Simulating multi-modal spatial data from dissociated single-cell data

Simulating multi-modal spatial data from dissociated single-cell data#

In this tutorial, we will simulate spatial transcriptomics and spatial ATAC data from a single-cell dataset.

[ ]:
import warnings
warnings.filterwarnings('ignore')
from deconvatac.tl import generate_spatial_data
import muon as mu
import scanpy as sc
import pandas as pd

Load single-cell reference#

We will use single-cell data from human heart tissue by Kanemaru et al. The data is available for download on cellxgene.

[ ]:
mdata = mu.read("./submission_data/data/human_cardiac_niches.h5mu")
mdata
MuData object with n_obs × n_vars = 139835 × 462560
  var:      'highly_variable'
  2 modalities
    atac:   139835 x 429828
      obs:  'sangerID', 'combinedID', 'donor', 'donor_type', 'region', 'region_finest', 'age', 'gender', 'facility', 'cell_or_nuclei', 'modality', 'kit_10x', 'flushed', 'batch_key', 'cell_type', 'cell_state'
      var:  'highly_variable', 'highly_accessible'
      uns:  'log1p'
      layers:       'log_norm', 'tfidf_normalized'
    rna:    139835 x 32732
      obs:  'sangerID', 'combinedID', 'donor', 'donor_type', 'region', 'region_finest', 'age', 'gender', 'facility', 'cell_or_nuclei', 'modality', 'kit_10x', 'flushed', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'scrublet_score', 'scrublet_leiden', 'cluster_scrublet_score', 'doublet_pval', 'doublet_bh_pval', 'batch_key', 'leiden_scVI', 'cell_type', 'cell_state_HCAv1', 'cell_state_scNym', 'cell_state_scNym_confidence', 'cell_state', 'latent_RT_efficiency', 'latent_cell_probability', 'latent_scale', 'n_counts', '_scvi_batch', '_scvi_labels', 'clus20', 'doublet_cls', 'original_or_new', 'batch', 'scANVI_predictions', 'leiden_scArches'
      var:  'gene_name-new', 'gene_name_scRNA-0-original', 'gene_name_snRNA-1-original', 'gene_name_multiome-2-original', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns:  'age_colors', 'cell_or_nuclei_colors', 'cell_state_colors', 'cell_type_colors', 'donor_colors', 'donor_type_colors', 'facility_colors', 'flushed_colors', 'gender_colors', 'hvg', 'kit_10x_colors', 'leiden', 'log1p', 'modality_colors', 'neighbors', 'original_or_new_colors', 'region_colors', 'region_finest_colors', 'scANVI_predictions_colors', 'umap'
      obsm: 'X_scArches', 'X_umap'
      layers:       'log_norm'
      obsp: 'connectivities', 'distances'

Simulate spatial data#

We will simulate two example datasets. In the first dataset, we simulate a homogeneous slide with one zone. The second simulation has four zones, with a different number of cell types in each.

[6]:
# Create parameter dictionary
params = {
    "Heart_homogeneous_1zone": {
        "n_regions": 1,
        "cell_type_number": [3],
        "cell_number_nu": [20],
        "cell_number_mean": [5],
        "region_type": "stripes",
        "num_spots": 1000,
        "balance": "balanced",
    },
    "Heart_heterogeneous_4zones": {
        "n_regions": 4,
        "cell_type_number": [10, 5, 7, 3],
        "cell_number_nu": [20, 20, 20, 20],
        "cell_number_mean": [15, 10, 12, 5],
        "region_type": "circles",
        "num_spots": 1000,
        "balance": "balanced",
    }
}
[ ]:
for key, param in params.items():
    print(key)
    print(param)
    simulation, samples = generate_spatial_data(mdata, cell_type_key="cell_type", **param)
    print(simulation)
    simulation.write(f"./submission_data/example_notebooks/simulation/{key}.h5mu")

Heart_homogeneous_1zone
{'n_regions': 1, 'cell_type_number': [3], 'cell_number_nu': [20], 'cell_number_mean': [5], 'region_type': 'stripes', 'num_spots': 1000, 'balance': 'balanced'}
Region 0: ['Mast cell' 'Mural cell' 'Mesothelial cell'] (mean cells per cluster: 4.485952133194589)
MuData object with n_obs × n_vars = 961 × 462560
  2 modalities
    atac:       961 x 429828
      obs:      'cell_count'
      uns:      'density', 'proportion_names'
      obsm:     'proportions', 'spatial'
    rna:        961 x 32732
      obs:      'cell_count'
      uns:      'density', 'proportion_names'
      obsm:     'proportions', 'spatial'
Heart_heterogeneous_4zones
{'n_regions': 4, 'cell_type_number': [10, 5, 7, 3], 'cell_number_nu': [20, 20, 20, 20], 'cell_number_mean': [15, 10, 12, 5], 'region_type': 'circles', 'num_spots': 1000, 'balance': 'balanced'}
Region 0: ['Mast cell' 'Mural cell' 'Mesothelial cell' 'Lymphoid' 'Neural cell'
 'Ventricular Cardiomyocyte' 'Lymphatic Endothelial cell' 'Fibroblast'
 'Endothelial cell' 'Myeloid'] (mean cells per cluster: 14.666666666666666)
Region 1: ['Adipocyte' 'Atrial Cardiomyocyte' 'Myeloid' 'Neural cell'
 'Ventricular Cardiomyocyte'] (mean cells per cluster: 9.545454545454545)
Region 2: ['Myeloid' 'Lymphoid' 'Atrial Cardiomyocyte' 'Mesothelial cell'
 'Ventricular Cardiomyocyte' 'Lymphatic Endothelial cell' 'Fibroblast'] (mean cells per cluster: 11.40625)
Region 3: ['Fibroblast' 'Myeloid' 'Lymphoid'] (mean cells per cluster: 4.503571428571429)
MuData object with n_obs × n_vars = 961 × 462560
  2 modalities
    atac:       961 x 429828
      obs:      'cell_count'
      uns:      'density', 'proportion_names'
      obsm:     'proportions', 'spatial'
    rna:        961 x 32732
      obs:      'cell_count'
      uns:      'density', 'proportion_names'
      obsm:     'proportions', 'spatial'

Visualize simulated data#

[ ]:
Heart_heterogeneous_4zones = mu.read_h5mu("./example_notebooks/simulation/Heart_heterogeneous_4zones.h5mu")
Heart_homogeneous_1zone = mu.read_h5mu("./example_notebooks/simulation/Heart_homogeneous_1zone.h5mu")
[15]:
# Save most abundant cell type in each spot to .obs
Heart_heterogeneous_4zones["atac"].obs = Heart_heterogeneous_4zones["atac"].obs.reset_index().join(
    pd.DataFrame(Heart_heterogeneous_4zones["atac"].obsm["proportions"], columns=Heart_heterogeneous_4zones["atac"].uns["proportion_names"]).reset_index(drop=True))
Heart_heterogeneous_4zones['atac'].obs['most_abundant'] = Heart_heterogeneous_4zones["atac"].obs.iloc[:, 2:].idxmax(axis=1)

Heart_homogeneous_1zone["atac"].obs = Heart_homogeneous_1zone["atac"].obs.reset_index().join(
    pd.DataFrame(Heart_homogeneous_1zone["atac"].obsm["proportions"], columns=Heart_homogeneous_1zone["atac"].uns["proportion_names"]).reset_index(drop=True))
Heart_homogeneous_1zone['atac'].obs['most_abundant'] = Heart_homogeneous_1zone["atac"].obs.iloc[:, 2:].idxmax(axis=1)
[19]:
sc.pl.embedding(
    Heart_homogeneous_1zone["atac"],
    basis="spatial",
    color=['cell_count', 'most_abundant'],
)
../_images/notebooks_simulate_data_dissociated_13_0.png
[17]:
sc.pl.embedding(
    Heart_heterogeneous_4zones["atac"],
    basis="spatial",
    color=['cell_count', 'most_abundant'],
)
../_images/notebooks_simulate_data_dissociated_14_0.png
[ ]: