Deconvoluting spatialATAC data with Cell2Location

Deconvoluting spatialATAC data with Cell2Location#

In this tutorial, we go over how to run Cell2Location with deconvATAC.

Import libraries#

[1]:
import warnings
warnings.filterwarnings('ignore')
from deconvatac.tl import cell2location
from deconvatac.pp import highly_variable_peaks
import scanpy as sc
import pandas as pd
import numpy as np

Read in the data#

We will use the human melanoma dataset by Russell et al.. The data is available for download on Zenodo in example_notebooks.zip.

[ ]:
russell_st = sc.read_h5ad("./example_notebooks/cell2location/russell_250_atac.h5ad")
russell_st
AnnData object with n_obs × n_vars = 360 × 53451
    obs: 'cell_count'
    var: 'features'
    uns: 'log1p', 'proportion_names'
    obsm: 'proportions', 'spatial'
    layers: 'log_norm', 'tfidf_normalized'
[ ]:
russell_sc = sc.read_h5ad("./example_notebooks/cell2location/russel_ref_atac.h5ad")
russell_sc
AnnData object with n_obs × n_vars = 2535 × 53451
    obs: 'barcodes', 'NAME_x', 'X', 'Y', 'cell_type', 'NAME_y', 'biosample_id', 'donor_id', 'species', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'sex', 'cluster'
    var: 'features'
    uns: 'cell_type_colors', 'log1p'
    obsm: 'X_umap', 'spatial'
    layers: 'log_norm', 'tfidf_normalized'
[4]:
# Calculate the most probable cluster assignment
max_prob_cluster = np.argmax(russell_st.obsm["proportions"], axis=1)
# Get the column names
cluster_id = russell_st.uns["proportion_names"]
russell_st.obs["ground truth"] = cluster_id[max_prob_cluster]
russell_st.obs['ground truth'] = pd.Categorical(russell_st.obs['ground truth'], categories=russell_sc.obs.cell_type.cat.categories)
russell_st.uns["ground truth_colors"] = russell_sc.uns["cell_type_colors"].copy()
sc.pl.embedding(russell_st, basis = "spatial", color = "ground truth")
sc.pl.embedding(russell_sc, basis = "spatial", color = "cell_type")
../_images/notebooks_run_cell2location_russell_8_0.png
../_images/notebooks_run_cell2location_russell_8_1.png

Select HVFs with deconvATAC#

Before running deconvolution, the feature space should be subsetted onto a smaller space. Since the method of highly variable feature selection shows greater performance than the highly accessible feature approach, we will select features using the deconvatac.pp.highly_variable_peaks function.

The deconvatac.pp.highly_variable_peaks function selects the highly variable features in place and saves the result to .var["highly_variable"]. It takes as input the key in .obs of the clusters by which the method identifies highly variable peaks across. In our case, the cluster_key is the cell type.

[26]:
highly_variable_peaks(adata=russell_sc, cluster_key="cell_type")
[27]:
russell_st = russell_st[:, russell_sc.var["highly_variable"]]
russell_sc = russell_sc[:, russell_sc.var["highly_variable"]]

Run Cell2Location#

[16]:
russell_st, russell_sc = cell2location(adata_spatial=russell_st, adata_ref=russell_sc, N_cells_per_location=8, detection_alpha=20
              , labels_key="cell_type", use_gpu=False, max_epochs_spatial=400, max_epochs_ref=400, return_adatas=True)
Anndata setup with scvi-tools version 1.0.3.
Setup via `RegressionModel.setup_anndata` with arguments:
{
'layer': None,
'batch_key': None,
'labels_key': 'cell_type',
'categorical_covariate_keys': None,
'continuous_covariate_keys': None
}
         Summary Statistics         
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃     Summary Stat Key      Value ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│         n_batch             1   │
│         n_cells           2535  │
│ n_extra_categorical_covs    0   │
│ n_extra_continuous_covs     0   │
│         n_labels           10   │
│          n_vars           20000 │
└──────────────────────────┴───────┘
               Data Registry                
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Registry Key     scvi-tools Location    ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│      X                 adata.X          │
│    batch      adata.obs['_scvi_batch']  │
│    ind_x        adata.obs['_indices']   │
│    labels     adata.obs['_scvi_labels'] │
└──────────────┴───────────────────────────┘
                     batch State Registry                      
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃     Source Location       Categories  scvi-tools Encoding ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ adata.obs['_scvi_batch']      0                0          │
└──────────────────────────┴────────────┴─────────────────────┘
                    labels State Registry                    
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃    Source Location      Categories  scvi-tools Encoding ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ adata.obs['cell_type']    T_CD4              0          │
│                           T_CD8              1          │
│                           T_reg              2          │
│                            mDC               3          │
│                          mono-mac            4          │
│                          myeloid             5          │
│                            pDC               6          │
│                           plasma             7          │
│                          tumour_1            8          │
│                          tumour_2            9          │
└────────────────────────┴────────────┴─────────────────────┘
/vol/storage/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/scvi/train/_trainrunner.py:76: UserWarning: `use_gpu` is deprecated in v1.0 and will be removed in v1.1. Please use `accelerator` and `devices` instead.
  accelerator, lightning_devices, device = parse_device_args(
/vol/storage/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/lightning/fabric/plugins/environments/slurm.py:168: PossibleUserWarning: The `srun` command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with `srun` like so: srun python /vol/storage/miniconda3/envs/cell2loc_env/lib/python ...
  rank_zero_warn(
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/vol/storage/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/lightning/fabric/plugins/environments/slurm.py:168: PossibleUserWarning: The `srun` command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with `srun` like so: srun python /vol/storage/miniconda3/envs/cell2loc_env/lib/python ...
  rank_zero_warn(
/vol/storage/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/lightning/pytorch/trainer/configuration_validator.py:69: UserWarning: You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.
  rank_zero_warn("You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.")
/vol/storage/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/lightning/pytorch/loops/fit_loop.py:281: PossibleUserWarning: The number of training batches (2) is smaller than the logging interval Trainer(log_every_n_steps=10). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  rank_zero_warn(
Epoch 400/400: 100%|██████████| 400/400 [16:34<00:00,  2.57s/it, v_num=1, elbo_train=8.85e+6]
`Trainer.fit` stopped: `max_epochs=400` reached.
Epoch 400/400: 100%|██████████| 400/400 [16:34<00:00,  2.49s/it, v_num=1, elbo_train=8.85e+6]
/vol/storage/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/lightning/fabric/plugins/environments/slurm.py:168: PossibleUserWarning: The `srun` command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with `srun` like so: srun python /vol/storage/miniconda3/envs/cell2loc_env/lib/python ...
  rank_zero_warn(
Sampling local variables, batch:   0%|          | 0/20 [00:00<?, ?it/s]
Sampling global variables, sample: 100%|██████████| 999/999 [00:18<00:00, 55.49it/s]
../_images/notebooks_run_cell2location_russell_15_15.png
../_images/notebooks_run_cell2location_russell_15_16.png
Anndata setup with scvi-tools version 1.0.3.
Setup via `Cell2location.setup_anndata` with arguments:
{
'layer': None,
'batch_key': None,
'labels_key': None,
'categorical_covariate_keys': None,
'continuous_covariate_keys': None
}
         Summary Statistics         
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃     Summary Stat Key      Value ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│         n_batch             1   │
│         n_cells            360  │
│ n_extra_categorical_covs    0   │
│ n_extra_continuous_covs     0   │
│         n_labels            1   │
│          n_vars           20000 │
└──────────────────────────┴───────┘
               Data Registry                
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Registry Key     scvi-tools Location    ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│      X                 adata.X          │
│    batch      adata.obs['_scvi_batch']  │
│    ind_x        adata.obs['_indices']   │
│    labels     adata.obs['_scvi_labels'] │
└──────────────┴───────────────────────────┘
                     batch State Registry                      
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃     Source Location       Categories  scvi-tools Encoding ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ adata.obs['_scvi_batch']      0                0          │
└──────────────────────────┴────────────┴─────────────────────┘
                     labels State Registry                      
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃      Source Location       Categories  scvi-tools Encoding ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ adata.obs['_scvi_labels']      0                0          │
└───────────────────────────┴────────────┴─────────────────────┘
/vol/storage/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/lightning/fabric/plugins/environments/slurm.py:168: PossibleUserWarning: The `srun` command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with `srun` like so: srun python /vol/storage/miniconda3/envs/cell2loc_env/lib/python ...
  rank_zero_warn(
/vol/storage/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/scvi/train/_trainrunner.py:76: UserWarning: `use_gpu` is deprecated in v1.0 and will be removed in v1.1. Please use `accelerator` and `devices` instead.
  accelerator, lightning_devices, device = parse_device_args(
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/vol/storage/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/lightning/fabric/plugins/environments/slurm.py:168: PossibleUserWarning: The `srun` command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with `srun` like so: srun python /vol/storage/miniconda3/envs/cell2loc_env/lib/python ...
  rank_zero_warn(
/vol/storage/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/lightning/pytorch/trainer/configuration_validator.py:69: UserWarning: You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.
  rank_zero_warn("You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.")
/vol/storage/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/lightning/pytorch/loops/fit_loop.py:281: PossibleUserWarning: The number of training batches (1) is smaller than the logging interval Trainer(log_every_n_steps=10). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  rank_zero_warn(
Epoch 400/400: 100%|██████████| 400/400 [01:12<00:00,  6.39it/s, v_num=1, elbo_train=4.14e+6]
`Trainer.fit` stopped: `max_epochs=400` reached.
Epoch 400/400: 100%|██████████| 400/400 [01:12<00:00,  5.50it/s, v_num=1, elbo_train=4.14e+6]
/vol/storage/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/lightning/fabric/plugins/environments/slurm.py:168: PossibleUserWarning: The `srun` command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with `srun` like so: srun python /vol/storage/miniconda3/envs/cell2loc_env/lib/python ...
  rank_zero_warn(
Sampling local variables, batch: 100%|██████████| 3/3 [01:09<00:00, 23.07s/it]
Sampling global variables, sample: 100%|██████████| 999/999 [00:20<00:00, 48.87it/s]
../_images/notebooks_run_cell2location_russell_15_32.png

In the returned AnnDatas, we can find the posteriors of the spatial mapping and reference model:

[19]:
russell_st, russell_sc
[19]:
(AnnData object with n_obs × n_vars = 360 × 20000
     obs: 'cell_count', 'leiden_pca', 'leiden_lsi', '_indices', '_scvi_batch', '_scvi_labels'
     var: 'features', 'highly_variable', 'highly_accessible'
     uns: 'leiden', 'log1p', 'lsi', 'neighbors', 'pca', 'proportion_names', '_scvi_uuid', '_scvi_manager_uuid', 'mod'
     obsm: 'X_lsi', 'X_pca', 'proportions', 'spatial', 'means_cell_abundance_w_sf', 'stds_cell_abundance_w_sf', 'q05_cell_abundance_w_sf', 'q95_cell_abundance_w_sf'
     varm: 'LSI', 'PCs'
     layers: 'log_norm', 'tfidf_normalized'
     obsp: 'connectivities', 'distances',
 AnnData object with n_obs × n_vars = 2535 × 20000
     obs: 'barcodes', 'NAME_x', 'X', 'Y', 'cell_type', 'NAME_y', 'biosample_id', 'donor_id', 'species', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'sex', 'cluster', '_indices', '_scvi_batch', '_scvi_labels'
     var: 'features', 'highly_accessible', 'highly_variable'
     uns: 'cell_type_colors', 'log1p', '_scvi_uuid', '_scvi_manager_uuid', 'mod'
     obsm: 'X_umap', 'spatial'
     varm: 'means_per_cluster_mu_fg', 'stds_per_cluster_mu_fg', 'q05_per_cluster_mu_fg', 'q95_per_cluster_mu_fg'
     layers: 'log_norm', 'tfidf_normalized')

The resulting means_cell_abundance_w_sf and q05_cell_abundance_w_sf cell type abundances are also saved to directory ./cell2location_results

Visualize results#

[6]:
russell_st.obs[russell_st.uns["mod"]["factor_names"]] = russell_st.obsm["q05_cell_abundance_w_sf"]
sc.pl.embedding(russell_st,basis="spatial",color=russell_st.uns["mod"]["factor_names"])

../_images/notebooks_run_cell2location_russell_20_0.png
[ ]: