preprocessing

stereoAlign.preprocessing.summarize_counts(adata, count_matrix=None, min_genes=20, min_cells=20)[source]

Summarise counts of the given count matrix

This function is useful for quality control. Aggregates counts per cell and per gene as well as mitochondrial fraction.

Parameters

count_matrix:

count matrix, by default uses adata.X

min_cells:

scanpy.pp.filter_cells parameter

min_genes:

scanpy.pp.filter_genes parameter

Returns

Include the following keys in adata.obs

‘n_counts’: number of counts per cell (count depth) ‘log_counts’: np.log of counts per cell ‘n_genes’: number of counts per gene

stereoAlign.preprocessing.norma_log(adata)[source]

Normalization and Log transform

Parameters:

adata

Returns:

stereoAlign.preprocessing.scale_batch(adata, batch)[source]

Batch-aware scaling of count matrix

Scaling counts to a mean of 0 and standard deviation of 1 using scanpy.pp.scale for each batch separately.

Parameters

adata:

anndata object with normalised and log-transformed counts

batch:

adata.obs column

Returns

scaled adata

stereoAlign.preprocessing.hvg_intersect(adata, batch, target_genes=2000, flavor='cell_ranger', n_bins=20, adataOut=False, n_stop=8000, min_genes=500, step_size=1000)[source]

Highly variable gene selection

Legacy approach to HVG selection only using HVG intersections between all batches

Parameters

adata:

anndata object with preprocessed counts

batch:

adata.obs column

target_genes:

maximum number of genes (intersection reduces the number of genes)

min_genes:

minimum number of intersection HVGs targeted

step_size:

step size to increase HVG selection per dataset

Returns

list of maximal target_genes number of highly variable genes

stereoAlign.preprocessing.hvg_batch(adata, batch_key=None, target_genes=2000, flavor='cell_ranger', n_bins=20, adataOut=False)[source]

Batch-aware highly variable gene selection

Method to select HVGs based on mean dispersions of genes that are highly variable genes in all batches. Using a the top target_genes per batch by average normalize dispersion. If target genes still hasn’t been reached, then HVGs in all but one batches are used to fill up. This is continued until HVGs in a single batch are considered.

Parameters

adata:

anndata object

batch_key:

adata.obs column

target_genes:

maximum number of genes (intersection reduces the number of genes)

flavor:

parameter for scanpy.pp.highly_variable_genes

n_bins:

parameter for scanpy.pp.highly_variable_genes

adataOut:

whether to return an anndata object or a list of highly variable genes

stereoAlign.preprocessing.reduce_data(adata, pca=True, pca_comps=50, neighbors=True, use_rep='X_pca', umap=False)[source]

Apply feature selection and dimensionality reduction steps.

Wrapper function of PCA, neighbours computation and dimensionality reduction.

Parameters

adata:

anndata object with normalised and log-transformed data in adata.X

pca:

whether to compute PCA

pca_comps:

number of principal components

neighbors:

whether to compute neighbours graph

use_rep:

embedding to use for neighbourhood graph

umap:

whether to compute UMAP representation