preprocessing¶

stereoAlign.preprocessing.summarize_counts(adata, count_matrix=None, min_genes=20, min_cells=20)[source]¶

Summarise counts of the given count matrix

This function is useful for quality control. Aggregates counts per cell and per gene as well as mitochondrial fraction.

Parameters¶

count_matrix:: count matrix, by default uses adata.X
min_cells:: scanpy.pp.filter_cells parameter
min_genes:: scanpy.pp.filter_genes parameter

Returns¶

Include the following keys in adata.obs: ‘n_counts’: number of counts per cell (count depth) ‘log_counts’: np.log of counts per cell ‘n_genes’: number of counts per gene

stereoAlign.preprocessing.norma_log(adata)[source]¶

Normalization and Log transform

Parameters:: adata –
Returns:

stereoAlign.preprocessing.scale_batch(adata, batch)[source]¶

Batch-aware scaling of count matrix

Scaling counts to a mean of 0 and standard deviation of 1 using scanpy.pp.scale for each batch separately.

Parameters¶

adata:: anndata object with normalised and log-transformed counts
batch:: adata.obs column

Returns¶

scaled adata

stereoAlign.preprocessing.hvg_intersect(adata, batch, target_genes=2000, flavor='cell_ranger', n_bins=20, adataOut=False, n_stop=8000, min_genes=500, step_size=1000)[source]¶

Highly variable gene selection

Legacy approach to HVG selection only using HVG intersections between all batches

Parameters¶

adata:: anndata object with preprocessed counts
batch:: adata.obs column
target_genes:: maximum number of genes (intersection reduces the number of genes)
min_genes:: minimum number of intersection HVGs targeted
step_size:: step size to increase HVG selection per dataset

Returns¶

list of maximal target_genes number of highly variable genes

stereoAlign.preprocessing.hvg_batch(adata, batch_key=None, target_genes=2000, flavor='cell_ranger', n_bins=20, adataOut=False)[source]¶

Batch-aware highly variable gene selection

Method to select HVGs based on mean dispersions of genes that are highly variable genes in all batches. Using a the top target_genes per batch by average normalize dispersion. If target genes still hasn’t been reached, then HVGs in all but one batches are used to fill up. This is continued until HVGs in a single batch are considered.

Parameters¶

adata:: anndata object
batch_key:: adata.obs column
target_genes:: maximum number of genes (intersection reduces the number of genes)
flavor:: parameter for scanpy.pp.highly_variable_genes
n_bins:: parameter for scanpy.pp.highly_variable_genes
adataOut:: whether to return an anndata object or a list of highly variable genes

stereoAlign.preprocessing.reduce_data(adata, pca=True, pca_comps=50, neighbors=True, use_rep='X_pca', umap=False)[source]¶

Apply feature selection and dimensionality reduction steps.

Wrapper function of PCA, neighbours computation and dimensionality reduction.

Parameters¶

adata:: anndata object with normalised and log-transformed data in adata.X
pca:: whether to compute PCA
pca_comps:: number of principal components
neighbors:: whether to compute neighbours graph
use_rep:: embedding to use for neighbourhood graph
umap:: whether to compute UMAP representation

preprocessing¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Parameters¶

stereoAlign

Navigation

Related Topics