preprocessing¶
- stereoAlign.preprocessing.summarize_counts(adata, count_matrix=None, min_genes=20, min_cells=20)[source]¶
Summarise counts of the given count matrix
This function is useful for quality control. Aggregates counts per cell and per gene as well as mitochondrial fraction.
Parameters¶
- count_matrix:
count matrix, by default uses
adata.X- min_cells:
scanpy.pp.filter_cellsparameter- min_genes:
scanpy.pp.filter_genesparameter
Returns¶
- Include the following keys in
adata.obs ‘n_counts’: number of counts per cell (count depth) ‘log_counts’:
np.logof counts per cell ‘n_genes’: number of counts per gene
- stereoAlign.preprocessing.norma_log(adata)[source]¶
Normalization and Log transform
- Parameters:
adata –
- Returns:
- stereoAlign.preprocessing.scale_batch(adata, batch)[source]¶
Batch-aware scaling of count matrix
Scaling counts to a mean of 0 and standard deviation of 1 using
scanpy.pp.scalefor each batch separately.Parameters¶
- adata:
anndataobject with normalised and log-transformed counts- batch:
adata.obscolumn
Returns¶
scaled adata
- stereoAlign.preprocessing.hvg_intersect(adata, batch, target_genes=2000, flavor='cell_ranger', n_bins=20, adataOut=False, n_stop=8000, min_genes=500, step_size=1000)[source]¶
Highly variable gene selection
Legacy approach to HVG selection only using HVG intersections between all batches
Parameters¶
- adata:
anndataobject with preprocessed counts- batch:
adata.obscolumn- target_genes:
maximum number of genes (intersection reduces the number of genes)
- min_genes:
minimum number of intersection HVGs targeted
- step_size:
step size to increase HVG selection per dataset
Returns¶
list of maximal
target_genesnumber of highly variable genes
- stereoAlign.preprocessing.hvg_batch(adata, batch_key=None, target_genes=2000, flavor='cell_ranger', n_bins=20, adataOut=False)[source]¶
Batch-aware highly variable gene selection
Method to select HVGs based on mean dispersions of genes that are highly variable genes in all batches. Using a the top target_genes per batch by average normalize dispersion. If target genes still hasn’t been reached, then HVGs in all but one batches are used to fill up. This is continued until HVGs in a single batch are considered.
Parameters¶
- adata:
anndataobject- batch_key:
adata.obscolumn- target_genes:
maximum number of genes (intersection reduces the number of genes)
- flavor:
parameter for
scanpy.pp.highly_variable_genes- n_bins:
parameter for
scanpy.pp.highly_variable_genes- adataOut:
whether to return an
anndataobject or a list of highly variable genes
- stereoAlign.preprocessing.reduce_data(adata, pca=True, pca_comps=50, neighbors=True, use_rep='X_pca', umap=False)[source]¶
Apply feature selection and dimensionality reduction steps.
Wrapper function of PCA, neighbours computation and dimensionality reduction.
Parameters¶
- adata:
anndataobject with normalised and log-transformed data inadata.X- pca:
whether to compute PCA
- pca_comps:
number of principal components
- neighbors:
whether to compute neighbours graph
- use_rep:
embedding to use for neighbourhood graph
- umap:
whether to compute UMAP representation