scanpy学习 教程 就像seurat教程一样
Tutorials — Scanpy 1.9.1 documentation 教程汇总 所有关于scanpy的教程
Usage Principles — Scanpy 1.9.1 documentation 总的介绍
API — Scanpy 1.9.1 documentation 常用命令汇总总结 读取数据 Datasets读取数据集 内置数据集
https://scanpy-tutorials.readthedocs.io/en/latest/plotting/core.html scanpy常用画图命令总结
External API — Scanpy 1.9.1 documentation wrapper scanpy工具的补充
Ecosystem — Scanpy 1.9.1 documentation 所用方法原理解析
####高级教程
https://theislab.github.io/scanpy-in-R/ 在R中使用Scanpy 但是在我的电脑上失败了!!第二次成功了!!!!!111 scanpy和seurat数据格式互相转化转换python r renv reticulate r中使用python 成功!!安装python环境 不需要安装conda环境_YoungLeelight的博客-CSDN博客
R中使用python reticulate rmarkdown
在Rstudio里面使用python_哔哩哔哩_bilibili
Usage Principles
Import Scanpy as:
import scanpy as sc Workflow
The typical workflow consists of subsequent calls of data analysis tools in sc.tl, e.g.:
sc.tl.umap(adata, **tool_params) # embed a neighborhood graph of the data using UMAP
where adata is an AnnData object. Each of these calls adds annotation to an expression matrix X, which stores n_obs observations (cells) of n_vars variables (genes). For each tool, there typically is an associated plotting function in sc.pl:
sc.pl.umap(adata, **plotting_params)
If you pass show=False, a Axes instance is returned and you have all of matplotlib’s detailed configuration possibilities.
To facilitate writing memory-efficient pipelines, by default, Scanpy tools operate inplace on adata and return None – this also allows to easily transition to out-of-memory pipelines. If you want to return a copy of the AnnData object and leave the passed adata unchanged, pass copy=True or inplace=False.
AnnData
Scanpy is based on anndata, which provides the AnnData class.

At the most basic level, an AnnData object adata stores a data matrix adata.X, annotation of observations adata.obs and variables adata.var as pd.DataFrame and unstructured annotation adata.uns as dict. Names of observations and variables can be accessed via adata.obs_names and adata.var_names, respectively. AnnData objects can be sliced like dataframes, for example, adata_subset = adata[:, list_of_gene_names]. For more, see this blog post.
To read a data file to an AnnData object, call:
adata = sc.read(filename)
to initialize an AnnData object. Possibly add further annotation using, e.g., pd.read_csv:
import pandas as pdanno = pd.read_csv(filename_sample_annotation)adata.obs[‘cell_groups’] = anno[‘cell_groups’] # categorical annotation of type pandas.Categoricaladata.obs[‘time’] = anno[‘time’] # numerical annotation of type float# alternatively, you could also set the whole dataframe# adata.obs = anno
To write, use:
adata.write(filename)adata.write_csvs(filename)adata.write_loom(filename) scanpy常用命令总结 API
Import Scanpy as:
import scanpy as sc
Note
Additional functionality is available in the broader ecosystem, with some tools being wrapped in the scanpy.external module.
Preprocessing: pp
Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.
Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.
Basic Preprocessing
For visual quality control, see highest_expr_genes() and filter_genes_dispersion() in scanpy.pl.
pp.calculate_qc_metrics(adata, *[, …])
Calculate quality control metrics.
pp.filter_cells(data[, min_counts, …])
Filter cell outliers based on counts and numbers of genes expressed.
pp.filter_genes(data[, min_counts, …])
Filter genes based on number of cells or counts.
pp.highly_variable_genes(adata[, layer, …])
Annotate highly variable genes [Satija15] [Zheng17] [Stuart19].
pp.log1p()
Logarithmize the data matrix.
pp.pca(data[, n_comps, zero_center, …])
Principal component analysis [Pedregosa11].
pp.normalize_total(adata[, target_sum, …])
Normalize counts per cell.
pp.regress_out(adata, keys[, n_jobs, copy])
Regress out (mostly) unwanted sources of variation.
pp.scale()
Scale data to unit variance and zero mean.
pp.subsample(data[, fraction, n_obs, …])
Subsample to a fraction of the number of observations.
pp.downsample_counts(adata[, …])
Downsample counts from count matrix.
Recipes
pp.recipe_zheng17(adata[, n_top_genes, log, …])
Normalization and filtering as of [Zheng17].
pp.recipe_weinreb17(adata[, log, …])
Normalization and filtering as of [Weinreb17].
pp.recipe_seurat(adata[, log, plot, copy])
Normalization and filtering as of Seurat [Satija15].
Batch effect correction
Also see [Data integration]. Note that a simple batch correction method is available via pp.regress_out(). Checkout scanpy.external for more.
pp.combat(adata[, key, covariates, inplace])
ComBat function for batch effect correction [Johnson07] [Leek12] [Pedersen12].
Neighbors
pp.neighbors(adata[, n_neighbors, n_pcs, …])
Compute a neighborhood graph of observations [McInnes18].
Tools: tl
Any transformation of the data matrix that is not preprocessing. In contrast to a preprocessing function, a tool usually adds an easily interpretable annotation to the data matrix, which can then be visualized with a corresponding plotting function.
Embeddings
tl.pca(data[, n_comps, zero_center, …])
Principal component analysis [Pedregosa11].
tl.tsne(adata[, n_pcs, use_rep, perplexity, …])
t-SNE [Maaten08] [Amir13] [Pedregosa11].
tl.umap(adata[, min_dist, spread, …])
Embed the neighborhood graph using UMAP [McInnes18].
tl.draw_graph(adata[, layout, init_pos, …])
Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18].
tl.diffmap(adata[, n_comps, neighbors_key, …])
Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18].
Compute densities on embeddings.
tl.embedding_density(adata[, basis, …])
Calculate the density of cells in an embedding (per condition).
Clustering and trajectory inference
tl.leiden(adata[, resolution, restrict_to, …])
Cluster cells into subgroups [Traag18].
tl.louvain(adata[, resolution, …])
Cluster cells into subgroups [Blondel08] [Levine15] [Traag17].
tl.dendrogram(adata, groupby[, n_pcs, …])
Computes a hierarchical clustering for the given groupby categories.
tl.dpt(adata[, n_dcs, n_branchings, …])
Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf19].
tl.paga(adata[, groups, use_rna_velocity, …])
Mapping out the coarse-grained connectivity structures of complex manifolds [Wolf19].
Data integration
tl.ingest(adata, adata_ref[, obs, …])
Map labels and embeddings from reference data to new data.
Marker genes
tl.rank_genes_groups(adata, groupby[, …])
Rank genes for characterizing groups.
tl.filter_rank_genes_groups(adata[, key, …])
Filters out genes based on log fold change and fraction of genes expressing the gene within and outside the groupby categories.
tl.marker_gene_overlap(adata, …[, key, …])
Calculate an overlap score between data-deriven marker genes and provided markers
Gene scores, Cell cycle
tl.score_genes(adata, gene_list[, …])
Score a set of genes [Satija15].
tl.score_genes_cell_cycle(adata, s_genes, …)
Score cell cycle genes [Satija15].
Simulations
tl.sim(model[, params_file, tmax, …])
Simulate dynamic gene expression data [Wittmann09] [Wolf18].
Plotting: pl
The plotting module scanpy.pl largely parallels the tl.* and a few of the pp.* functions. For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.
See → tutorial: plotting/core for an overview of how to use these functions.
Note
See the Settings section for all important plotting configurations.
Generic
pl.scatter(adata[, x, y, color, use_raw, …])
Scatter plot along observations or variables axes.
pl.heatmap(adata, var_names, groupby[, …])
Heatmap of the expression values of genes.
pl.dotplot(adata, var_names, groupby[, …])
Makes a dot plot of the expression values of var_names.
pl.tracksplot(adata, var_names, groupby[, …])
In this type of plot each var_name is plotted as a filled line plot where the y values correspond to the var_name values and x is each of the cells.
pl.violin(adata, keys[, groupby, log, …])
Violin plot.
pl.stacked_violin(adata, var_names, groupby)
Stacked violin plots.
pl.matrixplot(adata, var_names, groupby[, …])
Creates a heatmap of the mean expression values per group of each var_names.
pl.clustermap(adata[, obs_keys, use_raw, …])
Hierarchically-clustered heatmap.
pl.ranking(adata, attr, keys[, dictionary, …])
Plot rankings.
pl.dendrogram(adata, groupby, *[, …])
Plots a dendrogram of the categories defined in groupby.
Classes
These classes allow fine tuning of visual parameters.
pl.DotPlot(adata, var_names, groupby[, …])
Allows the visualization of two values that are encoded as dot size and color.
pl.MatrixPlot(adata, var_names, groupby[, …])
Allows the visualization of values using a color map.
pl.StackedViolin(adata, var_names, groupby)
Stacked violin plots.
Preprocessing
Methods for visualizing quality control and results of preprocessing functions.
pl.highest_expr_genes(adata[, n_top, show, …])
Fraction of counts assigned to each gene over all cells.
pl.filter_genes_dispersion(result[, log, …])
Plot dispersions versus means for genes.
pl.highly_variable_genes(adata_or_result[, …])
Plot dispersions or normalized variance versus means for genes.
Tools
Methods that extract and visualize tool-specific annotation in an AnnData object. For any method in module tl, there is a method with the same name in pl.
PCA
pl.pca(adata, *[, color, gene_symbols, …])
Scatter plot in PCA coordinates.
pl.pca_loadings(adata[, components, …])
Rank genes according to contributions to PCs.
pl.pca_variance_ratio(adata[, n_pcs, log, …])
Plot the variance ratio.
pl.pca_overview(adata, **params)
Plot PCA results.
Embeddings
pl.tsne(adata, *[, color, gene_symbols, …])
Scatter plot in tSNE basis.
pl.umap(adata, *[, color, gene_symbols, …])
Scatter plot in UMAP basis.
pl.diffmap(adata, *[, color, gene_symbols, …])
Scatter plot in Diffusion Map basis.
pl.draw_graph(adata, *[, color, …])
Scatter plot in graph-drawing basis.
pl.spatial(adata, *[, color, gene_symbols, …])
Scatter plot in spatial coordinates.
pl.embedding(adata, basis, *[, color, …])
Scatter plot for user specified embedding basis (e.g.
Compute densities on embeddings.
pl.embedding_density(adata[, basis, key, …])
Plot the density of cells in an embedding (per condition).
Branching trajectories and pseudotime, clustering
Visualize clusters using one of the embedding methods passing color=’louvain’.
pl.dpt_groups_pseudotime(adata[, color_map, …])
Plot groups and pseudotime.
pl.dpt_timeseries(adata[, color_map, show, …])
Heatmap of pseudotime series.
pl.paga(adata[, threshold, color, layout, …])
Plot the PAGA graph through thresholding low-connectivity edges.
pl.paga_path(adata, nodes, keys[, use_raw, …])
Gene expression and annotation changes along paths in the abstracted graph.
pl.paga_compare(adata[, basis, edges, …])
Scatter and PAGA graph side-by-side.
Marker genes
pl.rank_genes_groups(adata[, groups, …])
Plot ranking of genes.
pl.rank_genes_groups_violin(adata[, groups, …])
Plot ranking of genes for all tested comparisons.
pl.rank_genes_groups_stacked_violin(adata[, …])
Plot ranking of genes using stacked_violin plot (see stacked_violin())
pl.rank_genes_groups_heatmap(adata[, …])
Plot ranking of genes using heatmap plot (see heatmap())
pl.rank_genes_groups_dotplot(adata[, …])
Plot ranking of genes using dotplot plot (see dotplot())
pl.rank_genes_groups_matrixplot(adata[, …])
Plot ranking of genes using matrixplot plot (see matrixplot())
pl.rank_genes_groups_tracksplot(adata[, …])
Plot ranking of genes using heatmap plot (see heatmap())
Simulations
pl.sim(adata[, tmax_realization, …])
Plot results of simulation.
Reading
Note
For reading annotation use pandas.read_… and add it to your anndata.AnnData object. The following read functions are intended for the numeric data in the data matrix X.
Read common file formats using
read(filename[, backed, sheet, ext, …])
Read file and return AnnData object.
Read 10x formatted hdf5 files and directories containing .mtx files using
read_10x_h5(filename[, genome, gex_only, …])
Read 10x-Genomics-formatted hdf5 file.
read_10x_mtx(path[, var_names, make_unique, …])
Read 10x-Genomics-formatted mtx directory.
read_visium(path[, genome, count_file, …])
Read 10x-Genomics-formatted visum dataset.
Read other formats using functions borrowed from anndata
read_h5ad(filename[, backed, as_sparse, …])
Read .h5ad-formatted hdf5 file.
read_csv(filename[, delimiter, …])
Read .csv file.
read_excel(filename, sheet[, dtype])
Read .xlsx (Excel) file.
read_hdf(filename, key)
Read .h5 (hdf5) file.
read_loom(filename, *[, sparse, cleanup, …])
Read .loom-formatted hdf5 file.
read_mtx(filename[, dtype])
Read .mtx file.
read_text(filename[, delimiter, …])
Read .txt, .tab, .data (text) file.
read_umi_tools(filename[, dtype])
Read a gzipped condensed count matrix from umi_tools.
Get object from AnnData: get
The module sc.get provides convenience functions for getting values back in useful formats.
get.obs_df(adata[, keys, obsm_keys, layer, …])
Return values for observations in adata.
get.var_df(adata[, keys, varm_keys, layer])
Return values for observations in adata.
get.rank_genes_groups_df(adata, group, *[, …])
scanpy.tl.rank_genes_groups() results in the form of a DataFrame.
Queries
This module provides useful queries for annotation and enrichment.
queries.biomart_annotations(org, attrs, *[, …])
Retrieve gene annotations from ensembl biomart.
queries.gene_coordinates(org, gene_name, *)
Retrieve gene coordinates for specific organism through BioMart.
queries.mitochondrial_genes(org, *[, …])
Mitochondrial gene symbols for specific organism through BioMart.
queries.enrich(container, *[, org, …])
Get enrichment for DE results.
Metrics
Collections of useful measurements for evaluating results.
metrics.confusion_matrix(orig, new[, data, …])
Given an original and new set of labels, create a labelled confusion matrix.
metrics.gearys_c()
Calculate Geary’s C, as used by VISION.
metrics.morans_i()
Calculate Moran’s I Global Autocorrelation Statistic.
Experimental
New methods that are in early development which are not (yet) integrated in Scanpy core.
experimental.pp.normalize_pearson_residuals(…)
Applies analytic Pearson residual normalization, based on [Lause21].
experimental.pp.normalize_pearson_residuals_pca(…)
Applies analytic Pearson residual normalization and PCA, based on [Lause21].
experimental.pp.highly_variable_genes(adata, *)
Select highly variable genes using analytic Pearson residuals [Lause21].
experimental.pp.recipe_pearson_residuals(…)
Full pipeline for HVG selection and normalization by analytic Pearson residuals ([Lause21]).
Classes
AnnData is reexported from anndata.
Represent data as a neighborhood structure, usually a knn graph.
Neighbors(adata[, n_dcs, neighbors_key])
Data represented as graph of nearest neighbors.
Settings
A convenience function for setting some default matplotlib.rcParams and a high-resolution jupyter display backend useful for use in notebooks.
set_figure_params([scanpy, dpi, dpi_save, …])
Set resolution/size, styling and format of figures.
An instance of the ScanpyConfig is available as scanpy.settings and allows configuring Scanpy.
_settings.ScanpyConfig(*[, verbosity, …])
Config manager for scanpy.
Some selected settings are discussed in the following.
Influence the global behavior of plotting functions. In non-interactive scripts, you’d usually want to set settings.autoshow to False.
autoshow
Automatically show figures if autosave == False (default True).
autosave
Automatically save figures in figdir (default False).
The default directories for saving figures, caching files and storing datasets.
figdir
Directory for saving figures (default ‘./figures/’).
cachedir
Directory for cache files (default ‘./cache/’).
datasetdir
Directory for example datasets (default ‘./data/’).
The verbosity of logging output, where verbosity levels have the following meaning: 0=’error’, 1=’warning’, 2=’info’, 3=’hint’, 4=more details, 5=even more details, etc.
verbosity
Verbosity level (default warning)
Print versions of packages that might influence numerical results.
logging.print_header(*[, file])
Versions that might influence the numerical results.
logging.print_versions(*[, file])
Print versions of imported packages, OS, and jupyter environment.
Datasets
datasets.blobs([n_variables, n_centers, …])
Gaussian Blobs.
datasets.ebi_expression_atlas(accession, *)
Load a dataset from the EBI Single Cell Expression Atlas
datasets.krumsiek11()
Simulated myeloid progenitors [Krumsiek11].
datasets.moignard15()
Hematopoiesis in early mouse embryos [Moignard15].
datasets.pbmc3k()
3k PBMCs from 10x Genomics.
datasets.pbmc3k_processed()
Processed 3k PBMCs from 10x Genomics.
datasets.pbmc68k_reduced()
Subsampled and processed 68k PBMCs.
datasets.paul15()
Development of Myeloid Progenitors [Paul15].
datasets.toggleswitch()
Simulated toggleswitch.
datasets.visium_sge([sample_id, …])
Processed Visium Spatial Gene Expression data from 10x Genomics.
Deprecated functions
pp.filter_genes_dispersion(data[, flavor, …])
Extract highly variable genes [Satija15] [Zheng17].
pp.normalize_per_cell(data[, …])
Normalize total counts per cell.