Single-cell Transcriptomics

Single-cell Transcriptomics
A Single Cell RNA pipeline, implemented in Nextflow and part of the Online Pipelines Platform (OP²).

Pipeline overview

Our OP² single-cell transcriptomics pipeline is a bioinformatics analysis workflow used for single-cell RNA sequencing data. It allows you to analyze your RNA sequencing data using this gold standard analysis pipeline. You get insights into the quality of your data, expression profiles of your cells, differential expression levels of multiple genes, cell annotations and identities, and gene enrichment analysis.

The workflow processes raw data from FastQ inputs, aligns the reads, generates counts relative to genes and performs extensive quality-control on the results. These results are made available to you via two interactive reports, and a data package with all essential intermediate files to perform more in-depth data analysis. The pre-processing workflow processes your raw sequence data until QC approved aligned data. Next, the post-processing workflow enables you to review the biological meaning of your data via a statistical analysis approach.

See the pipeline page for a more detailed overview.

Do you have any question about these results? Just email us at helpdesk@excelra.com

Report info

Generated on
2025-04-16 08:46
Experiment
testscRNA_Mouse_brain_GDM_Ctrl
Pipeline
Single-cell Transcriptomics
Report
Post-processing Report
Species
mus_musculus
Species Build
mm10

Metadata

Below is an overview of the metadata of the count tables and samples. The high quality count tables are produced and filtered by STARsolo. Filename column corresponds to the prefix of uploaded samples. This is added to the table as this may differ from samplename.

Sample Group ID Filename
SRR32158738 GDM SRR32158738
SRR32158739 GDM SRR32158739
SRR32158740 GDM SRR32158740
SRR32158741 GDM SRR32158741
SRR32158742 Control SRR32158742
SRR32158743 Control SRR32158743
SRR32158744 Control SRR32158744
SRR32158745 Control SRR32158745

Quality Control

Low quality libraries from different cells can cluster together due to similarities in damage-induced expression profiles. These low quality libraries are not removed from the filtered dataset from STARsolo, therefore a quality control has mainly three metrics to check the quality of the data:
- Number of UMI: check for cells with low total counts
- Genes: check for low expressed genes
- Percentage of mitochondrial: High percentage can cause their own distinct clusters.
QC plots with cell density are created, instead of the normal violin plots from Seurat package, as these are more intuitive to understand. To identify cells that are outliers for the various QC metrics, it uses the median absolute deviation (MAD) from the median value of each metric across all cells. Specifically, a value is considered an outlier if it is more than 3 MADs from the median in the “problematic” direction. This is loosely motivated by the fact that such a filter will retain 99% of non-outlier values that follow a normal distribution.

The count tables of each sample are converted to a Seurat object [1]. Filtering is based on MAD (median absolute deviation) with default value MAD = 3.

QC metrics (log10nUMI, log10nGene, percentage mito) have been calculated, added to metadata and up to 5 samples are plotted here. The remaining plots can be found under QC folder in the results folder.

The UMI counts per cell should generally be above 500, that is the low end of what we expect. If UMI counts are between 500-1000 counts, it is usable but the cells probably should have been sequenced more deeply.

Note: Assumption is that batches have high quality to apply MAD. Samples from multiple batches can influence MAD. If sequence coverage is lower in one batch, it will drag down the median and MAD. This will reduce the suitability of adaptive threshold for other batches.