Our OP² germline genome transcriptomics pipeline is a bioinformatics analysis workflow used for whole genome, whole-exome or targeted DNA sequencing data.
It allows you to analyze your genome sequencing data using this gold standard analysis pipeline.
You get insights into the quality of your data, identify small to large nucleotide and structural variation and annotate with biological knowledge.
The workflow processes raw data from FastQ inputs, aligns the reads, calls variants and performs variant annotation.
These results are made available to you via two interactive reports, and a data package with all essential intermediate files to perform more in-depth data analysis.
The pre-processing workflow processes your raw sequence data until QC approved aligned data.
Next, the post-processing workflow enables you to review the biological meaning of your data via data annotation.
Whole genome, whole-exome and targeted genome data
Paired-end compressed raw FastQ files
Reference genome (GRCh37, GRCh38, GRCm38)
Reads with low-quality are discarded
Adaptor and quality trimming of reads
BWA aligns reads to reference genome
Alignment statistics: read depths, per base, GC content, …
GATK MarkDuplicates removes potential PCR artefacts
Construction of expression matrices
Base Quality Score Recalibration
BQSR is recalibrated
BQSR model is applied
Merge to final alignment file
All steps are consolidated in one alignment file per sample
Trimmed, recalibrated alignment file
SNVs, small indels, structural variants are called
GATK HaplotypeCaller, Strelka2, FreeBayes, Manta, …
Merge multi-variant files
All variant calling results are consolidated in one variant calling file per sample
Variants get biological knowledge assigned
snpEff and VEP
Variant QC and reporting
Quality score of variants are summarized
Summary statistics on variant categories, etc