We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

After calling and variant recalibration data shows very low titv ratios, what could be the cause?

As per suggestions, I am sharing my analysis pipeline which I have used for calling variants for my exome data (Paired end), kindly guide me if any changes could improve my analysis:

I have followed GATK Best practices as much as I could.

I have used GATK v 3.3 for my analysis.

Raw data QC (FastQC) , Alignment using BWA MEM (-M) parameter. BAM file - sorted, RG header added, indexed, duplicates marked and removed.

Following are the commands I have used for for Indel realignment, Base recalibration, Variant Calling and Variant recalibration with my data:

Indel realignment:

java -Xmx4g -jar /share/apps/GenomeAnalysisTK-3.3-0.tar.bz2_FILES/GenomeAnalysisTK.jar -T RealignerTargetCreator -R hg19_ref.fa -o RG_index.marked.bam.list -I RG_index.marked.bam -known dbsnp_138.hg19.vcf -L nexterarapidcapture_exome_targetedregions_v1.2.bed

java -Xmx4g -jar /share/apps/GenomeAnalysisTK-3.3-0.tar.bz2_FILES/GenomeAnalysisTK.jar -I RG_index.marked.bam -T IndelRealigner -R hg19_ref.fa -targetIntervals RG_index.marked.bam.list -o RG_index.marked.realigned.bam

java -jar /share/apps/picard-tools-1.60/FixMateInformation.jar INPUT=RG_index.marked.realigned.bam OUTPUT=RG_index.marked.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true

Base Recalibration:

java -Xmx4g -jar /share/apps/GenomeAnalysisTK-3.3-0.tar.bz2_FILES/GenomeAnalysisTK.jar -T BaseRecalibrator -R hg19_ref.fa -I RG_index.marked.realigned.fixed.bam -knownSites dbsnp_138.hg19.vcf -knownSites Mills_and_1000G_gold_standard.indels.hg19.vcf -o recal_data.table -L nexterarapidcapture_exome_targetedregions_v1.2.bed

java -Xmx4g -jar /share/apps/GenomeAnalysisTK-3.3-0.tar.bz2_FILES/GenomeAnalysisTK.jar -T PrintReads -R hg19_ref.fa -I RG_index.marked.realigned.fixed.bam -BQSR recal_data.table -o RG_index.marked.realigned.fixed.recal.bam

java -Xmx4g -jar /share/apps/GenomeAnalysisTK-3.3-0.tar.bz2_FILES/GenomeAnalysisTK.jar -T BaseRecalibrator -R hg19_ref.fa -I RG_index.marked.realigned.fixed.recal.bam -knownSites dbsnp_138.hg19.vcf -knownSites Mills_and_1000G_gold_standard.indels.hg19.vcf -BQSR recal_data.table -o after_recal_data.table

java -Xmx4g -jar /share/apps/GenomeAnalysisTK-3.3-0.tar.bz2_FILES/GenomeAnalysisTK.jar -T AnalyzeCovariates -R hg19_ref.fa -before recal_data.table -after after_recal_data.table -plots recal_plots.pdf -csv xyz.csv

java -Xmx4g -jar /share/apps/GenomeAnalysisTK-3.3-0.tar.bz2_FILES/GenomeAnalysisTK.jar -glm BOTH -T UnifiedGenotyper -R hg19_ref.fa -I RG_index.marked.realigned.fixed.recal.bam -D dbsnp_138.hg19.vcf -L nexterarapidcapture_exome_targetedregions_v1.2.bed -stand_call_conf 30 -stand_emit_conf 20 -dcov 1000 -mbq 20 -o raw.vcf -metrics unified.metrics -log unifiedgenotyper.log

java -Xmx4g -jar /share/apps/GenomeAnalysisTK-3.3-0.tar.bz2_FILES/GenomeAnalysisTK.jar -T SelectVariants -R hg19_ref.fa -V raw.vcf -o raw_indel.vcf -selectType INDEL

java -Xmx4g -jar /share/apps/GenomeAnalysisTK-3.3-0.tar.bz2_FILES/GenomeAnalysisTK.jar -T SelectVariants -R hg19_ref.fa -V raw.vcf -o raw_snp.vcf -selectType SNP

java -Xmx4g -jar /share/apps/GenomeAnalysisTK-3.3-0.tar.bz2_FILES/GenomeAnalysisTK.jar -T VariantRecalibrator -R hg19_ref.fa -input raw_snp.vcf -recalFile snp_output.recal -tranchesFile snp_output.tranches --maxGaussians 4 -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.sites.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.hg19.sites.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_138.hg19.vcf -an QD -an HaplotypeScore -an FS -an MQ -an MQRankSum -an ReadPosRankSum -mode SNP -rscriptFile recal_snp.plots.R -titv 2.8 -tranche 100 -tranche 99.9 -tranche 99 -tranche 90

java -Xmx4g -jar /share/apps/GenomeAnalysisTK-3.3-0.tar.bz2_FILES/GenomeAnalysisTK.jar -T VariantRecalibrator -R hg19_ref.fa -input raw_indel.vcf -recalFile indel.output.recal -tranchesFile indel.output.tranches --maxGaussians 3 -resource:mills,known=false,training=true,truth=true,prior=12.0 Mills_and_1000G_gold_standard.indels.hg19.vcf -an QD -an MQRankSum -an ReadPosRankSum -an FS -an MQ -mode INDEL -rscriptFile recal_indel.plots.R

After these steps, I get recalibrated variants:

targetTruthSensitivity,numKnown,numNovel,knownTiTv,novelTiTv,minVQSLod,filterName,model,accessibleTruthSites,callsAtTruthSites,truthSensitivity
90.00,24841,534,2.8894,0.7919,3.1376,VQSRTrancheSNP0.00to90.00,SNP,19189,17270,0.9000
99.00,28610,656,2.8046,0.7218,0.3253,VQSRTrancheSNP90.00to99.00,SNP,19189,18997,0.9900
99.50,29383,731,2.7843,0.6961,-0.4800,VQSRTrancheSNP99.00to99.50,SNP,19189,19093,0.9950
99.90,33344,1460,2.6352,0.9185,-5.2153,VQSRTrancheSNP99.50to99.90,SNP,19189,19169,0.9990
100.00,35255,2126,2.5657,0.8455,-1287.5865,VQSRTrancheSNP99.90to100.00,SNP,19189,19189,1.0000

And a tranches plot which is attached below.

Your guidance would be helpful.

Thank you in advance!

Best Answer

Answers

Sign In or Register to comment.