VQSR extremely low numbers of TP variants in tranche.(0.01 novel variants?)Weird tranche and VQSR

wlaiwlai Member
Dear staff, (wondering why am only allowed to select "Zoo & Garden" for the category...HELP)

I have 45108 variants from 31 exome vcf files. After VariantRecalibrator, my tranche specificity shows the best Ti/Tv ratio at 57.8 which is very different from several examples from the tutorial. Is it normal and should i proceed with ApplyVQSR with -tranche cutoff level at 55.7? I have read several tutorials and video which tells me it is alright to lower down the value but i have not yet seen anyone has lowered it to below 80.

If this is not normal, should i switch to hardfilter?

Command:

gatk --java-options "-Xmx48g -Xms48g" VariantRecalibrator -R $reference_dir -V tmp_sitesonly.vcf.gz -O tmp_sitesonly_recal_snps.vcf.gz --tranches-file tmp_sitesonly.snps.tranches -an MQ -an DP -an QD -an FS -an ReadPosRankSum --max-gaussians 4 --trust-all-polymorphic -tranche 60 -tranche 55.9 -tranche 55.8 -tranche 55.7 -tranche 55.5 -tranche 55 -mode SNP --rscript-file tmp_Merged_snps.plot.R --resource hapmap,known=false,training=true,truth=true,prior=15.0:$hapmap --resource omni,known=false,training=true,truth=false,prior=12.0:$omni --resource 1000G,known=false,training=true,truth=false,prior=10.0:$oneksnp --resource dbsnp,known=true,training=false,truth=false,prior=2.0:$dbsnp

Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx48g -Xms48g -jar /home/cruadmin01/installations/gatk-4.0.12.0/gatk-package-4.0.12.0-local.jar VariantRecalibrator -R /home/Graceca/references/hs38/hs38DH.fa -V tmp_sitesonly.vcf.gz -O tmp_sitesonly_recal_snps.vcf.gz --tranches-file tmp_sitesonly.snps.tranches -an MQ -an DP -an QD -an FS -an ReadPosRankSum --max-gaussians 4 --trust-all-polymorphic -tranche 60 -tranche 55.9 -tranche 55.8 -tranche 55.7 -tranche 55.5 -tranche 55 -mode SNP --rscript-file tmp_Merged_snps.plot.R --resource hapmap,known=false,training=true,truth=true,prior=15.0:/home/Graceca/references/GATK/hapmap_3.3.hg38.vcf.gz --resource omni,known=false,training=true,truth=false,prior=12.0:/home/Graceca/references/GATK/1000G_omni2.5.hg38.vcf.gz --resource 1000G,known=false,training=true,truth=false,prior=10.0:/home/Graceca/references/GATK/1000G_phase1.snps.high_confidence.hg38.vcf.gz --resource dbsnp,known=true,training=false,truth=false,prior=2.0:/home/Graceca/references/GATK/dbsnp_146.hg38.vcf.gz
16:13:44.468 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/cruadmin01/installations/gatk-4.0.12.0/gatk-package-4.0.12.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:13:46.193 INFO VariantRecalibrator - ------------------------------------------------------------
16:13:46.193 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.0.12.0
16:13:46.194 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
16:13:46.194 INFO VariantRecalibrator - Executing as [email protected] on Linux v3.13.0-103-generic amd64
16:13:46.194 INFO VariantRecalibrator - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_101-b13
16:13:46.194 INFO VariantRecalibrator - Start Date/Time: 22 March, 2019 4:13:44 PM SGT
16:13:46.194 INFO VariantRecalibrator - ------------------------------------------------------------
16:13:46.195 INFO VariantRecalibrator - ------------------------------------------------------------
16:13:46.195 INFO VariantRecalibrator - HTSJDK Version: 2.18.1
16:13:46.195 INFO VariantRecalibrator - Picard Version: 2.18.16
16:13:46.195 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:13:46.195 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:13:46.196 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:13:46.196 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:13:46.196 INFO VariantRecalibrator - Deflater: IntelDeflater
16:13:46.196 INFO VariantRecalibrator - Inflater: IntelInflater
16:13:46.196 INFO VariantRecalibrator - GCS max retries/reopens: 20
16:13:46.196 INFO VariantRecalibrator - Requester pays: disabled
16:13:46.196 INFO VariantRecalibrator - Initializing engine
16:13:46.593 INFO FeatureManager - Using codec VCFCodec to read file file:///home/Graceca/references/GATK/hapmap_3.3.hg38.vcf.gz
16:13:46.733 INFO FeatureManager - Using codec VCFCodec to read file file:///home/Graceca/references/GATK/1000G_omni2.5.hg38.vcf.gz
16:13:46.833 INFO FeatureManager - Using codec VCFCodec to read file file:///home/Graceca/references/GATK/1000G_phase1.snps.high_confidence.hg38.vcf.gz
16:13:46.903 INFO FeatureManager - Using codec VCFCodec to read file file:///home/Graceca/references/GATK/dbsnp_146.hg38.vcf.gz
16:13:46.957 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt1/fastq/hs38dr/tmp_sitesonly.vcf.gz
16:13:47.041 WARN IndexUtils - Feature file "/home/Graceca/references/GATK/dbsnp_146.hg38.vcf.gz" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
16:13:47.148 INFO VariantRecalibrator - Done initializing engine
16:13:47.150 INFO TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
16:13:47.150 INFO TrainingSet - Found omni track: Known = false Training = true Truth = false Prior = Q12.0
16:13:47.150 INFO TrainingSet - Found 1000G track: Known = false Training = true Truth = false Prior = Q10.0
16:13:47.150 INFO TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q2.0
16:13:47.206 INFO ProgressMeter - Starting traversal
16:13:47.206 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
16:13:52.601 INFO ProgressMeter - chr20:44357841 0.1 49559 551268.1
16:13:52.601 INFO ProgressMeter - Traversal complete. Processed 49559 total variants in 0.1 minutes.
16:13:52.616 INFO VariantDataManager - MQ: mean = 59.81 standard deviation = 1.40
16:13:52.629 INFO VariantDataManager - DP: mean = 90.09 standard deviation = 369.36
16:13:52.635 INFO VariantDataManager - QD: mean = 23.68 standard deviation = 7.09
16:13:52.640 INFO VariantDataManager - FS: mean = 0.16 standard deviation = 0.97
16:13:52.644 INFO VariantDataManager - ReadPosRankSum: mean = 0.15 standard deviation = 1.02
16:13:52.712 INFO VariantDataManager - Annotation order is: [DP, MQ, QD, FS, ReadPosRankSum]
16:13:52.718 INFO VariantDataManager - Training with 13094 variants after standard deviation thresholding.
16:13:52.720 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations...
16:13:52.940 INFO VariantRecalibratorEngine - Finished iteration 0.
16:13:53.076 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.07580
16:13:53.165 INFO VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.06460
16:13:53.253 INFO VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.01033
16:13:53.341 INFO VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.01247
16:13:53.414 INFO VariantRecalibratorEngine - Convergence after 24 iterations!
16:13:53.434 INFO VariantRecalibratorEngine - Evaluating full set of 45108 variants...
16:13:54.081 INFO VariantDataManager - Selected worst 1354 scoring variants --> variants with LOD <= -5.0000.
16:13:54.081 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations...
16:13:54.098 INFO VariantRecalibratorEngine - Finished iteration 0.
16:13:54.104 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.02325
16:13:54.110 INFO VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.00563
16:13:54.117 INFO VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.00108
16:13:54.117 INFO VariantRecalibratorEngine - Convergence after 15 iterations!
16:13:54.127 INFO VariantRecalibratorEngine - Evaluating full set of 45108 variants...
16:13:54.784 INFO TrancheManager - Finding 6 tranches for 45108 variants
16:13:54.809 INFO TrancheManager - TruthSensitivityTranche threshold 60.00 => selection metric threshold 0.400
16:13:54.821 INFO TrancheManager - Found tranche for 60.000: 0.400 threshold starting with variant 38815; running score is 0.400
16:13:54.821 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=60.00 minVQSLod=19.2431 known=(6221 @ 2.0883) novel=(72 @ 0.5000) truthSites(6689 accessible, 4013 called), name=anonymous]
16:13:54.821 INFO TrancheManager - TruthSensitivityTranche threshold 55.90 => selection metric threshold 0.441
16:13:54.827 INFO TrancheManager - Found tranche for 55.900: 0.441 threshold starting with variant 39577; running score is 0.441
16:13:54.827 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=55.90 minVQSLod=19.4768 known=(5517 @ 2.1017) novel=(14 @ 1.3333) truthSites(6689 accessible, 3739 called), name=anonymous]
16:13:54.827 INFO TrancheManager - TruthSensitivityTranche threshold 55.80 => selection metric threshold 0.442
16:13:54.830 INFO TrancheManager - Found tranche for 55.800: 0.442 threshold starting with variant 39592; running score is 0.442
16:13:54.830 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=55.80 minVQSLod=19.4814 known=(5503 @ 2.1025) novel=(13 @ 1.6000) truthSites(6689 accessible, 3732 called), name=anonymous]
16:13:54.830 INFO TrancheManager - TruthSensitivityTranche threshold 55.70 => selection metric threshold 0.443
16:13:54.833 INFO TrancheManager - Found tranche for 55.700: 0.443 threshold starting with variant 39607; running score is 0.443
16:13:54.833 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=55.70 minVQSLod=19.4869 known=(5489 @ 2.1034) novel=(12 @ 2.0000) truthSites(6689 accessible, 3725 called), name=anonymous]
16:13:54.833 INFO TrancheManager - TruthSensitivityTranche threshold 55.50 => selection metric threshold 0.445
16:13:54.836 INFO TrancheManager - Found tranche for 55.500: 0.445 threshold starting with variant 39643; running score is 0.445
16:13:54.836 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=55.50 minVQSLod=19.4990 known=(5455 @ 2.1041) novel=(10 @ 2.3333) truthSites(6689 accessible, 3712 called), name=anonymous]
16:13:54.836 INFO TrancheManager - TruthSensitivityTranche threshold 55.00 => selection metric threshold 0.450
16:13:54.839 INFO TrancheManager - Found tranche for 55.000: 0.450 threshold starting with variant 39744; running score is 0.450
16:13:54.839 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=55.00 minVQSLod=19.5381 known=(5359 @ 2.0886) novel=(5 @ 5.0000) truthSites(6689 accessible, 3678 called), name=anonymous]
16:13:54.840 INFO VariantRecalibrator - Writing out recalibration table...
16:13:55.390 INFO VariantRecalibrator - Writing out visualization Rscript file...
16:13:55.406 INFO VariantRecalibrator - Building DP x MQ plot...
16:13:55.410 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
16:13:55.856 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
16:13:56.079 INFO VariantRecalibrator - Building DP x QD plot...
16:13:56.082 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
16:13:56.424 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
16:13:56.700 INFO VariantRecalibrator - Building DP x FS plot...
16:13:56.703 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
16:13:57.044 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
16:13:57.255 INFO VariantRecalibrator - Building DP x ReadPosRankSum plot...
16:13:57.257 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
16:13:57.608 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
16:13:57.810 INFO VariantRecalibrator - Building MQ x QD plot...
16:13:57.811 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
16:13:58.207 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
16:13:58.395 INFO VariantRecalibrator - Building MQ x FS plot...
16:13:58.396 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
16:13:58.707 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
16:13:58.919 INFO VariantRecalibrator - Building MQ x ReadPosRankSum plot...
16:13:58.921 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
16:13:59.846 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
16:14:00.017 INFO VariantRecalibrator - Building QD x FS plot...
16:14:00.018 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
16:14:00.301 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
16:14:00.485 INFO VariantRecalibrator - Building QD x ReadPosRankSum plot...
16:14:00.485 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
16:14:00.768 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
16:14:00.944 INFO VariantRecalibrator - Building FS x ReadPosRankSum plot...
16:14:00.944 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
16:14:01.223 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
16:14:01.405 INFO VariantRecalibrator - Executing: Rscript /mnt1/fastq/hs38dr/tmp_Merged_snps.plot.R
16:14:12.480 INFO VariantRecalibrator - Executing: Rscript (resource)org/broadinstitute/hellbender/tools/walkers/vqsr/plot_Tranches.R /mnt1/fastq/hs38dr/tmp_sitesonly.snps.tranches 2.15



Last question, this is the result after ApplyVQSR:

chr2 88606308 . T *,A 597.39 PASS AC=2,10;AF=0.125,0.625;AN=16;DP=34;ExcessHet=0.0458;FS=0.000;MLEAC=7,24;MLEAF=0.438,1.00;MQ=60.00;POSITIVE_TRAIN_SITE;QD=29.11;SOR=0.804;VQSLOD=20.04;culprit=MQ GT:AD:DP:GQ:PGT:PID:PL:PS ./.:2,0,0:2:.:.:.:0,0,0,0,0,0 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 ./.:3,0,0:3:.:.:.:0,0,0,0,0,0 ./.:2,0,0:2:.:.:.:0,0,0,0,0,0 1/2:0,2,1:3:27:.:.:126,36,27,75,0,68 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 0/0:2,0,0:2:6:.:.:0,6,41,6,41,41 ./.:2,0,0:2:.:.:.:0,0,0,0,0,0 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 ./.:3,0,0:3:.:.:.:0,0,0,0,0,0 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 2|2:0,0,1:1:3:1|1:88606308_T_A:45,45,45,3,3,0:88606308 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 ./.:1,0,0:1:.:.:.:0,0,0,0,0,0 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 2|2:0,0,4:4:12:1|1:88606308_T_A:180,180,180,12,12,0:88606308 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 2|2:0,0,3:3:9:1|1:88606308_T_A:124,124,124,9,9,0:88606308 1/2:0,2,2:4:42:.:.:168,54,42,56,0,44 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 2|2:0,0,2:2:6:1|1:88606308_T_A:90,90,90,6,6,0:88606308 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0 0/0:2,0,0:2:6:.:.:0,6,39,6,39,39 ./.:0,0,0:0:.:.:.:0,0,0,0,0,0

This result shows one of the variant with a PASS status for tranche = 55.7, I dont get why DP is 34 while there are no DP that is 34 in all 31 samples shown, therefore how does the DP value comes(calculated) from ? I feel there is something really weird with the output.. is it simply because of low DP thats why that looks a bit abnormal to me? Some samples seems like only have one or two DP!
Tagged:
Sign In or Register to comment.