We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

BaseQRankSum variations with interval size in HaplotypeCaller

Analyzing the same sample with and without queue, I noticed a variant being filtered out in one of the runs with VQSRTrancheSNP99.00to99.90 in the filter column.

In my debugging of the problem, I noticed that the size of the region in HaplotypeCaller can influence both BaseQRankSum and ReadPosRankSum greatly in the g.vcf file.

commands:
1)
java -Xmx8g -Djava.io.tmpdir=tmp -jar /com/extra/GATK/3.5/jar-bin/GenomeAnalysisTK.jar -T HaplotypeCaller -I BDD.sorted.markdup.realigned.recal.bam -R ucsc.hg19_chrY_PAR1_PAR2_masked.fasta -L chr5:171333106-177333146 --genotyping_mode DISCOVERY --dbsnp dbsnp_138.hg19.vcf -ERC GVCF -variant_index_type LINEAR -variant_index_parameter 128000 -o BDD.sorted.markdup.realigned.recal.HaplotypeCaller_gVCF_chr5.vcf.gz

2)
java -Xmx8g -Djava.io.tmpdir=tmp -jar /com/extra/GATK/3.5/jar-bin/GenomeAnalysisTK.jar -T HaplotypeCaller -I BDD.sorted.markdup.realigned.recal.bam -R ucsc.hg19_chrY_PAR1_PAR2_masked.fasta -L chr5:175333106-177333146 --genotyping_mode DISCOVERY --dbsnp dbsnp_138.hg19.vcf -ERC GVCF -variant_index_type LINEAR -variant_index_parameter 128000 -o BDD.sorted.markdup.realigned.recal.HaplotypeCaller_gVCF_chr5.vcf.gz

The results for the SNP in question in the g.vcf file:
1)
chr5 176333126 rs2292256 C T, 5817.77 . BaseQRankSum=0.389;ClippingRankSum=2.280;DB;DP=314;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-1.360;RAW_MQ=1130400.00;ReadPosRankSum=-1.733 GT:AD:DP:GQ:PGT:PID:PL:SB 0/1:154,160,0:314:99:0|1:176333126_C_T:5846,0,7455,6310,7937,14246:53,101,56,104

2)
chr5 176333126 rs2292256 C T, 5817.77 . BaseQRankSum=-0.254;ClippingRankSum=0.132;DB;DP=314;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-1.278;RAW_MQ=1130400.00;ReadPosRankSum=-1.679 GT:AD:DP:GQ:PGT:PID:PL:SB 0/1:154,160,0:314:99:0|1:176333126_C_T:5846,0,7455,6310,7937,14246:53,101,56,104

This is probably the cause of the SNP being filtered in one run (no-queue) and not the other (queue). This leaves me with the question of which is most correct.

But why are these values different?

Tagged:

Best Answer

Answers

Sign In or Register to comment.