We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

HaplotypeCaller output modes EMIT_ALL_CONFIDENT_SITES and EMIT_ALL_SITES not working

davidmarquesdavidmarques Univesity of BernMember
Dear GATK-Team,

First of all, thank you for your great support and constant development of GATK! I was very pleased to see that the output mode options EMIT_ALL_CONFIDENT_SITES and EMIT_ALL_SITES were reintroduced recently in HaplotypeCaller (GitHub Issue 2865, I am not allowed to post links yet). However, I think these options or not yet working.

When I am using the --output-mode options EMIT_ALL_CONFIDENT_SITES and EMIT_ALL_SITES, only some seemingly random monomorphic sites are reported with site quality QUAL="Infinity", instead of all sites (monomorphic sites and variants). This, despite many, many high depth covered monomorphic sites in my dataset. I get the same buggy result for single-sample calling and for multi-sample calling with HaplotypeCaller. I am using GATK version v4.1.2.0, java 1.8.0_101-b13.

Here is the commands I used to get all confident sites for a two samples:
gatk --java-options "-Xmx10G -Xms1G" HaplotypeCaller --tmp-dir ./temp -R ref.fasta -I 22363.subset.bam -I 22365.subset.bam -L scaffold_7232:1-1000 -O test.allconf.vcf.gz --pcr-indel-model NONE --output-mode EMIT_ALL_CONFIDENT_SITES

The resulting VCF file look like this (barring all except the last header lines):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 22363 22365
scaffold_7232 49 . C T 539.88 . AC=1;AF=0.250;AN=4;BaseQRankSum=-0.334;DP=132;ExcessHet=3.0103;FS=0.857;MLEAC=1;MLEAF=0.250;MQ=42.22;MQRankSum=-0.781;QD=5.57;ReadPosRankSum=-1.285;SOR=0.578 GT:AD:DP:GQ:PL 0/1:74,23:97:99:548,0,2548 0/0:33,0:33:99:0,99,1296
scaffold_7232 241 . A . Infinity . AN=4;DP=332;MQ=43.51 GT:AD:DP 0/0:232:232 0/0:100:100
scaffold_7232 276 . A C 1514.46 . AC=2;AF=0.500;AN=4;BaseQRankSum=2.611;DP=318;ExcessHet=4.7712;FS=2.430;MLEAC=2;MLEAF=0.500;MQ=42.98;MQRankSum=-0.010;QD=4.76;ReadPosRankSum=1.388;SOR=0.544 GT:AD:DP:GQ:PL 0/1:172,41:213:99:1039,0,5920 0/1:85,20:105:99:485,0,2913
scaffold_7232 333 . T . Infinity . AN=4;DP=311;MQ=41.99 GT:AD:DP 0/0:202:202 0/0:109:109
scaffold_7232 343 . C T 4427.46 . AC=2;AF=0.500;AN=4;BaseQRankSum=0.018;DP=319;ExcessHet=4.7712;FS=5.050;MLEAC=2;MLEAF=0.500;MQ=41.79;MQRankSum=-1.223;QD=13.97;ReadPosRankSum=-0.937;SOR=0.901 GT:AD:DP:GQ:PL 0/1:84,120:204:99:3812,0,2721 0/1:88,25:113:99:625,0,2792
scaffold_7232 453 . G A 5311.46 . AC=2;AF=0.500;AN=4;BaseQRankSum=2.043;DP=303;ExcessHet=4.7712;FS=7.804;MLEAC=2;MLEAF=0.500;MQ=40.00;MQRankSum=-0.946;QD=17.65;ReadPosRankSum=0.174;SOR=0.761 GT:AD:DP:GQ:PL 0/1:91,98:189:99:3866,0,5942 0/1:72,40:112:99:1455,0,3886
scaffold_7232 460 . T C 5174.46 . AC=2;AF=0.500;AN=4;BaseQRankSum=1.691;DP=300;ExcessHet=4.7712;FS=7.886;MLEAC=2;MLEAF=0.500;MQ=39.96;MQRankSum=-1.060;QD=17.36;ReadPosRankSum=-0.211;SOR=0.674 GT:AD:DP:GQ:PL 0/1:90,95:185:99:3727,0,5928 0/1:73,40:113:99:1457,0,3901
scaffold_7232 475 . G . Infinity . AN=4;DP=306;MQ=39.88 GT:AD:DP 0/0:194:194 0/0:112:112
scaffold_7232 476 . A . Infinity . AN=4;DP=305;MQ=39.90 GT:AD:DP 0/0:193:193 0/0:112:112
scaffold_7232 485 . G A 4907.46 . AC=2;AF=0.500;AN=4;BaseQRankSum=2.955;DP=313;ExcessHet=4.7712;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=39.98;MQRankSum=-0.180;QD=15.73;ReadPosRankSum=-1.691;SOR=0.694 GT:AD:DP:GQ:PL 0/1:107,95:202:99:2997,0,3343 0/1:51,59:110:99:1920,0,1412
scaffold_7232 589 . T TAACTTAGGTTGTCTTAAAGAG 1148.91 . AC=1;AF=0.250;AN=4;BaseQRankSum=2.466;DP=286;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.250;MQ=40.83;MQRankSum=-3.627;QD=6.64;ReadPosRankSum=-2.513;SOR=0.661 GT:AD:DP:GQ:PL 0/1:139,34:173:99:1157,0,6676 0/0:105,0:105:99:0,317,5193
scaffold_7232 592 . A C 2229.46 . AC=2;AF=0.500;AN=4;BaseQRankSum=2.278;DP=278;ExcessHet=4.7712;FS=6.428;MLEAC=2;MLEAF=0.500;MQ=40.93;MQRankSum=0.022;QD=8.02;ReadPosRankSum=-0.480;SOR=0.964 GT:AD:DP:GQ:PL 0/1:130,43:173:99:1242,0,4637 0/1:71,34:105:99:997,0,2331
scaffold_7232 682 . C CT 1404.91 . AC=1;AF=0.250;AN=4;BaseQRankSum=-3.692;DP=291;ExcessHet=3.0103;FS=7.626;MLEAC=1;MLEAF=0.250;MQ=43.48;MQRankSum=1.029;QD=13.01;ReadPosRankSum=-0.245;SOR=0.905 GT:AD:DP:GQ:PL 0/0:161,0:161:99:0,488,7902 0/1:68,40:108:99:1413,0,3038
scaffold_7232 696 . G A 1337.88 . AC=1;AF=0.250;AN=4;BaseQRankSum=-2.273;DP=287;ExcessHet=3.0103;FS=4.200;MLEAC=1;MLEAF=0.250;MQ=43.62;MQRankSum=0.599;QD=12.62;ReadPosRankSum=1.324;SOR=0.730 GT:AD:DP:GQ:PL 0/0:178,0:178:99:0,539,8590 0/1:69,37:106:99:1346,0,3079
scaffold_7232 874 . A . Infinity . AN=4;DP=284;MQ=42.99 GT:AD:DP 0/0:191:191 0/0:93:93
scaffold_7232 884 . C A 1741.46 . AC=2;AF=0.500;AN=4;BaseQRankSum=-1.238;DP=272;ExcessHet=4.7712;FS=3.370;MLEAC=2;MLEAF=0.500;MQ=42.96;MQRankSum=-0.331;QD=7.35;ReadPosRankSum=-1.538;SOR=0.486 GT:AD:DP:GQ:PL 0/1:113,43:156:99:1215,0,3899 0/1:59,22:81:99:536,0,2058
scaffold_7232 911 . C . Infinity . AN=4;DP=287;MQ=42.32 GT:AD:DP 0/0:197:197 0/0:90:90
'''

Note that the depth across the whole 1 kb scaffold is very high in each sample and that both mapping and base qualities are nearly all very high, so I don't understand why other monomorphic sites are not called as confident. When I repeat the same with the --output-mode option EMIT_ALL_SITES, I get the exact same sites called in the VCF file, with QUAL=Infinity. Also, when I call single individuals instead, I get the same output with Infinity QUAL fields and only a handfull of the almost 1000 covered sites.

I am happy to provide the files above for replicating the issues in a different context, please let me know how I can provide them.

Thank you and best wishes,
David

Issue · Github
by bhanuGandham

Issue Number
6063
State
open
Last Updated
Assignee
Array
Milestone
Array

Answers

Sign In or Register to comment.