Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

HaplotypeCaller seems not working well for several of my samples

Hi,

I am using the GATK v3.1-1-g07a4bf8. It seems to me that HaplotypeCaller made quite a lot more heterozygous calls for several of my samples. As you can see from the scatter plot, for those individuals (every dot means one individual) in the red parallelogram, HaplotypeCaller made on average 1500 heterozygote calls than UnifiedGenotyper. For other individuals, the number of heterozygous genotypes called by both callers are similar. One thing I should point out, those one in the red parallelogram did use a different capture array. But when I ran HaplotypeCaller and UnifiedGenotyper I did provide all bam files (joint genotyping all my 87 samples) and used the same interval file. I checked the raw VCF file, one line is attached here.

16 29818864 . C G 534.97 PASS AC=24;AF=0.145;AN=166;BaseQRankSum=-4.753;ClippingRankSum=-2.284;DP=1251;FS=0.000;InbreedingCoeff=-0.1260;MLEAC=21;MLEAF=0.127;MQ=59.97;MQ0=0;MQRankSum=-0.374;NEGATIVE_TRAIN_SITE;QD=2.03;ReadPosRankSum=-4.426;VQSLOD=-1.830e+00;culprit=MQ GT:AD:DP:GQ:PL 0/1:20,0:20:40:40,0,1402 0/0:19,0:19:60:0,60,1306 0/0:19,0:19:23:0,23,1295 0/1:14,0:14:23:23,0,986 0/1:5,0:5:19:19,0,336 0/1:7,0:7:7:7,0,510 0/0:5,0:5:15:0,15,324 0/1:8,1:9:49:49,0,493 0/1:32,1:33:17:17,0,2292 0/0:7,0:7:21:0,21,451 0/1:7,0:7:8:8,0,388 0/0:6,0:6:14:0,14,399 0/1:25,0:25:9:9,0,1722 0/0:4,0:4:12:0,12,279 0/0:23,0:23:36:0,36,1487 0/0:22,0:22:39:0,39,1532 0/0:9,0:9:27:0,27,534 0/0:23,0:23:46:0,46,1714 0/0:26,0:26:51:0,51,1780 0/1:26,0:26:16:16,0,1855 0/0:32,1:33:53:0,53,2150 0/0:33,0:33:17:0,17,2247 0/0:20,0:20:4:0,4,1446 0/0:1,0:1:6:0,6,62 0/0:3,0:3:9:0,9,236 0/0:33,0:33:99:0,105,2403 0/0:31,0:31:93:0,93,2189 0/0:39,0:39:85:0,85,2665 0/0:36,0:36:79:0,79,2536 0/0:16,0:16:48:0,48,1105 0/0:1,0:1:3:0,3,58 0/0:18,0:18:54:0,54,1325 0/0:22,0:22:37:0,37,1594 0/0:35,0:35:45:0,45,2273 0/1:26,0:26:1:1,0,1774 ./. 0/0:3,0:3:9:0,9,196 0/1:18,0:18:9:9,0,1298 0/0:1,0:1:6:0,6,106 0/0:15,0:15:45:0,45,1102 0/0:3,0:3:9:0,9,218 0/0:35,1:36:57:0,57,2510 ./. ./. 0/0:18,0:18:57:0,57,1353 0/0:16,0:16:22:0,22,1239 0/0:13,0:13:39:0,39,854 0/0:3,0:3:9:0,9,185 0/0:2,0:2:12:0,12,210 0/0:16,0:16:48:0,48,1123 0/0:21,0:21:52:0,52,1495 0/0:2,0:2:9:0,9,192 0/1:20,2:22:41:41,0,1384 0/0:5,0:5:15:0,15,360 0/0:16,0:16:14:0,14,1185 0/0:1,0:1:6:0,6,97 0/0:2,0:2:6:0,6,144 0/0:3,0:3:9:0,9,174 0/0:4,0:4:12:0,12,293 0/0:1,0:1:3:0,3,69 0/0:28,0:28:84:0,84,1988 0/0:29,0:29:96:0,96,2174 0/0:33,0:33:76:0,76,2174 ./. 0/0:3,0:3:9:0,9,225 0/0:20,0:20:29:0,29,1444 0/0:2,0:2:6:0,6,129 0/0:13,0:13:6:0,6,964 0/0:35,0:35:47:0,47,2432 0/0:25,1:26:62:0,62,1885 0/0:27,0:27:84:0,84,1806 0/0:31,0:31:79:0,79,2217 0/0:31,0:31:65:0,65,2182 0/0:34,0:34:99:0,114,2517 0/1:2,1:3:17:17,0,88 0/1:3,0:3:99:112,0,186 0/1:3,0:3:5:5,0,167 1/1:.:.:3:10,3,0 0/0:3,0:3:12:0,12,215 0/1:8,1:9:43:43,0,473 0/1:2,0:2:39:39,0,131 0/1:6,0:6:68:68,0,386 0/1:12,1:13:64:64,0,767 0/1:5,0:5:99:113,0,297 0/0:5,0:5:8:0,8,323 0/1:3,0:3:18:18,0,148 0/1:4,0:4:17:17,0,235

Looks like HaplotypeCaller incorrectly made some heterozygous calls for some individuals, especially when the read depth is not very high.

This is part of the log from the GATK.

INFO 11:06:37,484 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:06:37,486 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:21
INFO 11:06:37,486 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 11:06:37,486 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 11:06:37,484 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:06:37,487 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:21
INFO 11:06:37,487 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 11:06:37,487 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 11:06:37,490 HelpFormatter - Program Args: -T HaplotypeCaller -l INFO -R /data/houl3/Amish/Reference/human_g1k_v37_decoy.fasta -I sample1.bam -I sample2.bam -I sample3.bam -I ... -I sample87.bam --dbsnp /data/houl3/Amish/Reference/dbsnp_138.b37.vcf -o output.vcf --intervals Exome.bed -stand_call_conf 30.0 -stand_emit_conf 30.0

Tagged:

Best Answers

Answers

  • houlipinghouliping Member

    Thank you very much for your quick response!

    I QCed the data with VQSR.
    One thing I can't understand is that why HaplotypeCaller made heterozygote calls even when there are no reads have the alternative allele.

    For example:

    GT:AD:DP:GQ:PL
    0/1:5,0:5:99:113,0,297
    0/1:6,0:6:68:68,0,386
    0/1:7,0:7:8:8,0,388

    -Liping

  • houlipinghouliping Member

    Ok, I will try the latest nightly build and keep you updated.

    Thanks,

    -Liping

  • houlipinghouliping Member

    The night build did not call this particular variant I mentioned in my first post. And the number of heterozygotes called by HaplotypeCaller and UnifiedGenotyper is similar. Looks like the night build is working well.

    But I am still seeing some variant calls like this:

    1 218607552 . T A 1216.56 PASS AC=13;AF=0.075;AN=174;BaseQRankSum=-5.511;ClippingRankSum=4.484;DP=8238;FS=0.000;InbreedingCoeff=-0.0808;MLEAC=13;MLEAF=0.075;MQ=60.00;MQ0=0;MQRankSum=0.163;QD=1.82;ReadPosRankSum=-2.663;VQSLOD=15.10;culprit=QD GT:AD:DP:GQ:PL 0/0:140,0:140:99:0,462,9946 0/0:135,0:135:99:0,476,9519 0/0:155,0:155:99:0,497,11165 0/0:136,0:136:99:0,462,9365 0/1:47,1:48:99:100,0,2831 0/0:54,1:55:80:0,80,3208 0/0:63,0:63:29:0,29,3511 0/0:73,2:75:77:0,77,4165 0/0:89,0:89:99:0,318,6764 0/0:74,1:75:36:0,36,4445 0/0:65,2:67:24:0,24,4061 0/0:62,2:64:27:0,27,3330 0/0:109,0:109:99:0,360,8154 0/0:62,1:63:69:0,69,3362 0/0:166,0:166:99:0,562,11348 0/0:153,0:153:99:0,519,11100 0/0:141,0:141:99:0,472,9820 0/0:125,0:125:99:0,429,8947 0/0:100,0:100:99:0,361,7570 0/0:100,0:100:99:0,321,6761 0/0:106,0:106:99:0,375,7920 0/0:121,0:121:99:0,400,8823 0/0:148,0:148:99:0,511,9887 0/0:60,0:60:99:0,211,4181 0/0:72,0:72:99:0,240,4615 0/0:109,0:109:99:0,379,8278 0/0:126,0:126:99:0,414,9193 0/0:131,0:131:99:0,398,9210 0/0:111,0:111:99:0,370,7929 0/0:116,0:116:99:0,386,8181 0/0:86,0:86:99:0,301,5931 0/0:51,0:51:99:0,178,3866 0/0:145,0:145:99:0,526,10819 0/0:129,0:129:99:0,438,9645 0/0:110,0:110:99:0,366,8214 0/0:91,0:91:99:0,318,6442 0/0:75,0:75:99:0,270,5383 0/0:126,0:126:99:0,444,8968 0/0:56,0:56:99:0,204,3949 0/0:143,0:143:99:0,476,11978 0/0:78,0:78:99:0,255,5103 0/0:107,0:107:99:0,370,7816 0/0:65,0:65:99:0,231,4655 0/0:92,0:92:99:0,286,6420 0/0:136,0:136:99:0,466,9514 0/0:160,0:160:99:0,520,10958 0/0:157,0:157:99:0,519,11508 0/0:80,0:80:99:0,251,5547 0/0:47,0:47:99:0,162,3260 0/0:126,0:126:99:0,462,9283 0/0:123,0:123:99:0,406,8383 0/0:55,0:55:99:0,198,3758 0/0:98,0:98:99:0,310,7336 0/0:51,0:51:99:0,189,3708 0/0:115,0:115:99:0,396,8225 0/0:78,0:78:99:0,253,5078 0/0:87,0:87:99:0,321,6248 0/0:86,0:86:99:0,280,5540 0/0:62,0:62:99:0,215,4102 0/0:89,0:89:99:0,315,6852 0/0:97,0:97:99:0,300,7260 0/0:113,0:113:99:0,385,8042 0/0:120,1:121:99:0,375,8915 0/0:69,0:69:99:0,229,4361 0/0:49,0:49:99:0,161,3257 0/0:129,0:129:99:0,441,8898 0/0:68,1:69:99:0,244,4527 0/0:136,0:136:99:0,484,9827 0/0:142,0:142:99:0,471,10469 0/0:96,0:96:99:0,328,7129 0/0:99,0:99:99:0,327,7456 0/0:125,0:125:99:0,435,9484 0/0:106,0:106:99:0,343,7155 0/0:106,0:106:99:0,349,7757 0/1:50,1:51:99:162,0,2798 0/1:47,0:47:99:101,0,2664 0/1:32,0:32:31:31,0,1824 0/1:39,0:39:99:147,0,1998 0/1:38,1:39:99:164,0,2305 0/1:60,1:61:24:24,0,3393 0/1:56,1:57:88:88,0,3015 0/0:51,0:51:37:0,37,2893 0/1:51,1:52:72:72,0,2783 0/1:48,0:48:99:270,0,2845 0/1:77,0:77:97:97,0,4540 0/1:42,1:43:32:32,0,2520 0/1:50,0:50:99:103,0,2611

    Any suggestions?

    Thanks,

    Liping

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @houliping‌

    Hi Liping,

    Have you tried using the -bamout argument? You can read about it here: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_haplotypecaller_HaplotypeCaller.html#--bamOutput

    I think this could be due to Haplotype Caller's reassembly of the reads in the active region. The output bam file will show you the reads after reassembly.

    -Sheila

Sign In or Register to comment.