Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Missing positions in the GVCF file
It seems the banded gvcf produced by HaplotypeCaller is missing many positions in the genome. We are not able to find half the specific positions we want to look at. To be clear, these positions are not even contained in any band, they are simply not there. For example, we were looking for position 866438 on chromosome 1 and these are the relevant lines in the gvcf:
1 866433 . G C, 412.77 . BaseQRankSum=-1.085;DP=16;MLEAC=1,0;MLEAF=0.500,0.00;MQ=60.00;MQ0=0;MQRankSum=-1.519;ReadPosRankSum=0.651 GT:AD:GQ:PL:SB 0/1:15,1,0:99:441,0,23572,546,23612,24157:0,0,0,0
1 866434 . G . . END=866435 GT:DP:GQ:MIN_DP:PL 0/0:22:45:21:0,45,675
1 866436 . ACGTG A, 403.73 . BaseQRankSum=-0.510;DP=17;MLEAC=1,0;MLEAF=0.500,0.00;MQ=60.00;MQ0=0;MQRankSum=-0.714;ReadPosRankSum=0.434 GT:AD:GQ:PL:SB 0/1:16,1,0:99:441,0,23572,546,23612,24157:0,0,0,0
1 866441 . A . . END=866441 GT:DP:GQ:MIN_DP:PL 0/0:21:0:21:0,0,402
We looked at the corresponding bam files and there seems to be good coverage for these positions. Moreover, if HaplotypeCaller is run in BP_RESOLUTION mode, then these positions are present in the resulting vcf. Any idea what might be going wrong?