Hi there,

We're seeing some annotation values varying more than we would have expected when the downsample to coverage setting is adjusted. In the following example, BaseQRankSum, FS and ReadPosRankSum change a fair bit. Is this expected behavior? Any suggestions as to why this might happen?

Downsample = 250

chr19 1228412 rs142990629 AAAGCTTGGG  A 3444.73 . AC=1;AF=0.500;AN=2; BaseQRankSum=-1.674; ClippingRankSum=2.617;DB;DP=514;FS=0.000; MLEAC=1; MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=1.059;QD=6.70;ReadPosRankSum=0.173 GT:AD:DP:GQ:PL  0/1:355,108:463:99:3482,0,36818

Downsample = 2000

chr19 1228412 rs142990629 AAAGCTTGGG  A 3811.73 . AC=1; AF=0.500; AN=2; BaseQRankSum=1.395;ClippingRankSum=2.358;DB;DP=532;FS=2.961;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=1.443;QD=7.16;ReadPosRankSum=2.027  GT:AD:DP:GQ:PL  0/1:380,119:499:99:3849,0,39507


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Sorry for the late response -- were these called with UG or HC, and which version?

  • lmoselmose Member

    These were called with HC. Definitely 3.X, although I'm not sure if it was 3.1 or 3.2.

  • freeseekfreeseek Member

    I have the exact same problem. I had tried the "-dt None" and the "-dcov 10000" options with the HaplotypeCaller and realized that nothing changed and also the "--maxReadsInRegionPerSample 10000". Read count is never above a few hundred reads. With the UnifiedGenotyper, the counting is on the thousands. This is extremely deceptive as there is no mention of this behavior in the HaplotypeCaller documentation (https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php). I actually need to stop the downsampling because I am trying to call heteroplasmic mutations in the mitochondrial genome. Any workarounds or should I just give up on the HaplotypeCaller for this? I am using GATK v3.3-0-g37228af.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @freeseek Apologies for the lack of clear documentation on the non-straightforward behavior of downsampling in HaplotypeCaller. You may be better off using UnifiedGenotyper in this case; though I wonder if you might not even benefit from a somatic caller like MuTect instead. It's not a use case I'm familiar with so I can't provide more help than that, sorry.

  • mjtivmjtiv Newark, DEMember

    Slightly confused by how "downsampling" works. HaplotypeCaller downsamples an active regions to a coverage of 500. The part I am getting confused at is if you look at a VCF's Allele Depth (AD) field, the counts for alleles are sometimes much larger than 500? How does this occur if there is downsampling? Specifically referencing GATK 3.7 Haplotypecaller in case newer versions have tweaked things.

