Lower GATK's HaplotypeCaller threshold for allele frequency as part of a variant calling pipeline

Hi!
As part of a variant calling pipeline i'm interested in lowering the threshold for allele frequency tolerance in GATK's HaplotypeCaller variant caller to 0.01 (1%), if possible. If not, is there another variant calling tool that doesn't fliter out variants with low allele frequencies?
Background: I know for a fact that there's at least one variant in my sample that doesn't appear in my VCF file if allele frequencies below 0.1 (10%) are filtered out during the variant calling as it's the default in some programs. I can see the variant when I inspect the corresponding bam file with samtools tview and another lab called that specific variant itself.
Thanks in advance,
Alon
Best Answers
-
Geraldine_VdAuwera Cambridge, MA admin
HC doesn't explicitly filter on allele frequency, but due to how the evaluation of probabilities are set up, it's unlikely that something so low would be called. You'd be better off either using an artificially high ploidy setting, or using a somatic variant caller like MuTect, which can handle that sort of frequency.
-
Sheila Broad Institute admin
@alons123
Hi Alon,When you have a higher ploidy, the detection rate will be set to a lower threshold. For example, when you have ploidy 2, the tool looks for alleles to be approximately present at 50% each. But, for ploidy 3, the tool will expect alleles to be present at approximately 33% each.
I hope this helps.
-Sheila
Answers
@alons123
Hi Alon,
Do you know if the variant is real, or are you assuming it is real from the IGV screenshot? Haplotype Caller does a realignment that may change the positions of the reads. You can use the bamout argument to see the bam file after realignment. https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php#--bamOutput
Thanks,
Sheila
Hi Sheila,
Thanks for the quick answer!
I'm now running HaplotypeCaller with the --bamoutput option as you suggested, waiting for the results.
Yes, i'm assuming it's real from the IGV screenshot but also because a lab we work with found that variant and had it in their VCF file. I think it has to do with the allele frequency as iv'e managed to find it with another program after I manually lowered it's allele frequency tolerance threshold argument.
Thanks,
Alon
HC doesn't explicitly filter on allele frequency, but due to how the evaluation of probabilities are set up, it's unlikely that something so low would be called. You'd be better off either using an artificially high ploidy setting, or using a somatic variant caller like MuTect, which can handle that sort of frequency.
Thank you Geralidne,
I'll probably start working with MuTect as you suggested.
One thing though, I don't quite understand why a high ploidy setting might solve it, can you explain please?
@alons123
Hi Alon,
When you have a higher ploidy, the detection rate will be set to a lower threshold. For example, when you have ploidy 2, the tool looks for alleles to be approximately present at 50% each. But, for ploidy 3, the tool will expect alleles to be present at approximately 33% each.
I hope this helps.
-Sheila
Thank you Sheila,
This helps tremendously, I completely understand now.
Thanks again everybody!