We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

How does HaplotypeCaller discriminate between heterozygous and homozygous variants?

Dear members of the GATK team

I am using different GATK modules to detect some SNPs in my RNASeq data set. I did a test run for one individual to get an idea about the output of HaplotypeCaller. I know that I still need to filter my variants, but nevertheless I was wondering how HaplotypeCaller set the variant to heterozygous or homozygous. There must be another parameter to take into account (other than the AD values). Am I right?

Here is an example:

0|*|TRINITY_DN53108_c0_g1::TRINITY_DN53108_c0_g1_i1::g.132814::m.132814 7333 . G T 42.77 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=1.644;ClippingRankSum=0.000;DP=21;ExcessHet=3.0103;FS=3.109;MLEAC=1;MLEAF=0.500;MQ=42.00;MQRankSum=0.000;QD=2.04;ReadPosRankSum=0.629;SOR=0.132 GT:AD:DP:GQ:PL 0/1:18,3:21:71:71,0,703

The genotype is 0/1 (G/T) and the AD is 18 to 3. Actually, I would say that this homozygous.
Before I mapped the reads to the reference, I filtered the reads with FastQC and did other processing steps like adapter trimming. I also marked and removed duplicated reads from the BAM file. So, my reads are processed correctly (I would say) and I could trust the final reads.

Nevertheless, with a ration of 18:3, I would still suggest a homozygous variant (just based on the AD values). I would change my mind if there is another value which is important for the decision or if one can say: "If you trust your read files, than this ration is still a reliable result for heterozygous variants.".

But still: If I doubt the files:
Is there any possibility to filter the variants based on their AD values? An example would be to filter out all heterozygous variants which are below the ratio of 30% : 70%?

Thanks in advance for your reply and I am looking forward to your answers.


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi Julia,

    There are indeed other factors that HaplotypeCaller takes into account. Have a look in the Methods and Algorithms section for step by step articles about the way HaplotypeCaller works.

    As for filtering on AD values, we don't recommend it as the tool does take into account other factors than just allele frequency. For example, a site could have a bunch reads that support the ref allele but have low base quality. And, there could be just a few reads that support an alternate allele, but they have very high base quality.

    I hope that helps.


Sign In or Register to comment.