How does HaplotypeCaller discriminate between heterozygous and homozygous variants?
Dear members of the GATK team
I am using different GATK modules to detect some SNPs in my RNASeq data set. I did a test run for one individual to get an idea about the output of HaplotypeCaller. I know that I still need to filter my variants, but nevertheless I was wondering how HaplotypeCaller set the variant to heterozygous or homozygous. There must be another parameter to take into account (other than the AD values). Am I right?
Here is an example:
0|*|TRINITY_DN53108_c0_g1::TRINITY_DN53108_c0_g1_i1::g.132814::m.132814 7333 . G T 42.77 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=1.644;ClippingRankSum=0.000;DP=21;ExcessHet=3.0103;FS=3.109;MLEAC=1;MLEAF=0.500;MQ=42.00;MQRankSum=0.000;QD=2.04;ReadPosRankSum=0.629;SOR=0.132 GT:AD:DP:GQ:PL 0/1:18,3:21:71:71,0,703
The genotype is 0/1 (G/T) and the AD is 18 to 3. Actually, I would say that this homozygous.
Before I mapped the reads to the reference, I filtered the reads with FastQC and did other processing steps like adapter trimming. I also marked and removed duplicated reads from the BAM file. So, my reads are processed correctly (I would say) and I could trust the final reads.
Nevertheless, with a ration of 18:3, I would still suggest a homozygous variant (just based on the AD values). I would change my mind if there is another value which is important for the decision or if one can say: "If you trust your read files, than this ration is still a reliable result for heterozygous variants.".
But still: If I doubt the files:
Is there any possibility to filter the variants based on their AD values? An example would be to filter out all heterozygous variants which are below the ratio of 30% : 70%?
Thanks in advance for your reply and I am looking forward to your answers.