If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
How does HaplotypeCaller discriminate between heterozygous and homozygous variants?
Dear members of the GATK team
I am using different GATK modules to detect some SNPs in my RNASeq data set. I did a test run for one individual to get an idea about the output of HaplotypeCaller. I know that I still need to filter my variants, but nevertheless I was wondering how HaplotypeCaller set the variant to heterozygous or homozygous. There must be another parameter to take into account (other than the AD values). Am I right?
Here is an example:
0|*|TRINITY_DN53108_c0_g1::TRINITY_DN53108_c0_g1_i1::g.132814::m.132814 7333 . G T 42.77 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=1.644;ClippingRankSum=0.000;DP=21;ExcessHet=3.0103;FS=3.109;MLEAC=1;MLEAF=0.500;MQ=42.00;MQRankSum=0.000;QD=2.04;ReadPosRankSum=0.629;SOR=0.132 GT:AD:DP:GQ:PL 0/1:18,3:21:71:71,0,703
The genotype is 0/1 (G/T) and the AD is 18 to 3. Actually, I would say that this homozygous.
Before I mapped the reads to the reference, I filtered the reads with FastQC and did other processing steps like adapter trimming. I also marked and removed duplicated reads from the BAM file. So, my reads are processed correctly (I would say) and I could trust the final reads.
Nevertheless, with a ration of 18:3, I would still suggest a homozygous variant (just based on the AD values). I would change my mind if there is another value which is important for the decision or if one can say: "If you trust your read files, than this ration is still a reliable result for heterozygous variants.".
But still: If I doubt the files:
Is there any possibility to filter the variants based on their AD values? An example would be to filter out all heterozygous variants which are below the ratio of 30% : 70%?
Thanks in advance for your reply and I am looking forward to your answers.