The front line support team will be unavailable to answer questions until May 27th 2019 as we are celebrating Memorial Day. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!
HaplotypeCaller and GenotypeGVCFs sensibility on heterozygous variants
Hello, I recently compared results from GATK best practices (bwa, Picard, HaplotypeCaller, GenotypeGVCFs) with a snp array set (a high confident known variant detection method) for 6 samples (data from Illumina Hiseq2500) and got a really interesting confusion matrix.
This means that GATK (as any other caller), has troubles by calling heterozygous variants. We are discussing the causes of this phenomenon and how HC+GG deal with it.
At first we though it is a DP problem and yes, it is: when filtering variants with DP>20 het column transformed in:
This means that the proportion of ref/alt bases is critical when calling heterozygous variants.
We hope you can give us more ideas on the causes of this problem and how can we move those wild-called het variants to called variants, even at the cost of getting more false positives.
We used bwa 0.7.10-r789 and gatk 3.7-0-gcfedb67