HaplotypeCaller and GenotypeGVCFs sensibility on heterozygous variants
Hello, I recently compared results from GATK best practices (bwa, Picard, HaplotypeCaller, GenotypeGVCFs) with a snp array set (a high confident known variant detection method) for 6 samples (data from Illumina Hiseq2500) and got a really interesting confusion matrix.
This means that GATK (as any other caller), has troubles by calling heterozygous variants. We are discussing the causes of this phenomenon and how HC+GG deal with it.
At first we though it is a DP problem and yes, it is: when filtering variants with DP>20 het column transformed in:
This means that the proportion of ref/alt bases is critical when calling heterozygous variants.
We hope you can give us more ideas on the causes of this problem and how can we move those wild-called het variants to called variants, even at the cost of getting more false positives.
We used bwa 0.7.10-r789 and gatk 3.7-0-gcfedb67