Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

HaplotypeCaller and GenotypeGVCFs sensibility on heterozygous variants

PascualPascual Las PalmasMember
edited July 2017 in Ask the GATK team

Hello, I recently compared results from GATK best practices (bwa, Picard, HaplotypeCaller, GenotypeGVCFs) with a snp array set (a high confident known variant detection method) for 6 samples (data from Illumina Hiseq2500) and got a really interesting confusion matrix.

gatk\snp-array wild het hom
wild 109,575 20,122 63
het 60 44,579 28
hom 378 26,493 28,402

This means that GATK (as any other caller), has troubles by calling heterozygous variants. We are discussing the causes of this phenomenon and how HC+GG deal with it.

At first we though it is a DP problem and yes, it is: when filtering variants with DP>20 het column transformed in:

gatk\snp-array wild het hom
wild 46,323 1524 42
het 22 32,337 14
hom 273 1325 9207

This means that the proportion of ref/alt bases is critical when calling heterozygous variants.

We hope you can give us more ideas on the causes of this problem and how can we move those wild-called het variants to called variants, even at the cost of getting more false positives.

We used bwa 0.7.10-r789 and gatk 3.7-0-gcfedb67

Best Answer


Sign In or Register to comment.