Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
CombineGVCFs assigns incorrect reference allele in GATK3.7 and GATK4
I'm analyzing 284 exomes using GATK 3.7-0-gcfedb67. My workflow is to run HaplotypeCaller on each individual exome split into 32 bed files. After HaplotypeCaller I combine the 32 split g.vcf files using CombineGVCFs to create a single g.vcf per individual before I genotype all individuals together using GenotypeGVCFs. In my last bed file, I’ve run into an issue where a site that is missing from the post-HaplotypeCaller g.vcf is incorporated into the combined g.vcf file with the wrong reference allele. For other individuals, this site is present in the post-HaplotypeCaller g.vcf, and for those individuals the site is assigned the correct reference allele after CombineGVCFs. This discordance in reference alleles leads to problems downstream with GenotypeGVCFs because it sees multiple reference alleles and throws an error. I used the same genome reference for all steps of the pipeline, and I haven’t been able to find a discussion on the GATK forum that solves this issue. I’m using GATK 3.7 for this but tried to run CombineGVCFs in GATK 4 and the output file had the same issue. Any help on this would be appreciated. Thanks!