Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

1000Genome reference gvcf files for GenotypeGVCF

AswathyNAswathyN BangaloreMember

I am using GATK 2014.3-3.2.2-7-gf9cba99. I use GenotypeGVCF tool for joint genotying of my samples, where I consider 1000Genome reference gvcf files along with all the gvcf files of the batch. I observe that after this step when I split individual sample.vcf files there are more number of variants (1328 in my sample). But the same sample variants, after vqsr filteration, has 230 variants.

I ran the GenotypeGVCF step of the same batch of samples (gvcf files) without 1000Genome reference gvcf files. Then I got the same sample.vcf file with 419 variants after the spliting. The same file, after vqsr filteration, has 268 variants. My question is,

1)What is the impact of 1000Genome gvcf files in Joint genotyping?
2)Why the variant number is reduced after vqsr filteration, in the case where 1000Genome gvcf files were considered for joint genotyping?

Thanks in advance,



  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    It's difficult to say based on just this information. It sounds like the 1000 genome samples are causing you to call more potential variants, but after filtering the effect is minimal -- the results end up being fairly similar. That's actually a pretty good sign. You should check whether the calls that are different are marginal calls, and whether the technical profiles of the 1000 genome samples you used are similar to those of your own samples.

Sign In or Register to comment.