Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Joint/batch variant calling when too few variants per sample

Hi there,

We are sequencing a set of regions that covers about 1.5 megabases in total. We're running into problems with VQSR -- VariantRecalibrator says there are too few variants to do recalibration. To give a sense of numbers, in one sample we have about 3000 SNVs and 600 indels.

We seem to have too few indels to do VQSR on them and have a couple of questions:

  1. Can we combine multiple samples to increase the number of variants, or does VariantRecalibrator need to work on each sample individually?

  2. If we do not use VQSR for indels, should we also avoid VQSR for the SNPs?

  3. The other question is whether joint or batch variant calling across several samples would help us in this case?

Thanks in advance!

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    VQSR definitely works better on multisample cohorts than on single samples. For exomes we recommend processing at least 30 samples together. If you do not have enough samples in your cohort, you can get additional exomes from the 1000 Genomes project, call them together with your samples and then recalibrate them all together.

    If you're still having trouble with the indels, it's OK to do VQSR on the SNPs but hard-filter the indels, as long as you analyze the results separately.

Sign In or Register to comment.