We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
VariantFiltration and VariantRecalibrator

Hello,
I have 2 questions:
I want to use the VariantFiltration after HaplotypeCaller. There are different filtering recommendations for SNPs and INDELs. How should I run the VariantFiltration? Should I create a separated vcf for SNPs and INDELs or is there a mode to use?
The second question is regarding VariantRecalibrator. It is recommended to run it with several samples. What does it mean - these samples should be family related or should they be from the same sequencing run? Since, currently I run each sample separately, but I do have many samples to analyze. All were sequenced at the same sequencing platform, by the same lab, but not necessarily at the same sequencing run. Can I use all together for the VariantRecalibrator?
Many thanks for your help,
Lily
Answers
@Lily
Hi Lily,
You can look here for a nice tutorial on how to use Variant Filtration: http://gatkforums.broadinstitute.org/discussion/2806/howto-apply-hard-filters-to-a-call-set
As for Variant Recalibrator, it is best to run with many samples because then the model it generates will be more accurate. The samples do not have to be family related or from the same sequencing run because the previous steps have already taken care of potential errors and discrepancies from those.
In your case, it is fine to use all of your samples together in VQSR even if they were run separately.
-Sheila
Hi Sheila,
Thank you very much for your answer.
The link for the Variant Filtration was really clear and helpful.
I still have a question about the Variant Recalibrator - and I apologize if it appears in the tutorial - I couldn't find a clear answer....
How should I run the samples together? Should I run them all together at the HaplotypeCaller to create one vcf, or should I combine them after the HaplotypeCaller using the combineVariants or should all vcf should be given as inputs to the Variant Recalibrator or something else should be done?
Hi Lily,
If you're using a 3.x version of GATK (and you should) you will run HC on your samples individually to generate GVCFs, then run all your GVCFs through GenotypeGVCFs together. This will produce a multi-sample VCF that you can then put through VQSR.
Hi Geraldine,
Thank you - I followed the above and it works fine.
However, I want to query about the recommended practice:
In my routine work, I will get few additional samples from time to time, Is there a way to add samples to an existing multi-sample VCF file?
And, after using VQSR, are there tools for splitting the file back into samples, in order to run downstream tools for adding annotations?
Thanks a lot for your help,
Lily
@Lily
Hi Lily,
Our new GVCF pipeline is designed specifically for adding in more samples as they arrive. You can simply save the gvcfs you produce for each sample, then when you get new samples, you will produce new gvcfs for them and combine the gvcfs into a new vcf. Please refer here for more information: http://gatkforums.broadinstitute.org/discussion/3893/calling-variants-on-cohorts-of-samples-using-the-haplotypecaller-in-gvcf-mode
After VQSR, you can use SelectVariants to split the file back into samples. Please read about it here: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.html
-Sheila