We've moved!
You can find our new documentation site and support forum for posting questions here.

germline variant calling workflow

This discussion was created from comments split from: New to the forum? Ask your questions here!.


  • lzhan140lzhan140 Member

    Variant Filter
    Hi GATK team,

    I'm testing with the germline variant calling workflow. After the GenotypeGVCFs, should I use VQSR or the hard filter to filter the cohort vcf, or extract variant for each sample to an individual vcf and apply the filters?

    I noticed some values in the info column are calculated for all samples, like DPin info seems like sum of all individual DPs. Would it be still right, if my resource files only contains 1 sample, and I use this to train and filter my cohort vcf?


  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @lzhan140

    I have moved this question to the firecloud forum and @SChaluvadi will be able to help you out with it.


  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @lzhan140 I am working on getting you an answer to your question and will get back to you!

    This Ticket has been deleted from Zendesk
  • lzhan140lzhan140 Member

    Great, appreciate it @SChaluvadi

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @lzhan140 Thank you for your patience!

    VQSR works better when you run it on calls from multiple samples so using your single cohort vcf will probably yield more accurate models than if you were to separate your samples as individual vcfs and then run VQSR. If you would like more details, here is a great blogpost that describes in both high-level and more technical language about the inner workings of VQSR. Additionally, if you require, here is a post that explains in more detail about some caveats to combining samples such that they are a coherent cohort.

    Regarding your question about DP - This document describes which training sets and arguments GATK Best practices suggests for training using VQSR. In this document, it is listed that, for exome data, DP should not be used due to variation in depth. Therefore, in your case, I suppose it would be okay to use the resource that you have but I would recommend reading through the best practices document anyway!

    I hope I was able to address your questions but if not please reply back with any follow-ups you might have!

  • lzhan140lzhan140 Member

    @SChaluvadi Thanks a lot! Perfectly answered my question.

Sign In or Register to comment.