Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Deprecated command-line argument?

Hello GATK Team,

I am following a workflow that assumes the user is working with GATK 2.2-16. I am, however, using the current GATK version. Curious, is -V:variant,VCF vcfFile still a valid way to call a command when using VariantAnnotator on the latest GATK release?

Thanks!

Best Answers

Answers

  • NickNick Member

    Sounds good. I already ran the walker using the way I described and it seemed to work, so I was curious. Thanks!

  • NickNick Member

    Thanks, I've also seen this used with the -B argument. I can't find any documentation on it, though.

  • pdexheimerpdexheimer Member ✭✭✭✭

    That one's long gone

  • NickNick Member

    Haha, good to know!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @Nick, is there any reason you are using such an old workflow? Our recommendations have changed a lot, and I'm worried you may not be benefiting from our latest work if you're not following our recent recommendations.

  • NickNick Member
    edited March 2015

    I am following the documentation on the GATK website, changing things that require updating. The old workflow is an outline-- i.e, MarkDuplicates > RealignerTargetCreator > IndelRealigner > UnifiedGenotyper (on snps) > VariantAnnotator (on snps) > UnifiedGenotyper (on indels) > VariantFiltration (around indels) > VariantFiltration (on snps) > VariantRecalibrator (on snps) > ApplyRecalibration (on snps). This is in a non-model context, and I am just interested in high quality, filtered snps. Would you suggest any additional steps or modifications?

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, I see. I would recommend switching to HaplotypeCaller instead of UnifiedGenotyper, it will give you much better calls and will reduce the number of steps since it calls SNPs and indels jointly.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Nick
    Hi,

    We do not recommend using Unified Genotyper anymore. We recommend using Haplotype Caller for variant calling. You can use Haplotype Caller to call both SNPs and Indels at the same time. Variant Annotator can then be used to add any annotations that you may need for downstream analysis or Variant Recalibration. You can use VariantRecalibrator if you have enough samples, otherwise you can use hard filtering.

    -Sheila

  • NickNick Member
    edited April 2015

    Geraldine_VdAuwera, thank you so much for your suggestion. Just to clarify, here is my updated workflow as per your recommendation: MarkDuplicates > RealignerTargetCreator > IndelRealigner > HaplotypeCaller (on snps/indels) > VariantAnnotator (on snps) > VariantFiltration (on snps; or hard filter with grep 'PASS|^#'?) > BQSR (on filtered snps) > VariantRecalibrator (on recalibrated snps) > ApplyRecalibration (on recalibrated snps). Am I missing something? I am only working with 16 samples, so not sure if I should take into count what Shelia said regarding VariantRecalibrator or just hard filtering. You guys are awesome, btw. Many thanks :)

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @Nick,

    What do you mean exactly by "BQSR (on filtered snps)'? Are you trying to do some BQSR bootstrapping? Or is it a typo for VQSR?

    Our recommendation for VQSR is to pad your cohort with additional samples from e.g. 1000G or Exac if you don't have enough (~30). It's better than using hard-filtering.

    I'm not sure I understand why the annotation/filtering steps before the VariantRecalibrator. What are you trying to do there?

  • NickNick Member
    edited April 2015

    Hi @Geraldine_VdAuwera,

    My apologies for not being clear. I am using high quality, filtered SNPs generated from the variant caller as my "known" sites when using BaseRecalibrator, as there is no database of known SNPs for my organism. After looking over the workflow and doing a little homework, it looks like I should just stick with hard-filtering using VariantFiltration on the recalibrated variant call set. My workflow looks something like this now: MarkDuplicates > RealignerTargetCreator > IndelRealigner > HaplotypeCaller > VariantAnnotator > VariantFiltration > BQSR > HaplotypeCaller (using recalibrated bam) > VariantFiltration . Does this make sense? One other question, if I am just concerned with snps, can I use BQSR on just snps, excluding indels?

    Many thanks,
    Nick

    Post edited by Nick on
  • NickNick Member

    Also, sorry this thread ended up changing directions from the original post. I will be cognizant next time, and post in a new thread. Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Oh I see -- yes that looks fine.

    BQSR won't go substantially faster if you leave out indels, so you might as well run on everything.

  • NickNick Member
    edited April 2015
  • NickNick Member

    Hi @Geraldine_VdAuwera,

    When using BaseRecalibrator, is it fine to use a merged, realigned bam file (with @RG info incorporated) as opposed to individual bam files? I think I read somewhere that using a merged bam file lowers statistical power. I figured @RG info would take care of this, but I could be wrong. (After removing duplicates, all downstream GATK analyses were used with this merged, realigned bam file)

    Thanks,
    Nick

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @Nick As long as the read group IDs are properly assigned (to distinguish lanes of data), you can run on a merged file. Internally, the recalibrator will process read groups (as identified by their ID) separately. The only caveat is that if there are many read groups's worth of data merged together, processing the entire set will take more compute power & memory, so it might be slower -- that's why we like to parallelize by processing lanes' worth separately. But it changes nothing to the calculations.

  • NickNick Member
Sign In or Register to comment.