How does the "-aggregate" argument in VariantRecalibrator compare to more samples genotyped together

Dear GATK-team,

I've tried to search for the answer to this question on the guidelines and forums pages, but I haven't been able to figure it out. I apologize if I'm missing something that should be obvious from the documentation.

So, I'm familiar with the current best practices for DNA-seq variant discovery with HC, call GVCFs and VQSR, and the requirement to have ample data for building the model in VQSR. To get enough data, one might add in extra variants, which you recommend doing in the CALLING stage.

I have a "ploidy 20"-dataset of several hundred samples where calling for practical computational purposes needs to be done in batches to avoid memory crash. But I'd nevertheless like to use all the variants for optimal VQSR. It looks like this might be done with the --aggregate argument in VariantRecalibrator by adding in raw VCFs from all batches in that stage. Would this really differ significantly from a workflow where all samples were called together? Why is the "--aggregate" option never mentioned in your advice on how to achieve a VQSR-worthy dataset?

Thanks for a great resource and website
Best regards
Lasse

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Pihlstrom
    Hi Lasse,

    The -aggregate argument can be used to add training variants, however those variants will not be output in your VCF. I suspect you would like those variants to be output in your final VCF. I think the argument was probably added for development purposes, but we do not recommend using it for analysis.

    -Sheila

  • Thanks for your reply!
    But it's really not imperative to get all the variants in the same VCF, as long as I get the best possible VQSR-quality. So say I have eight multipsample VCFs generated by the same pipeline, couldn't I feed all of them into VariantRecalibrator (seven through --aggregate), and then use ApplyRecalibration with the same, "common" recal/tranches files on all eight individually?

    Best regards
    Lasse

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Pihlstrom
    Hi Lasse,

    Sure. That will work.

    -Sheila

  • Thanks a lot, Sheila - I get it! :smiley:

Sign In or Register to comment.