Service note: Geraldine is on vacation this week; other members of GSA will be responding to questions, but they have a lot of work besides this, so be aware that responses may be a little slower than usual. Thank you for your patience.

Variant Recalibration - Number of Whole Exome Samples Needed and Where?

darbrobdarbrob Posts: 2Member

Hello,

I've just made a long needed update to the most recent version of GATK. I had been toying with the variant quality score recalibrator before but now that I have a great deal more exomes at my disposal I'd like to fully implement it in a meaningful way.

The phrase I'm confused about is "In our testing we've found that in order to achieve the best exome results one needs to use an exome callset with at least 30 samples." How exactly do I arrange these 30+ exomes?

Is there any difference or reason to choose one of the following two workflows over the other?

  1. Input 30+ exomes in the "-I" argument of either the UnifiedGenotyper or HaplotypeCaller and then with my multi-sample VCF perform the variant recalibration procedure and then split the individual call sets out of the multi-sample vcf with SelectVariants?

  2. Take 30+ individual vcf files, merge them together, and then perform variant recalibration on the merged vcf and then split the individual call sets out of the multi-sample vcf with SelectVariants?

  3. Or some third option I'm missing

Any help is appreciated.

Thanks

Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 2,239Administrator, GSA Official Member admin

    Hi there,

    Your question is not so much about variant recalibration as it is about how to call variants on a cohort of multiple samples, which is addressed earlier in the same Best Practices document. (Basically, you don't need to merge the input files.)

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.