It looks like you're new here. If you want to get involved, click one of these buttons!
Hello,
I've just made a long needed update to the most recent version of GATK. I had been toying with the variant quality score recalibrator before but now that I have a great deal more exomes at my disposal I'd like to fully implement it in a meaningful way.
The phrase I'm confused about is "In our testing we've found that in order to achieve the best exome results one needs to use an exome callset with at least 30 samples." How exactly do I arrange these 30+ exomes?
Is there any difference or reason to choose one of the following two workflows over the other?
Input 30+ exomes in the "-I" argument of either the UnifiedGenotyper or HaplotypeCaller and then with my multi-sample VCF perform the variant recalibration procedure and then split the individual call sets out of the multi-sample vcf with SelectVariants?
Take 30+ individual vcf files, merge them together, and then perform variant recalibration on the merged vcf and then split the individual call sets out of the multi-sample vcf with SelectVariants?
Or some third option I'm missing
Any help is appreciated.
Thanks
Answers
Hi there,
Your question is not so much about variant recalibration as it is about how to call variants on a cohort of multiple samples, which is addressed earlier in the same Best Practices document. (Basically, you don't need to merge the input files.)
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •