HaplotypeCaller on 2000 samples, feasible?

tommycarstensentommycarstensen United KingdomPosts: 153Member

I have previously tested HaplotypeCaller (GATK 2.7) on 100 samples. It takes a long time to run compared to UnifiedGenotyper. Is it feasible to use HaplotypeCaller on 2000 samples? Can I make it run faster other than by using multi-threading? Would it be an advantage for me to run ReduceReads prior to variant calling? I know the details are sparse, but I currently do not have any additional information. Thank you.

Tagged:

Best Answers

Answers

  • tommycarstensentommycarstensen United KingdomPosts: 153Member

    Geraldine, thank you for your answer. I have heard from multiple sources, that HC was one of the tools used for calling SNPs for phase III of the 1000G project; i.e. 2500 samples. Would you happen to know, what procedure was used to scale to this number of samples?

    It's almost like growing up hearing about the awesome dance moves of the phantom dancer "BreakArray", but nobody has ever witnessed him in action. It would be great, if you or someone else can enlighten me. Thank you.

  • tommycarstensentommycarstensen United KingdomPosts: 153Member

    Geraldine, is your new approach for single sample variant discovery with per-site-across-samples-likelihood analysis ready? I would like to test it, if a beta is available in a developer branch.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,822Administrator, GATK Developer admin

    Hi Tommy,

    We've got it working internally and are now finalizing the last details, tweaks etc. before we release it. I'll check if it's beta-testable in the public nightlies but I think there are still a few pieces that aren't accessible. FYI we plan to release 3.0 in the next couple of weeks.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.