We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

HaplotypeCaller on 2000 samples, feasible?

tommycarstensentommycarstensen United KingdomMember ✭✭✭

I have previously tested HaplotypeCaller (GATK 2.7) on 100 samples. It takes a long time to run compared to UnifiedGenotyper. Is it feasible to use HaplotypeCaller on 2000 samples? Can I make it run faster other than by using multi-threading? Would it be an advantage for me to run ReduceReads prior to variant calling? I know the details are sparse, but I currently do not have any additional information. Thank you.


Best Answers


  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    Geraldine, thank you for your answer. I have heard from multiple sources, that HC was one of the tools used for calling SNPs for phase III of the 1000G project; i.e. 2500 samples. Would you happen to know, what procedure was used to scale to this number of samples?

    It's almost like growing up hearing about the awesome dance moves of the phantom dancer "BreakArray", but nobody has ever witnessed him in action. It would be great, if you or someone else can enlighten me. Thank you.

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    Geraldine, is your new approach for single sample variant discovery with per-site-across-samples-likelihood analysis ready? I would like to test it, if a beta is available in a developer branch.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Tommy,

    We've got it working internally and are now finalizing the last details, tweaks etc. before we release it. I'll check if it's beta-testable in the public nightlies but I think there are still a few pieces that aren't accessible. FYI we plan to release 3.0 in the next couple of weeks.

Sign In or Register to comment.