Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

HaplotypeCaller on 2000 samples, feasible?

tommycarstensentommycarstensen Posts: 49Member

I have previously tested HaplotypeCaller (GATK 2.7) on 100 samples. It takes a long time to run compared to UnifiedGenotyper. Is it feasible to use HaplotypeCaller on 2000 samples? Can I make it run faster other than by using multi-threading? Would it be an advantage for me to run ReduceReads prior to variant calling? I know the details are sparse, but I currently do not have any additional information. Thank you.


Best Answers


  • tommycarstensentommycarstensen Posts: 49Member

    Geraldine, thank you for your answer. I have heard from multiple sources, that HC was one of the tools used for calling SNPs for phase III of the 1000G project; i.e. 2500 samples. Would you happen to know, what procedure was used to scale to this number of samples?

    It's almost like growing up hearing about the awesome dance moves of the phantom dancer "BreakArray", but nobody has ever witnessed him in action. It would be great, if you or someone else can enlighten me. Thank you.

  • tommycarstensentommycarstensen Posts: 49Member

    Geraldine, is your new approach for single sample variant discovery with per-site-across-samples-likelihood analysis ready? I would like to test it, if a beta is available in a developer branch.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Hi Tommy,

    We've got it working internally and are now finalizing the last details, tweaks etc. before we release it. I'll check if it's beta-testable in the public nightlies but I think there are still a few pieces that aren't accessible. FYI we plan to release 3.0 in the next couple of weeks.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.