To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

HaplotypeCaller Multisample Variant Calling

Hey there!

I've been using HaplotypeCaller as part of a new whole genome variant calling pipeline I'm working on and I had a question about the number of samples to use. From my tests, it seems like increasing the number of samples to run HaplotypeCaller on simultaneously improves the accuracy no matter how many samples I add (I've tried 1, 4, and 8 samples at a time). Before I tried 16 samples, I was wondering if you could tell me if there's a point of diminishing returns for adding samples to HaplotypeCaller. It seems like with every sample I add, the time/sample increases, so I don't want to keep adding samples if it's not going to result in an improved call set, but if it does improve the results I'll deal with it and live with the longer run times. I should note that I'm making this pipeline for an experiment where there will be up to 50 individuals, and of those, there are family groups of 3-4 people. If running HaplotypeCaller on all 50 simultaneously would result in the best call set, that's what I'll do. Thanks! (By the way, I love the improvements you made with 2.5!)

  • Grant

Best Answers


  • Thanks a ton Geraldine,

    This was really helpful. I guess I'll have to experiment a bit more. I'm usually working with around 20x coverage so I was wondering if that 100 sample approximation was with similar coverage. If so, that should work out well for the short term and I look forward to what comes in 2.6!

  • Thank you again for your suggestions. For now it looks like I can just keep increasing sample counts for a while, but if I hit any hiccups I'll tweak those defaults :)

  • I've begun work testing the rate of diminishing returns for my data and I have a question. How do you determine the quality of a call set produced by HaplotypeCaller? I've noticed in some figures (like this ones on this page that you just put "True positive rate" or "False positive rate", but it's not clear (at least to me) how you derived those values. I know of some QC metrics you can use like Ti/Tv ratios, but I was wondering what you use at Broad to evaluate these tools so I know if I'm heading in the right direction. Sorry to bother you again, and thanks for all of the help so far.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Grant,

    Call set quality evaluation is a complex topic. The basic way we calculate false vs. true positives is to compare calls to a database of highly curated calls which we use as "truth" data. Here, the selection of the truth data is key to the validity of the comparison, of course. We have some internal resources for this, as well as some public resources such as the datasets provided in our resource bundle. They are described (with an estimate or their reliability) in the FAQ article on VQSR training/truth datasets.

  • Hi,

    Time increases when you add samples, but what about virtual memory used?!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator



    I am assuming you are asking about RAM. RAM does demand an increase as a function of sample number because more data will need to be loaded into memory for processing. This is one of the reasons why the single-sample/GVCF workflow is better than classic multisample calling. Please read more about it here:


Sign In or Register to comment.