The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.10.2 is now available. As of 2.10.0, Picard supports NovaSeq CBCL data. Download and read release notes at
**GATK4-BETA.2** is here. That's TWO, as in the second beta release. Be sure to read about the known issues before test driving. See Article#9881 to start and for details.

HaplotypeCaller Multisample Variant Calling

Hey there!

I've been using HaplotypeCaller as part of a new whole genome variant calling pipeline I'm working on and I had a question about the number of samples to use. From my tests, it seems like increasing the number of samples to run HaplotypeCaller on simultaneously improves the accuracy no matter how many samples I add (I've tried 1, 4, and 8 samples at a time). Before I tried 16 samples, I was wondering if you could tell me if there's a point of diminishing returns for adding samples to HaplotypeCaller. It seems like with every sample I add, the time/sample increases, so I don't want to keep adding samples if it's not going to result in an improved call set, but if it does improve the results I'll deal with it and live with the longer run times. I should note that I'm making this pipeline for an experiment where there will be up to 50 individuals, and of those, there are family groups of 3-4 people. If running HaplotypeCaller on all 50 simultaneously would result in the best call set, that's what I'll do. Thanks! (By the way, I love the improvements you made with 2.5!)

  • Grant

Best Answers


  • Thanks a ton Geraldine,

    This was really helpful. I guess I'll have to experiment a bit more. I'm usually working with around 20x coverage so I was wondering if that 100 sample approximation was with similar coverage. If so, that should work out well for the short term and I look forward to what comes in 2.6!

  • Thank you again for your suggestions. For now it looks like I can just keep increasing sample counts for a while, but if I hit any hiccups I'll tweak those defaults :)

  • I've begun work testing the rate of diminishing returns for my data and I have a question. How do you determine the quality of a call set produced by HaplotypeCaller? I've noticed in some figures (like this ones on this page that you just put "True positive rate" or "False positive rate", but it's not clear (at least to me) how you derived those values. I know of some QC metrics you can use like Ti/Tv ratios, but I was wondering what you use at Broad to evaluate these tools so I know if I'm heading in the right direction. Sorry to bother you again, and thanks for all of the help so far.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Grant,

    Call set quality evaluation is a complex topic. The basic way we calculate false vs. true positives is to compare calls to a database of highly curated calls which we use as "truth" data. Here, the selection of the truth data is key to the validity of the comparison, of course. We have some internal resources for this, as well as some public resources such as the datasets provided in our resource bundle. They are described (with an estimate or their reliability) in the FAQ article on VQSR training/truth datasets.

  • Hi,

    Time increases when you add samples, but what about virtual memory used?!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator



    I am assuming you are asking about RAM. RAM does demand an increase as a function of sample number because more data will need to be loaded into memory for processing. This is one of the reasons why the single-sample/GVCF workflow is better than classic multisample calling. Please read more about it here:


Sign In or Register to comment.