The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

Maximum size for HaplotypeCaller

eflynn90eflynn90 Washington DCMember

I'm running HaplotypeCaller, and I was wondering what a recommended cohort size was. Would 30 to 40 families (100 to 150 individuals) be too large? Is there a maximum recommended size for number of independent alleles/number of individuals? How does the run time scale with additional individuals?



Best Answers


  • eflynn90eflynn90 Washington DCMember
    edited April 2014

    Thank you, very interesting!

    A question about GVCF mode:

    • Does it output a genotype, PHRED-based genotype likelihoods, and allellic depths at every position where there are any aligned reads?

    Questions about GenotypeGVCFs:

    • Is it able to merge similar INDELs into one row in the VCF file?
    • Can it call different genotypes for an individual at the same position if the cohort is different?
  • eflynn90eflynn90 Washington DCMember

    One more question: Is there any way to make this workflow take relationships in a ped file into account when calling genotypes?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    In a nutshell, yes on all points.

    • I'm working on a new doc that explains the GVCF format and how the underlying reference model works, should be ready in a day or two.

    • They will be merged into a single record with different indel alleles.

    • GenotypeGVCFs basically replicates the effect of multisample calling but without the computational cost.
  • eflynn90eflynn90 Washington DCMember

    Okay, thank you. I'll try to be patient and wait for the documentation :-)

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    No, the caller will not consider ped data. There are some post-processing tools to refine genotypes based on that info.

  • eflynn90eflynn90 Washington DCMember

    What are the post-processing tools? Are you referring to PhaseByTransmission?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Yes that's right.

  • eflynn90eflynn90 Washington DCMember

    Hi, I have a followup question about genotypeGVCFs. Is there an advantage to running it with a ethnically-matched cohort of individuals vs running it on an individual or a family at a time? Is it population-aware?


  • eflynn90eflynn90 Washington DCMember

    Great, thanks!

  • eflynn90eflynn90 Washington DCMember

    Hi, another followup question. I work with a lab that accumulates samples over time. Would it be fine to just re-genotype everyone together each time we get a new batch of samples, or should we try to separate out samples from different ethnicities into different genotyping cohorts? Does the presence of non-ethnically matched samples disturb anything?

    For example, we could genotype 70 African American individuals together. Or, we could genotype them along with 30 Asian individuals and 300 European American individuals. Will the first method produce better results than the second?

Sign In or Register to comment.