Maximum size for HaplotypeCaller

eflynn90eflynn90 Washington DCPosts: 56Member

I'm running HaplotypeCaller, and I was wondering what a recommended cohort size was. Would 30 to 40 families (100 to 150 individuals) be too large? Is there a maximum recommended size for number of independent alleles/number of individuals? How does the run time scale with additional individuals?

Thanks,
Elise

Tagged:

Best Answers

Answers

  • eflynn90eflynn90 Washington DCPosts: 56Member
    edited April 2014

    Thank you, very interesting!

    A question about GVCF mode:

    • Does it output a genotype, PHRED-based genotype likelihoods, and allellic depths at every position where there are any aligned reads?

    Questions about GenotypeGVCFs:

    • Is it able to merge similar INDELs into one row in the VCF file?
    • Can it call different genotypes for an individual at the same position if the cohort is different?
    Post edited by Geraldine_VdAuwera on
  • eflynn90eflynn90 Washington DCPosts: 56Member

    One more question: Is there any way to make this workflow take relationships in a ped file into account when calling genotypes?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,271Administrator, GATK Dev admin

    In a nutshell, yes on all points.

    • I'm working on a new doc that explains the GVCF format and how the underlying reference model works, should be ready in a day or two.

    • They will be merged into a single record with different indel alleles.

    • GenotypeGVCFs basically replicates the effect of multisample calling but without the computational cost.

    Geraldine Van der Auwera, PhD

  • eflynn90eflynn90 Washington DCPosts: 56Member

    Okay, thank you. I'll try to be patient and wait for the documentation :-)

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,271Administrator, GATK Dev admin

    No, the caller will not consider ped data. There are some post-processing tools to refine genotypes based on that info.

    Geraldine Van der Auwera, PhD

  • eflynn90eflynn90 Washington DCPosts: 56Member

    What are the post-processing tools? Are you referring to PhaseByTransmission?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,271Administrator, GATK Dev admin

    Yes that's right.

    Geraldine Van der Auwera, PhD

  • eflynn90eflynn90 Washington DCPosts: 56Member

    Hi, I have a followup question about genotypeGVCFs. Is there an advantage to running it with a ethnically-matched cohort of individuals vs running it on an individual or a family at a time? Is it population-aware?

    Thanks.

  • eflynn90eflynn90 Washington DCPosts: 56Member

    Great, thanks!

  • eflynn90eflynn90 Washington DCPosts: 56Member

    Hi, another followup question. I work with a lab that accumulates samples over time. Would it be fine to just re-genotype everyone together each time we get a new batch of samples, or should we try to separate out samples from different ethnicities into different genotyping cohorts? Does the presence of non-ethnically matched samples disturb anything?

    For example, we could genotype 70 African American individuals together. Or, we could genotype them along with 30 Asian individuals and 300 European American individuals. Will the first method produce better results than the second?

Sign In or Register to comment.