Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Cannot get HaplotypeCaller to output phase information in pooled multi-sample calling mode.

egafniegafni Member
edited November 2016 in Ask the GATK team

There is currently no way to run HaplotypeCaller with pooled multi-sample calling and still get phasing output. By pooled multi-sample calling, I'm referring to running HaplotypeCaller to simultaneously call multiple samples.

For example:
HaplotypeCaller
-I sampel1.bam
-I sample2.bam

There are two scenarios we have attempted which do not work:
1) If HaplotypeCaller is run with emitRefConfidence NONE, it simply does not output any phasing information (seems like a bug).
2) HaplotypeCaller is run with emitRefConfidence GVCF, it does not support pooled multi-sample calling (i.e. the BAMs must only contain reads from one sample per invocation).

We have seen pooled multi-sample calling perform significantly better but would also greatly benefit from HaplotypeCaller's phasing information. Is there a way to get both?

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Unfortunately what you're trying to do is not currently possible. The phasing behavior is hooked up to the GVCF generation machinery; it would not be practical to change this behavior in the current framework. However you can perform post-hoc phasing by running ReadBackedPhasing.

    Btw, we don't call this "pooled calling"; that term is reserved to describe experiments where samples are pooled at or before the library preparation level (either by directly pooling tissue or DNA) before barcoding, so individuals' sequence cannot be distinguished. What you describe is simply called "multisample calling".

Sign In or Register to comment.