Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Multi-allelic sites being dropped in PhaseByTransmission?

In switching to the 2.x series of GATK, I noticed that PBT now drops multi-allelic sites entirely from the output. Shouldn't the correct behavior be to write them out unmodified? Or is there a specific reason multi-allelic sites are not being written out?

Specifically, here is the current code

if (vc == null || !vc.isBiallelic())
    return metricsCounters;

But I think it should be something like this...

if (vc == null)
    return metricsCounters;
if (!vc.isBiallelic()) {
    vcfWriter.add(vc);
    return metricsCounters;
}

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Sorry to get to your question so late, it got dropped during a shift change.

    I agree that it would make sense to write out multi-allelic sites as unmodified rather than drop them... I'll ask if the author of PBT, @Laurent, can shed some light on this.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Laurent, thanks for answering! Yes, we think it would be preferable to have the multi-allelic sites included in the output by default. Although it might make sense to give the option to omit them from the output using a flag in the command... but that's up to you.

    Good to hear you're working on supporting multi-allelic sites. Good luck!

  • mlindermmlinderm Member

    Thanks for the responses. The change is so small, it is probably not worth submitting a patch...

Sign In or Register to comment.