Multi-allelic sites being dropped in PhaseByTransmission?

In switching to the 2.x series of GATK, I noticed that PBT now drops multi-allelic sites entirely from the output. Shouldn't the correct behavior be to write them out unmodified? Or is there a specific reason multi-allelic sites are not being written out?

Specifically, here is the current code

if (vc == null || !vc.isBiallelic())
    return metricsCounters;

But I think it should be something like this...

if (vc == null)
    return metricsCounters;
if (!vc.isBiallelic()) {
    return metricsCounters;

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Sorry to get to your question so late, it got dropped during a shift change.

    I agree that it would make sense to write out multi-allelic sites as unmodified rather than drop them... I'll ask if the author of PBT, @Laurent, can shed some light on this.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Laurent, thanks for answering! Yes, we think it would be preferable to have the multi-allelic sites included in the output by default. Although it might make sense to give the option to omit them from the output using a flag in the command... but that's up to you.

    Good to hear you're working on supporting multi-allelic sites. Good luck!

  • mlindermmlinderm Member

    Thanks for the responses. The change is so small, it is probably not worth submitting a patch...

