Bug: No PLs Produced in CombineGVCFs Leads to Error in GenotypeGVCFs

gtiaogtiao Cambridge, MAPosts: 9Member

I've run into the following bug while running GenotypeGVCFs:

##### ERROR MESSAGE: cannot merge genotypes from samples without PLs; sample <ID redacted> does not have likelihoods at position 1:1115551

The input file in question is a gVCF produced by merging a large number of smaller gVCFs using CombineGVCFs (all tasks were run using version 3.1). What's happening is that the position 1115551 doesn't exist in that particular sample:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  <Sample_ID>
1       1115550 .       AC      A,<NON_REF>     118.73  .       BaseQRankSum=-0.377;DP=15;MLEAC=1,0;MLEAF=0.500,0.00;MQ=60.72;MQ0=0;MQRankSum=-1.093;ReadPosRankSum=-0.811      GT:AD:DP:GQ:PL:SB       0/1:7,6,0:13:99:156,0,188,177,207,384:4,3,3,3
1       1115552 .       C       <NON_REF>       .       .       END=1115552     GT:DP:GQ:MIN_DP:PL      0/0:15:0:15:0,0,31

But when the sample is combined with other samples, that position gets filled in with a simple "0/0", without any PLs (or any of the other fields, including AD, DP, GQ, etc.), which causes the GenotypeGVCFs to choke.

I can imagine there might be other scenarios that will result in a "0/0" genotype field, so perhaps the easiest way to fix this would be to make sure that any "0/0" actually gets output as "./.:.:.:.:.".

Thanks,

Grace

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,672Administrator, GATK Developer admin

    Hi Grace,

    This matches another recent bug report, which is currently under review. Let me check with the devs if they need any test data. @vruano‌ may be in touch to follow up with you.

    Geraldine Van der Auwera, PhD

  • gtiaogtiao Cambridge, MAPosts: 9Member
  • valentinvalentin Posts: 17Member, GATK Developer mod

    Hi Grace,

    Could you post the information about the CombineGVCFs that produced that file? That would be the CombineGVCFs command line and input files.

    We just need a slice of those file around the problematic position (but including the header of course). You should be able to prepare it with SelectVariants (… e.g.. -T SelectVariants -R my-standard-ref.fasta -I my-gvcf-input1.vcf -L 1:1115051-1116051 ). The range can be as small as possible as long as the problem persists.

    Feel free to send it to me or Geraldine by private message.

    Cheers, V.

  • gtiaogtiao Cambridge, MAPosts: 9Member

    Hi, Valentin -- I just sent you a private message with the files and commands. Let me know if you have any follow-up questions about them. Thanks again for looking into this!

    Grace

  • valentinvalentin Posts: 17Member, GATK Developer mod

    Thanks Grace, Looking into it now. V.

  • valentinvalentin Posts: 17Member, GATK Developer mod

    How did you generate the multi-sample GVCFs? You run HaplotypeCaller on each sample independently and then did a first round of CombineGVCFs, right?

    I think that the problem might be related to the fact that 1115551 there is one redundant output line:

    In Bug_Slices.FinalMergedGVCF.gvcf.gz

    1       1115550 .       AC      A,<NON_REF>   …
    1       1115551 .       C       <NON_REF>       …
    

    Since 1115550 is a 1bp deletion in the following base (1115551) these two lines are meant to be combined into a single one positioned at 1115550. That is a bug. Not guaranteed but it might be the cause of the exception at the second round of CombineGVCFs.

  • gtiaogtiao Cambridge, MAPosts: 9Member

    Hi, Valentin -- Yes, the multi-sample GVCFs were created by running HC on each individual sample independently. There was a prior round of CombineGVCFs (which produced "./." for the genotypes in question); but on the second round of CombineGVCFs, that field was converted to "0/0".

    I see what you mean by the redundancy in the two lines you highlighted. I'll give the command another whirl once that bug is fixed, and we'll see whether that fixes the CombineGVCFs problem.

    Thanks!

    Grace

  • srynearson1srynearson1 Posts: 23Member
    edited May 21

    @valentin@Geraldine_VdAuwera

    I'm getting this same error and I did merger many individual sample together as CombineGVCFs has a unknown limit of file which it can work with forcing one to merge smaller sets of files.

    Is there a way to clean up these files once they have been merged, Or a command to use in CombineGVCFs which does not output redundant lines.

    -Thanks, SR

    P.S my error:

    ##### ERROR MESSAGE: cannot merge genotypes from samples without PLs; sample HLHS0136 does not have likelihoods at position 1:10929

    Version: The Genome Analysis Toolkit (GATK) v3.1-1-g07a4bf8,

    And the tool is still reporting 0/0:

    1       10929   .       C       <NON_REF>       .       .       END=11532       GT:DP:GQ:MIN_DP:PL      0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0:0:0:0:0,0,0 0/0     0/0
         0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0
    Post edited by srynearson1 on
  • srynearson1srynearson1 Posts: 23Member

    I searched around the forum and decided to download the Nightly release, which currently fixes this issue. Just thought I would post for anyone else having the same issue of merging large sets.

    -SR

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,672Administrator, GATK Developer admin

    @srynearson1‌ That was the right thing to do -- the problem you encountered stems from a bug we fixed recently in development.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.