CombineGVCFs Error - Features added out of order

donfreeddonfreed BaltimoreMember

Hi GATK team,

I am running the best practices pipeline on a large set of WES data. On some of the CombineGVCFs steps I am getting errors with the GATK v3.5.

Here is the command:

java -Xmx51200m -jar /home/ndaranalysis/GenomeAnalysisTK.jar -T CombineGVCFs -R /data/Build37/human_g1k_v37.fasta \
-o output.g.vcf.gz  -A AS_BaseQualityRankSumTest -A AS_FisherStrand -A AS_MappingQualityRankSumTest \
-A AS_QualByDepth -A AS_RMSMappingQuality -A AS_ReadPosRankSumTest -A AS_StrandOddsRatio -A AlleleBalance \
-A ClippingRankSumTest -A AS_InbreedingCoeff -A GCContent -A TandemRepeatAnnotator  -G StandardAnnotation \
-L /data/Exome/seqcap_ez_exome_v2.bed  -im ALL -ip 50 --logging_level ERROR -V input1.g.vcf -V input2.g.vcf ...

Here is the error:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
java.lang.IllegalArgumentException: Features added out of order: previous (TabixFeature{referenceIndex=9, start=1354
40296, end=135440296, featureStartFilePosition=356127518800018, featureEndFilePosition=-1}) > next (TabixFeature{ref
erenceIndex=9, start=193026, end=193026, featureStartFilePosition=356127518807875, featureEndFilePosition=-1})
        at htsjdk.tribble.index.tabix.TabixIndexCreator.addFeature(TabixIndexCreator.java:89)
        at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.add(IndexingVariantContextWriter.java:1
70)
        at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:219)
        at org.broadinstitute.gatk.engine.io.storage.VariantContextWriterStorage.add(VariantContextWriterStorage.jav
a:200)
        at org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub.add(VariantContextWriterStub.java:272)
        at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.endPreviousStates(CombineGVCFs.java:368)
        at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.reduce(CombineGVCFs.java:243)
        at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.reduce(CombineGVCFs.java:114)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociReduce.apply(TraverseLociNano.java
:291)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociReduce.apply(TraverseLociNano.java
:280)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:279)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
        at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Features added out of order: previous (TabixFeature{referenceIndex=9, start=135440296, end=1354
40296, featureStartFilePosition=356127518800018, featureEndFilePosition=-1}) > next (TabixFeature{referenceIndex=9,
start=193026, end=193026, featureStartFilePosition=356127518807875, featureEndFilePosition=-1})
##### ERROR ------------------------------------------------------------------------------------------

Thanks!

Best Answer

Answers

  • donfreeddonfreed BaltimoreMember

    I found what was causing the error and a fix.

    TL;DR: some individuals have deletions which overlap the end of the last target region of a chromosome, which is causing the error. Expanding the BED interval so that the entire deletion is within the interval fixes the problem.

    The BED file I am using has these lines:

    10      135440068       135440246
    11      193076  193177
    

    Here are some lines from the input GVCFs:

    for gvcf in `ls /data/GVCF/11549.fa.g.vcf.gz /data/GVCF/11549.mo.g.vcf.gz`
    do
        echo $gvcf
        tabix $gvcf 10:135440280-135440300
    done
    

    Produces:

    /data/GVCF/11549.fa.g.vcf.gz
    10      135440279       .       C       <NON_REF>       .       .       END=135440293   GT:DP:GQ:MIN_DP:PL    0/0/0/0/0:767:39:704:0,39,89,159,280,1800
    10      135440294       .       AGTGTGGGTTAGT   A,<NON_REF>     1487.56 .       AS_RAW_BaseQRankSum=20,103,21,2,22,1,23,2,25,4,26,1,27,2,28,4,29,15,30,36,31,230,32,88,33,51,34,58,35,73,36,13|4,1,20,4,27,1,28,1,29,7,30,19,31,9,32,8,33,8|;AS_RAW_MQ=447537.00|41743.00|0.00;AS_RAW_MQRankSum=23,455,25,161,29,4,31,3,35,4,37,4,40,9,41,11,42,29,43,3|23,46,29,3,40,3,41,6|;AS_RAW_ReadPosRankSum=0,2,1,27,2,20,3,17,4,13,5,17,6,6,7,14,8,15,9,18,10,9,11,12,12,12,13,16,14,20,15,11,16,14,17,10,18,20,19,9,20,5,21,9,22,10,23,16,24,7,25,9,26,13,27,12,28,17,29,10,30,20,31,15,32,15,33,12,34,10,35,7,36,23,37,15,38,11,39,22,40,16,41,31,42,9,43,13,44,4,45,12,46,4|0,1,1,1,2,3,3,2,4,7,6,1,7,4,9,3,11,2,12,1,13,2,14,1,19,2,20,4,22,1,23,1,25,1,26,4,27,3,28,1,29,1,32,1,35,2,37,2,40,3,41,2,44,1,45,1|;AS_SB_TABLE=68,615|4,54|0,0;BaseQRankSum=-3.552;ClippingRankSum=-0.315;DP=752;MLEAC=1,0;MLEAF=0.200,0.00;MQRankSum=-2.127;RAW_MQ=498926.00;ReadPosRankSum=-2.229 GT:AD:GQ:PL:SB  0/0/0/0/1:683,58,0:99:1523,0,683,1790,3786,29492,2190,859,1894,3860,29551,3049,2071,3964,29625,4260,4141,29729,6331,29906,32095:68,615,4,54
    /data/GVCF/11549.mo.g.vcf.gz
    10      135440279       .       C       <NON_REF>       .       .       END=135440296   GT:DP:GQ:MIN_DP:PL    0/0/0/0/0:776:39:637:0,39,89,159,280,1800
    

    Merging these two GVCFs reproduces the error.

    java -Xmx28672m -jar /home/ndaranalysis/GenomeAnalysisTK.jar -T CombineGVCFs -R /data/Build37/human_g1k_v37.fasta \
    -o /data/GVCF/merge_test.g.vcf.gz -A AS_BaseQualityRankSumTest -A AS_FisherStrand -A AS_MappingQualityRankSumTest \
    -A AS_QualByDepth -A AS_RMSMappingQuality -A AS_ReadPosRankSumTest -A AS_StrandOddsRatio -A AlleleBalance \
    -A ClippingRankSumTest -A AS_InbreedingCoeff -A GCContent -A TandemRepeatAnnotator  -G StandardAnnotation \
    -L 10:135440068-135440246 -L 11:193076-193177 -im ALL -ip 50 \
    -V /data/GVCF/11549.mo.g.vcf.gz -V /data/GVCF/11549.fa.g.vcf.gz
    

    However, adjusting the first interval so that the interval + padding encompasses the deletion fixes the problem. This command works:

    java -Xmx28672m -jar /home/ndaranalysis/GenomeAnalysisTK.jar -T CombineGVCFs -R /data/Build37/human_g1k_v37.fasta \
    -o /data/GVCF/merge_test.g.vcf.gz -A AS_BaseQualityRankSumTest -A AS_FisherStrand -A AS_MappingQualityRankSumTest \
    -A AS_QualByDepth -A AS_RMSMappingQuality -A AS_ReadPosRankSumTest -A AS_StrandOddsRatio -A AlleleBalance \
    -A ClippingRankSumTest -A AS_InbreedingCoeff -A GCContent -A TandemRepeatAnnotator  -G StandardAnnotation \
    -L 10:135440068-135440260 -L 11:193076-193177 -im ALL -ip 50 \
    -V /data/GVCF/11549.mo.g.vcf.gz -V /data/GVCF/11549.fa.g.vcf.gz
    
  • donfreeddonfreed BaltimoreMember

    I would be happy to upload a data snippet.

    I uploaded a tar archive "combineGvcfsError.tar.gz" to the FTP site according to the instructions.

    Issue · Github
    by Sheila

    Issue Number
    481
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks, we'll try to process this soon.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @donfreed
    Hi,

    I was not able to reproduce the error message you originally got. However, I did get this nice warning statement:

    WARN 15:30:42,116 CombineGVCFs - You have asked for an interval that cuts in the middle of one or more gVCF blocks. Please note that this will cause you to lose records that don't end within your interval.

    I tried with both intervals you suggested and both gave the same warning message.

    -Sheila

  • donfreeddonfreed BaltimoreMember

    Hi @Sheila,

    Thank you for looking into this. I am able to reproduce the error on three different machines (2 RHEL, 1 Ubuntu) using both 3.5-0 and the latest nightly build, so I am not sure why it is not reproducing for you. Also, I do not see the warning message you described, although I did see many warning messages which seem to be related to subsetting the gVCFs that are similar to:

    WARN  17:08:25,231 ReferenceConfidenceVariantContextMerger - WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but 68,615|4,54|0,0 doesn't parse and will not be annotated in the final VC.
    
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @donfreed
    Hi,

    Hmm. I get a fully combined GVCF with the WARNING message I posted above when I run this command:

    java -jar GenomeAnalysisTK.jar -T CombineGVCFs -R human_g1k_v37.fasta -V input_1.g.vcf.gz -V input_2.g.vcf.gz -o Sheila.CombineGVCFs.default.g.vcf -L 10:135440068-135440246 -L 11:193076-193177 -im ALL

    I also get a combined GVCF when I run the command above with -ip 50. However, with -ip 50, I do not get the WARNING message I posted above.

    I am not able to reproduce your error message with either the latest stable build or latest nightly build.

    -Sheila

  • donfreeddonfreed BaltimoreMember

    @Sheila

    OK, your command is working for me:

    java -jar $GATK -T CombineGVCFs -R human_g1k_v37.fasta -V input_1.g.vcf.gz -V input_2.g.vcf.gz -o Sheila.CombineGVCFs.default.g.vcf -L 10:135440068-135440246 -L 11:193076-193177 -im ALL -ip 50
    

    This runs without a WARNING.

    However this will not work:

    java -jar $GATK -T CombineGVCFs -R human_g1k_v37.fasta -V input_1.g.vcf.gz -V input_2.g.vcf.gz -o Sheila.CombineGVCFs.default.g.vcf.gz -L 10:135440068-135440246 -L 11:193076-193177 -im ALL -ip 50
    

    So it seems that having the output bgzip-compressed is crucial to reproducing the error.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @donfreed
    Hi,

    Alright. I was able to reproduce the error with a .gz output. However, if you add -ip 100 (make the padding larger) or do not use -ip at all, the error does not occur. Unfortunately, since there is a fix/workaround for this, I don't think the developers will be devoted to fixing this.

    Besides, if you have been using -L for all the steps up to and through HaplotypeCaller, you don't need to use -L for the steps afterwards :smile:
    https://www.broadinstitute.org/gatk/guide/article?id=4133

    -Sheila

  • mmokrejsmmokrejs Czech RepublicMember

    Alright. I was able to reproduce the error with a .gz output. However, if you add -ip 100 (make the padding larger) or do not use -ip at all, the error does not occur. Unfortunately, since there is a fix/workaround for this, I don't think the developers will be devoted to fixing this.

    I think this is a bad programming practice and you should have raised the issue to your developers. I will open a new issue as I am also getting these:

    WARN  10:35:44,161 ReferenceConfidenceVariantContextMerger - WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but 55291.00|1250.00|0.00 doesn't parse and will not be annotated in the final VC. 
    WARN  10:35:44,162 ReferenceConfidenceVariantContextMerger - WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but 0,4,2,2,4,1,5,3,7,1,10,4,12,4,13,2,15,1,16,1,17,2,18,1,20,17|15,2| doesn't parse and will not be annotated in the final VC. 
    WARN  10:35:44,162 ReferenceConfidenceVariantContextMerger - WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but 25,27,49,16|25,2| doesn't parse and will not be annotated in the final VC. 
    WARN  10:35:44,162 ReferenceConfidenceVariantContextMerger - WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but 25,18|1,1|0,0 doesn't parse and will not be annotated in the final VC. 
    WARN  10:35:44,162 ReferenceConfidenceVariantContextMerger - WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but 20,18,28,3,29,7,30,5,31,2,32,7,33,1|20,2| doesn't parse and will not be annotated in the final VC. 
    WARN  10:35:44,232 ReferenceConfidenceVariantContextMerger - WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but 0.00|80134.00|0.00 doesn't parse and will not be annotated in the final VC. 
    WARN  10:35:44,232 ReferenceConfidenceVariantContextMerger - WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but 0,0|26,33|0,0 doesn't parse and will not be annotated in the final VC. 
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    @mmokrejs Our developers have been working on new and improved implementations of this functionality in GATK4 which will be released soon. That is why we do not plan to address this in the GATK3 framework.
Sign In or Register to comment.