ArrayIndexOutOfBoundsException in GenotypeGVCFs on chrX with male/female adapted ploidy

I am attempting to call exomes using GATK 3.8, the new quality model, and AS annotations. However, for chrX, I get an ArrayIndexOutOfBoundsException for chrX, likely as I am using different ploidy for males and females.

INFO 20:01:42,079 ProgressMeter - X:140994551 3505.0 30.0 s 2.4 h 1.9% 26.9 m 26.4 m

ERROR --
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 24
at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyGenotypeLikelihoods.getNumLikelihoodElements(GeneralPloidyGenotypeLikelihoods.java:440)
at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyGenotypeLikelihoods.subsetToAlleles(GeneralPloidyGenotypeLikelihoods.java:339)
at org.broadinstitute.gatk.tools.walkers.genotyper.afcalc.IndependentAllelesExactAFCalculator.subsetAlleles(IndependentAllelesExactAFCalculator.java:494)
at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:292)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:392)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:375)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:330)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.regenotypeVC(GenotypeGVCFs.java:327)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:305)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:136)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.8-0-ge9d806836):

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @marchulsman
    Hi,

    Can you post the exact command you ran?
    Can you also confirm your input GVCFs validate with ValidateVariants?

    Thanks,
    Sheila

  • I used the follow commands to create the GVCFs:

    {GATK} -T HaplotypeCaller -R {REF_FASTA} -I {input[0]} -nct 4 -contamination {contamination_fraction} -BQSR {input[2]} -L {chrX exome regions} -ip 100 -ploidy {ploidy: 2 for females, 1 for male} --dbsnp {DBSNP_150} --genotyping_mode DISCOVERY --minPruning 2 -newQual --emitRefConfidence GVCF {GATK_ANNOTATIONS} -variant_index_type LINEAR -variant_index_parameter 128000 -l INFO -log {output[1]} -o {output[0]}

    Where GATK_ANNOTATIONS = -G StandardAnnotation -A AS_BaseQualityRankSumTest -A AS_FisherStrand -A AS_MappingQualityRankSumTest -A AS_QualByDepth -A AS_RMSMappingQuality -A AS_ReadPosRankSumTest -A AS_StrandOddsRatio -A AS_MQMateRankSumTest -A AS_InsertSizeRankSum -A FractionInformativeReads -A LikelihoodRankSumTest -A StrandBiasBySample -A MappingQualityZeroBySample -A GCContent

    And to combine the GVCFs:
    {GATK} -T CombineGVCFs {variants} -R {REF_FASTA} --dbsnp {DBSNP_150} {GATK_ANNOTATIONS} -o {output}

    And finally to call the combined GVCFs:
    {GATK} -T GenotypeGVCFs --disable_auto_index_creation_and_locking_when_reading_rods -R {REF_FASTA} -D {DBSNP_150} -newQual --annotateNDA -o {output[0]} -L {part of chrX} -ip 100 {GATK_ANNOTATIONS} {combined gvcfs}

    I validated the combined GVCFs using the folowing command, and found no errors.
    gatk-3.8 -T ValidateVariants -R {REF_FASTA} -V {combined gvcf} --validateGVCF -L {chrX regions} -ip 100

    The other chromosomes work without problems (including chrY which is called haploid), I only have an issue when mixing male/female ploidy.

    Thank you for the help!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @marchulsman
    Hi,

    Hmm. Can you submit a bug report? Instructions are here.

    Thanks,
    Sheila

  • twooldridgetwooldridge Member
    edited June 2018

    Hi Sheila,

    Was this issue ever resolved? I find that I'm experiencing an identical error, using GATK 3.8 and java 1.8. All other chromosomes complete fine, but with mixed ploidy (from Haplotype Caller) chrX fails aroun 10-11MB each time. Here's a snippet of the output:

    INFO  20:30:32,451 ProgressMeter -   chrX:10556201       1.0E7    25.3 m       2.5 m       92.0%    27.5 m       2.2 m 
    ##### ERROR --
    ##### ERROR stack trace 
    java.lang.ArrayIndexOutOfBoundsException: 20
        at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyGenotypeLikelihoods.getNumLikelihoodElements(GeneralPloidyGenotypeLikelihoods.java:440)
        at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyGenotypeLikelihoods.subsetToAlleles(GeneralPloidyGenotypeLikelihoods.java:339)
        at org.broadinstitute.gatk.tools.walkers.genotyper.afcalc.IndependentAllelesExactAFCalculator.subsetAlleles(IndependentAllelesExactAFCalculator.java:494)
        at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:292)
        at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:392)
        at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:375)
        at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:330)
        at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.regenotypeVC(GenotypeGVCFs.java:327)
        at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:305)
        at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:136)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
        at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.8-0-ge9d806836):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: 20
    ##### ERROR ------------------------------------------------------------------------------------------
    

    All gvcfs pass ValidateVariants, so they appear to be fine. I should also note that I've run GenotypeGVCFs successfully on the X chromosome from a different cohort in the past, using the same version of GATK (3.8)

    Thank you,
    Brock

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @twooldridge
    Hi Brock,

    No, I don't think this was ever resolved, as we never got any test data.

    Can you post the command you ran? Also, can you try with GATK4?

    -Sheila

Sign In or Register to comment.