Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

ArrayIndexOutOfBoundsException in GenotypeGVCFs on chrX with male/female adapted ploidy

I am attempting to call exomes using GATK 3.8, the new quality model, and AS annotations. However, for chrX, I get an ArrayIndexOutOfBoundsException for chrX, likely as I am using different ploidy for males and females.

INFO 20:01:42,079 ProgressMeter - X:140994551 3505.0 30.0 s 2.4 h 1.9% 26.9 m 26.4 m

ERROR --
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 24
at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyGenotypeLikelihoods.getNumLikelihoodElements(GeneralPloidyGenotypeLikelihoods.java:440)
at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyGenotypeLikelihoods.subsetToAlleles(GeneralPloidyGenotypeLikelihoods.java:339)
at org.broadinstitute.gatk.tools.walkers.genotyper.afcalc.IndependentAllelesExactAFCalculator.subsetAlleles(IndependentAllelesExactAFCalculator.java:494)
at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:292)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:392)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:375)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:330)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.regenotypeVC(GenotypeGVCFs.java:327)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:305)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:136)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.8-0-ge9d806836):

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @marchulsman
    Hi,

    Can you post the exact command you ran?
    Can you also confirm your input GVCFs validate with ValidateVariants?

    Thanks,
    Sheila

  • I used the follow commands to create the GVCFs:

    {GATK} -T HaplotypeCaller -R {REF_FASTA} -I {input[0]} -nct 4 -contamination {contamination_fraction} -BQSR {input[2]} -L {chrX exome regions} -ip 100 -ploidy {ploidy: 2 for females, 1 for male} --dbsnp {DBSNP_150} --genotyping_mode DISCOVERY --minPruning 2 -newQual --emitRefConfidence GVCF {GATK_ANNOTATIONS} -variant_index_type LINEAR -variant_index_parameter 128000 -l INFO -log {output[1]} -o {output[0]}

    Where GATK_ANNOTATIONS = -G StandardAnnotation -A AS_BaseQualityRankSumTest -A AS_FisherStrand -A AS_MappingQualityRankSumTest -A AS_QualByDepth -A AS_RMSMappingQuality -A AS_ReadPosRankSumTest -A AS_StrandOddsRatio -A AS_MQMateRankSumTest -A AS_InsertSizeRankSum -A FractionInformativeReads -A LikelihoodRankSumTest -A StrandBiasBySample -A MappingQualityZeroBySample -A GCContent

    And to combine the GVCFs:
    {GATK} -T CombineGVCFs {variants} -R {REF_FASTA} --dbsnp {DBSNP_150} {GATK_ANNOTATIONS} -o {output}

    And finally to call the combined GVCFs:
    {GATK} -T GenotypeGVCFs --disable_auto_index_creation_and_locking_when_reading_rods -R {REF_FASTA} -D {DBSNP_150} -newQual --annotateNDA -o {output[0]} -L {part of chrX} -ip 100 {GATK_ANNOTATIONS} {combined gvcfs}

    I validated the combined GVCFs using the folowing command, and found no errors.
    gatk-3.8 -T ValidateVariants -R {REF_FASTA} -V {combined gvcf} --validateGVCF -L {chrX regions} -ip 100

    The other chromosomes work without problems (including chrY which is called haploid), I only have an issue when mixing male/female ploidy.

    Thank you for the help!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @marchulsman
    Hi,

    Hmm. Can you submit a bug report? Instructions are here.

    Thanks,
    Sheila

  • twooldridgetwooldridge Member
    edited June 2018

    Hi Sheila,

    Was this issue ever resolved? I find that I'm experiencing an identical error, using GATK 3.8 and java 1.8. All other chromosomes complete fine, but with mixed ploidy (from Haplotype Caller) chrX fails aroun 10-11MB each time. Here's a snippet of the output:

    INFO  20:30:32,451 ProgressMeter -   chrX:10556201       1.0E7    25.3 m       2.5 m       92.0%    27.5 m       2.2 m 
    ##### ERROR --
    ##### ERROR stack trace 
    java.lang.ArrayIndexOutOfBoundsException: 20
        at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyGenotypeLikelihoods.getNumLikelihoodElements(GeneralPloidyGenotypeLikelihoods.java:440)
        at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyGenotypeLikelihoods.subsetToAlleles(GeneralPloidyGenotypeLikelihoods.java:339)
        at org.broadinstitute.gatk.tools.walkers.genotyper.afcalc.IndependentAllelesExactAFCalculator.subsetAlleles(IndependentAllelesExactAFCalculator.java:494)
        at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:292)
        at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:392)
        at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:375)
        at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:330)
        at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.regenotypeVC(GenotypeGVCFs.java:327)
        at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:305)
        at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:136)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
        at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.8-0-ge9d806836):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: 20
    ##### ERROR ------------------------------------------------------------------------------------------
    

    All gvcfs pass ValidateVariants, so they appear to be fine. I should also note that I've run GenotypeGVCFs successfully on the X chromosome from a different cohort in the past, using the same version of GATK (3.8)

    Thank you,
    Brock

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @twooldridge
    Hi Brock,

    No, I don't think this was ever resolved, as we never got any test data.

    Can you post the command you ran? Also, can you try with GATK4?

    -Sheila

Sign In or Register to comment.