Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

GATK (v 3.5) VariantAnnotator ERROR: potential bug?

Hi, I am using VariantAnnotator and, among other annotations, I would like to add also the allelic frequency of my SNPs in 1000G to the resulting annotated .vcf file.
I did this through the following command line

java -Xmx8g -jar /bio/GenomeAnalysisTK.jar -T VariantAnnotator -V ${1}/recalibratedSNPsINDELs.vcf -R /data/shared/noPAR.hg19_decoy.fa \
-resource:1000G /data/shared/sort.1000G_phase1.snps.high_confidence.hg19.vcf.gz \
-E 1000G.AF -E 1000G.pop --resourceAlleleConcordance \
-o ${1}.annotation1bis.vcf \
&> ${1}.annotation1bis.log

Nonetheless, the scripts stops after (successfully) annotating a few hundreds variants, with the following error message

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateExpressions(VariantAnnotatorEngine.java:462)
at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:217)
at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:203)
at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:357)
at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:114)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Index: 1, Size: 1
ERROR ------------------------------------------------------------------------------------------

I have found some similar errors in the forum but never the same as mine for VariantAnnotator..
Moreover, I have to say that when I remove the annotation of allelic frequencies in 1000G the script seems to work properly..what could be the reason for this?

I am looking forward to hearing from you.

Best,
Alessandro
(v3.5-0-g36282e4)

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    My guess is that there is a formatting error in the dbsnp file. Can you find out which variant the tool is choking on?
  • alegiaalegia ItalyMember

    Hi @Geraldine_VdAuwera ,
    I think I found the problematic variant which is present in the dbsnp file that I used in the pipeline, but it's not present in the 1000G resource file that I used in the command line above.
    Does it have anything to do with my issue?
    Below the line of the variant in my dbsnp file:

    CHROM POS ID REF ALT QUAL FILTER INFO

    chr1 948846 rs5772027 T TA . . ASP;GENEINFO=ISG15:9636;GNO;OTHERKG;REF;RSPOS=948847;SAO=0;SLO;SSR=0;U5;VC=DIV;VP=050100400205000102000200;WGT=1;dbSNPBuildID=114

    Maybe now you have some more hints on what the problem could be?

    All the resource and support files that I use (including fasta) were provided by our external sequencing service and sadly at the moment I cannot use the ones publicly available on your platform since the contigs slightly differs from them and I would need to restart the whole pipeline from scratch to use your resource files (which is not feasible at the moment due to storage space limitations that I have for bam files).

    Best,
    Alessandro

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @alegia
    Hi Alessandro,

    I am sorry if I am missing something obvious here, but why is the dbSNP file important when the error is thrown using 1000G file?

    Thanks,
    Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Sorry folks, I must have been using "dbsnp" as short-hand to refer to the resource file.

    Try validating the resource file and see if there's any obvious formatting error that pops out.

  • alegiaalegia ItalyMember

    Hi @Geraldine_VdAuwera,
    I did it, but nothing obvious comes out..

    INFO 21:16:34,719 HelpFormatter - --------------------------------------------------------------------------------
    INFO 21:16:34,721 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56
    INFO 21:16:34,721 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 21:16:34,721 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 21:16:34,723 HelpFormatter - Program Args: -T ValidateVariants -R /data/shared/noPAR.hg19_decoy.fa -V ALLgoodSamples/recalibratedSNPsINDELs.vcf --warnOnErrors --reference_window_stop 500
    INFO 21:16:34,728 HelpFormatter - Executing as [email protected] on Linux 3.10.0-123.20.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_45-b13.
    INFO 21:16:34,728 HelpFormatter - Date/Time: 2017/01/09 21:16:34
    INFO 21:16:34,728 HelpFormatter - --------------------------------------------------------------------------------
    INFO 21:16:34,728 HelpFormatter - --------------------------------------------------------------------------------
    INFO 21:16:34,769 GenomeAnalysisEngine - Strictness is SILENT
    INFO 21:16:34,848 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 21:16:35,034 GenomeAnalysisEngine - Preparing for traversal
    INFO 21:16:35,038 GenomeAnalysisEngine - Done preparing for traversal
    INFO 21:16:35,039 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 21:16:35,041 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 21:16:35,041 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    INFO 21:17:05,047 ProgressMeter - chr5:135700901 194134.0 30.0 s 2.6 m 32.5% 92.0 s 62.0 s
    INFO 21:17:35,049 ProgressMeter - chr12:40791665 407500.0 60.0 s 2.5 m 63.5% 94.0 s 34.0 s
    INFO 21:18:05,051 ProgressMeter - chr20:32265229 633832.0 90.0 s 2.4 m 87.8% 102.0 s 12.0 s
    Successfully validated the input file. Checked 539341 records with no failures.
    INFO 21:18:12,768 ProgressMeter - done 689068.0 97.0 s 2.4 m 97.7% 99.0 s 2.0 s
    INFO 21:18:12,769 ProgressMeter - Total runtime 97.73 secs, 1.63 min, 0.03 hours
    INFO 21:18:13,892 GATKRunReport - Uploaded run statistics report to AWS S3

    Any suggestion on what else it could be?

    Best,
    Alessandro

    Issue · Github
    by Sheila

    Issue Number
    1617
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hmm. Can you please choose a site that you think should be annotated, and post both the VCF record for that site from your callset, and the one from the resource file?

  • SpaceCade7SpaceCade7 Space, the final frontierMember
    edited March 2017

    Hi. I'm getting the same cryptic error. I am using the ExAc resource that the GATK bundle provides to annotate my VCF with VariantAnnotator.

    This runs successfully:

    java -jar $GATK \
       -R human_g1k_v37.fasta \
       -T VariantAnnotator \
       -V in.vcf.gz \
       -o out_exacAF.vcf.gz \
       --resource:exac ExAC.r0.3.nonTCGA.sites.vep.b37.vcf.gz \
       -E exac.AF \
       --resourceAlleleConcordance
    

    This gives an error (using same exact input and resource files):

    java -jar $GATK \
       -R human_g1k_v37.fasta \
       -T VariantAnnotator \
       -V in.vcf.gz \
       -o out_exac5.vcf.gz \
       --resource:exac ExAC.r0.3.nonTCGA.sites.vep.b37.vcf.gz \
       -E exac.AC_EAS \
       -E exac.AN_EAS \
       -E exac.Hemi_EAS \
       -E exac.Het_EAS \
       -E exac.Hom_EAS \
        --resourceAlleleConcordance
    

    And the error is:

            INFO  15:35:41,026 ProgressMeter -      7:61999901   1.294657027E9     7.0 m       0.0 s       41.8%    16.8 m       9.8 m 
            ##### ERROR --
            ##### ERROR stack trace 
            java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
                at java.util.ArrayList.rangeCheck(ArrayList.java:653)
                at java.util.ArrayList.get(ArrayList.java:429)
                at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateExpressions(VariantAnnotatorEngine.java:471)
                at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:226)
                at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:212)
                at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:355)
                at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:112)
                at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
                at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
                at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
                at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
                at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
                at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
                at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
                at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
                at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
                at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
                at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
                at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
                at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
            ##### ERROR ------------------------------------------------------------------------------------------
            ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
            ##### ERROR
            ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
            ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
            ##### ERROR Visit our website and forum for extensive documentation and answers to 
            ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
            ##### ERROR
            ##### ERROR MESSAGE: Index: 1, Size: 1
            ##### ERROR ------------------------------------------------------------------------------------------
    

    I'm running GATK v3.7 and Java 8.

    INFO  15:28:40,506 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18 
    

    Any idea what may be the cause of the error? Thanks.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @SpaceCade7
    Hi,

    Have you tried validating the input VCFs with ValidateVariants?

    Thanks,
    Sheila

Sign In or Register to comment.