Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantAnnotator, GATK3.2 Error "java.lang.reflect.InvocationTargetException". Ubuntu 12 java 1.7.

Hi, when I'm trying to use the VariantAnnotator on larger SNPEFF vcf files (about 40-50MB) I get the same error:
"stack trace" followed by "java.lang.reflect.InvocationTargetException" using GATK 3.2.

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:189)
at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.loadFromDisk(RMDTrackBuilder.java:336)
at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.attemptToLockAndLoadIndexFromDisk(RMDTrackBuilder.java:320)
at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:279)
at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:225)
at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:148)
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.(ReferenceOrderedDataSource.java:208)
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.(ReferenceOrderedDataSource.java:88)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:990)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:772)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:285)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:185)
... 14 more
Caused by: java.io.EOFException
at htsjdk.tribble.util.LittleEndianInputStream.readFully(LittleEndianInputStream.java:138)
at htsjdk.tribble.util.LittleEndianInputStream.readLong(LittleEndianInputStream.java:80)
at htsjdk.tribble.index.linear.LinearIndex$ChrIndex.read(LinearIndex.java:271)
at htsjdk.tribble.index.AbstractIndex.read(AbstractIndex.java:363)
at htsjdk.tribble.index.linear.LinearIndex.(LinearIndex.java:101)
... 19 more

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.2-2-gec30cee):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: java.lang.reflect.InvocationTargetException
ERROR ------------------------------------------------------------------------------------------

The original command was:

$java -Xmx$jmem -jar $gatk_dir/GenomeAnalysisTK.jar -T VariantAnnotator -l DEBUG \
    -R $genome \
    -A SnpEff \
    --variant $resultsDir/$hq_vcf \
    --snpEffFile $resultsDir/${hq_vcf/.vcf/.snpEff.vcf} \
    -L $resultsDir/${hq_vcf/.vcf/.snpEff.vcf} \
    -o $resultsDir/${hq_vcf/.vcf/.variantCalls.snpEff.va.vcf} \
    -rf BadCigar

Where I added already the -rf BadCigar argument with no effect. I rewrote the indexes for the snpEff vcf files used as input, with igvtools - no effect. $jmem was set to '11G'.
I can open all vcf files with an text editor - so it seems that there is no file corruption. Is there a different way to combine reads other than with bedtools?

Your help is very appreciated,

Philipp

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @philipp_henrich‌

    Hi Philipp,

    This is probably caused by the snpEff index file. You can delete it, and GATK will regenerate one.

    -Sheila

  • philipp_henrichphilipp_henrich Member
    edited February 2015

    Thank you Sheila, I will try that. I was working on this issue for a while and it turned out that there is a problem with generating the raw vcf files by the UnifiedGenotyper using bam files with ~400x sample depth (base coverage) and the -nct option. As a result, I assume that the indices for the downstream files (such as snpEff) were not properly written/finished due to incomplete/malformed bam files. Using -nt with appropriate memory allocation helped with the large files. Also, it appears that on linux, when working with files using up more than 16-20GB RAM (per thread), java memory (Xmx) is not properly freed after a command finishes (so far, observed with UG or PrintReads); RAM needs to be manually reclaimed with a root "sync". - That was a hard one to solve, but I finally could finish my analysis - I am trying to document my changes and I could write up a new thread for others, if of interest.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @philipp_henrich

    Hi Philipp,

    Thank you for getting back to us. A document with your changes/findings would be very helpful. We really appreciate any input from users!

    -Sheila

Sign In or Register to comment.