To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

GenotypeGVCFs multiple gvcf bug

scottyler89scottyler89 University of IowaMember

Hi all - I've tried using the GenotypeGVCFs in both 3.7 and 4.0, but get different errors in each. for 3.7 (nightly-2017-05-17-g44b6fa2). I get:

java -Xmx124g -jar ~/bin/gatk/GenomeAnalysisTK.jar -T GenotypeGVCFs -R /home/elab/references/mus_musculus/Mus_musculus.GRCm38.dna_sm.fa -nt 19 --max_alternate_alleles 4 --variant Sample_147003/Sample_147003.g.vcf --variant Sample_148112/Sample_148112.g.vcf --variant Sample_151203/Sample_151203.g.vcf --variant Sample_206082/Sample_206082.g.vcf --variant Sample_206083/Sample_206083.g.vcf --variant Sample_212034/Sample_212034.g.vcf --variant Sample_213965/Sample_213965.g.vcf --variant Sample_214051/Sample_214051.g.vcf --variant sample_14831_3/sample_14831_3.g.vcf --variant sample_14851_3/sample_14851_3.g.vcf --variant sample_15183_5/sample_15183_5.g.vcf --variant sample_15639_2/sample_15639_2.g.vcf --variant sample_15640_3/sample_15640_3.g.vcf --variant sample_15675_4/sample_15675_4.g.vcf --variant sample_15708_2/sample_15708_2.g.vcf --variant sample_15714_3/sample_15714_3.g.vcf --variant sample_15744_2/sample_15744_2.g.vcf --variant sample_15746_3/sample_15746_3.g.vcf --variant sample_15766_3/sample_15766_3.g.vcf --variant sample_15791_4/sample_15791_4.g.vcf --variant sample_15811_3/sample_15811_3.g.vcf --variant sample_15822_3/sample_15822_3.g.vcf --variant sample_15836_2/sample_15836_2.g.vcf --variant sample_15836_3/sample_15836_3.g.vcf --variant sample_15858_5/sample_15858_5.g.vcf --variant sample_15871_3/sample_15871_3.g.vcf --variant sample_15875_2/sample_15875_2.g.vcf --variant sample_15876_3/sample_15876_3.g.vcf --variant sample_15876_5/sample_15876_5.g.vcf --variant sample_15914_2/sample_15914_2.g.vcf --variant sample_15915_3/sample_15915_3.g.vcf --variant sample_20544_1/sample_20544_1.g.vcf --variant sample_20549_5/sample_20549_5.g.vcf --variant sample_20551_1/sample_20551_1.g.vcf --variant sample_20551_3/sample_20551_3.g.vcf --variant sample_20605_2/sample_20605_2.g.vcf --variant sample_20605_4/sample_20605_4.g.vcf --variant sample_20605_5/sample_20605_5.g.vcf --variant sample_20836_5/sample_20836_5.g.vcf --variant sample_20952_2/sample_20952_2.g.vcf -o /media/elab/Seagate_Expansion_Drive_1/ms_exomes/final/all.combined.vcf
INFO  09:32:54,680 HelpFormatter - --------------------------------------------------------------------------------------------- 
INFO  09:32:54,683 HelpFormatter - The Genome Analysis Toolkit (GATK) vnightly-2017-05-17-g44b6fa2, Compiled 2017/05/17 00:01:17 
INFO  09:32:54,683 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
INFO  09:32:54,683 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
INFO  09:32:54,683 HelpFormatter - [Thu Jul 27 09:32:54 CDT 2017] Executing on Linux 4.4.0-87-generic amd64 
INFO  09:32:54,684 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.2-b11 
INFO  09:32:54,686 HelpFormatter - Program Args: -T GenotypeGVCFs -R /home/elab/references/mus_musculus/Mus_musculus.GRCm38.dna_sm.fa -nt 19 --max_alternate_alleles 4 --variant Sample_147003/Sample_147003.g.vcf --variant Sample_148112/Sample_148112.g.vcf --variant Sample_151203/Sample_151203.g.vcf --variant Sample_206082/Sample_206082.g.vcf --variant Sample_206083/Sample_206083.g.vcf --variant Sample_212034/Sample_212034.g.vcf --variant Sample_213965/Sample_213965.g.vcf --variant Sample_214051/Sample_214051.g.vcf --variant sample_14831_3/sample_14831_3.g.vcf --variant sample_14851_3/sample_14851_3.g.vcf --variant sample_15183_5/sample_15183_5.g.vcf --variant sample_15639_2/sample_15639_2.g.vcf --variant sample_15640_3/sample_15640_3.g.vcf --variant sample_15675_4/sample_15675_4.g.vcf --variant sample_15708_2/sample_15708_2.g.vcf --variant sample_15714_3/sample_15714_3.g.vcf --variant sample_15744_2/sample_15744_2.g.vcf --variant sample_15746_3/sample_15746_3.g.vcf --variant sample_15766_3/sample_15766_3.g.vcf --variant sample_15791_4/sample_15791_4.g.vcf --variant sample_15811_3/sample_15811_3.g.vcf --variant sample_15822_3/sample_15822_3.g.vcf --variant sample_15836_2/sample_15836_2.g.vcf --variant sample_15836_3/sample_15836_3.g.vcf --variant sample_15858_5/sample_15858_5.g.vcf --variant sample_15871_3/sample_15871_3.g.vcf --variant sample_15875_2/sample_15875_2.g.vcf --variant sample_15876_3/sample_15876_3.g.vcf --variant sample_15876_5/sample_15876_5.g.vcf --variant sample_15914_2/sample_15914_2.g.vcf --variant sample_15915_3/sample_15915_3.g.vcf --variant sample_20544_1/sample_20544_1.g.vcf --variant sample_20549_5/sample_20549_5.g.vcf --variant sample_20551_1/sample_20551_1.g.vcf --variant sample_20551_3/sample_20551_3.g.vcf --variant sample_20605_2/sample_20605_2.g.vcf --variant sample_20605_4/sample_20605_4.g.vcf --variant sample_20605_5/sample_20605_5.g.vcf --variant sample_20836_5/sample_20836_5.g.vcf --variant sample_20952_2/sample_20952_2.g.vcf -o /media/elab/Seagate_Expansion_Drive_1/ms_exomes/final/all.combined.vcf 
INFO  09:32:54,689 HelpFormatter - Executing as elab@elab on Linux 4.4.0-87-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.2-b11. 
INFO  09:32:54,689 HelpFormatter - Date/Time: 2017/07/27 09:32:54 
INFO  09:32:54,690 HelpFormatter - --------------------------------------------------------------------------------------------- 
INFO  09:32:54,690 HelpFormatter - --------------------------------------------------------------------------------------------- 
ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/home/elab/bin/gatk/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
INFO  09:32:54,857 GenomeAnalysisEngine - Deflater: IntelDeflater 
INFO  09:32:54,857 GenomeAnalysisEngine - Inflater: IntelInflater 
INFO  09:32:54,858 GenomeAnalysisEngine - Strictness is SILENT 
INFO  09:32:54,974 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
##### ERROR --
##### ERROR stack trace 
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:187)
    at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:165)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadFromDisk(RMDTrackBuilder.java:375)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.attemptToLockAndLoadIndexFromDisk(RMDTrackBuilder.java:359)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:319)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:264)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:153)
    at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.<init>(ReferenceOrderedDataSource.java:208)
    at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.<init>(ReferenceOrderedDataSource.java:88)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:1074)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:851)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:294)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:181)
    ... 15 more
Caused by: java.io.EOFException
    at htsjdk.tribble.util.LittleEndianInputStream.readFully(LittleEndianInputStream.java:138)
    at htsjdk.tribble.util.LittleEndianInputStream.readLong(LittleEndianInputStream.java:80)
    at htsjdk.tribble.index.interval.IntervalTreeIndex$ChrIndex.read(IntervalTreeIndex.java:203)
    at htsjdk.tribble.index.AbstractIndex.read(AbstractIndex.java:367)
    at htsjdk.tribble.index.interval.IntervalTreeIndex.<init>(IntervalTreeIndex.java:52)
    ... 20 more
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2017-05-17-g44b6fa2):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: java.lang.reflect.InvocationTargetException
##### ERROR ------------------------------------------------------------------------------------------

Notably however, if I feed in only a smaller subset of the gvcfs, I don't get this error.

In 4.0 on the other hand, it doesn't look like it can handle multiple gvcf inputs:

***********************************************************************

A USER ERROR has occurred: Argument '[V, variant]' cannot be specified more than once.

***********************************************************************

Is this just changed syntax between 3.7 and 4.0, or can 4.0 genuinely not perform joint genotyping on multiple gvcfs?

Thanks,
Scott

Tagged:

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Accepted Answer

    Ah, the log4j thing is probably a red herring -- that's a minor logging thing that shouldn't cause the run to actually fail, it just pollutes the log output (I think I saw some chatter about getting that fixed).

    Possibly more relevant, the program is failing in a function that is supposed to read an index file, and one of the error types along the way is java.io.EOFException which translates to "end of file" -- this is generally seen when you have a file that is corrupted or incomplete. I would recommend checking the index files for all your GVCFs. Maybe try running on just one that you know is valid (eg because you can run a different tool on it, like ValidateVariants) and see if that works ok. If so it's a matter of checking all your files for a rotten index, and regenerating the bad file(s).

    I could be wrong but that seems the most likely problem/solution based on these errors.

Answers

  • EADGEADG KielMember

    Hi @scottyler89,

    with the GATK4 Issue maybe I can help...If I remember right from the gatk-Workshop(@Geraldine_VdAuwera) they change the philosophy to GenoType GVCFs in GATK4. So you just use: MergeVCFs ( a PICARD-Tool) first to merge your GVCFs and then run GenotypeGVCFs over the resulting gvcf.

    An example how to run MergeVCF:

    java -Xmx2g -jar /usr/gitc/picard.jar \
    MergeVcfs \
    INPUT=${sep=' INPUT=' input_vcfs} \
    OUTPUT=${output_vcf_name}
    

    Shamelessly stolen from the official pipeline ;)

    Maybe the tool-description should be altered.
    Perform joint genotyping on one or more samples pre-called with HaplotypeCaller
    I think it is a leftover from the 3.7 Version

    Hope this helps...

    Greetings EADG

  • scottyler89scottyler89 University of IowaMember

    Thanks for the advice. I've had some difficulty and bugs giving that a try as well unfortunately. It looks like CombineGVCFs has been deprecated in GATK4? I tried using it in 3.7, and got the following bug:

    java -Xmx124g -jar ~/bin/gatk/GenomeAnalysisTK.jar -T CombineGVCFs -R /home/elab/references/mus_musculus/Mus_musculus.GRCm38.dna_sm.fa --variant Sample_147003/Sample_147003.g.vcf --variant Sample_148112/Sample_148112.g.vcf --variant Sample_151203/Sample_151203.g.vcf --variant Sample_206082/Sample_206082.g.vcf --variant Sample_206083/Sample_206083.g.vcf --variant Sample_212034/Sample_212034.g.vcf --variant Sample_213965/Sample_213965.g.vcf --variant Sample_214051/Sample_214051.g.vcf --variant sample_14831_3/sample_14831_3.g.vcf --variant sample_14851_3/sample_14851_3.g.vcf --variant sample_15183_5/sample_15183_5.g.vcf --variant sample_15639_2/sample_15639_2.g.vcf --variant sample_15640_3/sample_15640_3.g.vcf --variant sample_15675_4/sample_15675_4.g.vcf --variant sample_15708_2/sample_15708_2.g.vcf --variant sample_15714_3/sample_15714_3.g.vcf --variant sample_15744_2/sample_15744_2.g.vcf --variant sample_15746_3/sample_15746_3.g.vcf --variant sample_15766_3/sample_15766_3.g.vcf --variant sample_15791_4/sample_15791_4.g.vcf --variant sample_15811_3/sample_15811_3.g.vcf --variant sample_15822_3/sample_15822_3.g.vcf --variant sample_15836_2/sample_15836_2.g.vcf --variant sample_15836_3/sample_15836_3.g.vcf --variant sample_15858_5/sample_15858_5.g.vcf --variant sample_15871_3/sample_15871_3.g.vcf --variant sample_15875_2/sample_15875_2.g.vcf --variant sample_15876_3/sample_15876_3.g.vcf --variant sample_15876_5/sample_15876_5.g.vcf --variant sample_15914_2/sample_15914_2.g.vcf --variant sample_15915_3/sample_15915_3.g.vcf --variant sample_20544_1/sample_20544_1.g.vcf --variant sample_20549_5/sample_20549_5.g.vcf --variant sample_20551_1/sample_20551_1.g.vcf --variant sample_20551_3/sample_20551_3.g.vcf --variant sample_20605_2/sample_20605_2.g.vcf --variant sample_20605_4/sample_20605_4.g.vcf --variant sample_20605_5/sample_20605_5.g.vcf --variant sample_20836_5/sample_20836_5.g.vcf --variant sample_20952_2/sample_20952_2.g.vcf -o /media/elab/Seagate_Expansion_Drive_1/ms_exomes/final/all.combined.raw.g.vcfINFO  08:40:02,922 HelpFormatter - --------------------------------------------------------------------------------------------- 
    INFO  08:40:02,924 HelpFormatter - The Genome Analysis Toolkit (GATK) vnightly-2017-05-17-g44b6fa2, Compiled 2017/05/17 00:01:17 
    INFO  08:40:02,924 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
    INFO  08:40:02,924 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
    INFO  08:40:02,925 HelpFormatter - [Fri Jul 28 08:40:02 CDT 2017] Executing on Linux 4.4.0-87-generic amd64 
    INFO  08:40:02,925 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.2-b11 
    INFO  08:40:02,928 HelpFormatter - Program Args: -T CombineGVCFs -R /home/elab/references/mus_musculus/Mus_musculus.GRCm38.dna_sm.fa --variant Sample_147003/Sample_147003.g.vcf --variant Sample_148112/Sample_148112.g.vcf --variant Sample_151203/Sample_151203.g.vcf --variant Sample_206082/Sample_206082.g.vcf --variant Sample_206083/Sample_206083.g.vcf --variant Sample_212034/Sample_212034.g.vcf --variant Sample_213965/Sample_213965.g.vcf --variant Sample_214051/Sample_214051.g.vcf --variant sample_14831_3/sample_14831_3.g.vcf --variant sample_14851_3/sample_14851_3.g.vcf --variant sample_15183_5/sample_15183_5.g.vcf --variant sample_15639_2/sample_15639_2.g.vcf --variant sample_15640_3/sample_15640_3.g.vcf --variant sample_15675_4/sample_15675_4.g.vcf --variant sample_15708_2/sample_15708_2.g.vcf --variant sample_15714_3/sample_15714_3.g.vcf --variant sample_15744_2/sample_15744_2.g.vcf --variant sample_15746_3/sample_15746_3.g.vcf --variant sample_15766_3/sample_15766_3.g.vcf --variant sample_15791_4/sample_15791_4.g.vcf --variant sample_15811_3/sample_15811_3.g.vcf --variant sample_15822_3/sample_15822_3.g.vcf --variant sample_15836_2/sample_15836_2.g.vcf --variant sample_15836_3/sample_15836_3.g.vcf --variant sample_15858_5/sample_15858_5.g.vcf --variant sample_15871_3/sample_15871_3.g.vcf --variant sample_15875_2/sample_15875_2.g.vcf --variant sample_15876_3/sample_15876_3.g.vcf --variant sample_15876_5/sample_15876_5.g.vcf --variant sample_15914_2/sample_15914_2.g.vcf --variant sample_15915_3/sample_15915_3.g.vcf --variant sample_20544_1/sample_20544_1.g.vcf --variant sample_20549_5/sample_20549_5.g.vcf --variant sample_20551_1/sample_20551_1.g.vcf --variant sample_20551_3/sample_20551_3.g.vcf --variant sample_20605_2/sample_20605_2.g.vcf --variant sample_20605_4/sample_20605_4.g.vcf --variant sample_20605_5/sample_20605_5.g.vcf --variant sample_20836_5/sample_20836_5.g.vcf --variant sample_20952_2/sample_20952_2.g.vcf -o /media/elab/Seagate_Expansion_Drive_1/ms_exomes/final/all.combined.raw.g.vcf 
    INFO  08:40:02,931 HelpFormatter - Executing as elab@elab on Linux 4.4.0-87-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.2-b11. 
    INFO  08:40:02,931 HelpFormatter - Date/Time: 2017/07/28 08:40:02 
    INFO  08:40:02,931 HelpFormatter - --------------------------------------------------------------------------------------------- 
    INFO  08:40:02,931 HelpFormatter - --------------------------------------------------------------------------------------------- 
    ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/home/elab/bin/gatk/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
    ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
    INFO  08:40:03,512 GenomeAnalysisEngine - Deflater: IntelDeflater 
    INFO  08:40:03,512 GenomeAnalysisEngine - Inflater: IntelInflater 
    INFO  08:40:03,512 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  08:40:04,237 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
    ##### ERROR --
    ##### ERROR stack trace 
    java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:187)
        at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:165)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadFromDisk(RMDTrackBuilder.java:375)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.attemptToLockAndLoadIndexFromDisk(RMDTrackBuilder.java:359)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:319)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:264)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:153)
        at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.<init>(ReferenceOrderedDataSource.java:208)
        at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.<init>(ReferenceOrderedDataSource.java:88)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:1074)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:851)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:294)
        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
    Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:181)
        ... 15 more
    Caused by: java.io.EOFException
        at htsjdk.tribble.util.LittleEndianInputStream.readFully(LittleEndianInputStream.java:138)
        at htsjdk.tribble.util.LittleEndianInputStream.readLong(LittleEndianInputStream.java:80)
        at htsjdk.tribble.index.interval.IntervalTreeIndex$ChrIndex.read(IntervalTreeIndex.java:203)
        at htsjdk.tribble.index.AbstractIndex.read(AbstractIndex.java:367)
        at htsjdk.tribble.index.interval.IntervalTreeIndex.<init>(IntervalTreeIndex.java:52)
        ... 20 more
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2017-05-17-g44b6fa2):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: java.lang.reflect.InvocationTargetException
    ##### ERROR -----------------------------------------------------------------------------------------
    

    When I tried using picard, I got this error:

    Input file /media/elab/Seagate_Expansion_Drive_1/ms_exomes/final/Sample_148112/Sample_148112.g.vcf has sample entries that don't match the other files.
    

    I'm assuming this is because the output from HaplotypeCaller (at least using the parameters I had), didn't yield base level calls. I tried again to use the 3.7 CombineGVCFs with the --convertToBasePairResolution flag but got the same error as calling it without.

    Thanks again for your help!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @scottyler89 You understood correctly that in GATK4, GenotypeGVCFs only takes a single input. And indeed, CombineGVCFs is gone, because it was a horribly inefficient tool. Instead, we have a tool called GenomicsDBImport that takes in all your GVCFs and produces a database (really a directory with a bunch of files) that you can then provide as input to GenotypeGVCFs. See this document: https://software.broadinstitute.org/gatk/documentation/article?id=10061

    The error you got with 3.7 reminds me of a bug that was fixed a little while ago so you might have better luck with a more recent nightly build, or the 3.8 version which I'm trying to get out right now (having a few technical difficulties but it might be ready by the time you read this). But the GATK4 version is better anyway so if you're willing to upgrade while it's still in beta, I would recommend using that.

    By the way, what @EADG described is the procedure for merging GVCFs produced from the same sample by scatter over genomic intervals. It will not work to prepare multiple sample GVCFs for input to GenotypeGVCFs.

  • scottyler89scottyler89 University of IowaMember

    Thanks for the help Geraldine! I gave the beta a try with GenomicsDBImport, and ended up getting an error unfortunately.

    java -Xmx124g -jar ~/Downloads/gatk-4.beta.3-SNAPSHOT/gatk-package-4.beta.3-SNAPSHOT-local.jar GenomicsDBImport --intervals 20 --genomicsDBWorkspace /media/elab/Seagate_Expansion_Drive_1/ms_exomes/final/all.combined.vcf.gendb --variant Sample_147003/Sample_147003.g.vcf --variant Sample_148112/Sample_148112.g.vcf --variant Sample_151203/Sample_151203.g.vcf --variant Sample_206082/Sample_206082.g.vcf --variant Sample_206083/Sample_206083.g.vcf --variant Sample_212034/Sample_212034.g.vcf --variant Sample_213965/Sample_213965.g.vcf --variant Sample_214051/Sample_214051.g.vcf --variant sample_14831_3/sample_14831_3.g.vcf --variant sample_14851_3/sample_14851_3.g.vcf --variant sample_15183_5/sample_15183_5.g.vcf --variant sample_15639_2/sample_15639_2.g.vcf --variant sample_15640_3/sample_15640_3.g.vcf --variant sample_15675_4/sample_15675_4.g.vcf --variant sample_15708_2/sample_15708_2.g.vcf --variant sample_15714_3/sample_15714_3.g.vcf --variant sample_15744_2/sample_15744_2.g.vcf --variant sample_15746_3/sample_15746_3.g.vcf --variant sample_15766_3/sample_15766_3.g.vcf --variant sample_15791_4/sample_15791_4.g.vcf --variant sample_15811_3/sample_15811_3.g.vcf --variant sample_15822_3/sample_15822_3.g.vcf --variant sample_15836_2/sample_15836_2.g.vcf --variant sample_15836_3/sample_15836_3.g.vcf --variant sample_15858_5/sample_15858_5.g.vcf --variant sample_15871_3/sample_15871_3.g.vcf --variant sample_15875_2/sample_15875_2.g.vcf --variant sample_15876_3/sample_15876_3.g.vcf --variant sample_15876_5/sample_15876_5.g.vcf --variant sample_15914_2/sample_15914_2.g.vcf --variant sample_15915_3/sample_15915_3.g.vcf --variant sample_20544_1/sample_20544_1.g.vcf -V sample_20549_5/sample_20549_5.g.vcf --variant sample_20551_1/sample_20551_1.g.vcf -V sample_20551_3/sample_20551_3.g.vcf --variant sample_20605_2/sample_20605_2.g.vcf -V sample_20605_4/sample_20605_4.g.vcf --variant sample_20605_5/sample_20605_5.g.vcf -V sample_20836_5/sample_20836_5.g.vcf --variant sample_20952_2/sample_20952_2.g.vcf 
    13:57:03.698 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/elab/Downloads/gatk-4.beta.3-SNAPSHOT/gatk-package-4.beta.3-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
    [July 31, 2017 1:57:03 PM CDT] GenomicsDBImport  --genomicsDBWorkspace /media/elab/Seagate_Expansion_Drive_1/ms_exomes/final/all.combined.vcf.gendb --variant Sample_147003/Sample_147003.g.vcf --variant Sample_148112/Sample_148112.g.vcf --variant Sample_151203/Sample_151203.g.vcf --variant Sample_206082/Sample_206082.g.vcf --variant Sample_206083/Sample_206083.g.vcf --variant Sample_212034/Sample_212034.g.vcf --variant Sample_213965/Sample_213965.g.vcf --variant Sample_214051/Sample_214051.g.vcf --variant sample_14831_3/sample_14831_3.g.vcf --variant sample_14851_3/sample_14851_3.g.vcf --variant sample_15183_5/sample_15183_5.g.vcf --variant sample_15639_2/sample_15639_2.g.vcf --variant sample_15640_3/sample_15640_3.g.vcf --variant sample_15675_4/sample_15675_4.g.vcf --variant sample_15708_2/sample_15708_2.g.vcf --variant sample_15714_3/sample_15714_3.g.vcf --variant sample_15744_2/sample_15744_2.g.vcf --variant sample_15746_3/sample_15746_3.g.vcf --variant sample_15766_3/sample_15766_3.g.vcf --variant sample_15791_4/sample_15791_4.g.vcf --variant sample_15811_3/sample_15811_3.g.vcf --variant sample_15822_3/sample_15822_3.g.vcf --variant sample_15836_2/sample_15836_2.g.vcf --variant sample_15836_3/sample_15836_3.g.vcf --variant sample_15858_5/sample_15858_5.g.vcf --variant sample_15871_3/sample_15871_3.g.vcf --variant sample_15875_2/sample_15875_2.g.vcf --variant sample_15876_3/sample_15876_3.g.vcf --variant sample_15876_5/sample_15876_5.g.vcf --variant sample_15914_2/sample_15914_2.g.vcf --variant sample_15915_3/sample_15915_3.g.vcf --variant sample_20544_1/sample_20544_1.g.vcf --variant sample_20549_5/sample_20549_5.g.vcf --variant sample_20551_1/sample_20551_1.g.vcf --variant sample_20551_3/sample_20551_3.g.vcf --variant sample_20605_2/sample_20605_2.g.vcf --variant sample_20605_4/sample_20605_4.g.vcf --variant sample_20605_5/sample_20605_5.g.vcf --variant sample_20836_5/sample_20836_5.g.vcf --variant sample_20952_2/sample_20952_2.g.vcf --intervals 20  --genomicsDBSegmentSize 1048576 --genomicsDBVCFBufferSize 16384 --overwriteExistingGenomicsDBWorkspace false --batchSize 0 --consolidate false --validateSampleNameMap false --readerThreads 1 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVariantIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 0 --cloudIndexPrefetchBuffer 0 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --disableToolDefaultReadFilters false
    [July 31, 2017 1:57:03 PM CDT] Executing as elab@elab on Linux 4.4.0-87-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.2-b11; Version: 4.beta.3-SNAPSHOT
    13:57:03.868 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 5
    13:57:03.868 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    13:57:03.868 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false
    13:57:03.868 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    13:57:03.868 INFO  GenomicsDBImport - Deflater: IntelDeflater
    13:57:03.868 INFO  GenomicsDBImport - Inflater: IntelInflater
    13:57:03.869 INFO  GenomicsDBImport - GCS max retries/reopens: 20
    13:57:03.869 INFO  GenomicsDBImport - Using google-cloud-java patch 317951be3c2e898e3916a4b1abf5a9c220d84df8
    13:57:03.869 INFO  GenomicsDBImport - Initializing engine
    13:57:05.879 INFO  GenomicsDBImport - Shutting down engine
    [July 31, 2017 1:57:05 PM CDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.04 minutes.
    Runtime.totalMemory()=1583874048
    java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:187)
        at htsjdk.tribble.TribbleIndexedFeatureReader.loadIndex(TribbleIndexedFeatureReader.java:163)
        at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:132)
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:110)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:510)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getHeaderFromPath(GenomicsDBImport.java:278)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.initializeHeaderAndSampleMappings(GenomicsDBImport.java:249)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.onStartup(GenomicsDBImport.java:227)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:114)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:173)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:192)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
        at org.broadinstitute.hellbender.Main.main(Main.java:233)
    Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:181)
        ... 13 more
    Caused by: java.io.EOFException
        at htsjdk.tribble.util.LittleEndianInputStream.readFully(LittleEndianInputStream.java:138)
        at htsjdk.tribble.util.LittleEndianInputStream.readLong(LittleEndianInputStream.java:80)
        at htsjdk.tribble.index.interval.IntervalTreeIndex$ChrIndex.read(IntervalTreeIndex.java:219)
        at htsjdk.tribble.index.AbstractIndex.read(AbstractIndex.java:404)
        at htsjdk.tribble.index.interval.IntervalTreeIndex.<init>(IntervalTreeIndex.java:53)
        ... 18 more
    

    I also tried running CombineGVCFs in a more recent nightly build, but got a similar lib4j2 related error. Same with the production version of 3.8

    java -Xmx124g -jar '/home/elab/Downloads/gatk/GenomeAnalysisTK.jar'    -T CombineGVCFs -R /home/elab/references/mus_musculus/Mus_musculus.GRCm38.dna_sm.fa --variant Sample_147003/Sample_147003.g.vcf --variant Sample_148112/Sample_148112.g.vcf --variant Sample_151203/Sample_151203.g.vcf --variant Sample_206082/Sample_206082.g.vcf --variant Sample_206083/Sample_206083.g.vcf --variant Sample_212034/Sample_212034.g.vcf --variant Sample_213965/Sample_213965.g.vcf --variant Sample_214051/Sample_214051.g.vcf --variant sample_14831_3/sample_14831_3.g.vcf --variant sample_14851_3/sample_14851_3.g.vcf --variant sample_15183_5/sample_15183_5.g.vcf --variant sample_15639_2/sample_15639_2.g.vcf --variant sample_15640_3/sample_15640_3.g.vcf --variant sample_15675_4/sample_15675_4.g.vcf --variant sample_15708_2/sample_15708_2.g.vcf --variant sample_15714_3/sample_15714_3.g.vcf --variant sample_15744_2/sample_15744_2.g.vcf --variant sample_15746_3/sample_15746_3.g.vcf --variant sample_15766_3/sample_15766_3.g.vcf --variant sample_15791_4/sample_15791_4.g.vcf --variant sample_15811_3/sample_15811_3.g.vcf --variant sample_15822_3/sample_15822_3.g.vcf --variant sample_15836_2/sample_15836_2.g.vcf --variant sample_15836_3/sample_15836_3.g.vcf --variant sample_15858_5/sample_15858_5.g.vcf --variant sample_15871_3/sample_15871_3.g.vcf --variant sample_15875_2/sample_15875_2.g.vcf --variant sample_15876_3/sample_15876_3.g.vcf --variant sample_15876_5/sample_15876_5.g.vcf --variant sample_15914_2/sample_15914_2.g.vcf --variant sample_15915_3/sample_15915_3.g.vcf --variant sample_20544_1/sample_20544_1.g.vcf --variant sample_20549_5/sample_20549_5.g.vcf --variant sample_20551_1/sample_20551_1.g.vcf --variant sample_20551_3/sample_20551_3.g.vcf --variant sample_20605_2/sample_20605_2.g.vcf --variant sample_20605_4/sample_20605_4.g.vcf --variant sample_20605_5/sample_20605_5.g.vcf --variant sample_20836_5/sample_20836_5.g.vcf --variant sample_20952_2/sample_20952_2.g.vcf -o /media/elab/Seagate_Expansion_Drive_1/ms_exomes/final/all.combined.raw.g.vcf
    INFO  14:04:14,831 HelpFormatter - --------------------------------------------------------------------------------------------- 
    INFO  14:04:14,834 HelpFormatter - The Genome Analysis Toolkit (GATK) vnightly-2017-07-29-g8c82c73, Compiled 2017/07/29 00:01:14 
    INFO  14:04:14,834 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
    INFO  14:04:14,834 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
    INFO  14:04:14,834 HelpFormatter - [Mon Jul 31 14:04:14 CDT 2017] Executing on Linux 4.4.0-87-generic amd64 
    INFO  14:04:14,835 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.2-b11 
    INFO  14:04:14,837 HelpFormatter - Program Args: -T CombineGVCFs -R /home/elab/references/mus_musculus/Mus_musculus.GRCm38.dna_sm.fa --variant Sample_147003/Sample_147003.g.vcf --variant Sample_148112/Sample_148112.g.vcf --variant Sample_151203/Sample_151203.g.vcf --variant Sample_206082/Sample_206082.g.vcf --variant Sample_206083/Sample_206083.g.vcf --variant Sample_212034/Sample_212034.g.vcf --variant Sample_213965/Sample_213965.g.vcf --variant Sample_214051/Sample_214051.g.vcf --variant sample_14831_3/sample_14831_3.g.vcf --variant sample_14851_3/sample_14851_3.g.vcf --variant sample_15183_5/sample_15183_5.g.vcf --variant sample_15639_2/sample_15639_2.g.vcf --variant sample_15640_3/sample_15640_3.g.vcf --variant sample_15675_4/sample_15675_4.g.vcf --variant sample_15708_2/sample_15708_2.g.vcf --variant sample_15714_3/sample_15714_3.g.vcf --variant sample_15744_2/sample_15744_2.g.vcf --variant sample_15746_3/sample_15746_3.g.vcf --variant sample_15766_3/sample_15766_3.g.vcf --variant sample_15791_4/sample_15791_4.g.vcf --variant sample_15811_3/sample_15811_3.g.vcf --variant sample_15822_3/sample_15822_3.g.vcf --variant sample_15836_2/sample_15836_2.g.vcf --variant sample_15836_3/sample_15836_3.g.vcf --variant sample_15858_5/sample_15858_5.g.vcf --variant sample_15871_3/sample_15871_3.g.vcf --variant sample_15875_2/sample_15875_2.g.vcf --variant sample_15876_3/sample_15876_3.g.vcf --variant sample_15876_5/sample_15876_5.g.vcf --variant sample_15914_2/sample_15914_2.g.vcf --variant sample_15915_3/sample_15915_3.g.vcf --variant sample_20544_1/sample_20544_1.g.vcf --variant sample_20549_5/sample_20549_5.g.vcf --variant sample_20551_1/sample_20551_1.g.vcf --variant sample_20551_3/sample_20551_3.g.vcf --variant sample_20605_2/sample_20605_2.g.vcf --variant sample_20605_4/sample_20605_4.g.vcf --variant sample_20605_5/sample_20605_5.g.vcf --variant sample_20836_5/sample_20836_5.g.vcf --variant sample_20952_2/sample_20952_2.g.vcf -o /media/elab/Seagate_Expansion_Drive_1/ms_exomes/final/all.combined.raw.g.vcf 
    INFO  14:04:14,840 HelpFormatter - Executing as elab@elab on Linux 4.4.0-87-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.2-b11. 
    INFO  14:04:14,841 HelpFormatter - Date/Time: 2017/07/31 14:04:14 
    INFO  14:04:14,841 HelpFormatter - --------------------------------------------------------------------------------------------- 
    INFO  14:04:14,841 HelpFormatter - --------------------------------------------------------------------------------------------- 
    ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/home/elab/Downloads/gatk/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
    ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
    INFO  14:04:15,135 GenomeAnalysisEngine - Deflater: IntelDeflater 
    INFO  14:04:15,136 GenomeAnalysisEngine - Inflater: IntelInflater 
    INFO  14:04:15,136 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  14:04:15,837 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
    ##### ERROR --
    ##### ERROR stack trace 
    java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:187)
        at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:165)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadFromDisk(RMDTrackBuilder.java:375)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.attemptToLockAndLoadIndexFromDisk(RMDTrackBuilder.java:359)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:319)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:264)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:153)
        at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.<init>(ReferenceOrderedDataSource.java:208)
        at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.<init>(ReferenceOrderedDataSource.java:88)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:1071)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:851)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:294)
        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
    Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:181)
        ... 15 more
    Caused by: java.io.EOFException
        at htsjdk.tribble.util.LittleEndianInputStream.readFully(LittleEndianInputStream.java:138)
        at htsjdk.tribble.util.LittleEndianInputStream.readLong(LittleEndianInputStream.java:80)
        at htsjdk.tribble.index.interval.IntervalTreeIndex$ChrIndex.read(IntervalTreeIndex.java:219)
        at htsjdk.tribble.index.AbstractIndex.read(AbstractIndex.java:404)
        at htsjdk.tribble.index.interval.IntervalTreeIndex.<init>(IntervalTreeIndex.java:53)
        ... 20 more
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2017-07-29-g8c82c73):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: java.lang.reflect.InvocationTargetException
    ##### ERROR ------------------------------------------------------------------------------------------
    

    Thanks again for all your help!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Accepted Answer

    Ah, the log4j thing is probably a red herring -- that's a minor logging thing that shouldn't cause the run to actually fail, it just pollutes the log output (I think I saw some chatter about getting that fixed).

    Possibly more relevant, the program is failing in a function that is supposed to read an index file, and one of the error types along the way is java.io.EOFException which translates to "end of file" -- this is generally seen when you have a file that is corrupted or incomplete. I would recommend checking the index files for all your GVCFs. Maybe try running on just one that you know is valid (eg because you can run a different tool on it, like ValidateVariants) and see if that works ok. If so it's a matter of checking all your files for a rotten index, and regenerating the bad file(s).

    I could be wrong but that seems the most likely problem/solution based on these errors.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    (which I realize I did not catch in your original message; sorry about that)

  • scottyler89scottyler89 University of IowaMember

    Thanks Geraldine - I'll check those. It may also explain why CombineGVCFs worked when I tried it on a smaller subset. I may have just arbitrarily excluded the VCF with culprit index. I'll let you know if that fixes it. Thanks again

  • scottyler89scottyler89 University of IowaMember

    The VCF was the problem! Thanks so much for your help. I just re-did the variant calling on the one sample that was causing problems, and got my pipeline up and running again.

    Best,
    Scott

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Wheee, excellent, we love to hear about problems being solved :)

Sign In or Register to comment.