Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Possible bug - either HaplotypeCaller or CombinegVCFs

Hello again,
I have finished processing and combining most of my gVCF files, but stumbled on combining the last batch...

The individual samples were called with HaplotypeCaller (GATK 4.0). Then I tried to combine them with combineGVCFs (GATK 4.0) - it gave me a following error:

15:01:40.127 INFO CombineGVCFs - Shutting down engine [17 May 2018 15:01:40 BST] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 257.39 minutes. Runtime.totalMemory()=75151441920 htsjdk.samtools.util.RuntimeIOException: /exports/2222.variants.g.vcf.gz has invalid uncompressedLength: -1589094972 at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:543) at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532) at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468) at htsjdk.samtools.util.BlockCompressedInputStream.seek(BlockCompressedInputStream.java:380) at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:427) at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46) at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170) at htsjdk.tribble.TabixFeatureReader$FeatureIterator.<init>(TabixFeatureReader.java:159) at htsjdk.tribble.TabixFeatureReader.query(TabixFeatureReader.java:133) at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.queryNextInterval(FeatureIntervalIterator.java:135) at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.loadNextFeature(FeatureIntervalIterator.java:92) at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.loadNextNovelFeature(FeatureIntervalIterator.java:74) at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.<init>(FeatureIntervalIterator.java:47) at org.broadinstitute.hellbender.engine.FeatureDataSource.iterator(FeatureDataSource.java:462) at org.broadinstitute.hellbender.engine.MultiVariantDataSource.lambda$iterator$2(MultiVariantDataSource.java:157) at org.broadinstitute.hellbender.engine.MultiVariantDataSource.lambda$getMergedIteratorFromDataSources$4(MultiVariantDataSource.java:196) at java.util.ArrayList.forEach(ArrayList.java:1257) at org.broadinstitute.hellbender.engine.MultiVariantDataSource.getMergedIteratorFromDataSources(MultiVariantDataSource.java:196) at org.broadinstitute.hellbender.engine.MultiVariantDataSource.iterator(MultiVariantDataSource.java:157) at java.lang.Iterable.spliterator(Iterable.java:101) at org.broadinstitute.hellbender.engine.MultiVariantWalker.getSpliteratorForDrivingVariants(MultiVariantWalker.java:41) at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:106) at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.traverse(MultiVariantWalkerGroupedOnStart.java:118) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:159) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:202) at org.broadinstitute.hellbender.Main.main(Main.java:288)

I've checked the variant file again - there is an index file created for this sample, there were no error messages in its creation (though I have lost the log file), running vcftools on it works fine and the visual inspection of the file doesn't show anything weird - it stops at the same location as in other samples' vcfs. I assumed this was a bug with combinegVCF in GATK 4.0 so went back to GATK 3.5 and now I am getting this error:

```

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: Unable to create iterator for rod named variant25
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.createIteratorFromResource(ReferenceOrderedDataSource.java:248)
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.createIteratorFromResource(ReferenceOrderedDataSource.java:185)
at org.broadinstitute.gatk.engine.datasources.rmd.ResourcePool.iterator(ResourcePool.java:93)
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.seek(ReferenceOrderedDataSource.java:168)
at org.broadinstitute.gatk.engine.datasources.providers.RodLocusView.(RodLocusView.java:82)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.getLocusView(TraverseLociNano.java:129)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:80)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)
Caused by: htsjdk.samtools.SAMFormatException: Invalid GZIP header
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:72)
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:402)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:384)
at htsjdk.samtools.util.BlockCompressedInputStream.seek(BlockCompressedInputStream.java:292)
at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:382)
at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:45)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:162)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.(TabixFeatureReader.java:150)
at htsjdk.tribble.TabixFeatureReader.query(TabixFeatureReader.java:125)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrack.query(RMDTrack.java:119)
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.createIteratorFromResource(ReferenceOrderedDataSource.java:241)
... 13 more

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Unable to create iterator for rod named variant25
ERROR ---------------------------------------------------------------------------

```

So it appears that variant caller did truncate at some point, but without flagging it and still produced the index file?
Is it possible that the job died during creation of the index file? Could I possibly try to recreate the tbi index before I re-submit the whole HaplotypeCaller for this sample?

Answers

Sign In or Register to comment.