GATK Runtime Error on GenotypeGVCFs: java.lang.Double cannot be cast to java.lang.Integer

james_lawlorjames_lawlor Huntsville, ALMember

Hi GATK Team,
I've run into the following error when trying to genotype ~1200 GVCFs:

java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Integer

I've replicated this error on both version nightly-2017-11-22-1 and v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50 using OpenJDK 64-Bit Server VM 1.8.0_65-b17 running on CentOS7.

This error has only occurred on Chromosome 10--all others have run correctly or are still pending (chr14). All input files are gzipped and tabix-indexed and I've verified that I can properly access all 1202 using tabix.

My best guess would be that there's some unexpected input buried in a line in one of the GVCFs--I've found this on a small subset of these data where I ran into tabix issues due to an occasional malformed line, which I then fixed--but I have no idea how I'd easily determine the culprit.

GATK input and output below (trimmed to remove repeated input flags and warnings).

Any guidance would be much appreciated.
Thank you!

Sender: LSF System <[email protected]>
Subject: Job 388878: <batchcall.vcf_10> in cluster <helion-poc> Exited

Job <batchcall.vcf_10> was submitted from host <login01> by user <jlawlor> in cluster <helion-poc>.
Job was executed on host(s) <16*hpc0010>, in queue <c7normal>, as user <jlawlor> in cluster <helion-poc>.
</gpfs/gpfs1/home/jlawlor> was used as the home directory.
</gpfs/gpfs2/cooperlab/CSER_batch_calls/cser_wgs_batch_cumulative_201710> was used as the working directory.
Started at Wed Nov 22 11:28:18 2017
Results reported on Wed Nov 22 11:32:14 2017

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
java -Xmx385g -Xms385g -jar /gpfs/gpfs2/cooperlab/test_batch/inadvisable_batch_call/nightly_1122/GenomeAnalysisTK.jar   -T GenotypeGVCFs    -R /gpfs/gpfs1/myerslab/reference/genomes/bwa-0.7.8/GRCh37.fa   -nt 16 -L 10 -o batchcall.vcf_10.vcf -V /gpfs/gpfs2/cooperlab/CSER_batch_calls/cser_wgs_batch_cumulative_201710/SL102359_10.g.vcf.gz -V /gpfs/gpfs2/cooperlab/CSER_batch_calls/cser_wgs_batch_cumulative_201710/SL102360_10.g.vcf.gz [ REPEATED FOR 1200 MORE SAMPLES ] 

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   573.35 sec.
    Max Memory :                                 131697 MB
    Average Memory :                             31937.71 MB
    Total Requested Memory :                     460800.00 MB
    Delta Memory :                               329103.00 MB
    Max Processes :                              3
    Max Threads :                                89

The output (if any) follows:

INFO  11:28:22,217 HelpFormatter - -------------------------------------------------------------------------------------- 
INFO  11:28:22,221 HelpFormatter - The Genome Analysis Toolkit (GATK) vnightly-2017-11-22-1, Compiled 2017/11/22 00:01:18 
INFO  11:28:22,224 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
INFO  11:28:22,224 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
INFO  11:28:22,225 HelpFormatter - [Wed Nov 22 11:28:22 CST 2017] Executing on Linux 3.10.0-327.3.1.el7.x86_64 amd64 
INFO  11:28:22,226 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_65-b17 
INFO  11:28:22,231 HelpFormatter - Program Args: -T GenotypeGVCFs -R /gpfs/gpfs1/myerslab/reference/genomes/bwa-0.7.8/GRCh37.fa -nt 16 -L 10 -o batchcall.vcf_10.vcf -V /gpfs/gpfs2/cooperlab/CSER_batch_calls/cser_wgs_batch_cumulative_201710/SL102359_10.g.vcf.gz -V /gpfs/gpfs2/cooperlab/CSER_batch_calls/cser_wgs_batch_cumulative_201710/SL102360_10.g.vcf.gz [ REPEATED FOR 1200 MORE SAMPLES ] 
INFO  11:28:22,241 HelpFormatter - Executing as [email protected] on Linux 3.10.0-327.3.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_65-b17. 
INFO  11:28:22,242 HelpFormatter - Date/Time: 2017/11/22 11:28:22 
INFO  11:28:22,242 HelpFormatter - -------------------------------------------------------------------------------------- 
INFO  11:28:22,243 HelpFormatter - -------------------------------------------------------------------------------------- 
INFO  11:29:15,805 NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/gpfs2/cooperlab/test_batch/inadvisable_batch_call/nightly_1122/GenomeAnalysisTK.jar!/com/intel/gkl/native/libgkl_compression.so 
INFO  11:29:15,830 GenomeAnalysisEngine - Deflater: IntelDeflater 
INFO  11:29:15,830 GenomeAnalysisEngine - Inflater: IntelInflater 
INFO  11:29:15,831 GenomeAnalysisEngine - Strictness is SILENT 
INFO  11:29:16,120 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
INFO  11:30:40,514 IntervalUtils - Processing 135534747 bp from intervals 
WARN  11:30:40,515 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  11:30:40,515 IndexDictionaryUtils - Track variant2 doesn't have a sequence dictionary built in, skipping dictionary validation 

[ REPEATED FOR 1200 MORE SAMPLES ] 

INFO  11:30:40,654 MicroScheduler - Running the GATK in parallel mode with 16 total threads, 1 CPU thread(s) for each of 16 data thread(s), of 64 processors available on this machine 
INFO  11:30:40,701 GenomeAnalysisEngine - Preparing for traversal 
INFO  11:30:40,702 GenomeAnalysisEngine - Done preparing for traversal 
INFO  11:30:40,702 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  11:30:40,703 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
INFO  11:30:40,703 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
WARN  11:30:42,986 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
WARN  11:30:42,988 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
INFO  11:30:42,988 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files 
INFO  11:31:10,708 ProgressMeter -        10:26601         0.0    30.0 s      49.6 w        0.0%    42.5 h      42.5 h 
WARN  11:31:42,051 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not GenotypeGVCFs 
INFO  11:32:11,121 ProgressMeter -        10:64601         0.0    90.0 s     149.5 w        0.0%    52.5 h      52.4 h 
##### ERROR --
##### ERROR stack trace 
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Integer
    at java.lang.Integer.compareTo(Integer.java:52)
    at java.util.ComparableTimSort.binarySort(ComparableTimSort.java:262)
    at java.util.ComparableTimSort.sort(ComparableTimSort.java:207)
    at java.util.Arrays.sort(Arrays.java:1312)
    at java.util.Arrays.sort(Arrays.java:1506)
    at java.util.ArrayList.sort(ArrayList.java:1454)
    at java.util.Collections.sort(Collections.java:141)
    at org.broadinstitute.gatk.utils.MathUtils.median(MathUtils.java:1010)
    at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.combineAnnotationValues(ReferenceConfidenceVariantContextMerger.java:84)
    at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:206)
    at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:303)
    at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:136)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.ShardTraverser.call(ShardTraverser.java:98)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2017-11-22-1):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: java.lang.Double cannot be cast to java.lang.Integer
##### ERROR ------------------------------------------------------------------------------------------

Best Answer

  • james_lawlorjames_lawlor Huntsville, ALMember
    Accepted Answer

    Oh. This could be the cause of the trouble: In the malformed GVCF, bcftools (or some other process that occurred before I got the file) filled in genotypes.
    E.g.:
    10 60969 rs187110906 C A,<NON_REF> 269.77 . DB;DP=10;MLEAC=2,0;MLEAF=1,0;MQ=27.92;MQ0=0 GT:AD:DP:GQ:PL:SB 1/1:0,10,0:10:30:298,30,0,298,30,298:0,0,4,6

    VS.

    10 60969 rs187110906 C A,<NON_REF> . . DP=10;MQ=27.92;MQ0=0 GT:AD:DP:GQ:PL:SB ./.:0,10,0:10:30:298,30,0,298,30,298:0,0,4,6

    I expect that makes the bad input file not count as a GVCF, or at least behave strangely. I guess the lesson here is always second-guess your input data. :)

    I'm going to go ahead and mark this as closed, since my full batch with the re-generated troublesome file is happily running. Thanks for the help!

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited November 2017

    @james_lawlor
    Hi,

    Does this still occur if you remove -nt 16 from your command?

    Thanks,
    Sheila

  • james_lawlorjames_lawlor Huntsville, ALMember

    Hi @Sheila,
    Yes, the error still occurs if I set -nt 1 in the command. (Let me know if I should also try with the command completely removed, just in case.)

    INFO  16:01:18,345 GenomeAnalysisEngine - Preparing for traversal 
    INFO  16:01:18,347 GenomeAnalysisEngine - Done preparing for traversal 
    INFO  16:01:18,347 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  16:01:18,347 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
    INFO  16:01:18,347 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
    WARN  16:01:20,592 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
    WARN  16:01:20,594 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
    INFO  16:01:20,594 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files 
    INFO  16:01:48,353 ProgressMeter -        10:37101         0.0    30.0 s      49.6 w        0.0%    30.4 h      30.4 h 
    WARN  16:02:05,631 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not GenotypeGVCFs 
    ##### ERROR --
    ##### ERROR stack trace 
    java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Integer
        at java.lang.Integer.compareTo(Integer.java:52)
        at java.util.ComparableTimSort.binarySort(ComparableTimSort.java:262)
        at java.util.ComparableTimSort.sort(ComparableTimSort.java:207)
        at java.util.Arrays.sort(Arrays.java:1312)
        at java.util.Arrays.sort(Arrays.java:1506)
        at java.util.ArrayList.sort(ArrayList.java:1454)
        at java.util.Collections.sort(Collections.java:141)
        at org.broadinstitute.gatk.utils.MathUtils.median(MathUtils.java:1010)
        at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.combineAnnotationValues(ReferenceConfidenceVariantContextMerger.java:84)
        at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:206)
        at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:303)
        at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:136)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
        at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.8-0-ge9d806836):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: java.lang.Double cannot be cast to java.lang.Integer
    ##### ERROR ------------------------------------------------------------------------------------------
    

    Thanks!

  • james_lawlorjames_lawlor Huntsville, ALMember

    (Incidentally, these GVCFs were made using GATK 3.3-0-g37228af, if that makes a difference. That raises it's own question of whether it's appropriate to use a newer version of GATK for genotyping, which I've asked here.)

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @james_lawlor
    Hi,

    Ah, yes I did not mention other issues in my response there, but the best thing to do is not use different versions of GATK :smile:
    Have a look at this article.

    It is true that sometimes you can get away with mixing and matching versions, but you have to be really careful. In this case, it will not work out.

    -Sheila

  • james_lawlorjames_lawlor Huntsville, ALMember

    @Sheila - Thanks! I'm now running into a different error when going back to GATK 3.3.0-37228af:

    INFO  16:15:29,861 HelpFormatter - Executing as [email protected] on Linux 3.10.0-327.3.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_65-b17. 
    INFO  16:15:29,862 HelpFormatter - Date/Time: 2017/11/29 16:15:29 
    INFO  16:15:29,862 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  16:15:29,862 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  16:16:53,749 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  16:16:54,369 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
    INFO  16:18:45,606 GATKRunReport - Uploaded run statistics report to AWS S3 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 3.3-0-g37228af): 
    ##### ERROR
    ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
    ##### ERROR The error message below tells you what is the problem.
    ##### ERROR
    ##### ERROR If the problem is an invalid argument, please check the online documentation guide
    ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ##### ERROR
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ##### ERROR
    ##### ERROR MESSAGE: Unable to parse header with error: For input string: "R", for input source: /gpfs/gpfs2/cooperlab/CSER_batch_calls/cser_wgs_batch_cumulative_201710/SL93455_10.g.vcf.gz
    ##### ERROR ------------------------------------------------------------------------------------------
    

    I'm still investigating why we don't see this in our other pipelines that use GATK 3.3.0, but let me know if there's anything known about this bug/error in this version that I should check.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited November 2017

    @james_lawlor
    Hi,

    So, yes this was a bug that was fixed in the later version :smile: However, there is a simple workaround. You can find help in this thread.

    -Sheila

  • james_lawlorjames_lawlor Huntsville, ALMember

    Thanks--I was able to isolate the samples causing the header error and have temporarily removed them.
    However, now I'm back to the original error. Looks like it's not a version incompatibility--I've verified that every GVCF was created with 3.3-0-g37228af, and I get the same output whether using -nt 1 or -nt 32

    Is there anything else I can do to isolate this problem?
    (I'm not using the CombineGVCFs tool--is this required?)

    INFO  11:03:53,472 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  11:03:53,474 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22 
    INFO  11:03:53,474 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  11:03:53,474 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    INFO  11:03:53,478 HelpFormatter - Program Args: -T GenotypeGVCFs -R /gpfs/gpfs1/myerslab/reference/genomes/bwa-0.7.8/GRCh37.fa -nt 32 -L 10 -o new_test_32_10.vcf -V [...]
    INFO  11:03:53,486 HelpFormatter - Executing as [email protected] on Linux 3.10.0-327.3.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_65-b17. 
    INFO  11:03:53,487 HelpFormatter - Date/Time: 2017/11/30 11:03:53 
    INFO  11:03:53,487 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  11:03:53,487 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  11:05:04,536 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  11:05:04,694 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
    INFO  11:06:52,253 IntervalUtils - Processing 135534747 bp from intervals 
    WARN  11:06:52,255 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation 
    [...]
    INFO  11:06:52,466 MicroScheduler - Running the GATK in parallel mode with 32 total threads, 1 CPU thread(s) for each of 32 data thread(s), of 64 processors available on this machine 
    INFO  11:06:52,512 GenomeAnalysisEngine - Preparing for traversal 
    INFO  11:06:52,513 GenomeAnalysisEngine - Done preparing for traversal 
    INFO  11:06:52,514 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  11:06:52,514 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
    INFO  11:06:52,514 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
    
    INFO  11:06:53,756 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files 
    INFO  11:07:22,517 ProgressMeter -        10:28001         0.0    30.0 s      49.6 w        0.0%    40.3 h      40.3 h 
    INFO  11:08:22,519 ProgressMeter -        10:63901         0.0    90.0 s     148.8 w        0.0%    53.0 h      53.0 h 
    INFO  11:08:29,154 GATKRunReport - Uploaded run statistics report to AWS S3 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace 
    java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Integer
            at java.lang.Integer.compareTo(Integer.java:52)
            at java.util.ComparableTimSort.binarySort(ComparableTimSort.java:262)
            at java.util.ComparableTimSort.sort(ComparableTimSort.java:207)
            at java.util.Arrays.sort(Arrays.java:1312)
            at java.util.Arrays.sort(Arrays.java:1506)
            at java.util.ArrayList.sort(ArrayList.java:1454)
            at java.util.Collections.sort(Collections.java:141)
            at org.broadinstitute.gatk.utils.MathUtils.median(MathUtils.java:1000)
            at org.broadinstitute.gatk.utils.variant.ReferenceConfidenceVariantContextMerger.combineAnnotationValues(ReferenceConfidenceVariantContextMerger.java:73)
            at org.broadinstitute.gatk.utils.variant.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:158)
            at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:200)
            at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:119)
            at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
            at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
            at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
            at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
            at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
            at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
            at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
            at org.broadinstitute.gatk.engine.executive.ShardTraverser.call(ShardTraverser.java:98)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: java.lang.Double cannot be cast to java.lang.Integer
    ##### ERROR ------------------------------------------------------------------------------------------
    
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @james_lawlor
    Hi,

    Hmm. So, you get this ThreadPoolExecutor error even when not using multi-threading? That is odd. Can you check if one or some GVCFs are causing the error? You can try running on different groups of GVCFs and seeing if some groups run without error.

    Thanks,
    Sheila

  • james_lawlorjames_lawlor Huntsville, ALMember

    Hi @Sheila! I was just about to post again--I did isolate the single file that was causing a problem (binary search FTW!). The only unique thing I've found about this one file (in my total set of 30,000 chromosome GVCFs) is that at some point in its lifetime it was processed with bcftools, according to these lines in the header:

    ##bcftools_viewVersion=1.2+htslib-1.2.1
    ##bcftools_viewCommand=view SL86574_COMBINED.g.vcf.gz 10
    

    So my best guess is that some data value somewhere in the file got changed in a bad way? (Other possibility that I haven't fully investigated yet: a data type is specified incorrectly in the VCF header?)

    Strangely, the bad file will genotype properly on its own, but will fail as soon as any other GVCF is included. Removing the file from the batch works (and is almost finished running). I also was able to go back to the source and re-generate that one file, and I'm hoping that will fix the problem when running the full batch.

    Once the batch with the clean file starts running, I'll let you know if that fixed the problem. If so, let me know if there's anything I can provide that would help you track down any issues.

  • james_lawlorjames_lawlor Huntsville, ALMember
    Accepted Answer

    Oh. This could be the cause of the trouble: In the malformed GVCF, bcftools (or some other process that occurred before I got the file) filled in genotypes.
    E.g.:
    10 60969 rs187110906 C A,<NON_REF> 269.77 . DB;DP=10;MLEAC=2,0;MLEAF=1,0;MQ=27.92;MQ0=0 GT:AD:DP:GQ:PL:SB 1/1:0,10,0:10:30:298,30,0,298,30,298:0,0,4,6

    VS.

    10 60969 rs187110906 C A,<NON_REF> . . DP=10;MQ=27.92;MQ0=0 GT:AD:DP:GQ:PL:SB ./.:0,10,0:10:30:298,30,0,298,30,298:0,0,4,6

    I expect that makes the bad input file not count as a GVCF, or at least behave strangely. I guess the lesson here is always second-guess your input data. :)

    I'm going to go ahead and mark this as closed, since my full batch with the re-generated troublesome file is happily running. Thanks for the help!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @james_lawlor
    Hi,

    We have seen that running other tools on VCFs produced by GATK can cause issues. I am happy you have things working now.

    -Sheila

Sign In or Register to comment.