Attention:
The frontline support team will be offline as we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and available to answer questions on the forum on March 25th 2019.

CombineGVCF: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Double

Hi

I'm trying to combine a bunch of gvcf's generated by bcbio-nextgen with GATK.
However, when running the command I get the following error:

INFO  09:32:20,508 GenomeAnalysisEngine - Preparing for traversal 
INFO  09:32:20,519 GenomeAnalysisEngine - Done preparing for traversal 
INFO  09:32:20,520 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  09:32:20,520 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
INFO  09:32:20,521 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
WARN  09:32:21,591 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
WARN  09:32:21,592 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
##### ERROR --
##### ERROR stack trace 
java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Double
        at java.lang.Double.compareTo(Double.java:49)
        at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:320)
        at java.util.ComparableTimSort.sort(ComparableTimSort.java:188)
        at java.util.Arrays.sort(Arrays.java:1312)
        at java.util.Arrays.sort(Arrays.java:1506)
        at java.util.ArrayList.sort(ArrayList.java:1454)
        at java.util.Collections.sort(Collections.java:141)
        at org.broadinstitute.gatk.utils.MathUtils.median(MathUtils.java:1010)
        at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.combineAnnotationValues(ReferenceConfidenceVariantContextMerger.java:84)
        at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:206)
        at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.endPreviousStates(CombineGVCFs.java:366)
        at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.reduce(CombineGVCFs.java:254)
        at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.reduce(CombineGVCFs.java:116)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociReduce.apply(TraverseLociNano.java:291)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociReduce.apply(TraverseLociNano.java:280)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:279)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
        at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.8-0-ge9d806836):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: java.lang.Integer cannot be cast to java.lang.Double
##### ERROR ------------------------------------------------------------------------------------------

I'm aware that gvcf's that went through bcftools posed an issue with the same stack trace in the past, but I've already been able to do several merges, with only some failing.

Any idea on how I could fix this?

Thanks a lot
M

sample vcf header (w/o contigs for short)

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,Version=3.8-0-ge9d806836,Date="Wed Oct 04 07:09:47 CEST 2017",Epoch=1507093787231,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[/home/projects/bcbio_annotation/exomes/HSQExomes/hg38/HSQ_RUN_055/samples_HSQ055-merged/work/align/D1308739/D1308739-sort.bam] showFullBamList=fal
se read_buffer_size=null read_filter=[BadCigar, NotPrimaryAlignment] disable_read_filter=[] intervals=[/home/projects/bcbio_annotation/exomes/HSQExomes/hg38/HSQ_RUN_055/samples_HSQ055-merged/work/gatk-haplotype/chr1/D1308739-chr1_0_16125340-regions.bed] excludeIntervals=null interval_set_rule=INTERSECTION interval_merging=ALL interval_padding
=0 reference_sequence=/home/galaxy/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=LENIENT_VCF_PROCESSING use_jdk_deflater=false use_jdk_inflater=false disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=LINEAR variant_index_parameter=128000 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN dbsnp=(RodBinding name=dbsnp source=/home/galaxy/bcbio/genomes/Hsapiens/hg38/variation/dbsnp-150.vcf.gz) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[FisherStrand, MappingQualityRankSumTest, MappingQualityZero, QualByDepth, ReadPosRankSumTest, RMSMappingQuality, BaseQualityRankSumTest, GCContent, HaplotypeScore, HomopolymerRun, DepthPerAlleleBySample, Coverage, ClippingRankSumTest, DepthPerSampleHC, StrandBiasBySample] excludeAnnotation=[ChromosomeCounts, FisherStrand, StrandOddsRatio, QualByDepth] group=[StandardAnnotation, StandardHCAnnotation] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=GVCF bamOutput=null bamWriterType=CALLED_HAPLOTYPES emitDroppedReads=false disableOptimizations=false annotateNDA=false useNewAFCalculator=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 heterozygosity_stdev=0.01 standard_min_confidence_threshold_for_calling=-0.0 standard_min_confidence_threshold_for_emitting=30.0 max_alternate_alleles=6 max_genotype_count=1024 max_num_PL_values=100 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=true gcpHMM=10 pair_hmm_implementation=VECTOR_LOGLESS_CACHING phredScaledGlobalReadMismappingRate=45 noFpga=false nativePairHmmThreads=1 useDoublePrecision=false sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false maxNumHaplotypesInPopulation=128 errorCorrectKmers=false minPruning=2 debugGraphTransformations=false allowCyclesInKmerGraphToGeneratePaths=false graphOutput=null kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 GVCFGQBands=[10, 20, 30, 40, 60, 80] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 includeUmappedReads=false useAllelesTrigger=false doNotRunPhysicalPhasing=false keepRG=null justDetermineActiveRegions=false dontGenotype=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false errorCorrectReads=false pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=10000 minReadsPerAlignmentStart=10 mergeVariantsViaLD=false activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxReadsInMemoryPerSample=30000 maxTotalReadsInMemory=10000000 maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
##GVCFBlock0-10=minGQ=0(inclusive),maxGQ=10(exclusive)
##GVCFBlock10-20=minGQ=10(inclusive),maxGQ=20(exclusive)
##GVCFBlock20-30=minGQ=20(inclusive),maxGQ=30(exclusive)
##GVCFBlock30-40=minGQ=30(inclusive),maxGQ=40(exclusive)
##GVCFBlock40-60=minGQ=40(inclusive),maxGQ=60(exclusive)
##GVCFBlock60-80=minGQ=60(inclusive),maxGQ=80(exclusive)
##GVCFBlock80-100=minGQ=80(inclusive),maxGQ=100(exclusive)
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=GC,Number=1,Type=Float,Description="GC content around the variant (see docs for window size details)">
##INFO=<ID=HRun,Number=1,Type=Integer,Description="Largest Contiguous Homopolymer Run of Variant Allele In Either Direction">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##contig=<ID=chr1,length=248956422>
...
##contig=<ID=HLA-DRB1*16:02:01,length=11005>
##reference=file:///home/galaxy/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa
##bcftools_concatVersion=1.5+htslib-1.5
##bcftools_concatCommand=concat --allow-overlaps -O z --file-list /home/projects/bcbio_annotation/exomes/HSQExomes/hg38/HSQ_RUN_055/samples_HSQ055-merged/work/gatk-haplotype/D1308739-files.list -o /tmp/bcbio/tmpWDKfKz/D1308739.vcf.gz; Date=Wed Oct  4 22:37:37 2017
##bcftools_viewVersion=1.5+htslib-1.5
##bcftools_viewCommand=view -h /home/projects/bcbio_annotation/exomes/HSQExomes/hg38/HSQ_RUN_055/samples_HSQ055-merged/final/D1308739/D1308739-gatk-haplotype.vcf.gz; Date=Wed Oct 18 09:19:18 2017
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  D1308739
Tagged:

Best Answers

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    I could not spot --EMIT_REFERENCE_CONFIDENCE parameter in the VCF header. Are you sure these are g.vcf files?

  • matdmsetmatdmset GhentMember

    Hi,

    Thanks for the reply.
    Yes, I'm 100% sure these are gvcf's.

    M

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Is it possible the bcftools might have messed with something during concat step?

    GenotypeGVCFs can take file.list to process all g.vcf files directly.

  • matdmsetmatdmset GhentMember

    My guess is that BCFtools in fact does have something to do with the error. It's been the case in the past. What I'm trying to figure out is how can I
    a) fix some formatting in the files,
    b) use some kind of cli param
    to make CombineGVCF work.
    I'm aware GenotypeGVCFs is able to process multiple files, but I'm dealing with 700+ samples, so I'm first creating batches with CombineGVCF, as per the Best Practices.

    Cheers
    M

  • matdmsetmatdmset GhentMember

    Is it possible that there's a tag in the header that has an Integer type, but has double type values for some reason? Or is it the other way around?
    If so, wouldn't changing the type in the header fix the issue? Or am I looking in the wrong direction here?

  • bgrenierbgrenier FranceMember

    Hi,

    Every time I had this message, this was due to bcftools which can change some float values to an integer representation : (e.g : before bcftools : MQ=31.0; after bcftools : MQ=31).

    The fact that GATK is very strict on that subject (40.0 is considered as a float while 40 is not) have some advantages and some drawbacks. I hope this problem will be resolved in GATK4 because bcftools is really useful and widely used when dealing with vcf files.

    Issue · Github
    by Sheila

    Issue Number
    3734
    State
    closed
    Last Updated
    Closed By
    lbergelson
  • matdmsetmatdmset GhentMember

    @bgrenier said:
    Hi,

    Every time I had this message, this was due to bcftools which can change some float values to an integer representation : (e.g : before bcftools : MQ=31.0; after bcftools : MQ=31).

    The fact that GATK is very strict on that subject (40.0 is considered as a float while 40 is not) have some advantages and some drawbacks. I hope this problem will be resolved in GATK4 because bcftools is really useful and widely used when dealing with vcf files.

    Seconded!

  • matdmsetmatdmset GhentMember

    Is there an option to make GATK more lenient here?

  • matdmsetmatdmset GhentMember

    Hi Sheila,

    Thanks for the update! We'll re-call the faulty samples for now and switch to GATK4 as soon as it's ready for prime time.

    Thanks again
    M

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @pd3
    Hi,

    Thank you for posting a solution :smiley:

    -Sheila

Sign In or Register to comment.