VariantAnnotator error with Freebayes merge: Must initialize the cache of allele anyploid indices...

mmterpstrammterpstra NetherlandsMember

Main question (for the community):
How to solve / avoid getting the error Must initialize the cache of allele anyploid indices for ploidy 1 with the VariantAnnotator (3.7-0) and freebayes1.1.0 .

The best solution might be Do not use third party tools! as the mod will say. Thus I might not need a perfect solution and so experiences merging variants from different sources are welcome. My workflow producing the results: (BWA->AddOrReplaceReadgroups-> MergeSamFiles->MarkDuplicates-> IndelrealignmentKnownIndels->(Freebayes + HC) -> CombineVariants -> snpEff+ VariantAnnotator). Optionally with a VariantsToAllelicPrimitives after freebayes.

The usual stuff

stack trace,an afflicting record and my solution

java -Xmx8g -Djava.io.tmpdir=/scratch/ -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=1 -jar /data/GATK/3.7-0-foss-2016a-Java-1.8.0_121/GenomeAnalysisTK.jar -T VariantAnnotator -R /data/ftp.broadinstitute.org/bundle/2.8/b37//human_g1k_v37_decoy.fasta --dbsnp /data/ftp.broadinstitute.org/bundle/2.8/b37//dbsnp_138.b37.vcf -I 1.bam -I 2.bam -I 3.bam -I 4.bam -I 5.bam -I 6.bam --excludeAnnotation MVLikelihoodRatio --excludeAnnotation TechnologyComposition --excludeAnnotation DepthPerSampleHC --excludeAnnotation PercentNBaseSolid --excludeAnnotation PossibleDeNovo --excludeAnnotation AS_RMSMappingQuality --excludeAnnotation ClusteredReadPosition --filter_bases_not_stored --useAllAnnotations --snpEffFile /scratch/snpEffGatk.intermediate.vcf --resource:cosmic,VCF /data/sftp-cancer.sanger.ac.uk/files/grch37/cosmic//v72/VCF/CosmicCodingMuts.bundle_2.8_b37.vcf.gz -E cosmic.ID --resource:1000gPhase1Snps,vcf /data//ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz -E 1000gPhase1Snps.AF -E 1000gPhase1Snps.AFR_AF -E 1000gPhase1Snps.AMR_AF -E 1000gPhase1Snps.ASN_AF -E 1000gPhase1Snps.EUR_AF --resource:1000gPhase1Indels,vcf /data/ftp.broadinstitute.org/bundle/2.8/b37//1000G_phase1.indels.b37.vcf -E 1000gPhase1Indels.AF -E 1000gPhase1Indels.AFR_AF -E 1000gPhase1Indels.AMR_AF -E 1000gPhase1Indels.ASN_AF -E 1000gPhase1Indels.EUR_AF --resource:dbSnp,vcf /data/ftp.broadinstitute.org/bundle/2.8/b37//dbsnp_138.b37.vcf -E dbSnp.CAF -E dbSnp.COMMON -E dbSnp.dbSNPBuildID --resource:exac,vcf /data//ExAC/release0.3/ExAC.r0.3.sites.vep.vcf.gz -E exac.AC -E exac.AN -E exac.AC_Adj -E exac.AC_Hom -E exac.AC_Het -E exac.AC_Hemi -E exac.AN_AFR -E exac.AC_AFR -E exac.AC_AMR -E exac.AN_AMR -E exac.AC_EAS -E exac.AN_EAS -E exac.AC_FIN -E exac.AN_FIN -E exac.AC_OTH -E exac.AN_OTH -E exac.AN_SAS -E exac.AC_SAS -V:input,VCF /scratch/3_178927848_178927848.vcf --out /scratch/3_178927848_178927848.vcf -L /scratch/test.vcf &>/dev/stdout  | tail -n 150
INFO  10:10:06,670 HelpFormatter - ---------------------------------------------------------------------------------- 
INFO  10:10:06,672 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18 
INFO  10:10:06,672 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
INFO  10:10:06,672 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
INFO  10:10:06,672 HelpFormatter - [Thu Jun 08 10:10:06 CEST 2017] Executing on Linux 2.6.32-642.6.2.el6.x86_64 amd64 
INFO  10:10:06,672 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13 
INFO  10:10:06,675 HelpFormatter - Program Args: -T VariantAnnotator -R /data/ftp.broadinstitute.org/bundle/2.8/b37//human_g1k_v37_decoy.fasta --dbsnp /data//ftp.broadinstitute.org/bundle/2.8/b37//dbsnp_138.b37.vcf -I /1.bam -I 2.bam -I 3.bam -I 4.bam -I 5.bam -I 6.bam --excludeAnnotation MVLikelihoodRatio --excludeAnnotation TechnologyComposition --excludeAnnotation DepthPerSampleHC --excludeAnnotation PercentNBaseSolid --excludeAnnotation PossibleDeNovo --excludeAnnotation AS_RMSMappingQuality --excludeAnnotation ClusteredReadPosition --filter_bases_not_stored --useAllAnnotations --snpEffFile /scratch/npEffGatk.intermediate.vcf --resource:cosmic,VCF /data/sftp-cancer.sanger.ac.uk/files/grch37/cosmic//v72/VCF/CosmicCodingMuts.bundle_2.8_b37.vcf.gz -E cosmic.ID --resource:1000gPhase1Snps,vcf /data/ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz -E 1000gPhase1Snps.AF -E 1000gPhase1Snps.AFR_AF -E 1000gPhase1Snps.AMR_AF -E 1000gPhase1Snps.ASN_AF -E 1000gPhase1Snps.EUR_AF --resource:1000gPhase1Indels,vcf /data/ftp.broadinstitute.org/bundle/2.8/b37//1000G_phase1.indels.b37.vcf -E 1000gPhase1Indels.AF -E 1000gPhase1Indels.AFR_AF -E 1000gPhase1Indels.AMR_AF -E 1000gPhase1Indels.ASN_AF -E 1000gPhase1Indels.EUR_AF --resource:dbSnp,vcf /data/ftp.broadinstitute.org/bundle/2.8/b37/dbsnp_138.b37.vcf -E dbSnp.CAF -E dbSnp.COMMON -E dbSnp.dbSNPBuildID --resource:exac,vcf /data/ExAC/release0.3/ExAC.r0.3.sites.vep.vcf.gz -E exac.AC -E exac.AN -E exac.AC_Adj -E exac.AC_Hom -E exac.AC_Het -E exac.AC_Hemi -E exac.AN_AFR -E exac.AC_AFR -E exac.AC_AMR -E exac.AN_AMR -E exac.AC_EAS -E exac.AN_EAS -E exac.AC_FIN -E exac.AN_FIN -E exac.AC_OTH -E exac.AN_OTH -E exac.AN_SAS -E exac.AC_SAS -V:input,VCF /scratch/3_178927848_178927848.vcf --out /scratch/3_178927848_178927848.vcf -L /scratch/3_178927848_178927848.vcf
INFO  10:10:06,680 HelpFormatter - Executing as [email protected] on Linux 2.6.32-642.6.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13. 
INFO  10:10:06,680 HelpFormatter - Date/Time: 2017/06/08 10:10:06 
INFO  10:10:06,680 HelpFormatter - ---------------------------------------------------------------------------------- 
INFO  10:10:06,680 HelpFormatter - ---------------------------------------------------------------------------------- 
INFO  10:10:06,711 GenomeAnalysisEngine - Strictness is SILENT 
INFO  10:10:06,790 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 
INFO  10:10:06,796 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  10:10:07,894 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 1.10 
INFO  10:10:10,340 IntervalUtils - Processing 6042 bp from intervals 
WARN  10:10:10,345 IndexDictionaryUtils - Track cosmic doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:10:10,345 IndexDictionaryUtils - Track 1000gPhase1Snps doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:10:10,346 IndexDictionaryUtils - Track exac doesn't have a sequence dictionary built in, skipping dictionary validation 
INFO  10:10:10,483 GenomeAnalysisEngine - Preparing for traversal over 6 BAM files 
INFO  10:10:10,613 GenomeAnalysisEngine - Done preparing for traversal 
INFO  10:10:10,613 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  10:10:10,614 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
INFO  10:10:10,614 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
WARN  10:10:11,001 AS_RankSumTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,002 AS_StrandBiasTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,002 AS_InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. 
WARN  10:10:11,003 AS_RankSumTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,003 AS_RankSumTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,003 AS_RankSumTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,003 AS_RankSumTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,003 AS_StrandBiasTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,004 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
WARN  10:10:11,004 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. 
WARN  10:10:11,004 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
WARN  10:10:28,397 BaseQualitySumPerAlleleBySample - Annotation will not be calculated, can only be called from MuTect2, not org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator 
WARN  10:10:28,397 OxoGReadCounts - Annotation will not be calculated, can only be called from MuTect2, not org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator 
WARN  10:10:28,398 AnnotationUtils - SAC annotation will not be calculated, must be called from HaplotypeCaller or MuTect2, not VariantAnnotator 
WARN  10:10:28,398 AnnotationUtils - SB annotation will not be calculated, must be called from HaplotypeCaller or MuTect2, not VariantAnnotator 
WARN  10:10:28,429 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator 
WARN  10:10:28,429 HardyWeinberg - Too few genotypes 
WARN  10:10:28,435 SpanningDeletions - Annotation will not be calculated, must be called from UnifiedGenotyper, not org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator 
WARN  10:10:28,437 TransmissionDisequilibriumTest - Transmission disequilibrium test annotation requires a valid ped file be passed in. 
##### ERROR --
##### ERROR stack trace 
java.lang.IllegalStateException: Must initialize the cache of allele anyploid indices for ploidy 1
    at htsjdk.variant.variantcontext.GenotypeLikelihoods.getAlleles(GenotypeLikelihoods.java:532)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.getLikelihoodIndexes(GATKVariantContextUtils.java:681)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.determineLikelihoodIndexesToUse(GATKVariantContextUtils.java:639)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.subsetAlleles(GATKVariantContextUtils.java:610)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.splitVariantContextToBiallelics(GATKVariantContextUtils.java:1071)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.splitVariantContextToBiallelics(GATKVariantContextUtils.java:1019)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.splitVariantContextToBiallelics(GATKVariantContextUtils.java:1000)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.getMinRepresentationBiallelics(VariantAnnotatorEngine.java:535)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateExpressions(VariantAnnotatorEngine.java:452)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:226)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:212)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:355)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:112)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Must initialize the cache of allele anyploid indices for ploidy 1
##### ERROR ------------------------------------------------------------------------------------------


Afflicting record (multiple present in VCF)

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  1   2   3   4   5   6
3   178927848   rs67871207  AT  A   61.21   .   AC=4;AF=1.00;AN=4;DB;DP=5;ExcessHet=3.0103;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=60.00;QD=20.40;SOR=3.258;set=GATK GT:AD:DP:GQ:PL  ./../.  1/1:0,1:1:3:28,3,0  ./. ./. 1/1:0,3:3:9:66,9,0
3   178927848   .   ATTTTTTTTTTTTA  ATTTTTTTTTTTA,ATTTTTTTTTTA,ATTTTTTTTCTTTA   117.54  .   AB=0.666667,0.333333,0;ABP=3.73412,3.73412,0;AC=3,1,2;AF=0.500,0.167,0.333;AN=6;AO=3,1,1;CIGAR=1M1D12M,1M2D11M,9M1X4M;DP=5;DPB=4.64286;DPRA=2,3,1;EPP=3.73412,5.18177,5.18177;EPPR=0;GTI=1;LEN=1,2,1;MEANALT=1.5,2,1;MQM=60,60,60;MQMR=0;NS=3;NUMALT=3;ODDS=0.405465;PAIRED=0,0,0;PAIREDR=0;PAO=0,0,0;PQA=0,0,0;PQR=0;PRO=0;QA=86,36,36;QR=0;RO=0;RPL=1,0,0;RPP=3.73412,5.18177,5.18177;RPPR=0;RPR=2,1,1;RUN=1,1,1;SAF=3,1,1;SAP=9.52472,5.18177,5.18177;SAR=0,0,0;SRF=0;SRP=0;SRR=0;TYPE=del,del,snp;set=freebayes;technology.illumina=1,1,1   GT:AD:AO:DP:PL:QA:QR:RO .   .   1/1:0,1,0,0:1,0,0:1:36,3,0,36,3,36,36,3,36,36:36,0,0:0:0    .   3/3:0,0,0,1:0,0,1:1:36,36,36,36,36,36,3,3,3,0:0,0,36:0:0    1/2:0,2,1,0:2,1,0:3:71,33,27,41,0,38,71,33,41,71:50,36,0:0:0

My solution

#keep only simple variants not multiallelic
perl -i.bak -wpe 'if(not((m/^#/) ||  (m/TYPE=complex;/||m/TYPE=snp;/||m/TYPE=del;/||m/TYPE=ins;/||m/TYPE=mnp;/))){$_="";}' probematic.vcf
#CombineVariants with  -genotypeMergeOptions PRIORITIZE -priority GATK,freebayes  --filteredrecordsmergetype KEEP_UNCONDITIONAL
#select the GATK records if two records have the same CHROM/POS
#VariantAnnotator....

Best Answer

Answers

  • mmterpstrammterpstra NetherlandsMember

    Thanks for your comment. If I am the only one with the problem the error is not high prio. The debugging will take more time than the workaround in place. Will look at it if other people experience the same problem (so please respond if you have the same problem).

  • mmterpstrammterpstra NetherlandsMember

    For people looking for a quick fix these errors are triggered by the '.' sample fields this assumes a ploidy of 1 which triggers the error. So cleanup Freebayes for this kind of data before running it though the GATK.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mmterpstra
    Hi,

    Thank you for posting your suggestion/solution :smiley:

    -Sheila

Sign In or Register to comment.