Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

Error in VariantEval

akiezunakiezun Posts: 13Member
edited August 2012 in Ask the GATK team

Hi, I'm running 1.6-512-gafa4399 (unstable).

When running VariantEval, I got an error: "Couldn't find state for 88 at node org.broadinstitute.sting.gatk.walkers.varianteval.stratifications.manager.StratNode"

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Couldn't find state for 88 at node org.broadinstitute.sting.gatk.walkers.varianteval.stratifications.manager.StratNode@1a7df60a
at org.broadinstitute.sting.gatk.walkers.varianteval.stratifications.manager.StratNode.find(StratNode.java:117)
at org.broadinstitute.sting.gatk.walkers.varianteval.stratifications.manager.StratificationManager.getKeys(StratificationManager.java:208)
at org.broadinstitute.sting.gatk.walkers.varianteval.stratifications.manager.StratificationManager.values(StratificationManager.java:260)
at org.broadinstitute.sting.gatk.walkers.varianteval.VariantEvalWalker.getEvaluationContexts(VariantEvalWalker.java:480)
at org.broadinstitute.sting.gatk.walkers.varianteval.VariantEvalWalker.map(VariantEvalWalker.java:406)
at org.broadinstitute.sting.gatk.walkers.varianteval.VariantEvalWalker.map(VariantEvalWalker.java:89)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:63)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:257)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:92)

This happened on running VariantEval like this:

-T VariantEval -L /xchip/cga/reference/hg19/whole_exome_agilent_1.1_refseq_plus_3_boosters_plus_10bp_padding_minus_mito.Homo_sapiens_assembly19.targets.interval_list -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta -nt 1 -o myFile.byAC.eval -eval myFile.vcf -D dbsnp_132_b37.leftAligned.vcf -gold /humgen/gsa-hpprojects/GATK/bundle/current/b37/Mills_and_1000G_gold_standard.indels.b37.sites.vcf -ST EvalRod -ST CompRod -ST Novelty -ST FunctionalClass -ST AlleleCount -noST -EV TiTvVariantEvaluator -EV CountVariants -EV CompOverlap -EV IndelSummary -noEV 

Do you know what may be causing this error? If it helps, I can distill a small failing vcf. ./adam

Tagged:

Best Answer

  • ebanksebanks Posts: 682 mod
    Answer ✓

    Hi Adam,

    Thanks for the report. I'm about to push in a patch to SelectVariants to handle selecting with MLEAC (and MLEAF) and to VariantEval to throw a useful error when the AC (or MLEAC) is not a possible value. This will be available for GATK 2.1.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

Answers

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    Yes, a tiny test file that reproduces the error would be very much appreciated

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • akiezunakiezun Posts: 13Member

    Here's one

    ##fileformat=VCFv4.1
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  P1
    1       877831  .       T       C       10000   PASS    MLEAC=88        GT:AD:DP:GQ:PL  1/1:0,1:1:3:37,3,0
    

    the POS and the MLEAC values seem to be relevant (ie can't change them without making the error go away)

    ./adam

  • ebanksebanks Posts: 682GATK Developer mod

    I think I know what's going on here. Did you subset your VCF to a single sample but keep the original MLEAC? The tool is trying to use the MLEAC for the AlleleCount stratification, but with only 1 sample the possible values should only be 0, 1, or 2 (and not 88). Knowing how you produced this file would help a lot.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • akiezunakiezun Posts: 13Member

    Hmm, this tiny file I made by subsetting to 1 line and 1 sample. But the original file was not subset and had the same exact error.

    The file was created using something pretty close to the standard HybridSelectionPipeline queue script (removing the snpEff annotation step but including calling with 1kg samples). That is, bams were reduced, then snps and indels were called together from 10 samples and 50 random 1kg samples, then SNPs were VQSR'd and indels were filtered, then snp and indels were combined, then 1kg samples were removed. Here'are the options extracted from the vcf file. I'm not sure what's the important part here so I'm pasting all. (Now that I'm writing it, I bet it's the 1kg samples and their removal that is causing the error).

    ##ApplyRecalibration="analysis_type=ApplyRecalibration input_file=[] read_buffer_size=null phone_home=STANDARD gatk_key=null read_filter=[] intervals=[/xchip/cga/reference/hg19/whole_exome_agilent_1.1_refseq_plus_3_boosters_plus_10bp_padding_minus_mito.Homo_sapiens_assembly19.targets.interval_list] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=50 reference_sequence=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=0 no_indel_quals=false defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=1 num_cpu_threads=null num_io_threads=null num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false useSlowGenotypes=false repairVCFHeader=null logging_level=INFO log_to_file=null help=false input=[(RodBinding name=input source=/xchip/cga/gdac-prod/cga/jobResults/UnifiedGenotyperComplete/PR_MyData_Capture_10/1626687/PR_MyData_Capture_10.snps.unfiltered.vcf)] recal_file=(RodBinding name=recal_file source=/xchip/cga/gdac-prod/cga/jobResults/UnifiedGenotyperComplete/PR_MyData_Capture_10/1626687/PR_MyData_Capture_10.snps.recal) tranches_file=/xchip/cga/gdac-prod/cga/jobResults/UnifiedGenotyperComplete/PR_MyData_Capture_10/1626687/PR_MyData_Capture_10.snps.tranches out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub ts_filter_level=98.5 ignore_filter=null mode=SNP filter_mismatching_base_and_quals=false"

    ##CombineVariants="analysis_type=CombineVariants input_file=[] read_buffer_size=null phone_home=STANDARD gatk_key=null read_filter=[] intervals=[/xchip/cga/reference/hg19/whole_exome_agilent_1.1_refseq_plus_3_boosters_plus_10bp_padding_minus_mito.Homo_sapiens_assembly19.targets.interval_list] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=50 reference_sequence=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=0 no_indel_quals=false defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=1 num_cpu_threads=null num_io_threads=null num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false useSlowGenotypes=false repairVCFHeader=null logging_level=INFO log_to_file=null help=false variant=[(RodBinding name=indels source=/xchip/cga/gdac-prod/cga/jobResults/UnifiedGenotyperComplete/PR_MyData_Capture_10/1626687/PR_MyData_Capture_10.indels.filtered.vcf), (RodBinding name=snps source=/xchip/cga/gdac-prod/cga/jobResults/UnifiedGenotyperComplete/PR_MyData_Capture_10/1626687/PR_MyData_Capture_10.snps.recalibrated.vcf)] out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub genotypemergeoption=PRIORITIZE filteredrecordsmergetype=KEEP_IF_ANY_UNFILTERED multipleallelesmergetype=BY_TYPE rod_priority_list=indels,snps printComplexMerges=false filteredAreUncalled=false minimalVCF=false setKey=set assumeIdenticalSamples=true minimumN=1 suppressCommandLineHeader=false mergeInfoWithMaxAC=false filter_mismatching_base_and_quals=false"

    ##SelectVariants="analysis_type=SelectVariants input_file=[] read_buffer_size=null phone_home=STANDARD gatk_key=null read_filter=[] intervals=[/xchip/cga/reference/hg19/whole_exome_agilent_1.1_refseq_plus_3_boosters_plus_10bp_padding_minus_mito.Homo_sapiens_assembly19.targets.interval_list] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=50 reference_sequence=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=0 no_indel_quals=false defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=1 num_cpu_threads=null num_io_threads=null num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false useSlowGenotypes=false repairVCFHeader=null logging_level=INFO log_to_file=null help=false variant=(RodBinding name=variant source=/xchip/cga/gdac-prod/cga/jobResults/UnifiedGenotyperComplete/PR_MyData_Capture_10/1626687/PR_MyData_Capture_10.unfiltered.vcf) discordance=(RodBinding name= source=UNBOUND) concordance=(RodBinding name= source=UNBOUND) out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sample_name=[] sample_expressions=null sample_file=null exclude_sample_name=[] exclude_sample_file=[] select_expressions=[] excludeNonVariants=false excludeFiltered=false regenotype=false restrictAllelesTo=ALL keepOriginalAC=false mendelianViolation=false mendelianViolationQualThreshold=0.0 select_random_number=0 select_random_fraction=0.0 remove_fraction_genotypes=0.0 selectTypeToInclude=[INDEL] keepIDs=null outMVFile=null fullyDecode=false forceGenotypesDecode=false justRead=false filter_mismatching_base_and_quals=false"

    ##UnifiedGenotyper="analysis_type=UnifiedGenotyper input_file=[/xchip/cga/gdac-prod/cga/jobResults/UnifiedGenotyperComplete/PR_MyData_Capture_10/1626687/PR_MyData_Capture_10.bam.list, /xchip/cga2/germline/resources/1kg_reduced_random50samples.bam] read_buffer_size=null phone_home=STANDARD gatk_key=null read_filter=[] intervals=[/xchip/cga/gdac-prod/cga/jobResults/UnifiedGenotyperComplete/PR_MyData_Capture_10/1626687/queueScatterGather/PR_MyData_Capture_10-1-sg/temp_064_of_100/scatter.intervals] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=60 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=0 no_indel_quals=false defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=1 num_cpu_threads=null num_io_threads=null num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false useSlowGenotypes=false repairVCFHeader=null logging_level=INFO log_to_file=null help=false genotype_likelihoods_model=BOTH p_nonref_model=EXACT heterozygosity=0.0010 pcr_error_rate=1.0E-4 genotyping_mode=DISCOVERY output_mode=EMIT_VARIANTS_ONLY standard_min_confidence_threshold_for_calling=30.0 standard_min_confidence_threshold_for_emitting=30.0 noSLOD=false annotateNDA=false alleles=(RodBinding name= source=UNBOUND) min_base_quality_score=17 max_deletion_fraction=0.05 max_alternate_alleles=3 cap_max_alternate_alleles_for_indels=true min_indel_count_for_genotyping=5 min_indel_fraction_per_sample=0.25 indel_heterozygosity=1.25E-4 indelGapContinuationPenalty=10 indelGapOpenPenalty=45 indelHaplotypeSize=80 noBandedIndel=false indelDebug=false ignoreSNPAlleles=false dbsnp=(RodBinding name=dbsnp source=/humgen/gsa-pipeline/resources/b37/v4/dbsnp_135.b37.vcf) comp=[] out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub debug_file=null metrics_file=null annotation=[] excludeAnnotation=[] filter_mismatching_base_and_quals=false"

    ##VariantFiltration="analysis_type=VariantFiltration input_file=[] read_buffer_size=null phone_home=STANDARD gatk_key=null read_filter=[] intervals=[/xchip/cga/reference/hg19/whole_exome_agilent_1.1_refseq_plus_3_boosters_plus_10bp_padding_minus_mito.Homo_sapiens_assembly19.targets.interval_list] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=50 reference_sequence=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=0 no_indel_quals=false defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=1 num_cpu_threads=null num_io_threads=null num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false useSlowGenotypes=false repairVCFHeader=null logging_level=INFO log_to_file=null help=false variant=(RodBinding name=variant source=/xchip/cga/gdac-prod/cga/jobResults/UnifiedGenotyperComplete/PR_MyData_Capture_10/1626687/PR_MyData_Capture_10.indels.unfiltered.vcf) mask=(RodBinding name= source=UNBOUND) out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub filterExpression=[FS>200.0, QD<2.0, ReadPosRankSum<-20.0, InbreedingCoeff<-0.8] filterName=[Indel_FS, Indel_QD, Indel_ReadPosRankSum, Indel_InbreedingCoeff] genotypeFilterExpression=[] genotypeFilterName=[] clusterSize=3 clusterWindowSize=0 maskExtension=0 maskName=Mask missingValuesInExpressionsShouldEvaluateAsFailing=false invalidatePreviousFilters=false filter_mismatching_base_and_quals=false"

    ./adam

  • ebanksebanks Posts: 682GATK Developer mod

    Adam, you used SelectVariants to pull out the 1000G samples, right? If so, can you please point me to the file (on the local disk) in the pipeline before SelectVariants was called?

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • akiezunakiezun Posts: 13Member

    Yes, i used SelectVariants. I deleted those intermediate files. I'll remake them and send you a pointer.

  • hoosier060hoosier060 Posts: 8Member

    Hi, I was wondering how I can perform 3-way comparison using VaraintEval. In other words, say you have datasets from 3 different sources and you want VariantEval's report on the 3-way comparison. Thanks -Charles

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    Hi Charles, this question belongs in a separate thread since it has nothing to do with the topic of this discussion. Please post it as a new discussion or as a comment on one of the VariantEval documentation articles.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.