Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

-T CombineVariants ##### ERROR MESSAGE: For input string: "20"

totatota Posts: 3Member
edited January 2013 in Ask the GATK team

Hello,

I called variants in chunks (100000 bp chunks) on chromosome 20 on 450 reduced BAMS using -T UnifiedGenotyper, which resulted in the generation of 631 VCFs.

Now I want to combine these 631 VCFs using -T CombineVariants, however there seems to be a problem with some of the VCFs and all I get is an error message.

I can combine the 5 first VCFs that were generated, but when I add a sixth one (chr20:400000_500000) I get the following error:

##### ERROR MESSAGE: For input string: "20"

If I exclude that file, I can combine about 30 VCFs until I encounter a similar error message for a different VCF file:

##### ERROR MESSAGE: For input string: "120"

I've looked through both of the "problematic" VCFs and can't see how they differ from the ones that I can combine.

Any idea of what may be going wrong and how I can solve this? At the moment I can only identify problematic VCF by trying to combine them using CombineVariants since I don't know what it is about these particular VCF files that is causing the problem.

Thanks for your help, Tota

Post edited by Geraldine_VdAuwera on

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    Hi Tota,

    Can you tell me what version of GATK you're using? Also, can you post the full error message with stack trace?

    Geraldine Van der Auwera, PhD

  • totatota Posts: 3Member

    Hi Geraldine,

    Im using GATK v2.1-11-g13c0244 and this command (for 2 VCFs, where chr20_400000_500000.vcf is the problematic one):

    java -jar gatk-2.1-11/GenomeAnalysisTK.jar -R hs37d5.fa --variant chr20_300000_400000.vcf --variant chr20_400000_500000.vcf -o combined.var.vcf -T CombineVariants

    and here is the ERROR message:

    INFO 09:27:14,157 HelpFormatter - --------------------------------------------------------------------------------- INFO 09:27:14,159 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-11-g13c0244, Compiled 2012/09/29 06:03:05 INFO 09:27:14,159 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 09:27:14,159 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 09:27:14,160 HelpFormatter - Program Args: -R /ddn/projects8/got2d/SUMMIT/pipeline/lib/hs37d5.fa --variant /ddn/projects11/got2d/SUMMIT/oxford/calling_451_rbams_chr20_smaller_chunks/vcf.raw.chunks/chr20_300000_400000.vcf --variant /ddn/projects11/got2d/SUMMIT/oxford/calling_451_rbams_chr20_smaller_chunks/vcf.raw.chunks/chr20_400000_500000.vcf -o /ddn/projects11/got2d/SUMMIT/oxford/calling_451_rbams_chr20_smaller_chunks/vcf.raw.combined/combined.var.vcf -T CombineVariants INFO 09:27:14,160 HelpFormatter - Date/Time: 2013/01/28 09:27:14 INFO 09:27:14,160 HelpFormatter - --------------------------------------------------------------------------------- INFO 09:27:14,160 HelpFormatter - --------------------------------------------------------------------------------- INFO 09:27:14,231 ArgumentTypeDescriptor - Dynamically determined type of /ddn/projects11/got2d/SUMMIT/oxford/calling_451_rbams_chr20_smaller_chunks/vcf.raw.chunks/chr20_300000_400000.vcf to be VCF INFO 09:27:14,288 ArgumentTypeDescriptor - Dynamically determined type of /ddn/projects11/got2d/SUMMIT/oxford/calling_451_rbams_chr20_smaller_chunks/vcf.raw.chunks/chr20_400000_500000.vcf to be VCF INFO 09:27:14,294 GenomeAnalysisEngine - Strictness is SILENT INFO 09:27:14,441 RMDTrackBuilder - Loading Tribble index from disk for file /ddn/projects11/got2d/SUMMIT/oxford/calling_451_rbams_chr20_smaller_chunks/vcf.raw.chunks/chr20_300000_400000.vcf INFO 09:27:15,273 RMDTrackBuilder - Loading Tribble index from disk for file /ddn/projects11/got2d/SUMMIT/oxford/calling_451_rbams_chr20_smaller_chunks/vcf.raw.chunks/chr20_400000_500000.vcf INFO 09:27:15,449 CombineVariants - Priority string not provided, using arbitrary genotyping order: null WARN 09:27:15,522 VCFUtils$HeaderConflictWarner - Ignoring header line already in map: this header line = UnifiedGenotyper="analysis_type=UnifiedGenotyper input_file=[/ddn/projects11/got2d/SUMMIT/oxford/proj1/bam.reduced/bams.list] read_buffer_size=null phone_home=STANDARD gatk_key=null read_filter=[] intervals=[20:300000-400000] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/ddn/projects8/got2d/SUMMIT/pipeline/lib/hs37d5.fa nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 defaultBaseQualities=-1 validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=null num_threads=1 num_cpu_threads=null num_io_threads=null num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false genotype_likelihoods_model=BOTH p_nonref_model=EXACT pcr_error_rate=1.0E-4 noSLOD=false annotateNDA=false min_base_quality_score=17 max_deletion_fraction=0.05 cap_max_alternate_alleles_for_indels=false min_indel_count_for_genotyping=5 min_indel_fraction_per_sample=0.25 indel_heterozygosity=1.25E-4 indelGapContinuationPenalty=10 indelGapOpenPenalty=45 indelHaplotypeSize=80 noBandedIndel=false indelDebug=false ignoreSNPAlleles=false allReadsSP=false ignoreLaneInfo=false reference_sample_calls=(RodBinding name= source=UNBOUND) reference_sample_name=null sample_ploidy=2 min_quality_score=1 max_quality_score=40 site_quality_prior=20 min_power_threshold_for_calling=0.95 min_reference_depth=100 exclude_filtered_reference_sites=false heterozygosity=0.001 genotyping_mode=DISCOVERY output_mode=EMIT_VARIANTS_ONLY standard_min_confidence_threshold_for_calling=30.0 standard_min_confidence_threshold_for_emitting=30.0 alleles=(RodBinding name= source=UNBOUND) max_alternate_alleles=3 dbsnp=(RodBinding name=dbsnp source=/ddn/projects8/got2d/SUMMIT/pipeline/lib//dbsnp_135.b37.vcf) comp=[] out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub debug_file=null metrics_file=null annotation=[] excludeAnnotation=[] filter_mismatching_base_and_quals=false" already present header = UnifiedGenotyper="analysis_type=UnifiedGenotyper input_file=[/ddn/projects11/got2d/SUMMIT/oxford/proj1/bam.reduced/bams.list] read_buffer_size=null phone_home=STANDARD gatk_key=null read_filter=[] intervals=[20:400000-500000] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/ddn/projects8/got2d/SUMMIT/pipeline/lib/hs37d5.fa nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 defaultBaseQualities=-1 validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=null num_threads=1 num_cpu_threads=null num_io_threads=null num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false genotype_likelihoods_model=BOTH p_nonref_model=EXACT pcr_error_rate=1.0E-4 noSLOD=false annotateNDA=false min_base_quality_score=17 max_deletion_fraction=0.05 cap_max_alternate_alleles_for_indels=false min_indel_count_for_genotyping=5 min_indel_fraction_per_sample=0.25 indel_heterozygosity=1.25E-4 indelGapContinuationPenalty=10 indelGapOpenPenalty=45 indelHaplotypeSize=80 noBandedIndel=false indelDebug=false ignoreSNPAlleles=false allReadsSP=false ignoreLaneInfo=false reference_sample_calls=(RodBinding name= source=UNBOUND) reference_sample_name=null sample_ploidy=2 min_quality_score=1 max_quality_score=40 site_quality_prior=20 min_power_threshold_for_calling=0.95 min_reference_depth=100 exclude_filtered_reference_sites=false heterozygosity=0.001 genotyping_mode=DISCOVERY output_mode=EMIT_VARIANTS_ONLY standard_min_confidence_threshold_for_calling=30.0 standard_min_confidence_threshold_for_emitting=30.0 alleles=(RodBinding name= source=UNBOUND) max_alternate_alleles=3 dbsnp=(RodBinding name=dbsnp source=/ddn/projects8/got2d/SUMMIT/pipeline/lib//dbsnp_135.b37.vcf) comp=[] out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub debug_file=null metrics_file=null annotation=[] excludeAnnotation=[] filter_mismatching_base_and_quals=false" INFO 09:27:16,393 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING] INFO 09:27:16,394 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 09:27:18,735 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NumberFormatException: For input string: "20" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.valueOf(Integer.java:570) at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec.decodeInts(AbstractVCFCodec.java:680) at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:641) at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:92) at org.broadinstitute.sting.utils.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:130) at org.broadinstitute.sting.utils.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:120) at org.broadinstitute.sting.utils.variantcontext.GenotypesContext.iterator(GenotypesContext.java:450) at org.broadinstitute.sting.utils.variantcontext.VariantContextUtils.mergeGenotypes(VariantContextUtils.java:882) at org.broadinstitute.sting.utils.variantcontext.VariantContextUtils.simpleMerge(VariantContextUtils.java:543) at org.broadinstitute.sting.gatk.walkers.variantutils.CombineVariants.map(CombineVariants.java:292) at org.broadinstitute.sting.gatk.walkers.variantutils.CombineVariants.map(CombineVariants.java:114) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.1-11-g13c0244):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: For input string: "20"
    ERROR ------------------------------------------------------------------------------------------

    Thank you for your help, Tota

  • totatota Posts: 3Member

    I can use pseq to identify the problematic VCF files and then re-generate them with GATK which seems to solve the problem!

    Tota

Sign In or Register to comment.