Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

DP statistic matches bamfile before realignment and not the bamout file

sezimmersezimmer Bronx, New YorkMember

Hi Everyone,

I am using GATK version 3.3 and I noticed that the numbers in the DP and AD fields actually match the original bam file before realignment and does not match the bamout file produced by haplotype caller. For example, at chromosome 1 position 6253228 the DP is 27 and the AD is 27,0. The number of reads (counted by IGV) in the original bam file at that position is also 27. However, in the "bamout" bam file the number of reads is 56. Below is the line from the VCF file.

chr1 6253228 . C . . . GT:AD:DP:GQ:PL 0/0:27,0:27:75:0,75,1125

Is the DP and AD fields supposed to match the original bamfile or the bamout bam file numbers? When I produce the "bamout" bamfile I use parameters -forceActive and -disableOptimizations.

Thank you.

Best,
Sam

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @sezimmer
    Hi Sam,

    I think that was an issue that was fixed in the latest version of GATK (probably somewhere between 3.3 and 3.4). Can you please try with the latest version?

    Thanks,
    Sheila

  • sezimmersezimmer Bronx, New YorkMember

    Hi Sheila,

    I just tried it with the version 3.4 using this command:

    java -jar -Xmx26g /public/apps/GenomeAnalysisTK/3.4-0/java.1.7.0_67/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/sezimmer/variantCallingPipeline/referenceFiles/fasta/mm9.fa -I /home/sezimmer/variantCallingPipeline/variantCallingD4/dedupFolder/D4.indel.recal.dedup.reordered.bam --dbsnp /home/sezimmer/variantCallingPipeline/referenceFiles/dbsnp_128.mm9.reorder.vcf --genotyping_mode DISCOVERY -mmq 30 -mbq 20 -stand_call_conf 20.0 -stand_emit_conf 20.0 -ERC BP_RESOLUTION --variant_index_type LINEAR --variant_index_parameter 128000 -L chrY -nt 1 -nct 1 -o /home/sezimmer/variantCallingPipeline/variantCallingD4/variants/gatk3.4/D4.allBases.chrY.vcf -bamout /home/sezimmer/variantCallingPipeline/variantCallingD4/variants/gatk3.4/D4.allBases.chrY.bam -forceActive -disableOptimizations

    However it did not work. I got the error "ERROR MESSAGE: READ_MAX_LENGTH must be > 0 but got 0". Then I tried the command a second time, without the "-disableOptimizations" paramter and did not get the error. Below is the output of the error. Do you know why "-disableOptimizations" causes this to happen in version 3.4 but not 3.3?

    Thank you.

    Best,
    Sam

    INFO 20:43:23,682 HelpFormatter - --------------------------------------------------------------------------------
    INFO 20:43:23,685 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41
    INFO 20:43:23,685 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 20:43:23,685 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 20:43:23,689 HelpFormatter - Program Args: -T HaplotypeCaller -R /home/sezimmer/variantCallingPipeline/referenceFiles/fasta/mm9.fa -I /home/sezimmer/variantCallingPipeline/variantCallingD4/dedupFolder/D4.indel.recal.dedup.reordered.bam --dbsnp /home/sezimmer/variantCallingPipeline/referenceFiles/dbsnp_128.mm9.reorder.vcf --genotyping_mode DISCOVERY -mmq 30 -mbq 20 -stand_call_conf 20.0 -stand_emit_conf 20.0 -ERC BP_RESOLUTION --variant_index_type LINEAR --variant_index_parameter 128000 -L chrY -nt 1 -nct 1 -o /home/sezimmer/variantCallingPipeline/variantCallingD4/variants/gatk3.4/D4.allBases.chrY.vcf -bamout /home/sezimmer/variantCallingPipeline/variantCallingD4/variants/gatk3.4/D4.allBases.chrY.bam -forceActive -disableOptimizations
    INFO 20:43:23,693 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.23.3.el6.657g0001.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_67-b01.
    INFO 20:43:23,694 HelpFormatter - Date/Time: 2016/01/12 20:43:23
    INFO 20:43:23,694 HelpFormatter - --------------------------------------------------------------------------------
    INFO 20:43:23,694 HelpFormatter - --------------------------------------------------------------------------------
    WARN 20:43:23,716 GATKVCFUtils - Naming your output file using the .g.vcf extension will automatically set the appropriate values for --variant_index_type and --variant_index_parameter
    INFO 20:43:24,464 GenomeAnalysisEngine - Strictness is SILENT
    INFO 20:43:24,514 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500
    INFO 20:43:24,523 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 20:43:24,562 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04
    INFO 20:43:24,569 HCMappingQualityFilter - Filtering out reads with MAPQ < 30
    INFO 20:43:24,689 IntervalUtils - Processing 15902555 bp from intervals
    INFO 20:43:24,759 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
    INFO 20:43:24,916 GenomeAnalysisEngine - Done preparing for traversal
    INFO 20:43:24,917 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 20:43:24,917 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 20:43:24,918 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime
    INFO 20:43:24,919 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
    INFO 20:43:24,919 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence output
    INFO 20:43:25,041 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
    WARN 20:43:27,750 PairHMMLikelihoodCalculationEngine$1 - Failed to load native library for VectorLoglessPairHMM - using Java implementation of LOGLESS_CACHING
    INFO 20:43:28,321 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Network is unreachable
    INFO 20:43:28,322 HttpMethodDirector - Retrying request
    INFO 20:43:28,323 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Network is unreachable
    INFO 20:43:28,323 HttpMethodDirector - Retrying request
    INFO 20:43:28,325 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Network is unreachable
    INFO 20:43:28,325 HttpMethodDirector - Retrying request
    INFO 20:43:28,326 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Network is unreachable
    INFO 20:43:28,327 HttpMethodDirector - Retrying request
    INFO 20:43:28,328 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Network is unreachable
    INFO 20:43:28,328 HttpMethodDirector - Retrying request

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalArgumentException: READ_MAX_LENGTH must be > 0 but got 0
    at org.broadinstitute.gatk.utils.pairhmm.PairHMM.initialize(PairHMM.java:126)
    at org.broadinstitute.gatk.utils.pairhmm.N2MemoryPairHMM.initialize(N2MemoryPairHMM.java:60)
    at org.broadinstitute.gatk.utils.pairhmm.LoglessPairHMM.initialize(LoglessPairHMM.java:66)
    at org.broadinstitute.gatk.utils.pairhmm.PairHMM.initialize(PairHMM.java:159)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.initializePairHMM(PairHMMLikelihoodCalculationEngine.java:268)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(PairHMMLikelihoodCalculationEngine.java:283)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:870)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:226)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.4-0-g7e26428):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: READ_MAX_LENGTH must be > 0 but got 0
    ERROR ------------------------------------------------------------------------------------------
  • sezimmersezimmer Bronx, New YorkMember

    Hi Sheila,

    I just tried the command again, but this time with GATK version 3.4-46. It works with this version. Was the error just a bug in version 3.4-0?

    Thank for your help!

    Best,
    Sam

Sign In or Register to comment.