Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

SVCNVDiscovery Error: java.lang.RuntimeException: Read count cache file

Dear Genome STRiP users,

I am running SVCNVDiscovery pipeline to 10686 samples with successfully completed SVPreprocesse. I met two "java.lang.RuntimeException" errors in the output.

Exception in thread "main" java.lang.RuntimeException: Read count cache file file:///proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir/rccache.bin is truncated
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:65)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.sv.commandline.CommandLineProgram.runAndReturnResult(CommandLineProgram.java:29)
    at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:25)
    at org.broadinstitute.sv.genotyping.RefineCNVBoundaries.main(RefineCNVBoundaries.java:133)
Caused by: java.lang.RuntimeException: Read count cache file file:///proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir/rccache.bin is truncated
    at org.broadinstitute.sv.metadata.depth.ReadCountFileReader$ReadCountDataIterator.decodeRow(ReadCountFileReader.java:516)
    at org.broadinstitute.sv.metadata.depth.ReadCountFileReader$ReadCountDataIterator.getReadCacheItems(ReadCountFileReader.java:470)
    at org.broadinstitute.sv.metadata.depth.ReadCountFileReader$ReadCountDataIterator.aggregateSampleReadCounts(ReadCountFileReader.java:476)
    at org.broadinstitute.sv.metadata.depth.ReadCountFileReader.getReadCounts(ReadCountFileReader.java:266)
    at org.broadinstitute.sv.common.ReadCountCache.getReadCounts(ReadCountCache.java:100)
    at org.broadinstitute.sv.genotyping.GenotypingDepthModule.computeRefReadCounts(GenotypingDepthModule.java:295)
    at org.broadinstitute.sv.genotyping.GenotypingDepthModule.computeRefReadCounts(GenotypingDepthModule.java:245)
    at org.broadinstitute.sv.genotyping.GenotypingDepthModule.getReadCounts(GenotypingDepthModule.java:230)
    at org.broadinstitute.sv.genotyping.GenotypingDepthModule.getCnpReadCounts(GenotypingDepthModule.java:217)
    at org.broadinstitute.sv.genotyping.GenotypingDepthModule.genotypeCnp(GenotypingDepthModule.java:141)
    at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.genotypeCnp(BoundaryRefinementAlgorithm.java:287)
    at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.refineOneBoundary(BoundaryRefinementAlgorithm.java:633)
    at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.refineBoundaryStep(BoundaryRefinementAlgorithm.java:553)
    at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.refineBoundaries(BoundaryRefinementAlgorithm.java:536)
    at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.processVariant(BoundaryRefinementAlgorithm.java:232)
    at org.broadinstitute.sv.genotyping.RefineCNVBoundaries.run(RefineCNVBoundaries.java:204)
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:54)
    ... 5 more 
INFO  23:29:51,212 QGraph - Writing incremental jobs reports... 
INFO  23:29:51,213 QGraph - 4 Pend, 5 Run, 1 Fail, 33 Done 
INFO  23:30:51,242 FunctionEdge - Done:  'java'  '-Xmx102400m' ...
...
INFO  23:25:21,377 MetaData - Opened metadata. 
INFO  23:25:21,436 RefineCNVBoundaries - Initializing input data set ... 
INFO  23:25:31,820 RefineCNVBoundaries - Initialized data set: 1 file, 121337 read groups, 10148 samples. 
INFO  23:25:32,452 ReadCountCache - Initializing read count cache with 1 file. 
mInputFile=file:///proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir/rccache.bin mCurrentSequenceName=chr16; mCurrentPosition=500001
Exception in thread "main" java.lang.RuntimeException: Read count cache file file:///proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir/rccache.bin is truncated
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:65)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.sv.commandline.CommandLineProgram.runAndReturnResult(CommandLineProgram.java:29)
    at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:25)
    at org.broadinstitute.sv.genotyping.RefineCNVBoundaries.main(RefineCNVBoundaries.java:133)
Caused by: java.lang.RuntimeException: Read count cache file file:///proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir/rccache.bin is truncated
    at org.broadinstitute.sv.metadata.depth.ReadCountFileReader$ReadCountDataIterator.decodeRow(ReadCountFileReader.java:516)
    at org.broadinstitute.sv.metadata.depth.ReadCountFileReader$ReadCountDataIterator.getReadCacheItems(ReadCountFileReader.java:470)
    at org.broadinstitute.sv.metadata.depth.ReadCountFileReader$ReadCountDataIterator.aggregateSampleReadCounts(ReadCountFileReader.java:476)
    at org.broadinstitute.sv.metadata.depth.ReadCountFileReader.getReadCounts(ReadCountFileReader.java:266)
    at org.broadinstitute.sv.common.ReadCountCache.getReadCounts(ReadCountCache.java:100)
    at org.broadinstitute.sv.genotyping.GenotypingDepthModule.computeRefReadCounts(GenotypingDepthModule.java:295)
    at org.broadinstitute.sv.genotyping.GenotypingDepthModule.computeRefReadCounts(GenotypingDepthModule.java:245)
    at org.broadinstitute.sv.genotyping.GenotypingDepthModule.getReadCounts(GenotypingDepthModule.java:230)
    at org.broadinstitute.sv.genotyping.GenotypingDepthModule.getCnpReadCounts(GenotypingDepthModule.java:217)
    at org.broadinstitute.sv.genotyping.GenotypingDepthModule.genotypeCnp(GenotypingDepthModule.java:141)
    at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.genotypeCnp(BoundaryRefinementAlgorithm.java:287)
    at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.refineOneBoundary(BoundaryRefinementAlgorithm.java:633)
    at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.refineBoundaryStep(BoundaryRefinementAlgorithm.java:558)
    at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.refineBoundaries(BoundaryRefinementAlgorithm.java:536)
    at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.processVariant(BoundaryRefinementAlgorithm.java:232)
    at org.broadinstitute.sv.genotyping.RefineCNVBoundaries.run(RefineCNVBoundaries.java:204)
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:54)
    ... 5 more 
INFO  23:42:51,601 QGraph - Writing incremental jobs reports... 
INFO  23:42:51,602 QGraph - 4 Pend, 0 Run, 2 Fail, 37 Done 
INFO  23:42:51,605 QCommandLine - Writing final jobs report... 
INFO  23:42:51,606 QCommandLine - Done with errors 
INFO  23:42:51,609 QGraph - ------- 
INFO  23:42:51,609 QGraph - Failed:   'java'  '-Xmx102400m'  ...
...

Has anyone met such error? Does it related to the original .bam files or my SVPreprocess is broken? May I have your suggestions? Thank you very much.

Best regards,
Wusheng

Best Answers

  • Accepted Answer

    Hi @bhandsaker ,

    I am not sure if I understand you accurately -- do you mean that when I set -L chr16:x0-y0, and the content of -intervalList for SVCNVDiscovery is chr16:x1-y1, if [x1, y1] ∈ [x0, y0], I should not get the error above; but if [x1, y1] ∉ [x0, y0], especially y1 > y0, then I will get the error shown above, correct?

    Here is what I set with regards to -L and -intervalList in SVPreprocess and SVCNVDiscovery

    SVPreprocess

    -L chr16:1-500000
    

    SVCNVDiscovery

        -intervalList ${gs_dir}/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_1-500000_interval.list \
    

    And the content of freeze6-AA_chr16_1-500000_interval.list is

    chr16:1-500000
    

    Briefly, I set the intervalList and -L exactly the same. And I think this is acceptable or a normal way to set -L and -intervalList, right? But I got the error shown as above.

    Anyway, triggered by your comment, I fixed the freeze6-AA_chr16_1-500000_interval.list to freeze6-AA_chr16_1-400000_interval.list and the content of freeze6-AA_chr16_1-400000_interval.list is

    chr16:1-400000
    

    to ensure that intervalList is IN the range of -L range.

    Then I completed the SVCNVDiscovery successfully ... amazing...

    Does it mean that this is a bug? May I have your comment? Because such situation also happened before (https://gatkforums.broadinstitute.org/gatk/discussion/12870/error-when-running-cnvdiscovery-in-a-batch-like-way-read-count-cache-file-is-truncated#latest).

    Anyway, these two post (the post we are in and https://gatkforums.broadinstitute.org/gatk/discussion/12870/error-when-running-cnvdiscovery-in-a-batch-like-way-read-count-cache-file-is-truncated#latest) are all based on your v2.00.1833 and I just found that there is a v2.00.1902. I will try the newer version and report if the same thing happened. Thank you very much.

    Best regards,
    Wusheng

Answers

  • zhangwushengzhangwusheng Member

    Updated error info:

    I re-run SVPreprocess successfully in one time (the former SVPreprocess completed with running 3 times). Then run SVCNVDiscovery based on the new SVPreprocess metadata. Nearly (but not exactly) the same error happened:

    The first error (no specific info)

    INFO  08:55:25,719 QGraph - 5 Pend, 1 Run, 0 Fail, 9 Done 
    ERROR 09:57:22,940 FunctionEdge - Error:  'java'  '-Xmx102400m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/proj/yunligrp/users/minzhi/gs/gs_tempdir/svcnv_tmp'  '-cp' '/proj/yunligrp/users/minzhi/svtoolkit/lib/SVToolkit.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/Queue.jar'  '-cp' '/proj/yunligrp/users/minzhi/svtoolkit/lib/SVToolkit.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/Queue.jar'  'org.broadinstitute.gatk.queue.QCommandLine'  '-cp' '/proj/yunligrp/users/minzhi/svtoolkit/lib/SVToolkit.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/Queue.jar'  '-S' '/proj/yunligrp/users/minzhi/svtoolkit/qscript/discovery/cnv/CNVDiscoveryStage7.q'  '-S' '/proj/yunligrp/users/minzhi/svtoolkit/qscript/discovery/cnv/CNVDiscoveryStageBase.q' '-S' '/proj/yunligrp/users/minzhi/svtoolkit/qscript/discovery/cnv/CNVDiscoveryGenotyper.q'  '-S' '/proj/yunligrp/users/minzhi/svtoolkit/qscript/SVQScript.q'  '-gatk' '/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar'  '-jobLogDir' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage7/seq_chr16/logs'  '-memLimit' '100.0'  '-jobRunner' 'Drmaa'  '-gatkJobRunner' 'Drmaa'  '-jobNative' '--mem=100000 --time=08:00:00 --nodes=1 --ntasks-per-node=8'  '-jobQueue' 'general'  -run  '-sequenceName' 'chr16'  '-runDirectory' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage7/seq_chr16'  '-sentinelFile' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_sentinel_files/stage_7_seq_chr16.sent'  --disableJobReport  '-configFile' '/proj/yunligrp/users/minzhi/svtoolkit/conf/genstrip_parameters.txt'  '-R' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta'  '-ploidyMapFile' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_standard_ploidy.map'  '-genomeMaskFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.svmask.fasta' '-genomeMaskFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.lcmask.fasta'  '-copyNumberMaskFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.gcmask.fasta'  '-readDepthMaskFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.rdmask.bed'  '-genderMaskBedFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.gendermask.bed'  '-vdjBedFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.vdjregions.bed'  '-genderMapFile' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_full_all-male_gender.map'  '-md' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir'  -disableGATKTraversal  '-I' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage6/seq_chr16/seq_chr16.merged_headers.bam'  '-vcf' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage4/seq_chr16/seq_chr16.merged.genotypes.vcf.gz'  '-siteListFile' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage6/seq_chr16/eval/SelectedVariants.list'  '-boundaryPrecision' '100'  '-maximumReferenceGapLength' '1000'  '-minimumRefinedLength' '500'  '-brigSiteVcfFile' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage7/seq_chr16/seq_chr16.brig.sites.vcf.gz'  
    ERROR 09:57:22,950 FunctionEdge - Contents of /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/logs/CNVDiscoveryPipeline-10.out:
    INFO  09:20:44,142 FunctionEdge - Done:  'java'  '-Xmx102400m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/proj/yunligrp/users/minzhi/gs/gs_tempdir/svcnv_tmp'  '-cp' '/proj/yunligrp/users/minzhi/svtoolkit/lib/SVToolkit.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/Queue.jar'  '-cp' '/proj/yunligrp/users/minzhi/svtoolkit/lib/SVToolkit.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/Queue.jar'  'org.broadinstitute.sv.genotyping.RefineCNVBoundaries'  '-I' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage6/seq_chr16/seq_chr16.merged_headers.bam'  '-O' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage7/seq_chr16/P0031/seq_chr16.merged.brig.vcf'  '-R' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta'  '-md' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir'  '-configFile' '/proj/yunligrp/users/minzhi/svtoolkit/conf/genstrip_parameters.txt' '-configFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.gsparams.txt'  '-P' 'depth.readCountCacheIgnoreGenomeMask:true'  '-genomeMaskFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.svmask.fasta' '-genomeMaskFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.lcmask.fasta'  '-genderMapFile' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_full_all-male_gender.map'  '-ploidyMapFile' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_standard_ploidy.map'  '-vcf' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage4/seq_chr16/seq_chr16.merged.genotypes.vcf.gz'  '-site' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage7/seq_chr16/P0031.sites.list'  '-boundaryPrecision' '100'  '-minimumRefinedLength' '500'  '-maximumReferenceGapLength' '1000'  
    INFO  09:20:44,143 QGraph - Writing incremental jobs reports... 
    INFO  09:20:44,145 QGraph - 4 Pend, 14 Run, 1 Fail, 24 Done 
    

    Second error: Caused by java.lang.RuntimeException

    INFO  09:23:14,303 QGraph - 4 Pend, 11 Run, 1 Fail, 27 Done 
    ERROR 09:24:14,360 FunctionEdge - Error:  'java'  '-Xmx102400m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/proj/yunligrp/users/minzhi/gs/gs_tempdir/svcnv_tmp'  '-cp' '/proj/yunligrp/users/minzhi/svtoolkit/lib/SVToolkit.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/Queue.jar'  '-cp' '/proj/yunligrp/users/minzhi/svtoolkit/lib/SVToolkit.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/Queue.jar'  'org.broadinstitute.sv.genotyping.RefineCNVBoundaries'  '-I' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage6/seq_chr16/seq_chr16.merged_headers.bam'  '-O' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage7/seq_chr16/P0037/seq_chr16.merged.brig.vcf'  '-R' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta'  '-md' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir'  '-configFile' '/proj/yunligrp/users/minzhi/svtoolkit/conf/genstrip_parameters.txt' '-configFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.gsparams.txt'  '-P' 'depth.readCountCacheIgnoreGenomeMask:true'  '-genomeMaskFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.svmask.fasta' '-genomeMaskFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.lcmask.fasta'  '-genderMapFile' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_full_all-male_gender.map'  '-ploidyMapFile' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_standard_ploidy.map'  '-vcf' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage4/seq_chr16/seq_chr16.merged.genotypes.vcf.gz'  '-site' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage7/seq_chr16/P0037.sites.list'  '-boundaryPrecision' '100'  '-minimumRefinedLength' '500'  '-maximumReferenceGapLength' '1000'  
    ERROR 09:24:14,364 FunctionEdge - Contents of /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage7/seq_chr16/logs/CNVDiscoveryStage7-38.out:
    INFO  09:06:37,401 HelpFormatter - ------------------------------------------------------------------ 
    INFO  09:06:37,403 HelpFormatter - Program Name: org.broadinstitute.sv.genotyping.RefineCNVBoundaries 
    INFO  09:06:37,406 HelpFormatter - Program Args: -I /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage6/seq_chr16/seq_chr16.merged_headers.bam -O /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage7/seq_chr16/P0037/seq_chr16.merged.brig.vcf -R /proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta -md /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir -configFile /proj/yunligrp/users/minzhi/svtoolkit/conf/genstrip_parameters.txt -configFile /proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.gsparams.txt -P depth.readCountCacheIgnoreGenomeMask:true -genomeMaskFile /proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.svmask.fasta -genomeMaskFile /proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.lcmask.fasta -genderMapFile /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_full_all-male_gender.map -ploidyMapFile /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_standard_ploidy.map -vcf /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage4/seq_chr16/seq_chr16.merged.genotypes.vcf.gz -site /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage7/seq_chr16/P0037.sites.list -boundaryPrecision 100 -minimumRefinedLength 500 -maximumReferenceGapLength 1000 
    INFO  09:06:37,409 HelpFormatter - Executing as [email protected] on Linux 3.10.0-957.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_191-b12. 
    INFO  09:06:37,409 HelpFormatter - Date/Time: 2019/03/04 09:06:37 
    INFO  09:06:37,409 HelpFormatter - ------------------------------------------------------------------ 
    INFO  09:06:37,409 HelpFormatter - ------------------------------------------------------------------ 
    INFO  09:06:37,415 RefineCNVBoundaries - Opening reference sequence ... 
    INFO  09:06:37,416 RefineCNVBoundaries - Opened reference sequence. 
    INFO  09:06:37,426 RefineCNVBoundaries - Opening genome mask ... 
    INFO  09:06:37,427 RefineCNVBoundaries - Opened genome mask. 
    INFO  09:06:37,429 MetaData - Opening metadata ...  
    INFO  09:06:37,429 MetaData - Adding metadata location /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir ... 
    INFO  09:06:37,432 MetaData - Opened metadata. 
    INFO  09:06:37,473 RefineCNVBoundaries - Initializing input data set ... 
    INFO  09:06:46,690 RefineCNVBoundaries - Initialized data set: 1 file, 121337 read groups, 10148 samples. 
    INFO  09:06:47,236 ReadCountCache - Initializing read count cache with 1 file. 
    mInputFile=file:///proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir/rccache.bin mCurrentSequenceName=chr16; mCurrentPosition=500001
    Exception in thread "main" java.lang.RuntimeException: Read count cache file file:///proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir/rccache.bin is truncated
        at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:65)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
        at org.broadinstitute.sv.commandline.CommandLineProgram.runAndReturnResult(CommandLineProgram.java:29)
        at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:25)
        at org.broadinstitute.sv.genotyping.RefineCNVBoundaries.main(RefineCNVBoundaries.java:133)
    Caused by: java.lang.RuntimeException: Read count cache file file:///proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir/rccache.bin is truncated
        at org.broadinstitute.sv.metadata.depth.ReadCountFileReader$ReadCountDataIterator.decodeRow(ReadCountFileReader.java:516)
        at org.broadinstitute.sv.metadata.depth.ReadCountFileReader$ReadCountDataIterator.getReadCacheItems(ReadCountFileReader.java:470)
        at org.broadinstitute.sv.metadata.depth.ReadCountFileReader$ReadCountDataIterator.aggregateSampleReadCounts(ReadCountFileReader.java:476)
        at org.broadinstitute.sv.metadata.depth.ReadCountFileReader.getReadCounts(ReadCountFileReader.java:266)
        at org.broadinstitute.sv.common.ReadCountCache.getReadCounts(ReadCountCache.java:100)
        at org.broadinstitute.sv.genotyping.GenotypingDepthModule.computeRefReadCounts(GenotypingDepthModule.java:295)
        at org.broadinstitute.sv.genotyping.GenotypingDepthModule.computeRefReadCounts(GenotypingDepthModule.java:245)
        at org.broadinstitute.sv.genotyping.GenotypingDepthModule.getReadCounts(GenotypingDepthModule.java:230)
        at org.broadinstitute.sv.genotyping.GenotypingDepthModule.getCnpReadCounts(GenotypingDepthModule.java:217)
        at org.broadinstitute.sv.genotyping.GenotypingDepthModule.genotypeCnp(GenotypingDepthModule.java:141)
        at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.genotypeCnp(BoundaryRefinementAlgorithm.java:287)
        at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.refineOneBoundary(BoundaryRefinementAlgorithm.java:633)
        at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.refineBoundaryStep(BoundaryRefinementAlgorithm.java:558)
        at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.refineBoundaries(BoundaryRefinementAlgorithm.java:536)
        at org.broadinstitute.sv.genotyping.BoundaryRefinementAlgorithm.processVariant(BoundaryRefinementAlgorithm.java:232)
        at org.broadinstitute.sv.genotyping.RefineCNVBoundaries.run(RefineCNVBoundaries.java:204)
        at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:54)
        ... 5 more 
    INFO  09:24:14,364 QGraph - Writing incremental jobs reports... 
    INFO  09:24:14,365 QGraph - 4 Pend, 10 Run, 2 Fail, 27 Done 
    INFO  09:25:44,434 FunctionEdge - Done:  'java'  '-Xmx102400m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/proj/yunligrp/users/minzhi/gs/gs_tempdir/svcnv_tmp'  '-cp' '/proj/yunligrp/users/minzhi/svtoolkit/lib/SVToolkit.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/Queue.jar'  '-cp' '/proj/yunligrp/users/minzhi/svtoolkit/lib/SVToolkit.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/Queue.jar'  'org.broadinstitute.sv.genotyping.RefineCNVBoundaries'  '-I' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage6/seq_chr16/seq_chr16.merged_headers.bam'  '-O' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage7/seq_chr16/P0003/seq_chr16.merged.brig.vcf'  '-R' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta'  '-md' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir'  '-configFile' '/proj/yunligrp/users/minzhi/svtoolkit/conf/genstrip_parameters.txt' '-configFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.gsparams.txt'  '-P' 'depth.readCountCacheIgnoreGenomeMask:true'  '-genomeMaskFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.svmask.fasta' '-genomeMaskFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.lcmask.fasta'  '-genderMapFile' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_full_all-male_gender.map'  '-ploidyMapFile' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_standard_ploidy.map'  '-vcf' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage4/seq_chr16/seq_chr16.merged.genotypes.vcf.gz'  '-site' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svcnv_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/cnv_stage7/seq_chr16/P0003.sites.list'  '-boundaryPrecision' '100'  '-minimumRefinedLength' '500'  '-maximumReferenceGapLength' '1000'  
    INFO  09:25:44,435 QGraph - Writing incremental jobs reports... 
    INFO  09:25:44,436 QGraph - 4 Pend, 9 Run, 2 Fail, 28 Done 
    

    Especially, in the second error info, there is one line:

    mInputFile=file:///proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000_parallel_success/md_tempdir/rccache.bin mCurrentSequenceName=chr16; mCurrentPosition=500001
    

    But the content in my interval list is

    chr16:1-500000
    

    which is not consistent with the "mCurrentPosition=500001". Is this related to the error?

    And then I re-run SVCNVDiscovery pipeline again on the same run directory of the above SVCNVDiscovery rundir. However, the two errors in the new results come back to exactly the same thing as in the question I posted yesterday. Anyway, may I have your suggestion about this "Read count cache file" error? Thank you very much.

    Best regards,
    Wusheng

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    You say the content of your interval list is chr16:1-500000. But where are you applying this interval list? If you tried to preprocess just a portion of the genome by supplying -L to SVPreprocess, this may be the problem.
    `

  • zhangwushengzhangwusheng Member
    Accepted Answer

    Hi @bhandsaker ,

    I am not sure if I understand you accurately -- do you mean that when I set -L chr16:x0-y0, and the content of -intervalList for SVCNVDiscovery is chr16:x1-y1, if [x1, y1] ∈ [x0, y0], I should not get the error above; but if [x1, y1] ∉ [x0, y0], especially y1 > y0, then I will get the error shown above, correct?

    Here is what I set with regards to -L and -intervalList in SVPreprocess and SVCNVDiscovery

    SVPreprocess

    -L chr16:1-500000
    

    SVCNVDiscovery

        -intervalList ${gs_dir}/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_1-500000_interval.list \
    

    And the content of freeze6-AA_chr16_1-500000_interval.list is

    chr16:1-500000
    

    Briefly, I set the intervalList and -L exactly the same. And I think this is acceptable or a normal way to set -L and -intervalList, right? But I got the error shown as above.

    Anyway, triggered by your comment, I fixed the freeze6-AA_chr16_1-500000_interval.list to freeze6-AA_chr16_1-400000_interval.list and the content of freeze6-AA_chr16_1-400000_interval.list is

    chr16:1-400000
    

    to ensure that intervalList is IN the range of -L range.

    Then I completed the SVCNVDiscovery successfully ... amazing...

    Does it mean that this is a bug? May I have your comment? Because such situation also happened before (https://gatkforums.broadinstitute.org/gatk/discussion/12870/error-when-running-cnvdiscovery-in-a-batch-like-way-read-count-cache-file-is-truncated#latest).

    Anyway, these two post (the post we are in and https://gatkforums.broadinstitute.org/gatk/discussion/12870/error-when-running-cnvdiscovery-in-a-batch-like-way-read-count-cache-file-is-truncated#latest) are all based on your v2.00.1833 and I just found that there is a v2.00.1902. I will try the newer version and report if the same thing happened. Thank you very much.

    Best regards,
    Wusheng

  • zhangwushengzhangwusheng Member

    Hi @bhandsaker ,

    Thank you very much, and this is indeed very clear. The reason why I always set -L (and I think I have to set) is that the whole genome cram file is too large. For example, chr16 has 96 million base pairs, and the file size of each sample's cram will be ~20GB. When we have 10000 samples, it will be a disaster for the memory system of HPC. So what we did now is cut each sample's cram file to 1-5,000,000 bam file which tremendously reduces the memory burden. On the other hand, we know the range of the region we are interested in ( < 5000 base pairs). So I tried to run SVPreprocess on 1-500K and running SVDiscoevry on this 1-500K range to find a 5K region. Now, based on your comment and detailed explanation, I have three questions:

    First, when you say setting -L is "not recommended or supported behavior", do you mean that the main problem is that the searching area in the SVDiscovery might be out of range, or if there will be other serious problems that will affect the accuracy of the SVCNVDisocevery?

    Second, and if in my case, I know the potential range of the region I am interested in, will it be safe ( keeping the accuracy and no "out of bound" error) to set 10 times of the range as the -L flag of the SVPreprocess -- for example, I am interested in [x,y], so I set -L chr16: [x-10(y-x), y+10(y-x)]<\code> ? Or what -L range will you suggest?

    Third, I agree with you that this is not a bug. But since this is indeed a requirement for the customer and just as I mentioned above that the whole genome data (the whole gram) data will be a disaster to the memory system, allowing search in the SVPreprocess's -L range would be very helpful to some research projects. Is there any way to submit our customer's suggestion or request to the Genome STRiP such as this request "CNV pipeline is always constrained to stay within the -intervalList argument"? And if so, and if Genome STRiP project works as an open source software, I would be very happy to make any contribution I can to promote at least this request (I am familiar with high-level programming language and being familiar with Java).

    Thank you very much.

    Best regards,
    Wusheng

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    The reason I can't recommend using -L in preprocessing is that we gather statistics on different properties of the sample / libraries during preprocessing, using all of the data genome wide. If you use -L in preprocessing, then we are only gathering these statistics on one region of the genome, which may be non-representative. For example, if there is an aneuploidy on one chromosome, results will be skewed. I also don't know how much the loss of power (by sampling only a small fraction of the genome) will affect the results. To the extent I have looked at this, I have found that there can be considerable variation across a chromosome (so sampling just part of the chromosome can yield worrisomely different results). I wanted to try sampling like this to speed up preprocessing, but was not happy with the results.

    I would encourage you to use PlotGenotypingResults to see how clean the data looks, for example at some randomly chosen sites in the 1kb to 10kb length range.

Sign In or Register to comment.