Raising the heap size in the CNV discovery pipeline

Hi,

I am trying to run the cnv discovery pipeline using the newest release of genomsetrip on ~400 bam files but I keep getting the following error at various locations of the pipeline, shown here for stage6:

ERROR 14:30:20,951 FunctionEdge - Error: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/u/nobackup/eeskin2/alden/bipolar_sv/svtoolkit/cleaned_scripts/.queue/tmp' '-cp' '/u/home/a/alden/svtoolkit/lib/SVToolkit.jar:/u/home/a/alden/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/u/home/a/alden/svtoolkit/lib/gatk/Queue.jar' '-cp' '/u/home/a/alden/svtoolkit/lib/SVToolkit.jar:/u/home/a/alden/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/u/home/a/alden/svtoolkit/lib/gatk/Queue.jar' 'org.broadinstitute.sv.apps.ExtractBAMSubset' '-I' '/u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/bam_headers/merged_headers.bam' '-O' '/u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage6/seq_chr2/seq_chr2.merged_headers.bam' '-L' 'NONE' '-sample' '/u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage5/eval/DiscoverySamples.list'
ERROR 14:30:20,960 FunctionEdge - Contents of /u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage6/seq_chr2/logs/CNVDiscoveryStage6-1.out:
INFO 14:29:38,959 HelpFormatter - ---------------------------------------------------------
INFO 14:29:38,962 HelpFormatter - Program Name: org.broadinstitute.sv.apps.ExtractBAMSubset
INFO 14:29:38,966 HelpFormatter - Program Args: -I /u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/bam_headers/merged_headers.bam -O /u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage6/seq_chr2/seq_chr2.merged_headers.bam -L NONE -sample /u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage5/eval/DiscoverySamples.list
INFO 14:29:38,971 HelpFormatter - Executing as [email protected] on Linux 2.6.32-573.26.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_77-b03.
INFO 14:29:38,972 HelpFormatter - Date/Time: 2016/10/31 14:29:38
INFO 14:29:38,972 HelpFormatter - ---------------------------------------------------------
INFO 14:29:38,972 HelpFormatter - ---------------------------------------------------------
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at htsjdk.samtools.SAMTextHeaderCodec.advanceLine(SAMTextHeaderCodec.java:131)
at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:86)
at htsjdk.samtools.SAMFileHeader.clone(SAMFileHeader.java:355)
at org.broadinstitute.sv.util.sam.SAMUtils.filterHeaderToSampleSet(SAMUtils.java:151)
at org.broadinstitute.sv.util.sam.SAMUtils.getMergedSAMFileHeader(SAMUtils.java:89)
at org.broadinstitute.sv.apps.ExtractBAMSubset.run(ExtractBAMSubset.java:86)
at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:54)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.sv.commandline.CommandLineProgram.runAndReturnResult(CommandLineProgram.java:29)
at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:25)
at org.broadinstitute.sv.apps.ExtractBAMSubset.main(ExtractBAMSubset.java:56)
)

I've noticed that the I think the obvious fix is to raise the maximum heap size using -Xmx flag, which I have set in the queuescript that I am using to run the cnv discovery portion of genomestrip (I have attached my *.sh file to this post as a text file, which was basically pilfered from the installtest example.

However, this value does not seem to be set in the downstream java commands initiated by the pipeline (I notice in the output that it is only 2g instead of 4g)

How can I raise this value?

Thanks so much for your attention,

alden

Answers

  • aldenalden Member

    I know that genomestrip sets sensible values for Xmx at various stages of the pipeline. I wonder if this error could be related to the fact that my *.bam headers are somewhat absurdly large??

    They are about 1200 lines for each sample.

    I know if I run the entire sample set broken up into smaller batches, the entire pipeline seems to run fine. I can remove all the extraneous @PG header lines, but I just wanted to avoid rewriting 400+ WGS samples...

    Is there any easy way to change the sensible defaults?

    Thanks,,

    alden

Sign In or Register to comment.