To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Raising the heap size in the CNV discovery pipeline


I am trying to run the cnv discovery pipeline using the newest release of genomsetrip on ~400 bam files but I keep getting the following error at various locations of the pipeline, shown here for stage6:

ERROR 14:30:20,951 FunctionEdge - Error: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '' '-cp' '/u/home/a/alden/svtoolkit/lib/SVToolkit.jar:/u/home/a/alden/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/u/home/a/alden/svtoolkit/lib/gatk/Queue.jar' '-cp' '/u/home/a/alden/svtoolkit/lib/SVToolkit.jar:/u/home/a/alden/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/u/home/a/alden/svtoolkit/lib/gatk/Queue.jar' '' '-I' '/u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/bam_headers/merged_headers.bam' '-O' '/u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage6/seq_chr2/seq_chr2.merged_headers.bam' '-L' 'NONE' '-sample' '/u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage5/eval/DiscoverySamples.list'
ERROR 14:30:20,960 FunctionEdge - Contents of /u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage6/seq_chr2/logs/CNVDiscoveryStage6-1.out:
INFO 14:29:38,959 HelpFormatter - ---------------------------------------------------------
INFO 14:29:38,962 HelpFormatter - Program Name:
INFO 14:29:38,966 HelpFormatter - Program Args: -I /u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/bam_headers/merged_headers.bam -O /u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage6/seq_chr2/seq_chr2.merged_headers.bam -L NONE -sample /u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage5/eval/DiscoverySamples.list
INFO 14:29:38,971 HelpFormatter - Executing as alden@n7261 on Linux 2.6.32-573.26.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_77-b03.
INFO 14:29:38,972 HelpFormatter - Date/Time: 2016/10/31 14:29:38
INFO 14:29:38,972 HelpFormatter - ---------------------------------------------------------
INFO 14:29:38,972 HelpFormatter - ---------------------------------------------------------
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(
at java.lang.AbstractStringBuilder.expandCapacity(
at java.lang.AbstractStringBuilder.ensureCapacityInternal(
at java.lang.AbstractStringBuilder.append(
at java.lang.StringBuilder.append(
at htsjdk.samtools.SAMTextHeaderCodec.advanceLine(
at htsjdk.samtools.SAMTextHeaderCodec.decode(
at htsjdk.samtools.SAMFileHeader.clone(
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(

I've noticed that the I think the obvious fix is to raise the maximum heap size using -Xmx flag, which I have set in the queuescript that I am using to run the cnv discovery portion of genomestrip (I have attached my *.sh file to this post as a text file, which was basically pilfered from the installtest example.

However, this value does not seem to be set in the downstream java commands initiated by the pipeline (I notice in the output that it is only 2g instead of 4g)

How can I raise this value?

Thanks so much for your attention,



  • I know that genomestrip sets sensible values for Xmx at various stages of the pipeline. I wonder if this error could be related to the fact that my *.bam headers are somewhat absurdly large??

    They are about 1200 lines for each sample.

    I know if I run the entire sample set broken up into smaller batches, the entire pipeline seems to run fine. I can remove all the extraneous @PG header lines, but I just wanted to avoid rewriting 400+ WGS samples...

    Is there any easy way to change the sensible defaults?



Sign In or Register to comment.