If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Raising the heap size in the CNV discovery pipeline


I am trying to run the cnv discovery pipeline using the newest release of genomsetrip on ~400 bam files but I keep getting the following error at various locations of the pipeline, shown here for stage6:

ERROR 14:30:20,951 FunctionEdge - Error: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '' '-cp' '/u/home/a/alden/svtoolkit/lib/SVToolkit.jar:/u/home/a/alden/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/u/home/a/alden/svtoolkit/lib/gatk/Queue.jar' '-cp' '/u/home/a/alden/svtoolkit/lib/SVToolkit.jar:/u/home/a/alden/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/u/home/a/alden/svtoolkit/lib/gatk/Queue.jar' '' '-I' '/u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/bam_headers/merged_headers.bam' '-O' '/u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage6/seq_chr2/seq_chr2.merged_headers.bam' '-L' 'NONE' '-sample' '/u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage5/eval/DiscoverySamples.list'
ERROR 14:30:20,960 FunctionEdge - Contents of /u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage6/seq_chr2/logs/CNVDiscoveryStage6-1.out:
INFO 14:29:38,959 HelpFormatter - ---------------------------------------------------------
INFO 14:29:38,962 HelpFormatter - Program Name:
INFO 14:29:38,966 HelpFormatter - Program Args: -I /u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/bam_headers/merged_headers.bam -O /u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage6/seq_chr2/seq_chr2.merged_headers.bam -L NONE -sample /u/home/a/alden/eeskin2/bipolar_sv/svtoolkit/cleaned_scripts/cnvdiscovery_batch1/cnv_stage5/eval/DiscoverySamples.list
INFO 14:29:38,971 HelpFormatter - Executing as [email protected] on Linux 2.6.32-573.26.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_77-b03.
INFO 14:29:38,972 HelpFormatter - Date/Time: 2016/10/31 14:29:38
INFO 14:29:38,972 HelpFormatter - ---------------------------------------------------------
INFO 14:29:38,972 HelpFormatter - ---------------------------------------------------------
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(
at java.lang.AbstractStringBuilder.expandCapacity(
at java.lang.AbstractStringBuilder.ensureCapacityInternal(
at java.lang.AbstractStringBuilder.append(
at java.lang.StringBuilder.append(
at htsjdk.samtools.SAMTextHeaderCodec.advanceLine(
at htsjdk.samtools.SAMTextHeaderCodec.decode(
at htsjdk.samtools.SAMFileHeader.clone(
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(

I've noticed that the I think the obvious fix is to raise the maximum heap size using -Xmx flag, which I have set in the queuescript that I am using to run the cnv discovery portion of genomestrip (I have attached my *.sh file to this post as a text file, which was basically pilfered from the installtest example.

However, this value does not seem to be set in the downstream java commands initiated by the pipeline (I notice in the output that it is only 2g instead of 4g)

How can I raise this value?

Thanks so much for your attention,



  • aldenalden Member

    I know that genomestrip sets sensible values for Xmx at various stages of the pipeline. I wonder if this error could be related to the fact that my *.bam headers are somewhat absurdly large??

    They are about 1200 lines for each sample.

    I know if I run the entire sample set broken up into smaller batches, the entire pipeline seems to run fine. I can remove all the extraneous @PG header lines, but I just wanted to avoid rewriting 400+ WGS samples...

    Is there any easy way to change the sensible defaults?



Sign In or Register to comment.