Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

FilterSamReads: java.lang.OutOfMemoryError: Java heap space

How to estimate the memory use (-Xmx?G) when using FilterSamReads? I always get the errors:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.regex.Pattern$Start.(Pattern.java:3450)
at java.util.regex.Pattern.compile(Pattern.java:1716)
at java.util.regex.Pattern.(Pattern.java:1351)
at java.util.regex.Pattern.compile(Pattern.java:1028)
at java.lang.String.split(String.java:2380)
at java.lang.String.split(String.java:2422)
at htsjdk.samtools.filter.ReadNameFilter.(ReadNameFilter.java:58)
at picard.sam.FilterSamReads.doWork(FilterSamReads.java:233)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @Yingya,

    The combination of these two documents should help you determine the memory available on your system and the memory use of processes from the GATK Best Practices. I know these don't talk about FilterSamReads specifically but they can serve as general guidelines in memory similar processes would require. For example, I imagine FilterSamReads would be similar in memory requirement to GATK's PrintReads.

    • To determine memory available on your system, take a look at instructions in section 2 of this document.
    • For memory use of GATK tools, see Blog#7249's CPU utilization chart. There is a link to an updated Intel white paper in Blog#8605 with a similar utilization chart.

    I hope these are helpful.

Sign In or Register to comment.