Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Different "java.lang.OutOfMemoryError: Java heap space" at nearly the end of SVPreprocess

Dear Genome STRiP users,

I nearly completed the SVPreprocess to all 10686 samples:

INFO  01:17:45,002 QGraph - 4 Pend, 2 Run, 0 Fail, 32076 Done

However, I met a similar but not the same "java.lang.OutOfMemoryError: Java heap space" as before:

ERROR 01:19:31,326 FunctionEdge - Error:  'java'  '-Xmx2048m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/proj/yunligrp/users/minzhi/gs/gs_tempdir/svpre_tmp'  '-cp' '/proj/yunligrp/users/minzhi/svtoolkit/lib/SVToolkit.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/Queue.jar'  '-cp' '/proj/yunligrp/users/minzhi/svtoolkit/lib/SVToolkit.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/Queue.jar'  'org.broadinstitute.sv.apps.ComputeDepthProfiles'  '-O' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir/profiles_100Kb/profile_seq_chr16_100000.dat.gz'  '-I' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir/headers.bam'  '-configFile' '/proj/yunligrp/users/minzhi/svtoolkit/conf/genstrip_parameters.txt' '-configFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.gsparams.txt'  '-R' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta'  '-L' 'chr16:1-500000'  '-genomeMaskFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.svmask.fasta'  '-md' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir'  '-profileBinSize' '100000'  '-maximumReferenceGapLength' '10000'  
ERROR 01:19:31,333 FunctionEdge - Contents of /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/logs/SVPreprocess-32077.out:
INFO  01:18:49,973 HelpFormatter - ------------------------------------------------------------- 
INFO  01:18:49,976 HelpFormatter - Program Name: org.broadinstitute.sv.apps.ComputeDepthProfiles 
INFO  01:18:49,979 HelpFormatter - Program Args: -O /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir/profiles_100Kb/profile_seq_chr16_100000.dat.gz -I /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir/headers.bam -configFile /proj/yunligrp/users/minzhi/svtoolkit/conf/genstrip_parameters.txt -configFile /proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.gsparams.txt -R /proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta -L chr16:1-500000 -genomeMaskFile /proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.svmask.fasta -md /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir -profileBinSize 100000 -maximumReferenceGapLength 10000 
INFO  01:18:49,983 HelpFormatter - Executing as [email protected] on Linux 3.10.0-957.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_191-b12. 
INFO  01:18:49,984 HelpFormatter - Date/Time: 2019/03/01 01:18:49 
INFO  01:18:49,984 HelpFormatter - ------------------------------------------------------------- 
INFO  01:18:49,984 HelpFormatter - ------------------------------------------------------------- 
INFO  01:18:49,999 ComputeDepthProfiles - Opening reference sequence ... 
INFO  01:18:50,002 ComputeDepthProfiles - Opened reference sequence. 
INFO  01:18:50,003 ComputeDepthProfiles - Opening genome mask ... 
INFO  01:18:50,005 ComputeDepthProfiles - Opened genome mask. 
INFO  01:18:50,007 MetaData - Opening metadata ...  
INFO  01:18:50,007 MetaData - Adding metadata location /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir ... 
INFO  01:18:50,018 MetaData - Opened metadata. 
INFO  01:18:50,018 ComputeDepthProfiles - Opened metadata. 
INFO  01:18:50,018 ComputeDepthProfiles - Initializing input data set ... 
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at htsjdk.samtools.SAMTextHeaderCodec.advanceLine(SAMTextHeaderCodec.java:139)
    at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:94)
    at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:655)
    at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:298)
    at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:176)
    at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:376)
    at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:202)
    at org.broadinstitute.sv.dataset.SAMFileLocation.createSamFileReader(SAMFileLocation.java:97)
    at org.broadinstitute.sv.dataset.SAMLocation.createSamFileReader(SAMLocation.java:41)
    at org.broadinstitute.sv.dataset.DataSet.initInputFile(DataSet.java:138)
    at org.broadinstitute.sv.dataset.DataSet.initialize(DataSet.java:128)
    at org.broadinstitute.sv.apps.ComputeDepthProfiles.initDataSet(ComputeDepthProfiles.java:263)
    at org.broadinstitute.sv.apps.ComputeDepthProfiles.run(ComputeDepthProfiles.java:141)
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:54)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.sv.commandline.CommandLineProgram.runAndReturnResult(CommandLineProgram.java:29)
    at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:25)
    at org.broadinstitute.sv.apps.ComputeDepthProfiles.main(ComputeDepthProfiles.java:109) 

And after this error it shows

INFO  01:20:13,358 QGraph - 4 Pend, 1 Run, 1 Fail, 32076 Done 

And then, such error repeated again rather than directly exit.

ERROR 01:20:43,371 FunctionEdge - Error:  'java'  '-Xmx2048m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/proj/yunligrp/users/minzhi/gs/gs_tempdir/svpre_tmp'  '-cp' '/proj/yunligrp/users/minzhi/svtoolkit/lib/SVToolkit.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/Queue.jar'  '-cp' '/proj/yunligrp/users/minzhi/svtoolkit/lib/SVToolkit.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/yunligrp/users/minzhi/svtoolkit/lib/gatk/Queue.jar'  'org.broadinstitute.sv.apps.CallSampleGender'  '-O' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir/sample_gender.report.txt'  '-I' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir/headers.bam'  '-configFile' '/proj/yunligrp/users/minzhi/svtoolkit/conf/genstrip_parameters.txt' '-configFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.gsparams.txt'  '-R' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta'  '-md' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir'  '-genderBedFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.gendermask.bed'  '-genomeMaskFile' '/proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.svmask.fasta'  '-L' 'chr16:1-500000'  '-ploidyMapFile' '/proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_standard_ploidy.map'  
ERROR 01:20:43,375 FunctionEdge - Contents of /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/logs/SVPreprocess-32080.out:
INFO  01:19:29,576 HelpFormatter - --------------------------------------------------------- 
INFO  01:19:29,580 HelpFormatter - Program Name: org.broadinstitute.sv.apps.CallSampleGender 
INFO  01:19:29,587 HelpFormatter - Program Args: -O /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir/sample_gender.report.txt -I /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir/headers.bam -configFile /proj/yunligrp/users/minzhi/svtoolkit/conf/genstrip_parameters.txt -configFile /proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.gsparams.txt -R /proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta -md /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir -genderBedFile /proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.gendermask.bed -genomeMaskFile /proj/yunligrp/users/minzhi/gs/Homo_sapiens_assembly38/Homo_sapiens_assembly38.svmask.fasta -L chr16:1-500000 -ploidyMapFile /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/supporting_freeze6-AA_chr16/freeze6-AA_chr16_standard_ploidy.map 
INFO  01:19:29,594 HelpFormatter - Executing as [email protected] on Linux 3.10.0-957.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_191-b12. 
INFO  01:19:29,594 HelpFormatter - Date/Time: 2019/03/01 01:19:29 
INFO  01:19:29,594 HelpFormatter - --------------------------------------------------------- 
INFO  01:19:29,595 HelpFormatter - --------------------------------------------------------- 
INFO  01:19:29,595 CallSampleGender - Opening reference sequence ... 
INFO  01:19:29,598 CallSampleGender - Opened reference sequence. 
INFO  01:19:29,609 MetaData - Opening metadata ...  
INFO  01:19:29,610 MetaData - Adding metadata location /proj/yunligrp/users/minzhi/gs/freeze6-AA_chr16/svpre_freeze6-AA_chr16_standard_full_single_1-500000over1-500000/md_tempdir ... 
INFO  01:19:29,626 MetaData - Opened metadata. 
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at htsjdk.samtools.SAMTextHeaderCodec.advanceLine(SAMTextHeaderCodec.java:139)
    at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:94)
    at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:655)
    at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:298)
    at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:176)
    at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:376)
    at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:202)
    at org.broadinstitute.sv.dataset.SAMFileLocation.createSamFileReader(SAMFileLocation.java:97)
    at org.broadinstitute.sv.dataset.SAMLocation.createSamFileReader(SAMLocation.java:41)
    at org.broadinstitute.sv.dataset.DataSet.initInputFile(DataSet.java:138)
    at org.broadinstitute.sv.dataset.DataSet.initialize(DataSet.java:128)
    at org.broadinstitute.sv.apps.CallSampleGender.run(CallSampleGender.java:105)
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:54)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.sv.commandline.CommandLineProgram.runAndReturnResult(CommandLineProgram.java:29)
    at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:25)
    at org.broadinstitute.sv.apps.CallSampleGender.main(CallSampleGender.java:92) 
INFO  01:20:43,375 QGraph - Writing incremental jobs reports... 

The details of this error are different from other "java.lang.OutOfMemoryError: Java heap space" I found in the forum. So is it possible to solve this problem by editing the SVQscript? May I have your suggestions? Thank you very much.

Best regards,
Wusheng

Best Answers

Answers

  • zhangwushengzhangwusheng Member

    Hi @bhandsaker ,

    Thank you very much for your detailed instruction. One more question about editing the script is that when you say “fix each failing step” and "Here the failing steps are ComputeDepthProfiles and CallSampleGender", do you mean these two functions in the "SVMergeMetadataPart2.q" file (attached below)?

    ComputeDepthProfile

        class ComputeDepthProfile(profilesDir: File, sequenceName: String, intervalList: List[GenomeInterval]) extends JavaCommand with BAMInputOutput {
            this.javaMainClass = "org.broadinstitute.sv.apps.ComputeDepthProfiles"
            // this.dependsOnFile = mergeOutputFiles
            this.outputFile = new File(profilesDir, "profile_seq_%s_%d.dat.gz".format(sequenceName, profileBinSize))
    
            commandArguments +=
                repeat(" -I ", bamLocations) +
                repeat(" -configFile ", parameterFiles) +
                repeat(" -P ", parameterList) +
                required(" -R ", referenceFile) +
                repeat(" -L ", intervalList) +
                repeat(" -genomeMaskFile ", getReferenceMetadata.genomeMasks) +
                repeat(" -md ", metaDataLocationList) +
                repeat(" -genderMapFile ", getGenderMapFileList) +
                required(" -profileBinSize ", profileBinSize) +
                optional(" -maximumReferenceGapLength ", maximumReferenceGapLength)
        }
    

    CallSampleGender

        class CallSampleGender() extends JavaCommand {
            this.javaMainClass = "org.broadinstitute.sv.apps.CallSampleGender"
            // this.dependsOnFile = mergeOutputFiles
            this.outputFile = new File(metaDataLocation, "sample_gender.report.txt")
    
            commandArguments +=
                repeat(" -I ", bamLocations) +
                required(" -R ", referenceFile) +
                required(" -md ", metaDataLocation) +
                repeat(" -genomeMaskFile ", getReferenceMetadata.genomeMasks) +
                repeat(" -genomeInterval ", genomeIntervalList) +
                optional(" -ploidyMapFile ", getReferenceMetadata.ploidyMap) +
                required(" -genderBedFile ", getReferenceMetadata.genderMaskBed) +
                optional(" -minMapQ ", depthMinimumMappingQuality)
        }
    

    I cannot find the lines to specify the memory, so would it be a good idea to add the following lines to specify the memory in each function?

            this.memoryLimit = Some(5)
            this.javaMemoryLimit = Some(5)
    

    Thank you very much.

    Best regards,
    Wusheng

  • zhangwushengzhangwusheng Member

    Hi @bhandsaker ,

    Thank you very much! The problem I posted was solved by adding the flag "-memLimit 85" in the svpreprocess script.

    To whom may have the same problem, the script I used now is

    classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"
    gs_dir="$4"
    rundir="${gs_dir}/$7_$1/$3"
    
    java -Xmx4g -cp ${classpath}\
         org.broadinstitute.gatk.queue.QCommandLine\
         -S ${SV_DIR}/qscript/SVPreprocess.q\
         -S ${SV_DIR}/qscript/SVQScript.q\
         -cp ${classpath}\
         -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
         -configFile ${SV_DIR}/conf/genstrip_parameters.txt \
         -R ${gs_dir}/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta \
         -L $1:$2 \
         -I ${gs_dir}/$7_$1/supporting_$7_$1/$7_$1_$5_sample.list \
         -md ${rundir}/md_tempdir \
         -tempDir ${gs_dir}/gs_tempdir/svpre_tmp \
         -runDirectory ${rundir} \
         -ploidyMapFile ${gs_dir}/$7_$1/supporting_$7_$1/$7_$1_$8_ploidy.map \
         -jobLogDir ${rundir}/logs \
         -jobRunner Drmaa \
         -gatkJobRunner Drmaa \
         -memLimit 85 \
         -jobNative "--mem=85000 --time=20:00:00 --nodes=1 --ntasks-per-node=8" \
         -jobQueue general \
         -run \
         || exit 1
    

    And the unit of -memLimit is GB, so -memLimit 85 means that it allows 85GB memory, while in the " -jobNative "--mem=85000", the unit is MB, which means that this is 85000MB or 85GB, and I am using SLURM.

    Best regards,
    Wusheng

Sign In or Register to comment.