If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GStrip discovery: java heap space?

Will_GilksWill_Gilks University of Sussex, UKMember ✭✭


I'm having a recurrent problem at the Discovering deletions stage. The error message states out of memory but I'm wondering if it isn't caused by an error in my code. I'm not sure how the " export SV_DIR='cd .. && pwd' " command fits in. I've omitted this from previous runs but maybe it's required now. The log indicates that /Queue.jar is copied twice which seems strange. I'm going to try with less samples anyway, and fiddle around a bit with the settings. Any advice much appreciated.



Code below then log:
Work flow preparation:
. /etc/profile.d/ module load sge module load genomestrip/2.0 HASH HERE export SV_DIR='cd .. && pwd' SV_TMPDIR=./tmpdir find ../../read_mapping/gwas_samples/*/ -type f -name "????.bam" -exec ls {} \; > lhm_rg_bams.list
runDir=lhm_rg_final bam=lhm_rg_bams.list disco_genos=lhm_rg_final/lhm_rg_size1.discovery.vcf final_genos=lhm_rg_final/lhm_rg_size1.genotypes.vcf ref_seq=../../reference_sequences/dmel/v6.0/dm6.fa mkdir -p ${runDir}/logs || exit 1 mkdir -p ${runDir}/metadata || exit 1 which java > /dev/null || exit 1 which Rscript > /dev/null || exit 1 which samtools > /dev/null || exit 1 export PATH=${SV_DIR}/bwa:${PATH} export LD_LIBRARY_PATH=${SV_DIR}/bwa:${LD_LIBRARY_PATH} mx="-Xmx4g" classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar" java -cp ${classpath} ${mx} -jar ${SV_DIR}/lib/SVToolkit.jar'

java -cp ${classpath} ${mx} org.broadinstitute.gatk.queue.QCommandLine \ -S ${SV_DIR}/qscript/SVPreprocess.q \ -S ${SV_DIR}/qscript/SVQScript.q \ -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \ -jobRunner Drmaa \ -gatkJobRunner Drmaa \ -jobNative '-V -pe openmp 40 -q bioinf.q' \ -cp ${classpath} \ -configFile conf/genstrip_test3_parameters.txt \ -tempDir ${SV_TMPDIR} \ -R ${ref_seq} \ -genomeMaskFile ref_metadata/dm6.svmask.fasta \ -readDepthMaskFile ref_metadata/dm6.rdmask.bed \ -runDirectory ${runDir} \ -md ${runDir}/metadata \ -disableGATKTraversal \ -ploidyMapFile conf/ \ -genderMapFile data/ \ -useMultiStep \ -computeGCProfiles true \ -reduceInsertSizeDistributions true \ -computeReadCounts true \ -jobLogDir ${runDir}/logs \ -I ${bam} \ -run \ || exit 1

Discovering deletions:
java -cp ${classpath} ${mx} org.broadinstitute.gatk.queue.QCommandLine \ -S ${SV_DIR}/qscript/SVDiscovery.q \ -S ${SV_DIR}/qscript/SVQScript.q \ -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \ -jobRunner Drmaa \ -gatkJobRunner Drmaa \ -jobNative '-V -pe openmp 30 -q bioinf.q' \ -cp ${classpath} \ -configFile conf/genstrip_test3_parameters.txt \ -tempDir ${SV_TMPDIR} \ -R ${ref_seq} \ -genomeMaskFile ref_metadata/dm6.svmask.fasta \ -readDepthMaskFile ref_metadata/dm6.rdmask.bed \ -runDirectory ${runDir} \ -md ${runDir}/metadata \ -disableGATKTraversal \ -jobLogDir ${runDir}/logs \ -ploidyMapFile conf/ \ -genderMapFile data/ \ -maximumSize 100000 \ -minimumSize 100 \ -I ${bam} \ -O ${disco_genos} \ -debug true \ -run \ || exit 1

INFO 10:19:13,097 FunctionEdge - Output written to /lustre/scratch/bioenv/ab12/LHm_analysis/genotyping/CNVs/lhm_rg_final/logs/SVDiscovery-1871.out INFO 10:19:13,256 DrmaaJobRunner - Submitted job id: 7226858 INFO 10:19:13,354 QGraph - 1 Pend, 1 Run, 0 Fail, 1870 Done ERROR 10:22:12,120 FunctionEdge - Error: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '' '-cp' '/cm/shared/apps/svtoolkit/2.0.1602/lib/SVToolkit.jar:/cm/shared/apps/svtoolkit/2.0.1602/lib/gatk/GenomeAnalysisTK.jar:/cm/shared/apps/svtoolkit/2.0.1602/lib/gatk/Queue.jar' '-cp' '/cm/shared/apps/svtoolkit/2.0.1602/lib/SVToolkit.jar:/cm/shared/apps/svtoolkit/2.0.1602/lib/gatk/GenomeAnalysisTK.jar:/cm/shared/apps/svtoolkit/2.0.1602/lib/gatk/Queue.jar' '' '-O' '/lustre/scratch/bioenv/ab12/LHm_analysis/genotyping/CNVs/lhm_rg_final/lhm_rg_size1.unfiltered.vcf' '-R' '../../reference_sequences/dmel/v6.0/dm6.fa' '-runDirectory' 'lhm_rg_final' ERROR 10:22:12,127 FunctionEdge - Contents of /lustre/scratch/bioenv/ab12/LHm_analysis/genotyping/CNVs/lhm_rg_final/logs/SVDiscovery-1871.out: INFO 10:19:18,451 HelpFormatter - ------------------------------------------------------------- INFO 10:19:18,454 HelpFormatter - Program Name: INFO 10:19:18,459 HelpFormatter - Program Args: -O /lustre/scratch/bioenv/ab12/LHm_analysis/genotyping/CNVs/lhm_rg_final/lhm_rg_size1.unfiltered.vcf -R ../../reference_sequences/dmel/v6.0/dm6.fa -runDirectory lhm_rg_final INFO 10:19:18,464 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.40.2.el6.nsc1.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14. INFO 10:19:18,465 HelpFormatter - Date/Time: 2016/01/10 10:19:18 INFO 10:19:18,465 HelpFormatter - ------------------------------------------------------------- INFO 10:19:18,466 HelpFormatter - ------------------------------------------------------------- Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at htsjdk.tribble.readers.PositionalBufferedStream.<init>( at htsjdk.tribble.readers.PositionalBufferedStream.<init>( at htsjdk.tribble.TabixFeatureReader.iterator( at at at<init>( at<init>( at at at at at at at at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start( at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start( at at at


Sign In or Register to comment.