Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

RealignerTargetCreator issue : not provide enough memory to run this program

cecmonatcecmonat Posts: 5Member

Hi,

I want to run RealignerTargetCreator with this command line :

qsub -b Y -N RTC -q bigmem.q "/usr/local/java/latest/bin/java -Xmx36g -jar /home/sabotf/sources/GenomeAnalysisTK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /data/projects/assembling-glab/PacBio_test/XL/filtered_subreads_XL.fasta -o /data/projects/assembling-glab/mappingResults/Tog5681Clean_vs_CG14_XL/output.intervals -I /data/projects/assembling-glab/mappingResults/Tog5681Clean_vs_CG14_XL/rmdup.bam" but this return this error :

`##### ERROR ------------------------------------------------------------------------------------------

ERROR A USER ERROR has occurred (version 2.3-9-ge5ebf34):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument to adjust the maximum heap size provided to Java
ERROR ------------------------------------------------------------------------------------------`

I tried with -Xmx4g, then,-Xmx12g, then -Xmx48g, and always the same error. I don't know what to do ... any idea ? thanks

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Hmm, this should run without needing quite that much memory. Can you try with the latest version?

    Also, what is the size of your dataset? Is it very large?

    Geraldine Van der Auwera, PhD

  • cecmonatcecmonat Posts: 5Member

    yeah that's what I think, because I already used this tool for bigger data, and it worked with -Xmx4g only ! Sorry how can I know what version I use and where can I recover the latest one ? thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    The version is stated in the console at the beginning of each run and then again in the error message if there is one, as here:

    ERROR A USER ERROR has occurred (version 2.3-9-ge5ebf34)
    

    You can get the newest version (currently 2.4-9 but 2.5 is coming out in a few days) from the Downloads page of our website (link in the top menu bar).

    Geraldine Van der Auwera, PhD

  • cecmonatcecmonat Posts: 5Member

    ok I tried with the latest version, with -Xmx4g, -Xmx16g and -Xmx32g and always the same error ... any idea ? thanks

  • cecmonatcecmonat Posts: 5Member

    Ok after a lot of verification, I think I had a problem before, during my mapping. Thanks.

  • MeenuMeenu Posts: 10Member

    Hii Cecmonat

    I am also getting the same memory problem.I tried with -Xmx4g, -Xmx16g still no solution

    please let me know how to solve that.

    Thanks

  • OliverDrechselOliverDrechsel Posts: 5Member

    Hi all,

    thanks for GATK - it's been an extremely useful tool and we use it on a daily basis. But recently i repeatedly run into a problem, which is the same as cecmonat describes.

    I have to align whole genome samples. For that i use the GEM aligner (http://www.nature.com/nmeth/journal/v9/n12/abs/nmeth.2221.html).

    As suggested in your BestPractices i do the DuplicateMarking on single lanes and afterwards fuse&sort the files using novosort.

    The TargetCreator works fine: Program Args: -nt 8 -T RealignerTargetCreator -R hg19.fasta -I ###.sort.bam -o ###.intervals -known dbindel137_121217.vcf --minReadsAtLocus 6 --maxIntervalSize 200 --downsampling_type NONE

    As soon as i do the Indel realignment i get the error message that i provided too little memory. I tried -Xmx20g, -Xmx35g up to -Xmx95g and still get the same error. As cecmonat said, there might be errors during the mapping. Basically i used the same settings for >500 exome sequences already successfully, so i don't have a clue what could be wrong.

    java -Xmx35g -jar $GATK -T IndelRealigner -R $REF -I $TMPDIR/$NAME.sort.bam -targetIntervals $TMPDIR/$NAME.intervals -o $TMPDIR/$NAME.realigned.bam -known $DBINDEL --maxReadsForRealignment 10000 --consensusDeterminationModel USE_SW --downsampling_type NONE

    Program Args: -T IndelRealigner -R hg19.fasta -I ###.sort.bam -targetIntervals ###.intervals -o ###.realigned.bam -known dbindel137_121217.vcf --maxReadsForRealignment 10000 --consensusDeterminationModel USE_SW --downsampling_type NONE

    ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining ReadShardBalancer$1 - Loading BAM index data for next contig ReadShardBalancer$1 - Done loading BAM index data for next contig ProgressMeter - Starting 0.00e+00 30.0 s 49.6 w 100.0% 30.0 s 0.0 s ProgressMeter - Starting 0.00e+00 74.0 s 123.6 w 100.0% 74.0 s 0.0 s

    #### ERROR MESSAGE: There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument to adjust the maximum heap size provided to Java

    I'd be glad if you could point me to my error.

    thanks, Oliver

    PS: GATK versions tried: 2.3-9-ge5ebf34, 2.4-9-g532efad, 2.6-4-g3e5ff60

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    I think the problem in your case is that you're running with USE_SW, which is very memory-intensive, especially at areas of deep coverage. If you really want to go with the SW realignment, you might want to try reducing the max reads for realignment argument.

    Geraldine Van der Auwera, PhD

  • OliverDrechselOliverDrechsel Posts: 5Member

    Hi Geraldine,

    thanks for the immediate answer and sorry for my delayed response.

    I switched USE_SW to KNOWNS_ONLY and subsequently reduced the maxReads to 500. Still i get the error message.

    If i run the IndelRealigner on a single lane, it works fine. Are there any adverse effects i have to expect if i would proceed with single lane Realignment & Recalibration and fuse the lanes just before Variant calling? I know its against the "best practice", but since i don't get any step forward i would be OK already with "good practice"

    thanks a lot, Oliver

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Hi Oliver,

    That's actually completely fine -- these pre-processing steps should typically be done per lane. Then you merge the lane data per sample, and optionally repeat the dedup & realign steps (but that's not required) before finally calling variants. I'm working on rewriting the best practices doc to make that clearer.

    Geraldine Van der Auwera, PhD

  • OliverDrechselOliverDrechsel Posts: 5Member

    Hi Geraldine,

    thanks a lot. So i proceed lane-wise and if the whole sample data still fail i keep going towards SNP and Indel calling. Same read group tags would then be fused to one observation.

    Sorry for the misunderstanding.

    cheers, Oliver

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Yep, that's the way to go. Good luck!

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.