GenomeSTRIP bam file error

zhaouuzhaouu hzauMember
edited July 2015 in GenomeSTRiP

Hi,
I have 100 bam files, and when I run SVPreprocess, it get an error. When I remove the last bam file (CX230.bam) in logs, the process seems running OK. But I can't find any difference between this bam file and others. So, how can I do to run all individuals simultaneously?

My command:
java -cp ${classpath} ${mx} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/SVPreprocess.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
--disableJobReport \
-cp ${classpath} \
-configFile /home/hzhao/software/svtoolkit/conf/genstrip_parameters.txt \
-tempDir ${SV_TMPDIR} \
-R /home/hzhao/rice_genome_sequence/MSU/version_7.0/sv_genome/rice_all_genomes_v7.fasta \
-genomeMaskFile /home/hzhao/my_data/SV_data/rice7.svmask.fasta \
-genderMapFile /home/hzhao/script/candidate_gene_priori/rice3k_head_geneder.map \
-ploidyMapFile /home/hzhao/my_data/SV_data/rice7_reference.ploidymap.txt \
-runDirectory ${runDir} \
-md ${runDir}/metadata \
-disableGATKTraversal \
-useMultiStep \
-L chr07:9132402-9172402 \
-reduceInsertSizeDistributions true \
-computeReadCounts true \
-computeGCProfiles true \
-jobLogDir ${runDir}/logs \
-I ${bam} \
-run \
|| exit 1

My log file is in attachment.

Thanks very much!

Answers

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    It could be a couple of different things.

    One possibility is that the .hist.bin file is somehow corrupted. You could try removing /home/hzhao/my_data/GATK_res/sv_test/metadata/isd/CX230.hist.bin
    and the corresponding .done file
    /home/hzhao/my_data/GATK_res/sv_test/metadata/isd/.CX230.hist.bin.done
    and rerunning to see if that fixes the problem.

    The other possibility is that there is a bug, perhaps triggered by the fact that you are trying to preprocess only a very small genomic interval. If you are able to send us the file
    /home/hzhao/my_data/GATK_res/sv_test/metadata/isd/CX230.hist.bin
    which should be fairly small, we can take a look.
    Emailing it to me separately is fine.

  • zhaouuzhaouu hzauMember

    Hi Bob.
    I have tried to remove CX230.hist.bin and the .CX230.hist.bin.done file, but also the same problem.

    CX230.hist.bin file is uploaded.

    Thanks.

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    Thanks very much. I'll put it in our queue.

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    I was able to reproduce this and verify that it is a bug. We will fix it in the next release, but that may be a little while.
    In the meantime, I think you should be able to work around this problem by using a slightly larger region (if you aim to have at least 10,000 reads in each read group I think you will be OK). Can you try with a slightly larger region?

  • zhaouuzhaouu hzauMember

    Hi Bob,
    I have set the region from chr07:9132402-9172402 to chr07:8832402-9472402, and using much more individuals to test. But I find the same error massage. please help me, thanks very much.

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    The fix for this is not in r1602.

    The fix is a little more complicated than I originally thought and will require quite a bit of testing.
    While this is a bug in the code, you really shouldn't be running preprocessing on a small interval of the genome - your results will likely be quite unsatisfactory.
    Try at least running preprocessing on all of chr07.

  • zhaouuzhaouu hzauMember

    Hi,
    thank you very much for your time and patience. I have about 1500+ varieties and all these varieties read depth > 15. When I run preprocessing on 'chr07', I got this:

    ERROR 21:22:14,732 FunctionEdge - Error: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/h
    ERROR 21:22:14,735 FunctionEdge - Contents of /home/hzhao/my_data/SV_data/sv_test/logs/SVPreprocess-1422.out:
    INFO 21:20:55,377 HelpFormatter - ------------------------------------------------------------------
    INFO 21:20:55,379 HelpFormatter - Program Name: org.broadinstitute.sv.apps.MergeInsertSizeHistograms
    INFO 21:20:55,387 HelpFormatter - Program Args: -I /home/hzhao/my_data/SV_data/sv_test/metadata/isd/B001.hist.bin -I /home/hzhao/my_data/SV_data/sv_test/metadata/isd/B002.hist.bin -I
    INFO 21:20:55,398 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_45-mockbuild_2013_11_22_18_30-b00.
    INFO 21:20:55,399 HelpFormatter - Date/Time: 2015/07/29 21:20:55
    INFO 21:20:55,399 HelpFormatter - ------------------------------------------------------------------
    INFO 21:20:55,399 HelpFormatter - ------------------------------------------------------------------
    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.TreeMap.put(TreeMap.java:569)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogram.readObject(InsertSizeHistogram.java:421)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogramFile$FileIterator.readHistogram(InsertSizeHistogramFile.java:140)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogramFile$FileIterator.advance(InsertSizeHistogramFile.java:129)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogramFile$FileIterator.next(InsertSizeHistogramFile.java:107)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogramFile$FileIterator.next(InsertSizeHistogramFile.java:86)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogramMerger.mergeNonDisjoint(InsertSizeHistogramMerger.java:72)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogramMerger.mergeHistograms(InsertSizeHistogramMerger.java:51)
    at org.broadinstitute.sv.apps.MergeInsertSizeHistograms.run(MergeInsertSizeHistograms.java:45)
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:54)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.sv.commandline.CommandLineProgram.runAndReturnResult(CommandLineProgram.java:29)
    at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:25)
    at org.broadinstitute.sv.apps.MergeInsertSizeHistograms.main(MergeInsertSizeHistograms.java:39)
    INFO 21:22:14,736 QGraph - Writing incremental jobs reports...
    INFO 21:22:15,022 QGraph - 7099 Pend, 0 Run, 1 Fail, 1421 Done
    INFO 21:22:15,082 QCommandLine - Writing final jobs report...
    INFO 21:22:15,083 QCommandLine - Done with errors
    INFO 21:22:15,398 QGraph - -------
    INFO 21:22:15,408 QGraph - Failed: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/home/
    INFO 21:22:15,409 QGraph - Log: /home/hzhao/my_data/SV_data/sv_test/logs/SVPreprocess-1422.out
    INFO 21:22:15,410 QCommandLine - Script failed: 7099 Pend, 0 Run, 1 Fail, 1421 Done

    I have set mx="-Xmx1000g".

    Thanks

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    This is good progress, as you are getting much farther.

    Are you (or can you?) use -bamFilesAreDisjoint true ?
    This option is recommended if none of your samples are split across multiple bam files.
    It will avoid this histogram merging step, which can require a lot of memory if you have many libraries.

  • haojamhaojam Member

    Hi Guys,

    Can any one suggest me how to rectify this error which I got while executing GenomeSTRiP command line for multiple BAM files at one go. I hereby attached the error message below and I look forward for any suggestion.

    With regards,
    Rocky

Sign In or Register to comment.