Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenomeSTRIP bam file error

zhaouuzhaouu hzauMember
edited July 2015 in GenomeSTRiP

Hi,
I have 100 bam files, and when I run SVPreprocess, it get an error. When I remove the last bam file (CX230.bam) in logs, the process seems running OK. But I can't find any difference between this bam file and others. So, how can I do to run all individuals simultaneously?

My command:
java -cp ${classpath} ${mx} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/SVPreprocess.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
--disableJobReport \
-cp ${classpath} \
-configFile /home/hzhao/software/svtoolkit/conf/genstrip_parameters.txt \
-tempDir ${SV_TMPDIR} \
-R /home/hzhao/rice_genome_sequence/MSU/version_7.0/sv_genome/rice_all_genomes_v7.fasta \
-genomeMaskFile /home/hzhao/my_data/SV_data/rice7.svmask.fasta \
-genderMapFile /home/hzhao/script/candidate_gene_priori/rice3k_head_geneder.map \
-ploidyMapFile /home/hzhao/my_data/SV_data/rice7_reference.ploidymap.txt \
-runDirectory ${runDir} \
-md ${runDir}/metadata \
-disableGATKTraversal \
-useMultiStep \
-L chr07:9132402-9172402 \
-reduceInsertSizeDistributions true \
-computeReadCounts true \
-computeGCProfiles true \
-jobLogDir ${runDir}/logs \
-I ${bam} \
-run \
|| exit 1

My log file is in attachment.

Thanks very much!

Answers

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    It could be a couple of different things.

    One possibility is that the .hist.bin file is somehow corrupted. You could try removing /home/hzhao/my_data/GATK_res/sv_test/metadata/isd/CX230.hist.bin
    and the corresponding .done file
    /home/hzhao/my_data/GATK_res/sv_test/metadata/isd/.CX230.hist.bin.done
    and rerunning to see if that fixes the problem.

    The other possibility is that there is a bug, perhaps triggered by the fact that you are trying to preprocess only a very small genomic interval. If you are able to send us the file
    /home/hzhao/my_data/GATK_res/sv_test/metadata/isd/CX230.hist.bin
    which should be fairly small, we can take a look.
    Emailing it to me separately is fine.

  • zhaouuzhaouu hzauMember

    Hi Bob.
    I have tried to remove CX230.hist.bin and the .CX230.hist.bin.done file, but also the same problem.

    CX230.hist.bin file is uploaded.

    Thanks.

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    Thanks very much. I'll put it in our queue.

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    I was able to reproduce this and verify that it is a bug. We will fix it in the next release, but that may be a little while.
    In the meantime, I think you should be able to work around this problem by using a slightly larger region (if you aim to have at least 10,000 reads in each read group I think you will be OK). Can you try with a slightly larger region?

  • zhaouuzhaouu hzauMember

    Hi Bob,
    I have set the region from chr07:9132402-9172402 to chr07:8832402-9472402, and using much more individuals to test. But I find the same error massage. please help me, thanks very much.

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    The fix for this is not in r1602.

    The fix is a little more complicated than I originally thought and will require quite a bit of testing.
    While this is a bug in the code, you really shouldn't be running preprocessing on a small interval of the genome - your results will likely be quite unsatisfactory.
    Try at least running preprocessing on all of chr07.

  • zhaouuzhaouu hzauMember

    Hi,
    thank you very much for your time and patience. I have about 1500+ varieties and all these varieties read depth > 15. When I run preprocessing on 'chr07', I got this:

    ERROR 21:22:14,732 FunctionEdge - Error: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/h
    ERROR 21:22:14,735 FunctionEdge - Contents of /home/hzhao/my_data/SV_data/sv_test/logs/SVPreprocess-1422.out:
    INFO 21:20:55,377 HelpFormatter - ------------------------------------------------------------------
    INFO 21:20:55,379 HelpFormatter - Program Name: org.broadinstitute.sv.apps.MergeInsertSizeHistograms
    INFO 21:20:55,387 HelpFormatter - Program Args: -I /home/hzhao/my_data/SV_data/sv_test/metadata/isd/B001.hist.bin -I /home/hzhao/my_data/SV_data/sv_test/metadata/isd/B002.hist.bin -I
    INFO 21:20:55,398 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_45-mockbuild_2013_11_22_18_30-b00.
    INFO 21:20:55,399 HelpFormatter - Date/Time: 2015/07/29 21:20:55
    INFO 21:20:55,399 HelpFormatter - ------------------------------------------------------------------
    INFO 21:20:55,399 HelpFormatter - ------------------------------------------------------------------
    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.TreeMap.put(TreeMap.java:569)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogram.readObject(InsertSizeHistogram.java:421)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogramFile$FileIterator.readHistogram(InsertSizeHistogramFile.java:140)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogramFile$FileIterator.advance(InsertSizeHistogramFile.java:129)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogramFile$FileIterator.next(InsertSizeHistogramFile.java:107)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogramFile$FileIterator.next(InsertSizeHistogramFile.java:86)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogramMerger.mergeNonDisjoint(InsertSizeHistogramMerger.java:72)
    at org.broadinstitute.sv.metadata.isize.InsertSizeHistogramMerger.mergeHistograms(InsertSizeHistogramMerger.java:51)
    at org.broadinstitute.sv.apps.MergeInsertSizeHistograms.run(MergeInsertSizeHistograms.java:45)
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:54)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.sv.commandline.CommandLineProgram.runAndReturnResult(CommandLineProgram.java:29)
    at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:25)
    at org.broadinstitute.sv.apps.MergeInsertSizeHistograms.main(MergeInsertSizeHistograms.java:39)
    INFO 21:22:14,736 QGraph - Writing incremental jobs reports...
    INFO 21:22:15,022 QGraph - 7099 Pend, 0 Run, 1 Fail, 1421 Done
    INFO 21:22:15,082 QCommandLine - Writing final jobs report...
    INFO 21:22:15,083 QCommandLine - Done with errors
    INFO 21:22:15,398 QGraph - -------
    INFO 21:22:15,408 QGraph - Failed: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/home/
    INFO 21:22:15,409 QGraph - Log: /home/hzhao/my_data/SV_data/sv_test/logs/SVPreprocess-1422.out
    INFO 21:22:15,410 QCommandLine - Script failed: 7099 Pend, 0 Run, 1 Fail, 1421 Done

    I have set mx="-Xmx1000g".

    Thanks

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    This is good progress, as you are getting much farther.

    Are you (or can you?) use -bamFilesAreDisjoint true ?
    This option is recommended if none of your samples are split across multiple bam files.
    It will avoid this histogram merging step, which can require a lot of memory if you have many libraries.

  • haojamhaojam Member

    Hi Guys,

    Can any one suggest me how to rectify this error which I got while executing GenomeSTRiP command line for multiple BAM files at one go. I hereby attached the error message below and I look forward for any suggestion.

    With regards,
    Rocky

Sign In or Register to comment.