Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Depth of Coverage java memory issue

I am running DoC walker on a multi-sample (384) bam file (~ 100 Gb). Even after assigning 200Gb to java memory (Xmx204800m) I get the following error:

ERROR MESSAGE: There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument to adjust the maximum heap size provided to Java # ERROR

The relevant log file is also attached.

Its not that the programme does not start at all but it fails somewhere in between while running the job which is even more frustrating as it wastes a lot of time rather than immediately prompting about this lack of enough memory.

Also the documentation for DoC mentions that it outputs 'summary: total, mean, median, quartiles, and threshold proportions, aggregated over all bases' but I only see mean coverage column in the partial output file. I want to know the median coverage but can't find any option to add this if it is not generated by default.

Regards,
Sanjeev

Comments

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sanjeevksh‌

    Hi Sanjeev,

    This looks like there may be a region of extremely deep coverage somewhere in your bam.

    One way to confirm this is to run with -dcov to something reasonable like 500 or 1000 and see if that works without error. You should also try to determine if it always happens in the same place in your file.

    -Sheila

  • sanjeevkshsanjeevksh Member

    Hi Sheila,

    Thanks for your helpful feedback. I ran UnifiedGenotyper on the same bam file where I also used '-dcov 250' which worked absolutely fine. So it looks this issue relates to some region of extreme coverage in my data. I shall run DepthOfCoverage with -dcov and hopefully it will work.

    Regards,
    Sanjeev

  • sanjeevkshsanjeevksh Member

    Hi Sheila,

    I have run DepthOfCoverage with -dcov 96000 but still it stopped before completing the job. I assume -dcov in this walker would apply to the overall coverage and not per sample coverage. That is why I have specified a higher but not extremely large value for -dcov. I ran with -dcov 80000 still same result and now it is running with -dcov 50000. Total samples in the merged file is 384 so this number I guess is reasonable. Also the programme does not stop at the same position all the time.

    I also noticed that the java version at our main and shared computing node is 1.8.0, however, I have java version 1.7.0_25 installed locally in my account which I am using in the gatk command line submission. As gatk only recommends java version 1.7, do you think that could be an issue? Having said that I have not experienced any such issues with DiagnoseTargets and UnifiedGenotyper walkers in the same setup and using the same bam files. In their cases jobs have run for all the 12 chromosomes without returning any error. Do you think this java 1.8 version issue could be walker specific?

    Regards,
    Sanjeev

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sanjeevksh‌
    Hi Sanjeev,

    I am happy it is working for -dcov 250.

    If -dcov 250 is enough coverage for you, you can specify the downsampling type to be per sample. By using this you can tell what the coverage needs to be per sample. Please read about it here: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_CommandLineGATK.html#--downsampling_type

    -Sheila

  • sanjeevkshsanjeevksh Member

    Hi Sheila,

    I said that -dcov 250 worked for DiagnoseTargets and UnifiedGenotyper. I did not try DepthOfCoverage with -dcov 250 as this will be too low for this walker which looks for read depth across all samples for the locus in question. Further issues on DepthOfCoverage were mentioned in my previous message and any input will be highly appreciated.

    Regards,
    Sanjeev

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sanjeevksh‌

    Hi Sanjeev,

    -dcov 50000 is quite large. I am pretty sure -dcov works by sample, so you can use a much lower number like 1000 or less.

    -Sheila

  • sanjeevkshsanjeevksh Member

    Hi Sheila,

    Thanks again! Yes if it works on per sample basis then 50000 is too high. However, I am finding it strange that DepthOfCoverage is used to find coverage range in your mapping files and then to restrict this walker to a particular depth does not seem right though.

    Regards,
    Sanjeev

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited June 2014

    @sanjeevksh‌

    Hi Sanjeev,

    Restricting coverage makes sense even for DepthOfCoverage. You are basically setting a threshold above which any coverage is too much coverage. We don't think it is necessary to know there is a site with exactly 52,476 reads of coverage; you only need to know there is a region with coverage far in excess of what is reasonable.

    I hope this makes sense.

    -Sheila

  • sanjeevkshsanjeevksh Member

    Thanks Sheila!
    Sanjeev

Sign In or Register to comment.