We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GenotypeGVCF and memory failure

mgdesaixmgdesaix Fort CollinsMember
Hello,

I am working through GATK4 (4.1.4.0) for germline short variant discovery. I have been following the best practices, but this is my first time using the pipeline. So far, I have run HaplotypeCaller to produce gvcf files and I broke up the haplotypeCaller jobs by scaffold (-L).

I then combined the broken up individual gvcf files back into one file with GatherVcfs (in hindsight, maybe this was unnecessary?)

I then ran GenomicsDBImport as follows:

gatk --java-options "-Xmx12g" \
GenomicsDBImport \
--genomicsdb-workspace-path "$out"/database_"$interval" \
--batch-size 50 \
-L "$list" \
--sample-name-map cohort.sample_map \
--tmp-dir ./AMRE/tmp/ \
--reader-threads 4

And it ran successfully, producing 25 multi-sample databases, with most jobs taking ~12-16 hrs, one taking 33 hours.

Now I am attempting to run GenotypeGVCF for each of the databases but I get an error about the tmp directory:

```
A USER ERROR has occurred: Failure working with the tmp directory /scratch/summit/[email protected]/AMRE/tmp. Try changing the tmp dir with with --tmp-dir on the command line. Exact error was should exist and have read/write access
```

I am running GenotypeGVCFs as described below, the --tmp-dir specification was the same as for GenomicsDBImport and I have deleted everything in the tmp directory since running GenomicsDBImport

```
gatk --java-options "-Xmx4g" GenotypeGVCFs \
-R "$reference" \
-V gendb://"$database" \
-O "$out"/AMRE."$database".vcf.gz \
--tmp-dir ./AMRE/tmp/
```

When I run it without the --tmp-dir I get a memory error message:
```
12:32:57.180 INFO GenotypeGVCFs - Shutting down engine
[December 13, 2019 12:32:57 PM MST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.41 minutes.
Runtime.totalMemory()=3131572224
java.lang.RuntimeException: java.io.IOException: No space left on device
```

I did just realize my memory specification for GenotypeGVCFs was lower than for the GenomicsDBImport, but that doesn't fix the issue with tmp directory.

Any insight into what these problems may be stemming from would be greatly appreciated! I held off from asking the forum for a while but I haven't been able to make any progress with this for a while.

Thank you,

Matt

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited December 2019

    Hi @mgdesaix

    This looks like a memory issue and not a GATK issue. From the error java.lang.RuntimeException: java.io.IOException: No space left on device it looks like there isn't enough free space in the tmp drive. There may not be enough space in their file system which you can check with this command df -h.

  • mgdesaixmgdesaix Fort CollinsMember
    Hello,

    Thank you for the quick response! That makes sense, I thought that might be a tmp drive issue.

    But that seems to go back to my original issue about not being able to set the tmp directory for genotypeGVCF as I did for GenomicsDBImport. Should setting the tmp drive resolve that? If so, any suggestions on why I would receive the error message about tmp-dir above when that worked for GenomicsDBImport?

    Thank you for your help,

    Matt
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    What is the output of df -h? It will tell you how much space, you have in /tmp

Sign In or Register to comment.