It looks like you're new here. If you want to get involved, click one of these buttons!
I'm trying to call variants on metagenomic data using the UnifiedGenotyper. I know that the diploid genotype calls & likelihoods will not be valid since my data is not diploid, but I want to use the vcf output so sum up base frequencies at detected variant loci.
I mapped 100+ samples (each being ~2 Illumina GA2 lanes of data that after host filtering usually contain about 20-40 million reads per sample) against a database of 671 bacterial reference sequences (and each reference can be in multiple parts, so I probably have 10s of thousands of sequence records in my ref db, spanning the 671 reference genomes...around 2.2Gb in total size). I am then feeding the resulting 100+ bam files to the UnifiedGenotyper.
After some initial mistakes on my part (yes I have entered the future and am using GATK 2.2-5 now :) ) I've now started a run in proper fashion, but after a couple hours its dying with the message that the java application has run out of memory:
I had set -Xmx60g for that failed run, so now I'm wondering if its possible to estimate how much memory would be needed for this job I'm trying to run. Do you think a job of this size is even possible with the UG? Is it the number of references that is killing me here? Or the number of samples?
Answers
Hi there,
Welcome to the future, and sorry for the delay in answering! It was due to a time differential adjustment from your recent temporal acceleration ;)
If your references are draft genomes with lots of contigs, then yes that's going to be a big problem. We haven't had that problem ourselves but we recently had a user post a similar problem on this forum. As far as we know they solved the problem by obtaining a more assembled version of their organism reference. If you can't do something like that, you might want to try batching your reference genomes rather than using them all at once.
I'll transfer this to "Ask the Community", hopefully someone out there will have some better idea of how to do this.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •