The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Last chance to register for the GATK workshop next week in Basel, Switzerland! http://www.sib.swiss/training/upcoming-training-events/training/gatk-workshop-lecture

# GATK_UnifiedGenotyper_Unable to merge temporary Tribble output file

Posts: 18Member
edited January 2013

Hi all,

I've been analyzing some illumina whole exome sequencing data these days. Yesterday I used GATK(version 2.0) UnifiedGenotyper to call snps and indels with the following commands:

run_gatk.sh
-T UnifiedGenotyper
-R GRCh37/human_g1k_v37.fasta
-I GATK_recal_result.bam
-glm BOTH
--dbsnp reference/dbsnp_134.b37.vcf
-stand_call_conf 50
-stand_emit_conf 10
-o raw2.vcf
-dcov 200

After running theses commands, I got a vcf file which is very small(when I checked the vcf file, I found these called snps and indels are all from Chromosome1)
The error message is as follows:

##### ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Unable to merge temporary Tribble output file.
Caused by: org.broad.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: /rd/tmp/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub8005277156701491219.tmp (Too many open files), for input source: /rd/tmp/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub8005277156701491219.tmp at org.broad.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:104) at org.broad.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:58) at org.broad.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:69) at org.broadinstitute.sting.gatk.io.storage.VariantContextWriterStorage.mergeInto(VariantContextWriterStorage.java:182) at org.broadinstitute.sting.gatk.io.storage.VariantContextWriterStorage.mergeInto(VariantContextWriterStorage.java:52) at org.broadinstitute.sting.gatk.executive.OutputMergeTask.merge(OutputMergeTask.java:48) at org.broadinstitute.sting.gatk.executive.HierarchicalMicroScheduler.mergeExistingOutput(HierarchicalMicroScheduler.java:263) ... 6 more Caused by: java.io.FileNotFoundException: /rd/tmp/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub8005277156701491219.tmp (Too many open files) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:120) at org.broad.tribble.util.ParsingUtils.openInputStream(ParsingUtils.java:56) at org.broad.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:96) ... 12 more ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A GATK RUNTIME ERROR has occurred (version 2.0-39-gd091f72): ##### ERROR ##### ERROR Please visit the wiki to see if this is a known problem ##### ERROR If not, please post the error, with stack trace, to the GATK forum ##### ERROR Visit our website and forum for extensive documentation and answers to ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk ##### ERROR ##### ERROR MESSAGE: Unable to merge temporary Tribble output file. ##### ERROR ------------------------------------------------------------------------------------------ Would you please help me solve it ? Thanks a lot Post edited by Geraldine_VdAuwera on Tagged: ## Best Answer • Boston, MAPosts: 28Member Answer ✓ Your system runs out of available file handles so can't open new files: /rd/tmp/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub8005277156701491219.tmp (Too many open files)  You can check the current number of available files with: $ ulimit -a | grep open
open files                      (-n) 1024


GATK can be aggressive in opening files, so you'll probably have to increase your current limit.

http://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux

Hope this helps.

Brad Chapman, Bioinformatics Core at Harvard Chan School

• Boston, MAPosts: 28Member

Your system runs out of available file handles so can't open new files:

/rd/tmp/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub8005277156701491219.tmp (Too many open files)


You can check the current number of available files with:

$ulimit -a | grep open open files (-n) 1024  GATK can be aggressive in opening files, so you'll probably have to increase your current limit. http://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux Hope this helps. Brad Chapman, Bioinformatics Core at Harvard Chan School • Posts: 74Member Hi there, Is there any way around this issue for large datasets? It runs fine with ~ 800 samples (multi-sample UG, version 2.5-2), but when I increase it to about ~1,100 (my whole set), it can't handle it anymore. I do not have permissions for changing the ulimits on the cluster ... • Posts: 10,469Administrator, Dev admin Unfortunately there's no workaround from the GATK side of things. Maybe try contacting your systems administrator to get them to customize your environment... Geraldine Van der Auwera, PhD • Posts: 27Member edited September 2013 Hello Geraldine, Perhaps I can contribute a solution. My user on my cluster has$ ulimit -a | grep open

open files (-n) 50000

I, like probably 99% of people posting here, cannot easily change the number of handles allowed on my cluster.

You would think that this would be enough to run UnifiedGenotyper on a few genomes. But while UnifiedGenotyper works fine for 1 bam at a time, as soon as I increase it to even 2, I get the same message:

##### ERROR ----

However, when I then reduce my number of threads to 1 (down from 11), I am able to run it. I've not tried to optimize the grey area between 1 and 11 threads yet.

Worth a try, douym?

Ah yes, @redzengenoist, you make a good point that anyone encountering this issue should consider lowering any multithreading counts they're using. That should definitely help mitigate the problem.

Geraldine Van der Auwera, PhD

• Posts: 27Member
edited September 2013

Thanks @Geraldine_VdAuwera,

Maybe I can ask you something in return, which isn't really worth a full thread: The correct format is for a bam.list files is just like this, right?

/xxx/file1.bam
/xxx/file2.bam
/xxx/file3.bam

Nothing fancy? The -I option in the java just points to the file, right?

java blablabla -I /xxx/xxx/bam.list

That is entirely correct.

Geraldine Van der Auwera, PhD

I updated to 2-7.2 and I encountered the "too many open files" as well. I raised the ulimit to 65535 files, but it does not work, same error. The nt and nct flags were perfectly fine tuned for the previous version, what is going on with this new one? Should I alter my multithread flags sweet spot?

Hmm, I can't think of any recent change we made that would explain this. Is it with UG that you're experiencing this issue? Have you run the same data through both versions to make sure it's the GATK version, not the batch of data, that is responsible?

Geraldine Van der Auwera, PhD

• Posts: 74Member

Hi there, I am encountering the same issue with version 2-7.2. In my previous post, I was using version 2.5-2. I got my cluster admin to increase the open files limit to 2048, and I was able to run UG on 1112 samples (no nt or nct flags). Now I am trying again on the same set of samples, but with version 2-7.2. Although ulimit is the same (2048), I am getting the 'too many open files' error, which makes me think something has changed between the two versions.

Were there any changes in the default parameters between the two? This is the exact command I have used for both:

java -Xmx8g -jar $path2Gatk/GenomeAnalysisTK.jar -T UnifiedGenotyper -l INFO -R$path2SeqIndex.fasta -I list_of_bams -o out.vcf --dbsnp:vcf $path2Dbsnp -stand_call_conf 10 -stand_emit_conf 10 -rf BadCigar -glm BOTH --intervals:bed$intfile --pedigree \$ped --pedigreeValidationType SILENT -dcov 250

• Posts: 74Member

Also, do you have an approximate number for estimating how many files gatk in trying to open, given the number of input samples? It would help to know what limit to request from cluster admin.

Hi @vsvinti, do you get the same issue if you leave out the dbsnp argument from your command?

Geraldine Van der Auwera, PhD

• Posts: 74Member

Geraldine, I don't see any changes in the behaviour when taking out the dbsnp option...

Caused by: java.io.FileNotFoundException: ~/java/jre1.7.0_40/lib/resources.jar (Too many open files)

Anything different in default settings that might be hidden? What does this number of open files depend on - only on number of input files, or does their size matter, etc.?

This depends on the number of open files, not their sizes. Typically this issue is linked to GATK creating temporary files. We don't have any guidelines to predict how many temp files may need to be opened, and I'm not sure what could have changed between versions to explain why it is failing now. We work almost exclusively in a cluster environment with much higher tolerances so we have little experience with this type of constraint. I would recommend trying to double the ulimit and see if that works.

Geraldine Van der Auwera, PhD

• Posts: 74Member

Seems like the error had to do with changes made to our cluster. It would be useful at some point to publish some numbers on how this number of temporary files increases with number of inputs in UG. It's hard to know what happens in the 'black box', and difficult to estimate what to request from cluster admins. Nevertheless, thank you for your responses.

Hi @vsvinti,

Temp file usage (for tools that primarily process BAMs), should be a function of the number of contigs in the intervals to process, not the number of samples. Are you working with draft genomes that have many contigs, by any chance?

Geraldine Van der Auwera, PhD

• Posts: 74Member

I am working with whole exomes, calling only at capture intervals. Aha, so if this number of temp files are due to the intervals, I could break them up into smaller chunks and see what happens! I'll report back if/when I get to it!