Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Misleading error message (file system blip) trying to do threaded processing to compressed output?

Trying to run

java -jar $GATKJAR -R $REF -T UnifiedGenotyper -I file1.bam -I file2.bam -I file3.bam -glm BOTH -o output.vcf.gz

gives an error like:

 ##### ERROR ------------------------------------------------------------------------------------------
 ##### ERROR A USER ERROR has occurred (version 2.4-9-g532efad): 
 ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
 ##### ERROR Please do not post this error to the GATK forum
 ##### ERROR
 ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
 ##### ERROR Visit our website and forum for extensive documentation and answers to 
 ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
 ##### ERROR
 ##### ERROR MESSAGE: There was a failure because temporary file /tmp/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub1033673347640679118.tmp could not be found while running the GATK with more than one thread.  Possible causes for this problem include: your system's open file handle limit is too small, your output or temp directories do not have sufficient space, or just an isolated file system blip
 ##### ERROR ------------------------------------------------------------------------------------------

The file is actually there, and is gzip-compressed and vcf-formatted.

However, if I specify -o output.vcf instead of -o output.vcf.gz, then everything works. I suspect the problem is with the autodetection of the codec. In VariantContextWriterStorage, LocalParallelizationProblem is thrown not only if the tmp file cannot be found, but whenever a FeatureDescriptor cannot be found for the file.

So... It seems like compressed output cannot be used from threaded processing with UnifiedGenotyper. Is my assessment correct?

  1. A better error message would be helpful to prevent others from trying the same thing I did.
  2. It would be nice to be able to write compressed output from a threaded UnifiedGenotyper, perhaps: a) the temp file could be written uncompressed even though the final file will be compressed, or b) the Codec-detection could detect gzip-compressed files?

Answers

  • anupamanupam Member

    Hi. I have been facing similar problem. here is the possible solution:
    increase the upper limit of your server. (this can be done by root)
    by default, ulimit is 1024.

    ulimit -n 50000

    then run the GATK with multi-threading option. Hope this help.

  • bpowbpow Member
    edited June 2013

    The ulimit change can fix some problems with multi-threading, but this is not a ulimit issue. The files get created but are not being re-read by GATK. When -nt and compressed output is specified, the temp files get created but are themselves gzip-compressed. It appears that FeatureManager().getByFiletype(file) is not able to recognize the compressed files to merge them.

    I noticed, looking at my initial post, that I left -nt out of the command line that demonstrates the problem. My bad...

    I reported this initially in March, and the issue still exists in 2.5-2-gf57256b. Could someone confirm elsewhere that UnifiedGenotyper, when used with -nt and output specified as a .vcf.gz file, this (unclear) error message is given? It happens predictably for me regardless of what bam files I provide, and predictably removing the parallelization or specifying output to output.vcf instead of output.vcf.gz allows processing to continue.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    Sorry for not answering your question back in March -- it must have slipped through my net. I'll check but I believe you're correct that it's the interaction between multithreading and compression. Working with compressed files is not trivial, we've had other issues with that.

  • ploherploher Member

    Just curious I ran into this issue and we are hitting the same one. Has this been fixed in a coming version? Ulimit is unlimitted and our machines are very powerful with plenty of resources (tmp space, avail space, memory, threads, cpu, etc.)

  • ploherploher Member

    fwiw, I'm not using a .gz and sometimes get the error. Here is the executable line:

    java ${JAVAOPT} -jar ${GATKDIR}/GenomeAnalysisTK.jar -T UnifiedGenotyper -R ${ASSEMBLY} -I ${OUTFILE}.mapped.uniquely.sorted.SNPTMP.recal.reducedreads.bam --dbsnp ${PLFILES}/dbSNP_137.sorted.vcf -o ${OUTFILE}.SNP.RawSNPCall.vcf -glm BOTH -stand_call_conf 50 -stand_emit_conf 20 -nt 32

  • Mark_DePristoMark_DePristo Broad InstituteMember admin

    Multi-threaded gzip VCF output now works with nt. What is the other problem?

    What exact error are you getting with nt?

Sign In or Register to comment.