Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Haplotypecaller -Djava.io.tmpdir

zzqzzq ChinaMember
edited May 2016 in Ask the GATK team

Hi @Geraldine_VdAuwera ,

Recently, I run the pipeline according to the best practices. But it gives me some strange errors when I call gvcf for each individual. The information about the error like following,

`INFO 06:50:39,055 ProgressMeter - opera_scaffold_6771:1480 2.68554067E9 79.6 h 106.0 s 95.6% 83.3 h 3.7 h

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.5-0-g36282e4):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Unable to create a temporary BAM schedule file. Please make sure Java can write to the default temp directory or use -Djava.io.tmpdir= to instruct it to use a different temp directory instead.
ERROR ------------------------------------------------------------------------------------------

`
Many samples seem to finish the jobs but at last failed while some samples completed successfully. I have redirected the -Djava.io.tmpdir in the command and the space on the driver is enough. The java version I used is 1.8. I hope these information can help you with my problem.

Best

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @zzq
    Hi,

    If you are using the latest stable version, you will need to use Java-1.7. However, if you upgrade to the latest nightly build, you can use Java-1.8.

    -Sheila

    P.S. The newest stable release should come out soon, and that will accept Java-1.8 if you want to wait for a stable release :smile:

  • zzqzzq ChinaMember
    edited May 2016

    Hi @Sheila

    Thanks, I have generated the gvcf for many individuals successfully using latest stable version with java-1.8, Is there any problem for these gvcf files when using Java-1.8.

    Best

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    As documented in the program requirements, only Java 1.7 is officially supported for versions up to and including 3.5. There can be correctness issues with GATK 3.5 on Java 1.8 which I recently discussed on the forum and on the blog.

    That being said I doubt that this particular error is due to the java version. Are you sure that there is enough space on the drive that you are using for tmp?

  • zzqzzq ChinaMember
    edited May 2016

    Hi @Geraldine_VdAuwera ,

    Yes, I am sure. But the number of files in the tmp is so large(about 11000, it is difficult for me to open it with ls). If java-1.8 ok, I just doubt that it is difficult to load and write tmp files in this directory and I will change a new tmp to have a try for these failed samples.

    best

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    To be clear, I'm saying that I don't think Java 1.8 is responsible for this particular error; however we have some indications that running GATK 3.5 is NOT OK from the perspective of correctness. It runs without failing but some annotation values may be incorrect, leading to possible negative effects on filtering.

  • zzqzzq ChinaMember
    edited May 2016

    Hi @Geraldine_VdAuwera ,

    You means that the samples finished with Java 1.8 successfully should also rerun again (call GVCF)? If the effects on filtering are weak, I will not rerun for these samples because it will take me a long time to do this.

    Many thanks!

    Issue · Github
    by Sheila

    Issue Number
    901
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    As I mentioned elsewhere, we haven't tested this thoroughly, since we only run on the platform that we officially support (by definition). So I can't tell you what might be the scale of the problem in your data. I would recommend running either one sample again, or a chromosome's worth of intervals across multiple samples, and evaluating how much difference there is in the annotation values. That will allow you to estimate whether your results might be affected substantially or not, and whether it's worth rerunning samples or not. It's up to you to decide how much risk you are willing to take.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
Sign In or Register to comment.