Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

piping GATK output to stdout

freeseekfreeseek Member
edited February 2014 in Ask the GATK team

I want to pipe GATK output to standard output.

I am using a command like this (GATK v2.8-1-g932cd3a):
java -Xmx4g -jar GenomeAnalysisTK.jar -R human_g1k_v37.fasta -T CombineVariants -V in1.vcf.gz -V in2.vcf.gz -o /dev/stdout

However, GATK echos the INFO information in the standard output, mixing information that is not meant to end up in a VCF file.

I have also tried the following command line:
java -Xmx4g -jar GenomeAnalysisTK.jar -R human_g1k_v37.fasta -T CombineVariants -V in1.vcf.gz -V in2.vcf.gz -log /dev/stderr -o /dev/stdout

But this only achieves the result to send the INFO information both to standard output and to standard error.

Is there a way to have GATK not use the standard output to communicate information to the user?

I have checked the documentation at http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_CommandLineGATK.html#--log_to_file but I don't understand how I could do this.

Tagged:

Best Answer

Answers

  • pdexheimerpdexheimer Member ✭✭✭✭

    You're so close - use --logging_level/-l with a value of OFF

    Side note to the documentation crew - a list of the valid levels (OFF, DEBUG, INFO, WARN, ERROR, FATAL) would be handy there.

  • freeseekfreeseek Member

    That is a good workaround. But what if I still want the logs output in a separate file?

  • pdexheimerpdexheimer Member ✭✭✭✭

    Not sure it's possible - the log_to_file argument eventually results in a Logger.addAppender() call. I think it's a tee rather than a redirect.

    What about using the shell to redirect stderr?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    a list of the valid levels (OFF, DEBUG, INFO, WARN, ERROR, FATAL) would be handy there

    Indeed it would! Will add to the to-do list.

  • freeseekfreeseek Member
    edited February 2014

    You are correct, it is a tee rather than a redirect.

    I am not sure what you mean with using a shell to redirect stderr. I could use:

    java -Xmx4g -jar GenomeAnalysisTK.jar -R human_g1k_v37.fasta -T CombineVariants -V in1.vcf.gz -V in2.vcf.gz -o /dev/stderr >/dev/null 2>out

    But then I could just run:

    java -Xmx4g -jar GenomeAnalysisTK.jar -R human_g1k_v37.fasta -T CombineVariants -V in1.vcf.gz -V in2.vcf.gz -o out >/dev/null

    I want to be able to pipe the GATK output to another program (in my case it is awk). I can indeed use "-l OFF" or easily grep out the INFO information (with "grep -v ^INFO"), but I think the design of the GATK is just flawed here.

    It would be good practice to have the log information be output on stderr by default (see http://en.wikipedia.org/wiki/Standard_streams#Standard_error_.28stderr.29) and not mixed with the VCF output, and "-log" should not work as a tee. The way it is now is just far from how most UNIX tools work.

  • pdexheimerpdexheimer Member ✭✭✭✭

    I had assumed that the log went to stderr, and so was suggesting something like $GATK -o /dev/stdout 2> /dev/null. But now that I think about it, you're right - the log data does always get saved to the stdout file by LSF

  • freeseekfreeseek Member
    edited February 2014

    Never mind, the "grep -v ^INFO" option does not work, since by default GATK even mixed VCF lines with the INFO lines, creating chimera reads. I have indeed got two lines in the output like the following:

    ... 0/0:43,1:43:99:0,12INFO 12:17:58,790 ProgressMeter - 2:113943823 2.61e+05 95.5 m 6.1 h 11.7% 13.6 h 12.0 h

    9,1652 0/0:39,0:39:99:0,117,1429 ...

    I have no other way to classify this as anything other than a bug.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @freeseek It would be a bug if we claimed that this capability (piping results to stdout) existed... But I don't think we do anywhere (and if we do it's a mistake). At present we don't support doing this, and to be honest I don't think we ever will unless someone contributes a patch to do this. Perhaps it is a design flaw but that it is how the GATK was designed to work in the Broad's pipelines (which it is our primary mission to support) so we cannot devote resources to implementing this. I'm sorry for the inconvenience and for not jumping in sooner to tell you it was unsupported; I've been traveling and am still catching up on forum questions.

  • freeseekfreeseek Member

    I am sorry Geraldine, but if that is the case, then the documentation is very misleading. See http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_CombineVariants.html#--out where it is mentioned that by default GATK will output to stdout. I think nobody would expect to see the log information in the standard output if this is used by default for the output of the walker.

    Though I confess that I am a bit perplexed. The real problem here is not that GATK does not support piping to stdout. I can feed /dev/stdout to GATK and GATK will believe that it is a regular file so that it does not even have to be aware that it is being used for piping the output to another program.

    The real problem here is that GATK outputs the log to standard output. This is an unconventional practice (see http://www.cplusplus.com/reference/iostream/clog/) and other users might very well find this practice confusing. The solution is extremely easy. Just output the log to standard error. It is not a matter of implementing something new, it is a matter of changing one line of code. If there are no resources to do this, I am happy to delve into the code.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Alright, if you're going to use logic... good thing my jetlag is wearing off and I'm regaining the ability to follow it :)

    It is true that the docs state that the default output of most walkers is to stdout. This comes down to legacy code from the early days of GATK, when it was a very different animal than what it has evolved into. But actually letting walkers output to stdout (by not specifying -o) is a bad idea in the current state of affairs, for the reasons that you've pointed out. And it's something we never do in our pipelines, because every intermediate file is kept until we're really really sure it's not needed anymore.

    That being said I agree with you on principle that the streams should be kept separate. I just worry that changing it is not going to be that simple -- there's usually something twisted deep in the GATK's bowels that makes things more complicated than they need to be (don't quote me on that).

    But you'll be happy to hear that the devs I've asked so far agree with you, so there's a good chance this might even get done for the 3.0 release...

  • bryandbryand San Francisco, USAMember

    This issue occurs in v3.5 when using the CombineGVCFs walker.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @bryand
    Hi,

    Can you please post the exact command you ran?

    Thanks,
    Sheila

Sign In or Register to comment.