Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Fallback and documentation for GATK tools that don't support multithreading

husseinhussein DSP @ Broad InstituteMember, Broadie, Dev ✭✭
edited May 2014 in Ask the GATK team

I've been using the GATK - in particular, the DiagnoseTargets and VariantsToTable tools - and have been running into trouble attempting to parallelise these tasks.

I've tried both the -n and -nct flags and it turns out that neither are supported by the above tools. Unfortunately there doesn't appear to be anything on the documentation that indicates this, so I only ever find out the hard way when trying to run them. As such, I have a couple of questions:

  1. Does the documentation list which of the engine-wide parameters are unsupported by certain tools? If not, could it?

  2. Even if the tools aren't automagically parallelisable, I still want to run them -- it's a little frustrating to kick off a long-running process over the weekend and get back on Monday to find it failed a few hours in! Is there an option to fall back to single-threaded execution if one of the multithreading flags isn't supported? If not, could there be?

Thanks!

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    edited May 2014

    Hi Hussein,

    Support for parallelism options is indicated in the tool docs when they are applicable, e.g. if you look at the UG's doc (http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_genotyper_UnifiedGenotyper.html) there is a "parallelism options" section. Currently we don't have anything that explicitly states that parallelism is not available if that is the case, but I understand how that might be helpful, and I'll add that as a feature request for the next version.

    Your jobs should be failing immediately with a USER ERROR if you are trying to use nt/nct where they are not supported. If that's not the case, please let us know what were the conditions, exact error message etc. because that's not the expected behavior.

    Have you run into any similar issues with other engine arguments being unsupported by specific tools?

  • husseinhussein DSP @ Broad InstituteMember, Broadie, Dev ✭✭

    Ah, I see - I wasn't aware that there was a specific section that was present if parallelism was supported and missing otherwise. That's good to know!

    It was indeed failing with a USER ERROR, but I happened to be calling that gatk tool in the middle of a long-running script, and said error brought the whole pipeline down. I understand that this is an instance of PEBCAK but it'd have been more helpful if it'd have printed a warning and fallen back to single-threaded execution, hence the request for a (possibly optional) fallback. I appreciate this may not be a worthwhile feature for the GATK in the general case!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hmm, you mean you'd want the program to just issue a warning, ignore the instruction and run single-threaded, then, rather than exit? I can see the value in that. I'll discuss with the devs to see if we might change the default behavior, or provide an "ignore inapplicable arguments" option.

  • husseinhussein DSP @ Broad InstituteMember, Broadie, Dev ✭✭

    Yes, that'd be ideal. I can't think of many situations where I'd rather the program exit if it can't run multithreaded.

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    OK, will try to push it through. FYI, the most likely objection will be "but people should test each command line before putting them into a pipeline script!". If you have a counterargument to that it, feel free to share -- it will give me ammunition :)

  • husseinhussein DSP @ Broad InstituteMember, Broadie, Dev ✭✭

    As a developer myself, my conscience doesn't allow me to argue that that objection isn't valid ;)

    You could just as easily ask why the option exists if it's invalid, but the problem (in this case, "stupid user") could just as easily be fixed with documentation or, as you say, testing on the user's part.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    You could just as easily ask why the option exists if it's invalid

    Inherited option, easier to include along with the rest of the collection than exclude case-by-case...

    Perhaps you are familiar with this quote from the great Douglas Adams?

    A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools.

Sign In or Register to comment.