It looks like you're new here. If you want to get involved, click one of these buttons!
I am having difficulties getting Queue to determine the order of jobs added to the queue. Using the @Input and @Output definitions of input and output files, the dependencies are defined and Queue waits for one output method to finish prior to starting the subsequent method.
Since the order the method is added to the queue does not determine the dependencies, my assumption is that Queue looks at the names of the variables added to the queue to determine which method's output is another method's input. Regardless, I've tried working with variable names in both added methods along with those defined in the @Input and @Output. All of my trials seem to come up short as Queue runs the jobs in a manner inconsistent with the @Input, @Output, and variables defined and added as arguments to methods added to the queue.
What is the secret with defining the order of jobs added to the queue? Are there any additional rules in defining variables or the @Input/@Output that I am missing?
Any help is good help. Thanks.
Geraldine_VdAuwera
Posts: 2,238 admin
Hi there,
You've got the principle right -- Queue will arrange steps in the right order based on the names of inout/output files -- but you're misunderstanding how inputs/outputs are specified. The @Input and @Output annotations are used for passing arguments through the command line. We typically use those to pass the starting file, and maybe the name we want for the final output file. Filenames for intermediate steps are typically not specified on the command line and instead get generated with standard/formulaic name patterns. I recommend looking at some of the simpler example scala scripts included in the repository, in the scala >> qscripts section.
I hope that helps!
Geraldine_VdAuwera
Posts: 2,238 admin
I'm glad you found the solution to your problem! I was about to comment that the DPP is a pretty complex script to begin with and you may want to play with the example scripts first, but it sounds like you've got it all figured out.
Just to reiterate for everyone else, @Input and @Output aren't used to define dependencies, they are used to annotate inputs and outputs that the engine needs to look for in the command line.
Answers
Thanks for the response. Looking at DataProcessingPipeline.scala, the helper functions also use @Input and @Output to define its dependencies. I assume it is those definitions that determine which methods are run in what order. One example is running bwa aln followed by bwa sampe. aln output is the sampe input.
My problem, however, is that even though I define the dependencies like those in DataProcessingPipeline.scala, it goes out of order.
If you would be willing to take a look at what I am doing in my script, it would be much appreciated. I've looked through what seems to be the entire gatk site, github, and other spots for the answer and I am down to trial and error as my only means of correcting the problem.
thanks so much again.
P.S. I attached my script, but I cannot locate where it put it in my post. Here is the gist of it.
class idVariants extends QScript with Logging {
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I made a mistake in my code as the @Input I use in bwaMakeSam should be the @Output. Due to this, the dependency is not defined and is run in parallel to the method it should precede. My apologies for the error. I think I had it right, but the code was wrong.
Thanks again for your help!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •