US Holiday notice: this Thursday and Friday (Nov 25-26) the forum will be unattended. Normal service will resume Monday Nov 29. Happy Thanksgiving!

Queue dependencies determine order of jobs

beezwax0beezwax0 Posts: 3Member
edited January 2013 in Ask the GATK team

I am having difficulties getting Queue to determine the order of jobs added to the queue. Using the @Input and @Output definitions of input and output files, the dependencies are defined and Queue waits for one output method to finish prior to starting the subsequent method.

Since the order the method is added to the queue does not determine the dependencies, my assumption is that Queue looks at the names of the variables added to the queue to determine which method's output is another method's input. Regardless, I've tried working with variable names in both added methods along with those defined in the @Input and @Output. All of my trials seem to come up short as Queue runs the jobs in a manner inconsistent with the @Input, @Output, and variables defined and added as arguments to methods added to the queue.

What is the secret with defining the order of jobs added to the queue? Are there any additional rules in defining variables or the @Input/@Output that I am missing?

Any help is good help. Thanks.

Post edited by Geraldine_VdAuwera on

Best Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,682Administrator, GATK Developer admin
    Answer ✓

    Hi there,

    You've got the principle right -- Queue will arrange steps in the right order based on the names of inout/output files -- but you're misunderstanding how inputs/outputs are specified. The @Input and @Output annotations are used for passing arguments through the command line. We typically use those to pass the starting file, and maybe the name we want for the final output file. Filenames for intermediate steps are typically not specified on the command line and instead get generated with standard/formulaic name patterns. I recommend looking at some of the simpler example scala scripts included in the repository, in the scala >> qscripts section.

    I hope that helps!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,682Administrator, GATK Developer admin
    Answer ✓

    I'm glad you found the solution to your problem! I was about to comment that the DPP is a pretty complex script to begin with and you may want to play with the example scripts first, but it sounds like you've got it all figured out.

    Just to reiterate for everyone else, @Input and @Output aren't used to define dependencies, they are used to annotate inputs and outputs that the engine needs to look for in the command line.


  • beezwax0beezwax0 Posts: 3Member
    edited September 2012

    Thanks for the response. Looking at DataProcessingPipeline.scala, the helper functions also use @Input and @Output to define its dependencies. I assume it is those definitions that determine which methods are run in what order. One example is running bwa aln followed by bwa sampe. aln output is the sampe input.

    My problem, however, is that even though I define the dependencies like those in DataProcessingPipeline.scala, it goes out of order.

    If you would be willing to take a look at what I am doing in my script, it would be much appreciated. I've looked through what seems to be the entire gatk site, github, and other spots for the answer and I am down to trial and error as my only means of correcting the problem.

    thanks so much again.

    P.S. I attached my script, but I cannot locate where it put it in my post. Here is the gist of it.

    class idVariants extends QScript with Logging {

    // job parameters
    @Input(doc="do alignment, default: true", fullName="align", shortName="a", required=false)
    var align: Boolean = true
    // bwa parameters
    @Input(doc="fastq file(s) to be processed (maximum of 2)", fullName="fastq", shortName="fq", required=true) 
    var infiles: List[File] = _
    @Output(doc="output filename", fullName="output", shortName="out", required=true)
    var outfile: File = _
    @Argument(doc="bwa: prefix of reference index filepath", fullName="databaseprefix", shortName="db", required=true) 
    var refPrefix: String = _
    @Argument(doc="bwa: number of threads to spawn", fullName="numthreads", shortName="thread", required=false)
    var threadCount: Int = 8
    @Argument(doc="bwa: minimum allowable quality value", fullName="qualthreshold", shortName="thresh", required=false) 
    var qualThresh: Int = 0
    @Argument(doc="picard: absolute path to picard jar files", fullName="picardpath", shortName="p", required=true)
    var pathToPicard: String = _
    def script() {
        // do alignment
        if(align == true){
            // intermediate filenames
            val sam = new File(outfile + ".sam")
            val clean = new File(outfile + ".clean.sam")
            val bam = new File(outfile + ".bam")
            val sort = new File(outfile + ".sort.bam")
            val dup = new File(outfile + ".dup.bam")
            val bai = new File(outfile + ".bai")
            // create a sai filename for each inputted file and put in list
            //var tmpSais = new ListBuffer[File]
            var sais: Seq[File] = Nil
            var filenum = 0
            for(infile <- infiles) {
                val sai = new File(outfile + "." + filenum + ".sai")
                sais = sais :+ sai
                filenum += 1
                add(bwaAlign(infile, sai, refPrefix, threadCount, qualThresh))
            // merge and convert to SAM format
            add(bwaMakeSam(sais, infiles, refPrefix, sam),
                cleanSam(sam, clean, pathToPicard))
    case class bwaAlign(inFq: File, outSai: File, refPrefix: String, threadCount: Int, qualThresh: Int) extends CommandLineFunction {
        // call bwa aln
        //include read group info
        @Input(doc="fastq file to align")
        val infile = inFq
        @Output(doc="alignment output filename")
        val outfile = outSai
        // queue commandLine definition
        def commandLine =   required("bwa") + required("aln") + 
                            required("-f", outfile) +
                            optional("-t", threadCount) + 
                            optional("-q", qualThresh) +
                            required(refPrefix) +
        //this.isIntermediate = true
        this.analysisName = outfile + "-bwa_aln"
        this.jobName = outfile + "-bwa_aln"
    case class bwaMakeSam(inSai: Seq[File], inFq: Seq[File], refPrefix: String, outSam: File) extends CommandLineFunction {
        // bwa samse/sampe call to merge alignments and format as SAM file
        @Input(doc="alignment files (max 2)")
        val infile = inSai
        @Input(doc="output SAM filename")
        val outfile = outSam
        var cmdStr = new String("bwa")
        if(inSai.length == 1) {
            cmdStr += " samse " + "-f " + outfile + " " + refPrefix + " " + infile(0) + " " + inFq(0) 
        else {
            cmdStr += " sampe " + "-f " + outfile + " " + refPrefix + " " + infile(0) + " " + infile(1) + " " + inFq(0) + " " + inFq(1)
        def commandLine = cmdStr
        //this.isIntermediate = true
        this.analysisName = outfile + "-bwa_samse_or_sampe"
        this.jobName = outfile + "-bwa_samse_or_sampe"
    case class cleanSam(inSam: File, outSam: File, picardPath: String) extends CommandLineFunction {
        // picard's CleanSam.jar
        @Input(doc="SAM file to clean")
        val infile = inSam
        @Output(doc="cleaned SAM filename")
        val outfile = outSam
        def commandLine = required("java") + required("-jar", pathToPicard+"/CleanSam.jar") +
                          required("INPUT="+infile) +
        //this.isIntermediate = true
        this.analysisName = outfile + "-picard_cleanSam"
        this.jobName = outfile + "-picard_cleanSam"
    Post edited by beezwax0 on
  • beezwax0beezwax0 Posts: 3Member

    I made a mistake in my code as the @Input I use in bwaMakeSam should be the @Output. Due to this, the dependency is not defined and is run in parallel to the method it should precede. My apologies for the error. I think I had it right, but the code was wrong.

    Thanks again for your help!

Sign In or Register to comment.