The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Queue dependencies determine order of jobs

beezwax0beezwax0 Member Posts: 3
edited January 2013 in Ask the GATK team

I am having difficulties getting Queue to determine the order of jobs added to the queue. Using the @Input and @Output definitions of input and output files, the dependencies are defined and Queue waits for one output method to finish prior to starting the subsequent method.

Since the order the method is added to the queue does not determine the dependencies, my assumption is that Queue looks at the names of the variables added to the queue to determine which method's output is another method's input. Regardless, I've tried working with variable names in both added methods along with those defined in the @Input and @Output. All of my trials seem to come up short as Queue runs the jobs in a manner inconsistent with the @Input, @Output, and variables defined and added as arguments to methods added to the queue.

What is the secret with defining the order of jobs added to the queue? Are there any additional rules in defining variables or the @Input/@Output that I am missing?

Any help is good help. Thanks.

Post edited by Geraldine_VdAuwera on
Tagged:

Best Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,428 admin
    Accepted Answer

    Hi there,

    You've got the principle right -- Queue will arrange steps in the right order based on the names of inout/output files -- but you're misunderstanding how inputs/outputs are specified. The @Input and @Output annotations are used for passing arguments through the command line. We typically use those to pass the starting file, and maybe the name we want for the final output file. Filenames for intermediate steps are typically not specified on the command line and instead get generated with standard/formulaic name patterns. I recommend looking at some of the simpler example scala scripts included in the repository, in the scala >> qscripts section.

    I hope that helps!

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,428 admin
    Accepted Answer

    I'm glad you found the solution to your problem! I was about to comment that the DPP is a pretty complex script to begin with and you may want to play with the example scripts first, but it sounds like you've got it all figured out.

    Just to reiterate for everyone else, @Input and @Output aren't used to define dependencies, they are used to annotate inputs and outputs that the engine needs to look for in the command line.

    Geraldine Van der Auwera, PhD

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,428 admin
    Accepted Answer

    Hi there,

    You've got the principle right -- Queue will arrange steps in the right order based on the names of inout/output files -- but you're misunderstanding how inputs/outputs are specified. The @Input and @Output annotations are used for passing arguments through the command line. We typically use those to pass the starting file, and maybe the name we want for the final output file. Filenames for intermediate steps are typically not specified on the command line and instead get generated with standard/formulaic name patterns. I recommend looking at some of the simpler example scala scripts included in the repository, in the scala >> qscripts section.

    I hope that helps!

    Geraldine Van der Auwera, PhD

  • beezwax0beezwax0 Member Posts: 3
    edited September 2012

    Thanks for the response. Looking at DataProcessingPipeline.scala, the helper functions also use @Input and @Output to define its dependencies. I assume it is those definitions that determine which methods are run in what order. One example is running bwa aln followed by bwa sampe. aln output is the sampe input.

    My problem, however, is that even though I define the dependencies like those in DataProcessingPipeline.scala, it goes out of order.

    If you would be willing to take a look at what I am doing in my script, it would be much appreciated. I've looked through what seems to be the entire gatk site, github, and other spots for the answer and I am down to trial and error as my only means of correcting the problem.

    thanks so much again.

    P.S. I attached my script, but I cannot locate where it put it in my post. Here is the gist of it.

    class idVariants extends QScript with Logging {

    // job parameters
    @Input(doc="do alignment, default: true", fullName="align", shortName="a", required=false)
    var align: Boolean = true
    
    // bwa parameters
    @Input(doc="fastq file(s) to be processed (maximum of 2)", fullName="fastq", shortName="fq", required=true) 
    var infiles: List[File] = _
    
    @Output(doc="output filename", fullName="output", shortName="out", required=true)
    var outfile: File = _
    
    @Argument(doc="bwa: prefix of reference index filepath", fullName="databaseprefix", shortName="db", required=true) 
    var refPrefix: String = _
    
    @Argument(doc="bwa: number of threads to spawn", fullName="numthreads", shortName="thread", required=false)
    var threadCount: Int = 8
    
    @Argument(doc="bwa: minimum allowable quality value", fullName="qualthreshold", shortName="thresh", required=false) 
    var qualThresh: Int = 0
    
    @Argument(doc="picard: absolute path to picard jar files", fullName="picardpath", shortName="p", required=true)
    var pathToPicard: String = _
    
    def script() {
    
        // do alignment
        if(align == true){
    
            // intermediate filenames
            val sam = new File(outfile + ".sam")
            val clean = new File(outfile + ".clean.sam")
            val bam = new File(outfile + ".bam")
            val sort = new File(outfile + ".sort.bam")
            val dup = new File(outfile + ".dup.bam")
            val bai = new File(outfile + ".bai")
    
            // create a sai filename for each inputted file and put in list
            //var tmpSais = new ListBuffer[File]
            var sais: Seq[File] = Nil
    
            var filenum = 0
    
            for(infile <- infiles) {
    
                val sai = new File(outfile + "." + filenum + ".sai")
                logger.debug(sais.toString())
                sais = sais :+ sai
    
                filenum += 1
    
                add(bwaAlign(infile, sai, refPrefix, threadCount, qualThresh))
            }
    
            // merge and convert to SAM format
            add(bwaMakeSam(sais, infiles, refPrefix, sam),
                cleanSam(sam, clean, pathToPicard))
    
    
        }
    
    
    }
    
    case class bwaAlign(inFq: File, outSai: File, refPrefix: String, threadCount: Int, qualThresh: Int) extends CommandLineFunction {
        // call bwa aln
    
        //include read group info
    
        @Input(doc="fastq file to align")
        val infile = inFq
        @Output(doc="alignment output filename")
        val outfile = outSai
    
        // queue commandLine definition
        def commandLine =   required("bwa") + required("aln") + 
                            required("-f", outfile) +
                            optional("-t", threadCount) + 
                            optional("-q", qualThresh) +
                            required(refPrefix) +
                            required(infile) 
    
        //this.isIntermediate = true
        this.analysisName = outfile + "-bwa_aln"
        this.jobName = outfile + "-bwa_aln"
    
    }
    
    case class bwaMakeSam(inSai: Seq[File], inFq: Seq[File], refPrefix: String, outSam: File) extends CommandLineFunction {
        // bwa samse/sampe call to merge alignments and format as SAM file
    
        @Input(doc="alignment files (max 2)")
        val infile = inSai
        @Input(doc="output SAM filename")
        val outfile = outSam
    
        var cmdStr = new String("bwa")
    
        if(inSai.length == 1) {
            cmdStr += " samse " + "-f " + outfile + " " + refPrefix + " " + infile(0) + " " + inFq(0) 
        }
        else {
            cmdStr += " sampe " + "-f " + outfile + " " + refPrefix + " " + infile(0) + " " + infile(1) + " " + inFq(0) + " " + inFq(1)
        }
    
        def commandLine = cmdStr
    
        //this.isIntermediate = true
        this.analysisName = outfile + "-bwa_samse_or_sampe"
        this.jobName = outfile + "-bwa_samse_or_sampe"
    }
    
    case class cleanSam(inSam: File, outSam: File, picardPath: String) extends CommandLineFunction {
        // picard's CleanSam.jar
    
        @Input(doc="SAM file to clean")
        val infile = inSam
        @Output(doc="cleaned SAM filename")
        val outfile = outSam
    
        def commandLine = required("java") + required("-jar", pathToPicard+"/CleanSam.jar") +
                          required("INPUT="+infile) +
                          required("OUTPUT="+outfile)
    
        //this.isIntermediate = true
        this.analysisName = outfile + "-picard_cleanSam"
        this.jobName = outfile + "-picard_cleanSam"
    }
    
  • beezwax0beezwax0 Member Posts: 3

    I made a mistake in my code as the @Input I use in bwaMakeSam should be the @Output. Due to this, the dependency is not defined and is run in parallel to the method it should precede. My apologies for the error. I think I had it right, but the code was wrong.

    Thanks again for your help!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,428 admin
    Accepted Answer

    I'm glad you found the solution to your problem! I was about to comment that the DPP is a pretty complex script to begin with and you may want to play with the example scripts first, but it sounds like you've got it all figured out.

    Just to reiterate for everyone else, @Input and @Output aren't used to define dependencies, they are used to annotate inputs and outputs that the engine needs to look for in the command line.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.