(howto) Run Queue for the first time

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin
edited July 2013 in Tutorials

Objective

Run a basic analysis command on example data, parallelized with Queue.

Prerequisites

Steps

  1. Set up a dry run of Queue
  2. Run the analysis for real
  3. Running on a computing farm

1. Set up a dry run of Queue

One very cool feature of Queue is that you can test your script by doing a "dry run". That means Queue will prepare the analysis and build the scatter commands, but not actually run them. This makes it easier to check the sanity of your script and command.

Here we're going to set up a dry run of a CountReads analysis. You should be familiar with the CountReads walker and the example files from the bundles, as used in the basic "GATK for the first time" tutorial. In addition, we're going to use the example QScript called ExampleCountReads.scala provided in the Queue package download.

Action

Type the following command:

java -Djava.io.tmpdir=tmp -jar Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam

where -S ExampleCountReads.scala specifies which QScript we want to run, -R exampleFASTA.fasta specifies the reference sequence, and -I exampleBAM.bam specifies the file of aligned reads we want to analyze.

Expected Result

After a few seconds you should see output that looks nearly identical to this:

INFO  00:30:45,527 QScriptManager - Compiling 1 QScript 
INFO  00:30:52,869 QScriptManager - Compilation complete 
INFO  00:30:53,284 HelpFormatter - ---------------------------------------------------------------------- 
INFO  00:30:53,284 HelpFormatter - Queue v2.0-36-gf5c1c1a, Compiled 2012/08/08 20:18:21 
INFO  00:30:53,284 HelpFormatter - Copyright (c) 2012 The Broad Institute 
INFO  00:30:53,284 HelpFormatter - Fro support and documentation go to http://www.broadinstitute.org/gatk 
INFO  00:30:53,285 HelpFormatter - Program Args: -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam 
INFO  00:30:53,285 HelpFormatter - Date/Time: 2012/08/09 00:30:53 
INFO  00:30:53,285 HelpFormatter - ---------------------------------------------------------------------- 
INFO  00:30:53,285 HelpFormatter - ---------------------------------------------------------------------- 
INFO  00:30:53,290 QCommandLine - Scripting ExampleCountReads 
INFO  00:30:53,364 QCommandLine - Added 1 functions 
INFO  00:30:53,364 QGraph - Generating graph. 
INFO  00:30:53,388 QGraph - ------- 
INFO  00:30:53,402 QGraph - Pending:  'java'  '-Xmx1024m'  '-Djava.io.tmpdir=/Users/vdauwera/sandbox/Q2/resources/tmp'  '-cp' '/Users/vdauwera/sandbox/Q2/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'CountReads'  '-I' '/Users/vdauwera/sandbox/Q2/resources/exampleBAM.bam'  '-R' '/Users/vdauwera/sandbox/Q2/resources/exampleFASTA.fasta'  
INFO  00:30:53,403 QGraph - Log:     /Users/vdauwera/sandbox/Q2/resources/ExampleCountReads-1.out 
INFO  00:30:53,403 QGraph - Dry run completed successfully! 
INFO  00:30:53,404 QGraph - Re-run with "-run" to execute the functions. 
INFO  00:30:53,409 QCommandLine - Script completed successfully with 1 total jobs 
INFO  00:30:53,410 QCommandLine - Writing JobLogging GATKReport to file /Users/vdauwera/sandbox/Q2/resources/ExampleCountReads.jobreport.txt 

If you don't see this, check your spelling (GATK commands are case-sensitive), check that the files are in your working directory, and if necessary, re-check that the GATK and Queue are properly installed.

If you do see this output, congratulations! You just successfully ran you first Queue dry run!


2. Run the analysis for real

Once you have verified that the Queue functions have been generated successfully, you can execute the pipeline by appending -run to the command line.

Action

Instead of this command, which we used earlier:

java -Djava.io.tmpdir=tmp -jar Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam

this time you type this:

java -Djava.io.tmpdir=tmp -jar Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run

See the difference?

Result

You should see output that looks nearly identical to this:

INFO  00:56:33,688 QScriptManager - Compiling 1 QScript 
INFO  00:56:39,327 QScriptManager - Compilation complete 
INFO  00:56:39,487 HelpFormatter - ---------------------------------------------------------------------- 
INFO  00:56:39,487 HelpFormatter - Queue v2.0-36-gf5c1c1a, Compiled 2012/08/08 20:18:21 
INFO  00:56:39,488 HelpFormatter - Copyright (c) 2012 The Broad Institute 
INFO  00:56:39,488 HelpFormatter - Fro support and documentation go to http://www.broadinstitute.org/gatk 
INFO  00:56:39,489 HelpFormatter - Program Args: -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run 
INFO  00:56:39,490 HelpFormatter - Date/Time: 2012/08/09 00:56:39 
INFO  00:56:39,490 HelpFormatter - ---------------------------------------------------------------------- 
INFO  00:56:39,491 HelpFormatter - ---------------------------------------------------------------------- 
INFO  00:56:39,498 QCommandLine - Scripting ExampleCountReads 
INFO  00:56:39,569 QCommandLine - Added 1 functions 
INFO  00:56:39,569 QGraph - Generating graph. 
INFO  00:56:39,589 QGraph - Running jobs. 
INFO  00:56:39,623 FunctionEdge - Starting:  'java'  '-Xmx1024m'  '-Djava.io.tmpdir=/Users/vdauwera/sandbox/Q2/resources/tmp'  '-cp' '/Users/vdauwera/sandbox/Q2/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'CountReads'  '-I' '/Users/vdauwera/sandbox/Q2/resources/exampleBAM.bam'  '-R' '/Users/vdauwera/sandbox/Q2/resources/exampleFASTA.fasta'  
INFO  00:56:39,623 FunctionEdge - Output written to /Users/GG/codespace/GATK/Q2/resources/ExampleCountReads-1.out 
INFO  00:56:50,301 QGraph - 0 Pend, 1 Run, 0 Fail, 0 Done 
INFO  00:57:09,827 FunctionEdge - Done:  'java'  '-Xmx1024m'  '-Djava.io.tmpdir=/Users/vdauwera/sandbox/Q2/resources/tmp'  '-cp' '/Users/vdauwera/sandbox/Q2/resources/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'CountReads'  '-I' '/Users/vdauwera/sandbox/Q2/resources/exampleBAM.bam'  '-R' '/Users/vdauwera/sandbox/Q2/resources/exampleFASTA.fasta'  
INFO  00:57:09,828 QGraph - 0 Pend, 0 Run, 0 Fail, 1 Done 
INFO  00:57:09,835 QCommandLine - Script completed successfully with 1 total jobs 
INFO  00:57:09,835 QCommandLine - Writing JobLogging GATKReport to file /Users/vdauwera/sandbox/Q2/resources/ExampleCountReads.jobreport.txt 
INFO  00:57:10,107 QCommandLine - Plotting JobLogging GATKReport to file /Users/vdauwera/sandbox/Q2/resources/ExampleCountReads.jobreport.pdf 
WARN  00:57:18,597 RScriptExecutor - RScript exited with 1. Run with -l DEBUG for more info. 

Great! It works!

The results of the traversal will be written to a file in the current directory. The name of the file will be printed in the output, ExampleCountReads.out in this example.

If for some reason the run was interrupted, in most cases you can resume by just launching the command. Queue will pick up where it left off without redoing the parts that ran successfully.


3. Running on a computing farm

Run with -bsub to run on LSF, or for early Grid Engine support see Queue with Grid Engine.

See also QFunction and Command Line Options for more info on Queue options.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Comments

  • oriol_senanoriol_senan Posts: 1Member

    The link "how to use GATK for the first time" is not working

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Prerequisites links are fixed, thanks for reporting this.

    Geraldine Van der Auwera, PhD

  • lucdhlucdh Posts: 10Member
    edited October 2012

    I looks like some more links need fixing:

    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Thanks for reporting, we'll fix these asap.

    Geraldine Van der Auwera, PhD

  • omedvedevaomedvedeva Posts: 1Member

    I can't perform a first dry run on Windows 7 with Queue 2.2.5. The installation seems to be correct since --help option works. It looks like it can't find the tmp directory that it creates at the correct location. The same problem occurs with QueueLite too. What am I missing? In the stack trace below fasta, bam and scala files were in the working directory:

    C:\GATK\Queue-2.2-5-g3bf5e3f>java -Djava.io.tmpdir=tmp -jar Queue.jar -S Example CountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam ERROR 10:17:34,493 QScriptManager - \GATK\Queue-2.2-5-g3bf5e3f\tmp\Q-Classes-80 75780960630530304 does not exist or is not a directory INFO 10:17:35,965 QScriptManager - Compiling 1 QScript INFO 10:17:40,538 QScriptManager - Compilation complete

    ...

    ERROR stack trace

    org.broadinstitute.sting.commandline.InvalidArgumentException: Argument with name 'R' isn't defined. at org.broadinstitute.sting.commandline.ParsingEngine.validate(ParsingEn gine.java:303) at org.broadinstitute.sting.commandline.ParsingEngine.validate(ParsingEn gine.java:276) at org.broadinstitute.sting.commandline.CommandLineProgram.start(Command LineProgram.java:204) at org.broadinstitute.sting.commandline.CommandLineProgram.start(Command LineProgram.java:146) at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala: 62) at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)

    ##### ERROR --------------------------------------------------------------------

    ERROR A GATK RUNTIME ERROR has occurred (version 2.2-5-g3bf5e3f):

    ...

    ERROR MESSAGE: Argument with name 'R' isn't defined.
    ERROR --------------------------------------------------------------------

    Thank you, Olga.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin
    edited November 2012

    I'm sorry Olga, we can't provide support for running GATK or Queue on Windows. There are differences in I/O management that cause problems with filepaths, and we can't shoulder the support burden of helping you figure that out. You should post this question in the Ask the Community section; perhaps others will be able to advise you on this point.

    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

  • elisa1507elisa1507 Posts: 2Member
    edited February 2013

    Hi, So I'm trying to run this tutorial and the first script runs fine and looks exactly like step 1. Once I put -run at the end of it, I'm getting an error that looks like this :

    ERROR 15:07:45,844 FunctionEdge - Error:  'java'  '-Xmx1024m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/Users/jones/bin/Queue/tmp'  '-cp' '/Users/jones/bin/Queue/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'CountReads'  '-I' '/Users/jones/bin/Queue/resources/exampleBAM.bam'  '-R' '/Users/jones/bin/Queue/resources/exampleFASTA.fasta'  
    ERROR 15:07:45,851 FunctionEdge - Contents of /Users/jones/bin/Queue/ExampleCountReads-1.out:
    Conflicting collector combinations in option list; please refer to the release notes for the combinations allowed
    Could not create the Java virtual machine. 
    INFO  15:07:45,852 QGraph - Writing incremental jobs reports... 
    INFO  15:07:45,853 QJobsReporter - Writing JobLogging GATKReport to file /Users/jones/bin/Queue/ExampleCountReads.jobreport.txt 
    INFO  15:07:45,884 QGraph - 0 Pend, 0 Run, 1 Fail, 0 Done 
    INFO  15:07:45,886 QCommandLine - Script failed with 1 total jobs 
    INFO  15:07:45,889 QCommandLine - Writing final jobs report... 
    INFO  15:07:45,889 QJobsReporter - Writing JobLogging GATKReport to file /Users/jones/bin/Queue/ExampleCountReads.jobreport.txt 
    INFO  15:07:45,893 QJobsReporter - Plotting JobLogging GATKReport to file /Users/jones/bin/Queue/ExampleCountReads.jobreport.pdf 
    WARN  15:07:46,693 RScriptExecutor - RScript exited with 1. Run with -l DEBUG for more info. 
    INFO  15:07:46,695 QCommandLine - Done with errors 
    INFO  15:07:46,697 QGraph - ------- 
    INFO  15:07:46,699 QGraph - Failed:   'java'  '-Xmx1024m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/Users/jones/bin/Queue/tmp'  '-cp' '/Users/jones/bin/Queue/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'CountReads'  '-I' '/Users/jones/bin/Queue/resources/exampleBAM.bam'  '-R' '/Users/jones/bin/Queue/resources/exampleFASTA.fasta'  
    INFO  15:07:46,700 QGraph - Log:     /Users/jones/bin/Queue/ExampleCountReads-1.out 
    

    Do you know why this could be please? I'm new to this!

    Thanks!

    Post edited by Geraldine_VdAuwera on
  • grumblrgrumblr Posts: 1Member

    The QFunction and Command Line Options links point to this same page....

    See also QFunction and Command Line Options for more info on Queue options.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Hi @grumblr, sorry about the dead links, I'll fix them asap. The articles they refer to should be in the Developer Zone.

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Hi @elisa1507, I just realized I never answered your question. Sorry about that, it must have slipped through my net. Did you find the solution to your problem or do you still need help with that?

    Geraldine Van der Auwera, PhD

  • chukhmanchukhman Posts: 5Member

    Hi all, I ran the above tutorial and received the specified output but I'm not sure how to interpret it. The ExampleCountReads-1.out file seems error free but the ExampleCountReads.jobreport.txt file only contains the line "#:GATKReport.v1.1:0" and nothing else. Also, the ExampleCountReads.jobreport.pdf file is unreadable. The warning "RScriptExecutor - RScript exited with 1" bothers me and upon rerunning with -l DEBUG, it shows several issues with R packages having functions masked (not sure what that means) and the exit status 1 seems to be caused by some "argument 1 is not a vector". Is this all the correct behavior or are these issues really problems that I need to worry about? Thanks for your help!

    Morris Chukhman, MS UIC Bioinformatics

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Hi Morris,

    It sounds like your analysis run went fine but it's the peripheral reporting that screwed up. Can you post the contents of the ExampleCountReads-1.out file to be sure? Also, do you know if you have gsalib installed?

    Geraldine Van der Auwera, PhD

  • chukhmanchukhman Posts: 5Member

    Thanks Geraldine for you reply!

    Here is the contents of ExampleCountReads-1.out:

    INFO  15:24:31,080 GenomeAnalysisEngine - Strictness is SILENT
    INFO  15:24:31,083 ReferenceDataSource - Dict file /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleFASTA.dict does not exist. Trying to create it now.
    [Tue Feb 05 15:24:31 CST 2013] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleFASTA.fasta OUTPUT=/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/dict3620772975149938405.tmp    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
    [Tue Feb 05 15:24:31 CST 2013] Executing as pkanabar@nike.structure.uic.edu on Linux 2.6.32-279.1.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.6.0_17-b04; Picard version: null
    [Tue Feb 05 15:24:31 CST 2013] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
    Runtime.totalMemory()=244187136
    INFO  15:24:31,406 GenomeAnalysisEngine - Downsampling Settings: No downsampling
    INFO  15:24:31,415 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO  15:24:31,428 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
    INFO  15:24:31,461 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO  15:24:31,461 ProgressMeter -        Location processed.reads  runtime per.1M.reads completed total.runtime remaining
    INFO  15:24:31,517 ReadShardBalancer$1 - Loading BAM index data for next contig
    INFO  15:24:31,521 ReadShardBalancer$1 - Done loading BAM index data for next contig
    INFO  15:24:31,540 ReadShardBalancer$1 - Loading BAM index data for next contig
    INFO  15:24:31,549 Walker - [REDUCE RESULT] Traversal result is: 33
    INFO  15:24:31,551 ProgressMeter -            done        3.30e+01    0.1 s       44.9 m     97.3%         0.1 s     0.0 s
    INFO  15:24:31,552 ProgressMeter - Total runtime 0.09 secs, 0.00 min, 0.00 hours
    INFO  15:24:31,669 MicroScheduler - 0 reads were filtered out during traversal out of 33 total (0.00%)
    INFO  15:24:32,547 GATKRunReport - Uploaded run statistics report to AWS S3
    ~

    It seems to be working properly since that is exactly what the sample output in the GATK tutorial looks like.

    Here is the output when I run the whole Queue.jar job and the command that I used:

    java -Djava.io.tmpdir=tmp -jar /data1/rhel60/gatk_git20130205/dist/Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run -l DEBUG

    INFO  11:06:34,442 QScriptManager - Compiling 1 QScript
    DEBUG 11:06:34,446 QScriptManager - Compilation directory: /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/tmp/Q-Classes-5894914949320226077
    INFO  11:06:38,335 QScriptManager - Compilation complete
    INFO  11:06:38,578 HelpFormatter - ----------------------------------------------------------------------
    INFO  11:06:38,578 HelpFormatter - Queue vexported, Compiled 2013/02/06 15:30:41
    INFO  11:06:38,578 HelpFormatter - Copyright (c) 2012 The Broad Institute
    INFO  11:06:38,578 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    DEBUG 11:06:38,578 HelpFormatter - Current directory: /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk
    INFO  11:06:38,579 HelpFormatter - Program Args: -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run -l DEBUG
    INFO  11:06:38,579 HelpFormatter - Date/Time: 2013/02/11 11:06:38
    INFO  11:06:38,579 HelpFormatter - ----------------------------------------------------------------------
    INFO  11:06:38,579 HelpFormatter - ----------------------------------------------------------------------
    INFO  11:06:38,587 QCommandLine - Scripting ExampleCountReads
    DEBUG 11:06:38,635 QGraph - adding QNode: 0
    INFO  11:06:38,644 QCommandLine - Added 1 functions
    INFO  11:06:38,645 QGraph - Generating graph.
    INFO  11:06:38,659 QGraph - Running jobs.
    INFO  11:06:38,663 QGraph - -------
    INFO  11:06:38,676 QGraph - Done:     'java'  '-Xmx1024m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/tmp'  '-cp' '/data1/rhel60/gatk_git20130205/dist/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'CountReads'  '-I' '/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleBAM.bam'  '-R' '/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleFASTA.fasta'
    DEBUG 11:06:38,676 QGraph - Inputs:  List(/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleBAM.bai, /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleBAM.bam, /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleBAM.bam.bai, /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleFASTA.fasta)
    DEBUG 11:06:38,676 QGraph - Outputs: List(/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads-1.out)
    DEBUG 11:06:38,677 QGraph - Done+:   List(/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/.ExampleCountReads-1.out.done)
    DEBUG 11:06:38,677 QGraph - Done-:   List()
    DEBUG 11:06:38,677 QGraph - CmdDir:  /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk
    DEBUG 11:06:38,677 QGraph - Temp?:   false
    DEBUG 11:06:38,678 QGraph - Prev:    none (reset = false)
    INFO  11:06:38,678 QGraph - Log:     /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads-1.out
    INFO  11:06:38,685 QGraph - 0 Pend, 0 Run, 0 Fail, 1 Done
    INFO  11:06:38,687 QCommandLine - Script failed with 1 total jobs
    INFO  11:06:38,687 QCommandLine - Writing final jobs report...
    INFO  11:06:38,687 QJobsReporter - Writing JobLogging GATKReport to file /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads.jobreport.txt
    INFO  11:06:38,698 QJobsReporter - Plotting JobLogging GATKReport to file /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads.jobreport.pdf
    DEBUG 11:06:38,709 RScriptExecutor - Executing:
    DEBUG 11:06:38,709 RScriptExecutor -   Rscript
    DEBUG 11:06:38,709 RScriptExecutor -   -e
    DEBUG 11:06:38,709 RScriptExecutor -   tempLibDir = '/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/tmp/Rlib.4673101731368374405';install.packages(pkgs=c('/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/tmp/gsalib.tar.6822779882938808174.gz'), lib=tempLibDir, repos=NULL, type='source', INSTALL_opts=c('--no-libs', '--no-data', '--no-help', '--no-demo', '--no-exec'));library('gsalib', lib.loc=tempLibDir);source('/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/tmp/queueJobReport.6906968465526462577.R');
    DEBUG 11:06:38,710 RScriptExecutor -   /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads.jobreport.txt
    DEBUG 11:06:38,710 RScriptExecutor -   /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads.jobreport.pdf
    * installing *source* package âgsalibâ ...
    ** Creating default NAMESPACE file
    ** R
    ** preparing package for lazy loading
    ** building package indices
    ** testing if installed package can be loaded
    
    * DONE (gsalib)
    Loading required package: methods
    Loading required package: gtools
    Loading required package: gdata
    gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
    
    gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
    
    Attaching package: âgdataâ
    
    The following object(s) are masked from âpackage:statsâ:
    
        nobs
    
    The following object(s) are masked from âpackage:utilsâ:
    
        object.size
    
    Loading required package: caTools
    Loading required package: grid
    Loading required package: KernSmooth
    KernSmooth 2.23 loaded
    Copyright M. P. Wand 1997-2009
    Loading required package: MASS
    
    Attaching package: âgplotsâ
    
    The following object(s) are masked from âpackage:statsâ:
    
        lowess
    
    Loading required package: plyr
    
    Attaching package: âreshapeâ
    
    The following object(s) are masked from âpackage:plyrâ:
    
        rename, round_any
    
    [1] "Report"
    [1] "Project          : /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads.jobreport.txt"
    Error in order(allJobs$analysisName, allJobs$startTime, decreasing = T) :
      argument 1 is not a vector
    Calls: source ... withVisible -> eval -> eval -> plotJobsGantt -> order
    Execution halted
    DEBUG 11:06:42,668 RScriptExecutor - Result: 1
    WARN  11:06:42,668 RScriptExecutor - RScript exited with 1
    DEBUG 11:06:42,674 IOUtils - Deleted /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/tmp/Q-Classes-5894914949320226077
    

    It doesn't seem to be complaining about 'gsalib' in particular but the objects masked from the packages seem a bit odd. The failure seems to be in plotJobsGantt but I'm not sure if its the app itself or something upstream that is causing the failure.

    Thanks so much for helping us debug this!

    Cheers!

    Morris

  • chukhmanchukhman Posts: 5Member

    Thanks Geraldine for you reply!

    Here is the contents of ExampleCountReads-1.out:

    INFO  15:24:31,080 GenomeAnalysisEngine - Strictness is SILENT
    INFO  15:24:31,083 ReferenceDataSource - Dict file /dry_run_gatk/exampleFASTA.dict does not exist. Trying to create it now.
    [Tue Feb 05 15:24:31 CST 2013] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/dry_run_gatk/exampleFASTA.fasta OUTPUT=/dry_run_gatk/dict3620772975149938405.tmp    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
    [Tue Feb 05 15:24:31 CST 2013] Executing as pkanabar@nike.structure.uic.edu on Linux 2.6.32-279.1.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.6.0_17-b04; Picard version: null
    [Tue Feb 05 15:24:31 CST 2013] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
    Runtime.totalMemory()=244187136
    INFO  15:24:31,406 GenomeAnalysisEngine - Downsampling Settings: No downsampling
    INFO  15:24:31,415 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO  15:24:31,428 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
    INFO  15:24:31,461 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO  15:24:31,461 ProgressMeter -        Location processed.reads  runtime per.1M.reads completed total.runtime remaining
    INFO  15:24:31,517 ReadShardBalancer$1 - Loading BAM index data for next contig
    INFO  15:24:31,521 ReadShardBalancer$1 - Done loading BAM index data for next contig
    INFO  15:24:31,540 ReadShardBalancer$1 - Loading BAM index data for next contig
    INFO  15:24:31,549 Walker - [REDUCE RESULT] Traversal result is: 33
    INFO  15:24:31,551 ProgressMeter -            done        3.30e+01    0.1 s       44.9 m     97.3%         0.1 s     0.0 s
    INFO  15:24:31,552 ProgressMeter - Total runtime 0.09 secs, 0.00 min, 0.00 hours
    INFO  15:24:31,669 MicroScheduler - 0 reads were filtered out during traversal out of 33 total (0.00%)
    INFO  15:24:32,547 GATKRunReport - Uploaded run statistics report to AWS S3
    ~

    It seems to be working properly since that is exactly what the sample output in the GATK tutorial looks like.

    Here is the output when I run the whole Queue.jar job and the command that I used:

    java -Djava.io.tmpdir=tmp -jar /data1/rhel60/gatk_git20130205/dist/Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run -l DEBUG

    INFO  11:06:34,442 QScriptManager - Compiling 1 QScript
    DEBUG 11:06:34,446 QScriptManager - Compilation directory: /dry_run_gatk/tmp/Q-Classes-5894914949320226077
    INFO  11:06:38,335 QScriptManager - Compilation complete
    INFO  11:06:38,578 HelpFormatter - ----------------------------------------------------------------------
    INFO  11:06:38,578 HelpFormatter - Queue vexported, Compiled 2013/02/06 15:30:41
    INFO  11:06:38,578 HelpFormatter - Copyright (c) 2012 The Broad Institute
    INFO  11:06:38,578 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    DEBUG 11:06:38,578 HelpFormatter - Current directory: /dry_run_gatk
    INFO  11:06:38,579 HelpFormatter - Program Args: -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run -l DEBUG
    INFO  11:06:38,579 HelpFormatter - Date/Time: 2013/02/11 11:06:38
    INFO  11:06:38,579 HelpFormatter - ----------------------------------------------------------------------
    INFO  11:06:38,579 HelpFormatter - ----------------------------------------------------------------------
    INFO  11:06:38,587 QCommandLine - Scripting ExampleCountReads
    DEBUG 11:06:38,635 QGraph - adding QNode: 0
    INFO  11:06:38,644 QCommandLine - Added 1 functions
    INFO  11:06:38,645 QGraph - Generating graph.
    INFO  11:06:38,659 QGraph - Running jobs.
    INFO  11:06:38,663 QGraph - -------
    INFO  11:06:38,676 QGraph - Done:     'java'  '-Xmx1024m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/dry_run_gatk/tmp'  '-cp' '/data1/rhel60/gatk_git20130205/dist/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'CountReads'  '-I' '/dry_run_gatk/exampleBAM.bam'  '-R' '/dry_run_gatk/exampleFASTA.fasta'
    DEBUG 11:06:38,676 QGraph - Inputs:  List(/dry_run_gatk/exampleBAM.bai, /dry_run_gatk/exampleBAM.bam, /dry_run_gatk/exampleBAM.bam.bai, /dry_run_gatk/exampleFASTA.fasta)
    DEBUG 11:06:38,676 QGraph - Outputs: List(/dry_run_gatk/ExampleCountReads-1.out)
    DEBUG 11:06:38,677 QGraph - Done+:   List(/dry_run_gatk/.ExampleCountReads-1.out.done)
    DEBUG 11:06:38,677 QGraph - Done-:   List()
    DEBUG 11:06:38,677 QGraph - CmdDir:  /dry_run_gatk
    DEBUG 11:06:38,677 QGraph - Temp?:   false
    DEBUG 11:06:38,678 QGraph - Prev:    none (reset = false)
    INFO  11:06:38,678 QGraph - Log:     /dry_run_gatk/ExampleCountReads-1.out
    INFO  11:06:38,685 QGraph - 0 Pend, 0 Run, 0 Fail, 1 Done
    INFO  11:06:38,687 QCommandLine - Script failed with 1 total jobs
    INFO  11:06:38,687 QCommandLine - Writing final jobs report...
    INFO  11:06:38,687 QJobsReporter - Writing JobLogging GATKReport to file /dry_run_gatk/ExampleCountReads.jobreport.txt
    INFO  11:06:38,698 QJobsReporter - Plotting JobLogging GATKReport to file /dry_run_gatk/ExampleCountReads.jobreport.pdf
    DEBUG 11:06:38,709 RScriptExecutor - Executing:
    DEBUG 11:06:38,709 RScriptExecutor -   Rscript
    DEBUG 11:06:38,709 RScriptExecutor -   -e
    DEBUG 11:06:38,709 RScriptExecutor -   tempLibDir = '/dry_run_gatk/tmp/Rlib.4673101731368374405';install.packages(pkgs=c('/dry_run_gatk/tmp/gsalib.tar.6822779882938808174.gz'), lib=tempLibDir, repos=NULL, type='source', INSTALL_opts=c('--no-libs', '--no-data', '--no-help', '--no-demo', '--no-exec'));library('gsalib', lib.loc=tempLibDir);source('/dry_run_gatk/tmp/queueJobReport.6906968465526462577.R');
    DEBUG 11:06:38,710 RScriptExecutor -   /dry_run_gatk/ExampleCountReads.jobreport.txt
    DEBUG 11:06:38,710 RScriptExecutor -   /dry_run_gatk/ExampleCountReads.jobreport.pdf
    * installing *source* package âgsalibâ ...
    ** Creating default NAMESPACE file
    ** R
    ** preparing package for lazy loading
    ** building package indices
    ** testing if installed package can be loaded
    
    * DONE (gsalib)
    Loading required package: methods
    Loading required package: gtools
    Loading required package: gdata
    gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
    
    gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
    
    Attaching package: âgdataâ
    
    The following object(s) are masked from âpackage:statsâ:
    
        nobs
    
    The following object(s) are masked from âpackage:utilsâ:
    
        object.size
    
    Loading required package: caTools
    Loading required package: grid
    Loading required package: KernSmooth
    KernSmooth 2.23 loaded
    Copyright M. P. Wand 1997-2009
    Loading required package: MASS
    
    Attaching package: âgplotsâ
    
    The following object(s) are masked from âpackage:statsâ:
    
        lowess
    
    Loading required package: plyr
    
    Attaching package: âreshapeâ
    
    The following object(s) are masked from âpackage:plyrâ:
    
        rename, round_any
    
    [1] "Report"
    [1] "Project          : /dry_run_gatk/ExampleCountReads.jobreport.txt"
    Error in order(allJobs$analysisName, allJobs$startTime, decreasing = T) :
      argument 1 is not a vector
    Calls: source ... withVisible -> eval -> eval -> plotJobsGantt -> order
    Execution halted
    DEBUG 11:06:42,668 RScriptExecutor - Result: 1
    WARN  11:06:42,668 RScriptExecutor - RScript exited with 1
    DEBUG 11:06:42,674 IOUtils - Deleted /dry_run_gatk/tmp/Q-Classes-5894914949320226077
    

    It doesn't seem to be complaining about 'gsalib' in particular but the objects masked from the packages seem a bit odd. The failure seems to be in plotJobsGantt but I'm not sure if its the app itself or something upstream that is causing the failure.

    Thanks so much for helping us debug this!

    Cheers!

    Morris

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Hi Morris,

    OK, your analysis job definitely executed correctly. What is screwing up is just Queue's reporting about the job(s) that it ran, which is annoying but not of real importance. I think the failure may be linked to a bug in the reporting system which we've fixed in our development version. You can safely ignore this error for now; if it persists in the next version (2.4, estimated for release next week) let us know in this thread.

    Geraldine Van der Auwera, PhD

  • chukhmanchukhman Posts: 5Member

    The same error occurs both with the 2.3.9 tarball as well as the version on github. Is the dev version different thatn the github version?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    That's correct, the dev version is different and is currently not available to the public. The github version is the last stable version we released, and is the same thing as the tarball. We're in the process of changing our release workflow and may in the near future start providing nightly builds of the dev source; but right now that's just not possible, sorry.

    Geraldine Van der Auwera, PhD

  • chukhmanchukhman Posts: 5Member

    Has 2.4 been released yet? The downloads page still links to 2.3.9.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Not yet -- we're planning on releasing it Monday if all goes well.

    Geraldine Van der Auwera, PhD

  • AlessandroAlessandro Posts: 3Member

    Hi Geraldine, I have a problem running of the "dry run" pre-analysis that you suggest. I've read the comments above, but none seemed to help my case, so I post the command line that I've used and the error...thanks in advance!!!

    java -Xmx10g -jar /path/directory/2.4-9/Queue.jar --temp_directory /path/directory/tmp_processes/ -S ExampleCountReads.scala -R /path/directory/reference_sorted_normalized.fasta -I input.bam

    INFO 11:41:27,268 QScriptManager - Compiling 1 QScript ERROR 11:41:27,274 QScriptManager - IO error while decoding ExampleCountReads.scala with UTF-8 Please try specifying another one using the -encoding option ERROR 11:41:27,275 QScriptManager - one error found

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    org.broadinstitute.sting.queue.QException: Compile of ExampleCountReads.scala failed with 1 error at org.broadinstitute.sting.queue.QScriptManager.loadScripts(QScriptManager.scala:71) at org.broadinstitute.sting.queue.QCommandLine.org$broadinstitute$sting$queue$QCommandLine$$qScriptPluginManager(QCommandLine.scala:95) at org.broadinstitute.sting.queue.QCommandLine.getArgumentSources(QCommandLine.scala:227) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:202) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:62) at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.4-9-g532efad):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Compile of ExampleCountReads.scala failed with 1 error
    ERROR ------------------------------------------------------------------------------------------

    INFO 11:41:27,348 QCommandLine - Shutting down jobs. Please wait...

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Hi Alessandro, this is actually just telling you that it didn't find the scala script you specified. Unlike the regular GATK commands where you just give the tool name, with Queue scripts you need to provide the full path to the script relative to your working directory.

    Geraldine Van der Auwera, PhD

  • AlessandroAlessandro Posts: 3Member

    Fantastic, dry run successfully executed! Thank you so much!

  • blueskypyblueskypy Posts: 228Member

    hi, I'm getting the following error. Could someone help me? Thanks a lot!

    [usnee1-lph001-n066 42] ~ $ ls
    R  script  seqs  test
    [14:04 0.04]
    [usnee1-lph001-n066 43] ~ $   java -Djava.io.tmpdir=tmp -jar $queue_jar -S ./seqs/softwares/Queue-2.5-2/resources/ExampleCountReads.scala -R /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleFASTA.fasta -I /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleBAM.bam -l DEBUG -run
    INFO  14:05:29,297 QScriptManager - Compiling 1 QScript
    DEBUG 14:05:29,298 QScriptManager - Compilation directory: /site/ne/home/cuiji01/tmp/Q-Classes-568453805836268123
    INFO  14:05:32,089 QScriptManager - Compilation complete
    INFO  14:05:32,193 HelpFormatter - ----------------------------------------------------------------------
    INFO  14:05:32,193 HelpFormatter - Queue v2.5-2-gf57256b, Compiled 2013/05/01 09:29:04
    INFO  14:05:32,193 HelpFormatter - Copyright (c) 2012 The Broad Institute
    INFO  14:05:32,193 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    DEBUG 14:05:32,194 HelpFormatter - Current directory: /site/ne/home/cuiji01
    INFO  14:05:32,194 HelpFormatter - Program Args: -S ./seqs/softwares/Queue-2.5-2/resources/ExampleCountReads.scala -R /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleFASTA.fasta -I /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleBAM.bam -l DEBUG -run
    INFO  14:05:32,194 HelpFormatter - Date/Time: 2013/05/06 14:05:32
    INFO  14:05:32,194 HelpFormatter - ----------------------------------------------------------------------
    INFO  14:05:32,194 HelpFormatter - ----------------------------------------------------------------------
    INFO  14:05:32,202 QCommandLine - Scripting ExampleCountReads
    DEBUG 14:05:32,259 QGraph - adding QNode: 0
    INFO  14:05:32,268 QCommandLine - Added 1 functions
    INFO  14:05:32,269 QGraph - Generating graph.
    INFO  14:05:32,279 QGraph - Running jobs.
    INFO  14:05:32,281 QGraph - -------
    INFO  14:05:32,289 QGraph - Done:     'java'  '-Xmx1024m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/site/ne/home/cuiji01/tmp'  '-cp' '/site/ne/home/cuiji01/seqs/softwares/Queue-2.5-2/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'CountReads'  '-I' '/site/ne/app/x86_64/gatk/v2.4.9/resources/exampleBAM.bam'  '-R' '/site/ne/app/x86_64/gatk/v2.4.9/resources/exampleFASTA.fasta'
    DEBUG 14:05:32,289 QGraph - Inputs:  List(/site/ne/app/x86_64/gatk/v2.4.9/resources/exampleBAM.bai, /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleBAM.bam, /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleBAM.bam.bai, /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleFASTA.fasta)
    DEBUG 14:05:32,289 QGraph - Outputs: List(/site/ne/home/cuiji01/ExampleCountReads-1.out)
    DEBUG 14:05:32,290 QGraph - Done+:   List(/site/ne/home/cuiji01/.ExampleCountReads-1.out.done)
    DEBUG 14:05:32,290 QGraph - Done-:   List()
    DEBUG 14:05:32,290 QGraph - CmdDir:  /site/ne/home/cuiji01
    DEBUG 14:05:32,290 QGraph - Temp?:   false
    DEBUG 14:05:32,290 QGraph - Prev:    none (reset = false)
    INFO  14:05:32,290 QGraph - Log:     /site/ne/home/cuiji01/ExampleCountReads-1.out
    INFO  14:05:32,295 QGraph - 0 Pend, 0 Run, 0 Fail, 1 Done
    INFO  14:05:32,296 QCommandLine - Writing final jobs report...
    INFO  14:05:32,296 QJobsReporter - Writing JobLogging GATKReport to file /site/ne/home/cuiji01/ExampleCountReads.jobreport.txt
    INFO  14:05:32,310 QJobsReporter - Plotting JobLogging GATKReport to file /site/ne/home/cuiji01/ExampleCountReads.jobreport.pdf
    DEBUG 14:05:32,344 RScriptExecutor - Executing:
    DEBUG 14:05:32,345 RScriptExecutor -   Rscript
    DEBUG 14:05:32,345 RScriptExecutor -   -e
    DEBUG 14:05:32,345 RScriptExecutor -   tempLibDir = '/site/ne/home/cuiji01/tmp/Rlib.8304889352133617132';install.packages(pkgs=c('/site/ne/home/cuiji01/tmp/RlibSources.3490673511527363174/gsalib'), lib=tempLibDir, repos=NULL, type='source', INSTALL_opts=c('--no-libs', '--no-data', '--no-help', '--no-demo', '--no-exec'));library('gsalib', lib.loc=tempLibDir);source('/site/ne/home/cuiji01/tmp/queueJobReport.2164983308823639078.R');
    DEBUG 14:05:32,345 RScriptExecutor -   /site/ne/home/cuiji01/ExampleCountReads.jobreport.txt
    DEBUG 14:05:32,345 RScriptExecutor -   /site/ne/home/cuiji01/ExampleCountReads.jobreport.pdf
    * installing *source* package âgsalibâ ...
    ** Creating default NAMESPACE file
    ** R
    ** preparing package for lazy loading
    ** building package indices ...
    ** testing if installed package can be loaded
    
    * DONE (gsalib)
    Loading required package: methods
    Loading required package: gtools
    Loading required package: gdata
    gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
    
    gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
    
    Attaching package: âgdataâ
    
    The following object(s) are masked from âpackage:statsâ:
    
        nobs
    
    The following object(s) are masked from âpackage:utilsâ:
    
        object.size
    
    Loading required package: caTools
    Loading required package: bitops
    Loading required package: grid
    Loading required package: KernSmooth
    KernSmooth 2.23 loaded
    Copyright M. P. Wand 1997-2009
    
    Attaching package: âgplotsâ
    
    The following object(s) are masked from âpackage:statsâ:
    
        lowess
    
    Loading required package: plyr
    
    Attaching package: âreshapeâ
    
    The following object(s) are masked from âpackage:plyrâ:
    
        rename, round_any
    
    [1] "Report"
    [1] "Project          : /site/ne/home/cuiji01/ExampleCountReads.jobreport.txt"
    Error in order(allJobs$analysisName, allJobs$startTime, decreasing = T) :
      argument 1 is not a vector
    Calls: source ... eval.with.vis -> eval.with.vis -> plotJobsGantt -> order
    Execution halted
    DEBUG 14:05:39,893 RScriptExecutor - Result: 1
    WARN  14:05:39,894 RScriptExecutor - RScript exited with 1
    INFO  14:05:39,930 QCommandLine - Script completed successfully with 1 total jobs
    DEBUG 14:05:39,953 IOUtils - Deleted /site/ne/home/cuiji01/tmp/Q-Classes-568453805836268123
    [14:05 0.32]
    [usnee1-lph001-n066 44] ~ $ ls
    ExampleCountReads.jobreport.pdf  ExampleCountReads.jobreport.txt  R  script  seqs  test  tmp
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Hi @blueskypy,

    We've seen this error from another user recently -- it looks like there's a software version issue that is affecting the generation of the job report plots. Unfortunately we don't have the resources to track down the exact issue right now, sorry. On the bright side you can ignore the rscript error, since it's not an issue with the Queue run, it's just the plot that summarizes the run info.

    Geraldine Van der Auwera, PhD

  • blueskypyblueskypy Posts: 228Member

    hi, Geraldine, Thanks for the help! At another thread, a user suggested the error was caused by outdated version of ggplot2. So I updated ggplot2, but still get the error. The file ExampleCountReads-1.out was not produced either, could you help me to find the reason?

    Thanks a lot!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Hi @blueskypy,

    Based on the output you posted earlier, the file should be there : /site/ne/home/cuiji01/ExampleCountReads-1.out. Is it not the case? Do you get a different "Outputs" line in your second run than in your first? Any error messages?

    Geraldine Van der Auwera, PhD

  • blueskypyblueskypy Posts: 228Member
    edited May 2013

    before I run the Queue:

    [usnee1-lph001-n066 42] ~ $ ls

    R script seqs test

    After

    [usnee1-lph001-n066 44] ~ $ ls

    ExampleCountReads.jobreport.pdf ExampleCountReads.jobreport.txt R script seqs test tmp

    Post edited by blueskypy on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Can you list hidden files to see if there is a .ExampleCountReads-1.out.done file there?

    Geraldine Van der Auwera, PhD

  • blueskypyblueskypy Posts: 228Member

    that's right! it's there but it's empty!

  • blueskypyblueskypy Posts: 228Member
    edited May 2013

    the ExampleCountReads.jobreport.pdf cannot be opened either, the error says there is no page. Also very little content in the 3rd file:

    [usnee1-lph001-n066 74] ~ $ more ExampleCountReads.jobreport.txt
    
    #:GATKReport.v1.1:0
    
    Post edited by blueskypy on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    That file tells Queue that the job has already been successfully completed, and it doesn't need to do it again. This is useful for bigger jobs, to be able to resume after a failure without redoing all the work that has already successfully completed. You can either delete the .done file, or add -startFromScratch to the Queue command line to override it.

    Geraldine Van der Auwera, PhD

  • blueskypyblueskypy Posts: 228Member

    hi, Geraldine, Good news! I deleted the .ExampleCountReads-1.out.done and re-run the Queue. And this time everything works fine and the output looks correct as well.

    So I think the error maybe indeed was due to outdated ggplot2. But in my previous runs, even if I updated ggplot2, I didn't delete the old 'done' file so the Queue didn't really run and I still got the same error msg. Is my understanding right?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Great, I'm glad to hear that! Good to know about the ggplot2 version, thanks for reporting your solution.

    Yes, I believe that's correct -- the "failure" of your second run was due to the leftover .done file telling Queue not to do anything. This generated an empty table in the job report (since nothing was done) so you got the same error (rscript couldn't run) for a slightly different reason.

    Geraldine Van der Auwera, PhD

  • blueskypyblueskypy Posts: 228Member
    edited May 2013

    Thanks Geraldine for your help! You may want to provide the solution to this thread: http://gatk.vanillaforums.com/discussion/2467/install-gsalib

    I was going to post the suggestion but somehow have a problem to login using google on that page.

    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Done, thanks for pointing it out! FYI the problem you encountered on that page is that it uses an older URL format for the forum, which affects some of our older articles; you should be able to access it normally by changing "https" to "http" in the link.

    Geraldine Van der Auwera, PhD

  • blueskypyblueskypy Posts: 228Member

    hi, Geraldine, I wonder if I can ask another question. Is the '-jobRunner GridEngine' option same as using the following?

    bsub java -Djava.io.tmpdir=tmp -jar Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Hi @blueskypy,

    That option is used to specify which job runner your cluster/server uses for job management. I can't tell you the details of the syntax used with GridEngine as that's not what we use in-house, but we do have other users around who use it -- hopefully they will jump in to contribute their experience.

    Geraldine Van der Auwera, PhD

  • blueskypyblueskypy Posts: 228Member

    how to specify an output dir for CountReads?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    If you're running CountReads on its own, it will always output the result to stdout. If you're running it via Queue, it depends on how the scala script is set up. In the example script it's predetermined, but you can either change the hardcoded default, or add an argument to the script to set it from the command line.

    Geraldine Van der Auwera, PhD

  • adouble2adouble2 Posts: 12Member

    Hi, I think the link at the bottom "Queue with Grid Engine" should point to: http://www.broadinstitute.org/gatk/guide/article?id=1313 and I think the QFunction link and Command Line options should both now point to: broadinstitute.org/gatk/guide/article?id=1311

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    @adouble2, that's correct. Thanks for reporting the missing links, I've added them to the document.

    Geraldine Van der Auwera, PhD

  • AdrianVeresAdrianVeres HarvardPosts: 1Member
    edited July 2013

    I'm getting an error that is preventing from submitting LSF jobs using Queue.

    ERROR 17:15:19,132 FunctionEdge - Error: echo hello world 
    scala.MatchError: M (of class java.lang.String)
        at org.broadinstitute.sting.queue.engine.lsf.Lsf706JobRunner$.unitDivisor(Lsf706JobRunner.scala:409)
        at org.broadinstitute.sting.queue.engine.lsf.Lsf706JobRunner$.org$broadinstitute$sting$queue$engine$lsf$Lsf706JobRunner$$convertUnits(Lsf706JobRunner.scala:420)
        at org.broadinstitute.sting.queue.engine.lsf.Lsf706JobRunner.start(Lsf706JobRunner.scala:99)
        at org.broadinstitute.sting.queue.engine.FunctionEdge.start(FunctionEdge.scala:84)
        at org.broadinstitute.sting.queue.engine.QGraph.runJobs(QGraph.scala:434)
        at org.broadinstitute.sting.queue.engine.QGraph.run(QGraph.scala:156)
        at org.broadinstitute.sting.queue.QCommandLine.execute(QCommandLine.scala:171)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
        at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:62)
        at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
    

    This happens using HelloWorld.scala from the GitHub repo, but also running any other script with the LSF JobRunner.

    This error does not occur during dry runs, or in runs using the shell JobRunner. Using -bsub, this error occurs wether I specify reasonable -jobQueue, -memLimit, -resMemLimit, -resMemReq or not. I am on an LSF 8.0.1 cluster, as shown by lsid.

    Platform LSF 8.0.1, Jun 13 2011
    Copyright 1992-2011 Platform Computing Corporation
    
    My cluster name is cmucluster
    My master name is cmulsf
    

    I am using Queue 2.6-4, this is the context prior to the error.

    [x-removed]$ java -jar Queue-2.6-4-g3e5ff60/Queue.jar -S queue/HelloWorld.scala -l DEBUG -jobQueue short -bsub -startFromScratch -run
    INFO  17:15:15,244 QScriptManager - Compiling 1 QScript 
    DEBUG 17:15:15,245 QScriptManager - Compilation directory: /tmp/Q-Classes-2975930027453708515 
    INFO  17:15:18,723 QScriptManager - Compilation complete 
    INFO  17:15:18,778 HelpFormatter - ---------------------------------------------------------------------- 
    INFO  17:15:18,779 HelpFormatter - Queue v2.6-4-g3e5ff60, Compiled 2013/06/24 14:50:50 
    INFO  17:15:18,779 HelpFormatter - Copyright (c) 2012 The Broad Institute 
    INFO  17:15:18,779 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    DEBUG 17:15:18,779 HelpFormatter - Current directory: /x/x 
    INFO  17:15:18,779 HelpFormatter - Program Args: -S queue/HelloWorld.scala -l DEBUG -jobQueue short -bsub -startFromScratch -run 
    INFO  17:15:18,779 HelpFormatter - Date/Time: 2013/07/10 17:15:18 
    INFO  17:15:18,779 HelpFormatter - ---------------------------------------------------------------------- 
    INFO  17:15:18,780 HelpFormatter - ---------------------------------------------------------------------- 
    INFO  17:15:18,785 QCommandLine - Scripting HelloWorld 
    DEBUG 17:15:18,796 QGraph - adding QNode: 0 
    INFO  17:15:18,802 QCommandLine - Added 1 functions 
    INFO  17:15:18,803 QGraph - Generating graph. 
    INFO  17:15:18,809 QGraph - Running jobs. 
    INFO  17:15:18,810 QGraph - Removing outputs from previous runs. 
    DEBUG 17:15:18,817 IOUtils - Deleted /x/x/.HelloWorld-1.out.fail 
    DEBUG 17:15:18,967 FunctionEdge - Starting: /x/x > echo hello world 
    INFO  17:15:18,968 FunctionEdge - Output written to /x/x/HelloWorld-1.out 
    DEBUG 17:15:19,089 IOUtils - Deleted /x/x/HelloWorld-1.out 
    DEBUG 17:15:19,124 IOUtils - Deleted /x/x/.queue/tmp/.exec7638084941875402342 
    ERROR 17:15:19,132 FunctionEdge - Error: echo hello world 
    
    Post edited by AdrianVeres on
  • galeanogaleano usPosts: 1Member

    Hi, Running this tutorial I have had the same problem that Olga reported. Some one have found a solution?

    thanks, Carlos

    @omedvedeva said: I can't perform a first dry run on Windows 7 with Queue 2.2.5. The installation seems to be correct since --help option works. It looks like it can't find the tmp directory that it creates at the correct location. The same problem occurs with QueueLite too. What am I missing? In the stack trace below fasta, bam and scala files were in the working directory:

    C:\GATK\Queue-2.2-5-g3bf5e3f>java -Djava.io.tmpdir=tmp -jar Queue.jar -S Example CountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam ERROR 10:17:34,493 QScriptManager - \GATK\Queue-2.2-5-g3bf5e3f\tmp\Q-Classes-80 75780960630530304 does not exist or is not a directory INFO 10:17:35,965 QScriptManager - Compiling 1 QScript INFO 10:17:40,538 QScriptManager - Compilation complete

    ...

    ERROR stack trace

    org.broadinstitute.sting.commandline.InvalidArgumentException: Argument with name 'R' isn't defined. at org.broadinstitute.sting.commandline.ParsingEngine.validate(ParsingEn gine.java:303) at org.broadinstitute.sting.commandline.ParsingEngine.validate(ParsingEn gine.java:276) at org.broadinstitute.sting.commandline.CommandLineProgram.start(Command LineProgram.java:204) at org.broadinstitute.sting.commandline.CommandLineProgram.start(Command LineProgram.java:146) at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala: 62) at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)

    ##### ERROR --------------------------------------------------------------------

    ERROR A GATK RUNTIME ERROR has occurred (version 2.2-5-g3bf5e3f):

    ...

    ERROR MESSAGE: Argument with name 'R' isn't defined.
    ERROR --------------------------------------------------------------------

    Thank you, Olga.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Hi @AdrianVeres and @galeano, sorry to respond so late. Unfortunately we're currently not able to provide support for Queue issues at the moment. The software is provided as is, and if you have different system configurations it's up to you to get it to work. You may need to ask for help from your IT department. Good luck!

    Geraldine Van der Auwera, PhD

  • Philipp79Philipp79 Posts: 2Member

    Hi Geraldine, I am trying to run GenomeStrip but got stuck at the second step, the "SVPreprocess" Queue script. The error prevents the compilation of the local SVQScript.q, SVPreprocess.q. I assume this is an R issue: running R version 3.0.2 (hence, an older one). It may have to do with the address specified in my script: $java -Xmx8g -cp $home/Queue/Queue.jar:$home/svtoolkit/SVToolkit.jar:$gatk_dir/GenomeAnalysisTK.jar org.broadinstitute.sting.queue.QCommandLine \ ...

    While the error refers to: org.broadinstitute.sv.queue.ComputeVCFPartitions

    I added the R-package "coin" already as suggested by a colleague of yours regarding a different compilation issue.

    Thanks for your help.

    The whole error is as follows:

    SLF4J: Class path contains multiple SLF4J bindings.

    SLF4J: Found binding in [jar:file:/home/user/NGS_2013_exp/Queue-2.7-4-g6f46d11/Queue.jar!/org/slf4j/impl/StaticLoggerBinder.class]

    SLF4J: Found binding in [jar:file:/home/user/NGS_2013_exp/GenomeAnalysisTK-2.7-4-g6f46d11/GenomeAnalysisTK.jar!/org/slf4j/impl/StaticLoggerBinder.class]

    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

    INFO 15:17:54,265 QScriptManager - Compiling 2 QScripts

    ERROR 15:17:56,229 QScriptManager - SVQScript.q:15: object queue is not a member of package org.broadinstitute.sv

    ERROR 15:17:56,269 QScriptManager - import org.broadinstitute.sv.queue.ComputeDiscoveryPartitions

    ERROR 15:17:56,270 QScriptManager - ^

    ERROR 15:17:56,271 QScriptManager - SVQScript.q:16: object queue is not a member of package org.broadinstitute.sv

    ERROR 15:17:56,305 QScriptManager - import org.broadinstitute.sv.queue.ComputeVCFPartitions

    ERROR 15:17:56,306 QScriptManager - ^

    ERROR 15:17:56,307 QScriptManager - SVQScript.q:17: object util is not a member of package org.broadinstitute.sv

    ERROR 15:17:56,341 QScriptManager - import org.broadinstitute.sv.util.GenomeInterval

    ERROR 15:17:56,342 QScriptManager - ^

    ERROR 15:17:57,896 QScriptManager - three errors found

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    org.broadinstitute.sting.queue.QException: Compile of /home/user/NGS_2013_exp/svtoolkit/qscript/SVPreprocess.q, /home/user/NGS_2013_exp/svtoolkit/qscript/SVQScript.q failed with 3 errors

    at org.broadinstitute.sting.queue.QScriptManager.loadScripts(QScriptManager.scala:71)

    at org.broadinstitute.sting.queue.QCommandLine.org$broadinstitute$sting$queue$QCommandLine$$qScriptPluginManager(QCommandLine.scala:95)

    at org.broadinstitute.sting.queue.QCommandLine.getArgumentSources(QCommandLine.scala:227)

    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:202)

    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)

    at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:62)

    at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.7-4-g6f46d11):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Compile of /home/user/NGS_2013_exp/svtoolkit/qscript/SVPreprocess.q, /home/user/NGS_2013_exp/svtoolkit/qscript/SVQScript.q failed with 3 errors
    ERROR ------------------------------------------------------------------------------------------

    INFO 15:17:57,968 QCommandLine - Shutting down jobs. Please wait...

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Hi @Philipp79,

    This looks more like a GATK/Queue/GenomeStrip version issue (you may be using versions of the different jars that are not compatible together) than anything to do with R. But this intro-level article's comments section is really not the right place to discuss this problem. Please post this question as a separate discussion, preferably in the GenomeStrip section of the forum.

    Geraldine Van der Auwera, PhD

  • wallysb01wallysb01 Posts: 3Member

    Hi everyone,

    I am trying to add the best practices options to the haplotypecaller scala script with no luck. I have zero scala experience but from basic pattern matching I've tried these additions:

    package org.broadinstitute.sting.queue.qscripts.examples

    import org.broadinstitute.sting.queue.QScript import org.broadinstitute.sting.queue.extensions.gatk._

    /** * An example building on the intro ExampleCountReads.scala. * Runs an INCOMPLETE variant calling pipeline with just the UnifiedGenotyper, VariantEval and optional VariantFiltration. * For a complete description of the suggested for a variant calling pipeline see the latest version of the Best Practice Variant Detection document */ class ExampleHaplotypeCaller extends QScript { // Create an alias 'qscript' to be able to access variables // in the ExampleHaplotypeCaller. // 'qscript' is now the same as 'ExampleHaplotypeCaller.this' qscript =>

    // Required arguments. All initialized to empty values.

    @Input(doc="The reference file for the bam files.", shortName="R") var referenceFile: File = _ // _ is scala shorthand for null

    @Input(doc="Bam file to genotype.", shortName="I") var bamFile: File = _

    // The following arguments are all optional.

    @Input(doc="An optional file with a list of intervals to proccess.", shortName="L", required=false) var intervals: File = _

    @Argument(doc="A optional list of filter names.", shortName="filter", required=false) var filterNames: List[String] = Nil // Nil is an empty List, versus null which means a non-existent List.

    @Argument(doc="An optional list of filter expressions.", shortName="filterExpression", required=false) var filterExpressions: List[String] = Nil

    @Argument(doc="The minimum phred-scaled confidence threshold at which variants should be called", fullName="standard_min_confidence_threshold_for_emitting", shortName="stand_call_conf", required=false) var standCallConf: Int = _

    @Argument(doc="The minimum phred-scaled confidence threshold at which variants should be emitted", fullName="standard_min_confidence_threshold_for_calling", shortName="stand_emit_conf", required=false) var standEmitConf: Int = _

    @Argument(doc="Specifies how to determine the alternate alleles to use for genotyping (DISCOVERY|GENOTYPE_GIVEN_ALLELES)", fullName="genotyping_mode", shortName="gt_mode", required=false) var gtMode: List[String] = Nil

    // This trait allows us set the variables below in one place, // and then reuse this trait on each CommandLineGATK function below. trait UnifiedGenotyperArguments extends CommandLineGATK { this.reference_sequence = qscript.referenceFile this.intervals = if (qscript.intervals == null) Nil else List(qscript.intervals) this.standCallConf = Int(qscript.standEmitConf) this.standCallConf = Int(qscript.standCallConf) this.gtMode = List(qscript.gtMode) // Set the memory limit to 8 gigabytes on each command. this.memoryLimit = 8 }

    Then I use the following command:

    java -Xmx12g -jar ~/tools/Queue-2.7-4-g6f46d11/Queue.jar -S ../../Queue-2.7-4-g6f46d11/resources/ExampleHaplotypeCaller.scala -R exampleFASTA.fasta -I exampleBAM.bam -stand_emit_conf 10 -stand_call_conf 30 -gt_mode DISCOVERY -jobRunner PbsEngine -startFromScratch -jobQueue batch -memLimit 4

    And get the following error:

    INFO 02:00:26,722 QScriptManager - Compiling 1 QScript DEBUG 02:00:26,723 QScriptManager - Compilation directory: /tmp/Q-Classes-2057664968471242235 ERROR 02:00:27,856 QScriptManager - ExampleHaplotypeCaller.scala:86: value standard_min_confidence_threshold_for_emitting is not a member of ExampleHaplotypeCaller.this.UnifiedGenotyperArguments ERROR 02:00:27,859 QScriptManager - this.standard_min_confidence_threshold_for_emitting = qscript.standEmitConf ERROR 02:00:27,859 QScriptManager - ^ ERROR 02:00:27,868 QScriptManager - ExampleHaplotypeCaller.scala:87: value standard_min_confidence_threshold_for_calling is not a member of ExampleHaplotypeCaller.this.UnifiedGenotyperArguments ERROR 02:00:27,870 QScriptManager - this.standard_min_confidence_threshold_for_calling = qscript.standCallConf ERROR 02:00:27,870 QScriptManager - ^ ERROR 02:00:27,878 QScriptManager - ExampleHaplotypeCaller.scala:88: value genotypeing_mode is not a member of ExampleHaplotypeCaller.this.UnifiedGenotyperArguments ERROR 02:00:27,880 QScriptManager - this.genotypeing_mode = qscript.gtMode ERROR 02:00:27,880 QScriptManager - ^ ERROR 02:00:28,227 QScriptManager - three errors found

    Any ideas?

    Thanks for any help.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Hi there, you seem to have errors in the argument names (e.g. genotypeing_mode in misspelled). Are you using an IDE to develop your script? A good IDE will enable you to look up available argument names easily and reduce the chance of making such errors.

    Geraldine Van der Auwera, PhD

  • wallysb01wallysb01 Posts: 3Member

    Ok, I fixed that spelling mistake. And reran, but get the same message. I am looking into scala IDEs. Do you have a favorite?

    Also, do you know if this should work if I get the syntax right? I'm just trying to speed up HaplotypeCaller as its estimating a 17 day run time with just one 30x genome. For now I suppose 17 days isn't so bad, but I need to eventually add several more genomes.

    Thanks for the help

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    We use IntelliJ IDEA, it's convenient for developing Java & scala together.

    The script looks fine overall, assuming it includes the def script() section (which you didn't include in what you posted).

    Have a look at the slides of the workshop we held earlier this week; you can find a link in the announcements section, and we'll have a full info page up later today. The presentations will walk you through understanding the important parts of a QScript and how to modify it to suit your needs.

    Geraldine Van der Auwera, PhD

  • wallysb01wallysb01 Posts: 3Member

    it does have the def script(). I also tried working of another example that was very similar, but already had some modifications for the HaplotypeCaller. I've attached the full thing thus far if you want to check the def script() part. And I get the same errors with this script as well. So its got to something I'm missing.

    I'll take a look at those slides and keep checking back for the info page. Thanks for pointing those out.

    txt
    txt
    HaplotypeCaller.scala.txt
    6K
Sign In or Register to comment.