We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!
Qscript of Picard tools

Hi,
So I've finally taken the plunge and migrated our analysis pipeline to Queue. With some great feedback from @johandahlberg, I have gotten to a state where most of the stuff is running smoothly on the cluster.
I'm trying to add Picard's CalculateHSMetrics to the pipeline, but am having some issues. This code:
case class hsmetrics(inBam: File, baitIntervals: File, targetIntervals: File, outMetrics: File) extends CalculateHsMetrics with ExternalCommonArgs with SingleCoreJob with OneDayJob { @Input(doc="Input BAM file") val bam: File = inBam @Output(doc="Metrics file") val metrics: File = outMetrics this.input :+= bam this.targets = targetIntervals this.baits = baitIntervals this.output = metrics this.reference = refGenome this.isIntermediate = false }
Gives the following error message:
ERROR 06:56:25,047 QGraph - Missing 2 values for function: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/Users/dankle/IdeaProjects/eclipse/AutoSeq/.queue/tmp' null 'INPUT=/Users/dankle/tmp/autoseqscala/exampleIND2/exampleIND2.panel.bam' 'TMP_DIR=/Users/dankle/IdeaProjects/eclipse/AutoSeq/.queue/tmp' 'VALIDATION_STRINGENCY=SILENT' 'OUTPUT=/Users/dankle/tmp/autoseqscala/exampleIND2/exampleIND2.panel.preMarkDupsHsMetrics.metrics' 'BAIT_INTERVALS=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/exampleINTERVAL.intervals' 'TARGET_INTERVALS=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/exampleINTERVAL.intervals' 'REFERENCE_SEQUENCE=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/bwaindex0.6/exampleFASTA.fasta' 'METRIC_ACCUMULATION_LEVEL=SAMPLE' ERROR 06:56:25,048 QGraph - @Argument: jarFile - jar ERROR 06:56:25,049 QGraph - @Argument: javaMainClass - Main class to run from javaClasspath
And yeah, is seems that the jar file is currently set to null in the command line. However, MarkDuplicates runs fine without setting the jar:
case class dedup(inBam: File, outBam: File, metricsFile: File) extends MarkDuplicates with ExternalCommonArgs with SingleCoreJob with OneDayJob { @Input(doc = "Input bam file") var inbam = inBam @Output(doc = "Output BAM file with dups removed") var outbam = outBam this.REMOVE_DUPLICATES = true this.input :+= inBam this.output = outBam this.metrics = metricsFile this.memoryLimit = 3 this.isIntermediate = false }
Why does CalculateHSMetrics need the jar, but not MarkDuplicates? Both are imported with import org.broadinstitute.sting.queue.extensions.picard._
.
Best Answer
-
pdexheimer ✭✭✭✭
Hmm, that's interesting. It looks like the Queue extension for CalculateHsMetrics is missing a line. Try adding
this.javaMainClass = "net.sf.picard.analysis.directed.CalculateHsMetrics"
into your hsmetrics case class
Answers
Hmm, that's interesting. It looks like the Queue extension for CalculateHsMetrics is missing a line. Try adding
this.javaMainClass = "net.sf.picard.analysis.directed.CalculateHsMetrics"
into your hsmetrics case classHuh, so it is indeed missing the class line. I'll patch that in the codebase; in the meantime @pdexheimer's suggestion should work to get your script up and working. (thanks Phil!)
Hmm… I'm now getting this:
Any ideas?
Can you try running the Picard command (using the same inputs, parameters etc) directly from command line? This to check whether it's the Picard tool bugging out or Queue is misbehaving.
Got the same error. sigh It seems this is on picard. Thanks.
For future reference: The latter error happens when
METRIC_ACCUMULATION_LEVEL=SAMPLE
is set but no read groups are present in the BAM file. If the metric accumulation level is unset, or read groups are added, it runs fine. Sorry about the non-GATK-related part of this post, and thanks for the help identifying the initial issue.Ah, thanks for reporting your solution. Feel free to tell the Picard devs they need to add more graceful handling for that error case.
Update: we reported the issue to the Picard team, and they have developed a fix to handle this error case. Now, for any reads that are missing read group, there will be a row at whatever level of accumulation is requested with "unknown" in the appropriate columns.
Great work, both teams!