Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

Qscript of Picard tools

dklevebringdklevebring Posts: 53Member
edited November 2013 in Ask the team

Hi,

So I've finally taken the plunge and migrated our analysis pipeline to Queue. With some great feedback from @johandahlberg, I have gotten to a state where most of the stuff is running smoothly on the cluster.

I'm trying to add Picard's CalculateHSMetrics to the pipeline, but am having some issues. This code:

case class hsmetrics(inBam: File, baitIntervals: File, targetIntervals: File, outMetrics: File) extends CalculateHsMetrics with ExternalCommonArgs with SingleCoreJob with OneDayJob {
    @Input(doc="Input BAM file") val bam: File = inBam
    @Output(doc="Metrics file") val metrics: File = outMetrics
    this.input :+= bam
    this.targets = targetIntervals
    this.baits = baitIntervals
    this.output = metrics
    this.reference =  refGenome
    this.isIntermediate = false
}

Gives the following error message:

ERROR 06:56:25,047 QGraph - Missing 2 values for function:  'java'  '-Xmx2048m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/Users/dankle/IdeaProjects/eclipse/AutoSeq/.queue/tmp' null 'INPUT=/Users/dankle/tmp/autoseqscala/exampleIND2/exampleIND2.panel.bam'  'TMP_DIR=/Users/dankle/IdeaProjects/eclipse/AutoSeq/.queue/tmp'  'VALIDATION_STRINGENCY=SILENT'  'OUTPUT=/Users/dankle/tmp/autoseqscala/exampleIND2/exampleIND2.panel.preMarkDupsHsMetrics.metrics'  'BAIT_INTERVALS=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/exampleINTERVAL.intervals'  'TARGET_INTERVALS=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/exampleINTERVAL.intervals'  'REFERENCE_SEQUENCE=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/bwaindex0.6/exampleFASTA.fasta'  'METRIC_ACCUMULATION_LEVEL=SAMPLE'  
ERROR 06:56:25,048 QGraph -   @Argument: jarFile - jar 
ERROR 06:56:25,049 QGraph -   @Argument: javaMainClass - Main class to run from javaClasspath 

And yeah, is seems that the jar file is currently set to null in the command line. However, MarkDuplicates runs fine without setting the jar:

case class dedup(inBam: File, outBam: File, metricsFile: File) extends MarkDuplicates with ExternalCommonArgs with SingleCoreJob with OneDayJob {
    @Input(doc = "Input bam file") var inbam = inBam
    @Output(doc = "Output BAM file with dups removed") var outbam = outBam
    this.REMOVE_DUPLICATES = true
    this.input :+= inBam
    this.output = outBam
    this.metrics = metricsFile
    this.memoryLimit = 3
    this.isIntermediate = false
}

Why does CalculateHSMetrics need the jar, but not MarkDuplicates? Both are imported with import org.broadinstitute.sting.queue.extensions.picard._.

Post edited by dklevebring on

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,285Administrator, GSA Member admin

    Huh, so it is indeed missing the class line. I'll patch that in the codebase; in the meantime @pdexheimer's suggestion should work to get your script up and working. (thanks Phil!)

    Geraldine Van der Auwera, PhD

  • dklevebringdklevebring Posts: 53Member

    Hmm… I'm now getting this:

    [Wed Nov 20 16:37:13 CET 2013] net.sf.picard.analysis.directed.CalculateHsMetrics BAIT_INTERVALS=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/exampleINTERVAL.intervals TARGET_INTERVALS=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/exampleINTERVAL.intervals INPUT=/Users/dankle/tmp/autoseqscala/exampleIND2/exampleIND2.panel.bam OUTPUT=/Users/dankle/tmp/autoseqscala/exampleIND2/exampleIND2.panel.hsMetricsPreMarkDups.metrics METRIC_ACCUMULATION_LEVEL=[SAMPLE, ALL_READS] REFERENCE_SEQUENCE=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/bwaindex0.6/exampleFASTA.fasta TMP_DIR=[/Users/dankle/IdeaProjects/eclipse/AutoSeq/.queue/tmp] VALIDATION_STRINGENCY=SILENT    VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
    [Wed Nov 20 16:37:13 CET 2013] Executing as dankle@LM0004MEB.local on Mac OS X 10.8.5 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_40-b43; Picard version: 1.96(1534)
    [Wed Nov 20 16:37:13 CET 2013] net.sf.picard.analysis.directed.CalculateHsMetrics done. Elapsed time: 0,00 minutes.
    Runtime.totalMemory()=128974848
    To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
    Exception in thread "main" java.lang.NullPointerException
    at net.sf.picard.metrics.MultiLevelCollector$Distributor.acceptRecord(MultiLevelCollector.java:146)
    at net.sf.picard.metrics.MultiLevelCollector.acceptRecord(MultiLevelCollector.java:277)
    at net.sf.picard.analysis.directed.CollectTargetedMetrics.doWork(CollectTargetedMetrics.java:123)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    at net.sf.picard.analysis.directed.CalculateHsMetrics.main(CalculateHsMetrics.java:74)
    

    Any ideas?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,285Administrator, GSA Member admin

    Can you try running the Picard command (using the same inputs, parameters etc) directly from command line? This to check whether it's the Picard tool bugging out or Queue is misbehaving.

    Geraldine Van der Auwera, PhD

  • dklevebringdklevebring Posts: 53Member

    Got the same error. sigh It seems this is on picard. Thanks.

  • dklevebringdklevebring Posts: 53Member

    For future reference: The latter error happens when METRIC_ACCUMULATION_LEVEL=SAMPLE is set but no read groups are present in the BAM file. If the metric accumulation level is unset, or read groups are added, it runs fine. Sorry about the non-GATK-related part of this post, and thanks for the help identifying the initial issue.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,285Administrator, GSA Member admin

    Ah, thanks for reporting your solution. Feel free to tell the Picard devs they need to add more graceful handling for that error case.

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,285Administrator, GSA Member admin

    Update: we reported the issue to the Picard team, and they have developed a fix to handle this error case. Now, for any reads that are missing read group, there will be a row at whatever level of accumulation is requested with "unknown" in the appropriate columns.

    Geraldine Van der Auwera, PhD

  • dklevebringdklevebring Posts: 53Member
Sign In or Register to comment.