The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# MarkDuplicates Queue extension and intermediate files

Member, Dev

I was frustrated by the .metrics file from MarkDuplicates getting deleted as an intermediate file, so I set isIntermediate=false for that step in the DataProcessingPipeline. But now I'm getting tired of manually deleting the intermediate bams.

So my request is, could that field be changed from an @Output to an @Argument? This would be on line 50 of org.broadinstitute.sting.queue.extensions.picard.MarkDuplicates.scala. I also made that a required field in my local copy, since it is required to run the Picard tool.

A similar but opposite problem is that the bai file from the IndelRealigner step is not deleted - but that looks like it would require either special handling for that walker in Queue or for the index file to be an argument to the Java walker. Neither is a particularly appealing solution.

Tagged:

• Charlestown, MAMember

The data processing pipeline is really only intended as a reference for users to write their own pipeline. It was written many years ago, it's not maintained and does not necessary reflect our best practices. That being said, you are more than welcome to make changes and use it however you want. In particular, I think that the changes you suggest here are very sensible.

• Member, Dev

Thanks, but the problem is that the changes I'm talking about are in the Queue Picard extensions, not in DPP itself. I can work around it (my copy of DPP now includes a MyMarkDuplicates` class), but I thought this particular change, while making it more convenient for me, would also help other people trying to use this extension

• Charlestown, MAMember

You're right about the metrics file being required, I've changed it on MarkDuplicates.scala.

But why do you want to switch from @Output to @Argument?

If you don't want the metrics file to be deleted, just add an isIntermediate = false to the mark duplicates class.

• Member, Dev

The problem is that I only want certain outputs to be intermediate. In this specific case of DPP, I'm running clean-dedup-recal in that order. I want the bams that MarkDuplicates produces to be intermediate, but I want the metrics file to be permanent. With the system as it is now, I can either clear isIntermediate and delete the intermediate bams manually, or set isIntermediate and lose the duplication metrics.

So setting the metrics file as an @Argument is really just a hack to allow it to persist even when MarkDuplicates has isIntermediate set. Without a more granular intermediate system, it's the only solution I could come up with (although maybe the correct approach is to have an extra field in the @Output metadata that could override intermediate-ness). In the bigger picture, I think most use cases for that file involve manual review (as opposed to another step in a Queue pipeline) - though I could certainly be wrong about that. But even if I am, this file is much smaller than most of the .out files, so it's probably not going to hurt anything to have it persist