The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

# Picard tools MarkDuplicates using cram format ...How to pass A valid CRAM reference?

Member Posts: 9

Hello there!

I am trying to use picard tools to mark duplicates using a cram format file; however I could not find any documentation to address this problem. How can I pass a valid CRAM reference?

-lili

Sat Feb 06 16:09:42 CST 2016] picard.sam.markduplicates.MarkDuplicatesWithMateCigar MINIMUM_DISTANCE=250 INPUT=[/EXOME/gatk/test_10990_bwa_srtd.cram] OUTPUT=EXOME/gatk/test_10990_wes_dupMC.cram METRICS_FILE=/EXOME/gatk/test_10990_wes_dupMC_metrics.txt OPTICAL_DUPLICATE_PIXEL_DISTANCE=2500 CREATE_INDEX=true SKIP_PAIRS_WITH_NO_MATE_CIGAR=true BLOCK_SIZE=100000 REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=TOTAL_MAPPED_REFERENCE_LENGTH PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicatesWithMateCigar READ_NAME_REGEX= VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Sat Feb 06 16:09:42 CST 2016] Executing as antunes@gpu10 on Linux 2.6.32-431.29.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_40-b26; Picard version: 2.1.0(25ebc07f7fbaa7c1a4a8e6c130c88c1d10681802_1454776546) IntelDeflater
[Sat Feb 06 16:09:42 CST 2016] picard.sam.markduplicates.MarkDuplicatesWithMateCigar done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=4116185088
Exception in thread "main" java.lang.IllegalStateException: A valid CRAM reference was not supplied and one cannot be acquired via the property settings reference_fasta or use_cram_ref_download
at htsjdk.samtools.cram.ref.ReferenceSource.getDefaultCRAMReferenceSource(ReferenceSource.java:98)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:269) at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(AbstractMarkDuplicatesCommandLineProgram.java:205) at picard.sam.markduplicates.MarkDuplicatesWithMateCigar.doWork(MarkDuplicatesWithMateCigar.java:118) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105) -lili Tagged: #### Issue · Github February 2016 by Sheila Issue Number 559 State closed Last Updated Assignee Array Milestone Array Closed By vdauwera ## Best Answer ## Answers • Broad InstituteMember, Broadie, Moderator, Dev Posts: 4,443 admin @la_br2016 Hi, I have to check with the team. We will get back to you shortly. -Sheila • Member Posts: 9 Hi Sheila, were you able to find something about this issue? thanks, -lili -lili • Broad InstituteMember, Broadie, Moderator, Dev Posts: 82 admin edited February 2016 Hi Lili: I too obtained the same result using MarkedDuplicates with a cram file. However, the tool works fine with an equivalent bam file. Thus, I would tentatively conclude that the support for cram in Picard is not ubiquitous. An easy workaround would be to convert your cram to a bam file using the "bam" command in cramtools. Let us know if this works. • Member Posts: 9 edited February 2016 Hi @dekling This is exactly what I have been doing and it works fine. I was hoping to get this pipeline to not use bam file anymore (save space and it is easier to transfer files from servers) but Picard Markduplicate is the only exception for now. Any thoughts about Picard tools having CRAM support any time soon? Thanks for your feedback. -lili -lili • Administrator, Dev Posts: 11,118 admin Hi @la_br2016, Picard tools do support CRAM. The underlying issue here is that CRAM, as a format, is entirely reference-based: only information that differs from the reference is stored in the file, so the CRAM's content is useless without a reference. I think that you should be able to provide the reference to MarkDuplicates using the REFERENCE_SEQUENCE argument, which according to the Picard docs is applicable to all Picard tools. Let us know if that doesn't work. Geraldine Van der Auwera, PhD • Member Posts: 9 Yes! I did run passing the argument R=${REF} but it didn't recognize that argument and complained about it; I've also tried REFERENCE_SEQUENCE = \${REF} and gave the same error. (I don't have the error message file handy now but I can post later)

I see that Picard tools in some application ask for reference option R= fasta; however under Markduplicates (http://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicatesWithMateCigar) it does not have R= option.

...so that was my understanding about CRAM format, having to provide the reference file... and GATK CRAM support (I was not sure about Picard Tool in its full extent - since MarkDuplicates didn't handle well) and it was the reason for this question; "How to pass A valid CRAM reference".

Thanks,

-lili

-lili

• Member Posts: 9

that will be super!

Thanks a lot @Geraldine_VdAuwera

-lili

• Cambridge, UKMember Posts: 5

Hello there,
I wonder if there's any way around this issue as I am also having this same problem with markduplicates. I have tried adding the reference but it complains that "A valid CRAM reference was not supplied and one cannot be acquired via the property settings reference_fasta or use_cram_ref_download"
Many thanks