The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Picard tools MarkDuplicates using cram format ...How to pass A valid CRAM reference?

la_br2016la_br2016 Member Posts: 9

Hello there!

I am trying to use picard tools to mark duplicates using a cram format file; however I could not find any documentation to address this problem. How can I pass a valid CRAM reference?

Thanks in advance,

-lili

Sat Feb 06 16:09:42 CST 2016] picard.sam.markduplicates.MarkDuplicatesWithMateCigar MINIMUM_DISTANCE=250 INPUT=[/EXOME/gatk/test_10990_bwa_srtd.cram] OUTPUT=EXOME/gatk/test_10990_wes_dupMC.cram METRICS_FILE=/EXOME/gatk/test_10990_wes_dupMC_metrics.txt OPTICAL_DUPLICATE_PIXEL_DISTANCE=2500 CREATE_INDEX=true SKIP_PAIRS_WITH_NO_MATE_CIGAR=true BLOCK_SIZE=100000 REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=TOTAL_MAPPED_REFERENCE_LENGTH PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicatesWithMateCigar READ_NAME_REGEX= VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Sat Feb 06 16:09:42 CST 2016] Executing as antunes@gpu10 on Linux 2.6.32-431.29.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_40-b26; Picard version: 2.1.0(25ebc07f7fbaa7c1a4a8e6c130c88c1d10681802_1454776546) IntelDeflater
[Sat Feb 06 16:09:42 CST 2016] picard.sam.markduplicates.MarkDuplicatesWithMateCigar done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=4116185088
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: A valid CRAM reference was not supplied and one cannot be acquired via the property settings reference_fasta or use_cram_ref_download
at htsjdk.samtools.cram.ref.ReferenceSource.getDefaultCRAMReferenceSource(ReferenceSource.java:98)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:269)
at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(AbstractMarkDuplicatesCommandLineProgram.java:205)
at picard.sam.markduplicates.MarkDuplicatesWithMateCigar.doWork(MarkDuplicatesWithMateCigar.java:118)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

-lili

Tagged:

Issue · Github
by Sheila

Issue Number
559
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator, Dev Posts: 4,579 admin

    @la_br2016
    Hi,

    I have to check with the team. We will get back to you shortly.

    -Sheila

  • la_br2016la_br2016 Member Posts: 9

    Hi Sheila, were you able to find something about this issue?

    thanks,

    -lili

    -lili

  • deklingdekling Broad InstituteMember Posts: 82 admin
    edited February 2016

    Hi Lili: I too obtained the same result using MarkedDuplicates with a cram file. However, the tool works fine with an equivalent bam file. Thus, I would tentatively conclude that the support for cram in Picard is not ubiquitous. An easy workaround would be to convert your cram to a bam file using the "bam" command in cramtools. Let us know if this works.

  • la_br2016la_br2016 Member Posts: 9
    edited February 2016

    Hi @dekling

    This is exactly what I have been doing and it works fine.

    I was hoping to get this pipeline to not use bam file anymore (save space and it is easier to transfer files from servers) but Picard Markduplicate is the only exception for now.

    Any thoughts about Picard tools having CRAM support any time soon?

    Thanks for your feedback.

    -lili

    -lili

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,371 admin

    Hi @la_br2016,

    Picard tools do support CRAM. The underlying issue here is that CRAM, as a format, is entirely reference-based: only information that differs from the reference is stored in the file, so the CRAM's content is useless without a reference.

    I think that you should be able to provide the reference to MarkDuplicates using the REFERENCE_SEQUENCE argument, which according to the Picard docs is applicable to all Picard tools.

    Let us know if that doesn't work.

    Geraldine Van der Auwera, PhD

  • la_br2016la_br2016 Member Posts: 9

    Hi @Geraldine_VdAuwera

    Yes! I did run passing the argument R=${REF} but it didn't recognize that argument and complained about it; I've also tried REFERENCE_SEQUENCE = ${REF} and gave the same error. (I don't have the error message file handy now but I can post later)

    I see that Picard tools in some application ask for reference option R= fasta; however under Markduplicates (http://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicatesWithMateCigar) it does not have R= option.

    ...so that was my understanding about CRAM format, having to provide the reference file... and GATK CRAM support (I was not sure about Picard Tool in its full extent - since MarkDuplicates didn't handle well) and it was the reason for this question; "How to pass A valid CRAM reference".

    Thanks,

    -lili

    -lili

  • la_br2016la_br2016 Member Posts: 9

    that will be super!

    Thanks a lot @Geraldine_VdAuwera

    -lili

  • cdiaz81cdiaz81 Cambridge, UKMember Posts: 5

    Hello there,
    I wonder if there's any way around this issue as I am also having this same problem with markduplicates. I have tried adding the reference but it complains that "A valid CRAM reference was not supplied and one cannot be acquired via the property settings reference_fasta or use_cram_ref_download"
    Many thanks

  • SheilaSheila Broad InstituteMember, Broadie, Moderator, Dev Posts: 4,579 admin

    @cdiaz81
    Hi,

    I believe Geraldine has answered your question here.

    -Sheila

Sign In or Register to comment.