The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

Markduplicates in Cram files. A valid CRAM reference was not supplied...

cdiaz81cdiaz81 Cambridge, UKMember

Hello,
I wonder if you can help with the following. I am trying to mark duplicates in a cram file with the following command (picard latest):
'picard-tools MarkDuplicates I=09_1#21.cram O=09_1#21_md.cram M=09_1#21_md.txt'

I keep getting the following errors:

To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: A valid CRAM reference was not supplied and one cannot be acquired via the property settings reference_fasta or use_cram_ref_download
at htsjdk.samtools.cram.ref.ReferenceSource.getDefaultCRAMReferenceSource(ReferenceSource.java:107)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:301)
at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(AbstractMarkDuplicatesCommandLineProgram.java:212)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:421)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:220)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105

I have also tried with the fasta reference R=in.fasta but nothing..
There is a similar post asking the same question but it doesn't seem to have been solved.
Many thanks in advance,

Carmen

Issue · Github
by Sheila

Issue Number
1266
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    When you provided the reference, did you get the exact same error message, or a different one?
  • cdiaz81cdiaz81 Cambridge, UKMember

    Hello Geraldine,
    Thank you for looking into this. I get exactly the same error as you see below. For some reason, Picard does not like the reference.
    The only way I found around it is using scramble to convert cram to bam and then using markduplicates.
    The reason I use scramble and not Samtools is because I found that converting cram-to-bam with Samtools also throws an error to do with 'allocating reads to wrong bins'. Looking online and talking to people in the Sanger it appears they are aware of this so this might change in the near future. My as you can imagine, I would ideally like to use my cram files to save space.

    Anyway, this is what I get when I try to work with my cram files.

    picard-tools MarkDuplicates I=09_1#21.cram O=09_1#21_md.cram M=09_1#21_md.txt R=/lustre/scratch108/viruses/cds1/References/Homo_sapiens.GRCh37.dna.all.fa
    [Tue Sep 13 06:39:03 BST 2016] picard.sam.markduplicates.MarkDuplicates INPUT=[09_1#21.cram] OUTPUT=09_1#21_md.cram METRICS_FILE=09_1#21_md.txt REFERENCE_SEQUENCE=/lustre/scratch108/viruses/cds1/References/Homo_sapiens.GRCh37.dna.all.fa MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX= OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
    [Tue Sep 13 06:39:03 BST 2016] Executing as cds1@pcs5b on Linux 3.2.0-105-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_74-b02; Picard version: 2.6.0-SNAPSHOT
    INFO 2016-09-13 06:39:03 MarkDuplicates Start of doWork freeMemory: 2042802184; totalMemory: 2058354688; maxMemory: 2058354688
    INFO 2016-09-13 06:39:03 MarkDuplicates Reading input file and constructing read end information.
    INFO 2016-09-13 06:39:03 MarkDuplicates Will retain up to 7916748 data points before spilling to disk.
    [Tue Sep 13 06:39:04 BST 2016] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=2058354688
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Exception in thread "main" java.lang.IllegalStateException: A valid CRAM reference was not supplied and one cannot be acquired via the property settings reference_fasta or use_cram_ref_download
    at htsjdk.samtools.cram.ref.ReferenceSource.getDefaultCRAMReferenceSource(ReferenceSource.java:107)
    at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:301)
    at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(AbstractMarkDuplicatesCommandLineProgram.java:212)
    at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:421)
    at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:220)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

    Many thanks,

    Carmen

  • cdiaz81cdiaz81 Cambridge, UKMember

    Hello,
    Yes, the data is actually indexed. So I guess it is not possible to work with Crams at the moment. That's not problem. I will continue with my Bams. I will also give it a try with the conversion tool as you suggest.
    Cheers!

  • cdiaz81cdiaz81 Cambridge, UKMember

    Hello, Thank you for looking into this Geraldine. The bug in samtools is fixed now so it can be used to convert to bam.

Sign In or Register to comment.