We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Using CRAM files in Picard SamToFastq with Queue

Hi,
I'm trying to convert CRAM files to FASTQ format with SamToFastq as part of a pipeline I have written in a Queue script. Is there any parameter for passing a reference file in the SamToFastq Queue module? I didn't see any such parameter in SamToFastq.scala, PicardBamFunction.scala, or JavaCommandLineFunction.scala. I couldn't find anything on the forum, and I tried a bunch of different things by guessing, but didn't have any luck. If there is no parameter to use a reference file in a Queue script for Picard tools, is there any sort of workaround (e.g., environment variable that Picard will pick up on)?
Thanks!
Andrew
Best Answer
-
shlee Cambridge ✭✭✭✭✭
Hi @andrewo,
I recommend you first convert your CRAM to BAM before processing with SamToFastq. It's possible that the input parameter that takes a BAM will also take a CRAM. The tools recognize the file type by the extension, e.g.
.bam
or.cram
. You can try it out to see if it works and see if the results are as expected.Be sure you are using the latest version of Picard, currently at v2.9.0.
Also, let me mention that we are moving towards using WDL scripting for pipeline work. Check it out at the WDL website.
Answers
Hi @andrewo,
I recommend you first convert your CRAM to BAM before processing with SamToFastq. It's possible that the input parameter that takes a BAM will also take a CRAM. The tools recognize the file type by the extension, e.g.
.bam
or.cram
. You can try it out to see if it works and see if the results are as expected.Be sure you are using the latest version of Picard, currently at v2.9.0.
Also, let me mention that we are moving towards using WDL scripting for pipeline work. Check it out at the WDL website.
CRAM support is still patchy in Picard tools -- each tool has to be modified individually to support CRAM and this has not yet been done across the board. I second @shlee's recommendation to convert to BAM first.
@Geraldine_VdAuwera
@shlee
Thanks to you both for the responses. I did try to input CRAM files but I got an error message referring to a reference file as input, hence my question about how to supply the reference file. Here's the command that was sent to the cluster by Queue, and the accompanying error message I received:
The error message mentioned that 'chr1' was not found in the reference file. I wasn't sure if there was some default reference it looks for but I assumed this error was because I had not provided a reference fasta file. I can manually run Picard and add the parameter
REFERENCE_SEQUENCE=hg19.fa
, which works, but I couldn't find a way to add that parameter to the SamToFastq object in my Queue script. I think I'm limited to the options that Queue allows me to specify (for example, https://github.com/broadgsa/gatk/blob/master/public/gatk-queue-extensions-public/src/main/scala/org/broadinstitute/gatk/queue/extensions/picard/SamToFastq.scala).The solution you mentioned is what I came up with as well, which is to convert to BAM and then run SamToFastq. I was hoping to go straight from CRAM to FASTQ but this works.
Thanks for mentioning WDL as well. It appears that GridEngine is supported now so I will probably try to move our pipelines to WDL in the near future.
It should be possible to add an argument setting for REFERENCE_SEQUENCE in Queue but to be honest I forget how it's done. I definitely encourage you to consider migrating to WDL; we're starting to share and support full workflow scripts, which should help.
@andrewo, The error sounds like you are mistakenly using the wrong reference set. Be sure the contig naming matches between your data and the reference, e.g.
chr1
vs just1
.