The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

#### ☞ Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Did we ask for a bug report?

Then follow instructions in Article#1894.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

##### Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# Using ReducedReads as part of GATK pipeline

Posts: 10
edited October 2012

We are attempting to see if using ReducedReads will help with the overwhelming file sizes for the SNP calling we are doing on whole genome BAM files. We have been using a protocol similar to the one described in best practices document: Best: multi-sample realignment with known sites and recalibration. My question is what is the best point in the pipeline to use ReducedReads?

Post edited by Geraldine_VdAuwera on
Tagged:

## Answers

• Broad InstitutePosts: 693 ✭✭✭

After all of the data processing steps, right before variant calling. Note that ReduceReads will want to work on single sample files.

Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

• Broad InstitutePosts: 153 admin

I would no longer do multi-sample realignment. I'd do per sample realignment with known sites, recalibrate, and then run ReduceReads per sample to make a reduced BAM. That's the recommended option today. And allows you to avoid the (ridiculously) expensive joint realignment step. Let us know your experiences, I'm very interested to hear how well this worked for you.

--
Mark A. DePristo, Ph.D.
Co-Director, Medical and Population Genetics
Broad Institute of MIT and Harvard

• Posts: 10

So then if I need to combine samples, I should reduce the reads and then combine?

• Broad InstitutePosts: 693 ✭✭✭

The GATK can combine BAMs on the fly, so you shouldn't need to physically combine them.

Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

• Posts: 10

Ok, that leads to a question I had earlier but never posted. It it really outside the scope of this thread, but since you mentioned GATK combining BAMs, I stumbled when I got to the VariantAnnotator since that didn't support multiple BAMs.

• Broad InstitutePosts: 693 ✭✭✭

Variant Annotator absolutely does support multiple BAMs as input...

Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

• Posts: 10

Thank you. I'll try it again, and if I hit the same snag, I'll start another thread asking where I went wrong with VariantAnnotator.

• Posts: 10

My mistake. It did accept multiple BAMs. It was just taking an enormous amount of time to process. I'll try reducing the reads.

• Broad InstitutePosts: 153 admin

Why are you running VariantAnnotator anyway? You can just tell UnifiedGenotyper to add all of the annotations you want while calling.

--
Mark A. DePristo, Ph.D.
Co-Director, Medical and Population Genetics
Broad Institute of MIT and Harvard

• Posts: 10

I just asked that question of my collaborator. We're going to rework our protocol. Thank you for all the help. I'm off to find more info about parallelizing now.

• Posts: 10

After some time working on analysis for our Nature paper, I have returned my focus to this issue and have results to discuss. Using the default arguments with ReduceReads and using the resulting BAM as input to UnifiedGenotyper, I get a result set that is smaller than I would get using non-reduced inputs. If I split the file by chromosome intervals, I get an even smaller VCF file. We are guessing here that this is due to some threshold values dropping reads that would otherwise be processed. I am still trying to determine if the smaller files are subsets of the larger ones, but I'm hoping that you might have advice regarding the arguments that would allow for equivalent outputs.

• Posts: 10

I was using GenomeAnalysisTK-2.0-38-g45f7b0d. I hadn't seen that 2.1 was available. I will try that instead. Thank you.

Sign In or Register to comment.