The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Using ReducedReads as part of GATK pipeline

mpw6mpw6 Member Posts: 10
edited October 2012 in Ask the GATK team

We are attempting to see if using ReducedReads will help with the overwhelming file sizes for the SNP calling we are doing on whole genome BAM files. We have been using a protocol similar to the one described in best practices document: Best: multi-sample realignment with known sites and recalibration. My question is what is the best point in the pipeline to use ReducedReads?

Post edited by Geraldine_VdAuwera on

Best Answer

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin

    After all of the data processing steps, right before variant calling. Note that ReduceReads will want to work on single sample files.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • Mark_DePristoMark_DePristo Broad InstituteMember Posts: 153 admin

    I would no longer do multi-sample realignment. I'd do per sample realignment with known sites, recalibrate, and then run ReduceReads per sample to make a reduced BAM. That's the recommended option today. And allows you to avoid the (ridiculously) expensive joint realignment step. Let us know your experiences, I'm very interested to hear how well this worked for you.

    --
    Mark A. DePristo, Ph.D.
    Co-Director, Medical and Population Genetics
    Broad Institute of MIT and Harvard

  • mpw6mpw6 Member Posts: 10

    So then if I need to combine samples, I should reduce the reads and then combine?

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin

    The GATK can combine BAMs on the fly, so you shouldn't need to physically combine them.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • mpw6mpw6 Member Posts: 10

    Ok, that leads to a question I had earlier but never posted. It it really outside the scope of this thread, but since you mentioned GATK combining BAMs, I stumbled when I got to the VariantAnnotator since that didn't support multiple BAMs.

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin

    Variant Annotator absolutely does support multiple BAMs as input...

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • mpw6mpw6 Member Posts: 10

    Thank you. I'll try it again, and if I hit the same snag, I'll start another thread asking where I went wrong with VariantAnnotator.

  • mpw6mpw6 Member Posts: 10

    My mistake. It did accept multiple BAMs. It was just taking an enormous amount of time to process. I'll try reducing the reads.

  • Mark_DePristoMark_DePristo Broad InstituteMember Posts: 153 admin

    Why are you running VariantAnnotator anyway? You can just tell UnifiedGenotyper to add all of the annotations you want while calling.

    --
    Mark A. DePristo, Ph.D.
    Co-Director, Medical and Population Genetics
    Broad Institute of MIT and Harvard

  • mpw6mpw6 Member Posts: 10

    I just asked that question of my collaborator. We're going to rework our protocol. Thank you for all the help. I'm off to find more info about parallelizing now.

  • mpw6mpw6 Member Posts: 10

    After some time working on analysis for our Nature paper, I have returned my focus to this issue and have results to discuss. Using the default arguments with ReduceReads and using the resulting BAM as input to UnifiedGenotyper, I get a result set that is smaller than I would get using non-reduced inputs. If I split the file by chromosome intervals, I get an even smaller VCF file. We are guessing here that this is due to some threshold values dropping reads that would otherwise be processed. I am still trying to determine if the smaller files are subsets of the larger ones, but I'm hoping that you might have advice regarding the arguments that would allow for equivalent outputs.

  • mpw6mpw6 Member Posts: 10

    I was using GenomeAnalysisTK-2.0-38-g45f7b0d. I hadn't seen that 2.1 was available. I will try that instead. Thank you.

Sign In or Register to comment.