The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

need to run a step with FixMateInformation after realignment step ?

TiphaineTiphaine Member Posts: 53
edited January 2013 in Ask the GATK team

Dear Community and GATK's team,

I have one question about the cleaning step before SNP calling, mainly about local realignment around indels.

I read on some website describing their workflow that alignments may change during the realignment process, it would prefer to fix the mate information and Picard offers this utility to do that for us. is it true? Or are there only any insert sizes that can change? If there are some change of insert sizes, is there a tool that checks that these changes are ok?

What do you use Picard's tool, FixMateInformation.jar, to fix the mate information when using paired-end data ?

Up to now, I have not used in my pipeline. Maybe this is a mistake.
If we have to add this step, should we add this step after the realignment step or recalibration step?

Thank you for your help,

Tiphaine

Post edited by Geraldine_VdAuwera on

Best Answer

  • Mark_DePristoMark_DePristo Broad InstituteMember Posts: 153 admin
    Accepted Answer

    The indel realigner fixes mates itself, so the file is valid after realignment. How this is accomplished is a bit of magic code from Eric Banks and myself from several years ago. A previous version of the realigner did require you to run two passes, but that was before the magic code was written.

    --
    Mark A. DePristo, Ph.D.
    Co-Director, Medical and Population Genetics
    Broad Institute of MIT and Harvard

Answers

  • Mark_DePristoMark_DePristo Broad InstituteMember Posts: 153 admin
    Accepted Answer

    The indel realigner fixes mates itself, so the file is valid after realignment. How this is accomplished is a bit of magic code from Eric Banks and myself from several years ago. A previous version of the realigner did require you to run two passes, but that was before the magic code was written.

    --
    Mark A. DePristo, Ph.D.
    Co-Director, Medical and Population Genetics
    Broad Institute of MIT and Harvard

  • mhtmht Member Posts: 8

    Hi Mark,

    I have the same question as Tiphaine, except that I'm using GATK-lite. Does the indel realigner in this version also fixes mates or do I need to do it myself?

    Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,388 admin

    The realigner in Lite is the same tool as in the full version, so yes, it also fixes mates.

    Geraldine Van der Auwera, PhD

  • BerntPoppBerntPopp Member Posts: 49

    Mark_DePristo

    The indel realigner fixes mates itself, so the file is valid after realignment. How this is accomplished is a bit of magic code from Eric Banks and myself from several years ago. A previous version of the realigner did require you to run two passes, but that was before the magic code was written.

    Hi Mark,

    So it is alright that I still get these errors from Picard "ValidateSamFile" after running the Indel Realigner (GATK 2.13) on my Paired Files? Or does the code keep the original Information and add the fixed mate information somewhere else?

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin

    The Indel Realigner only fixes the mates of reads that it moves. If your mates information didn't start out correct for other reads, then they will still be wrong even after running through the IR.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • BerntPoppBerntPopp Member Posts: 49
    edited October 2012

    Alright, i understand.
    So should one use FixMateInformation (as it adds a extra step to the BAM processing pipeline)?
    Or asked the other way around: Does the UnifiedGenotyper also handle the malformated mate information?

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin

    UG doesn't care about the exact locations of the mates.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • ocanelaocanela Member Posts: 5

    Hi,

    In my pipeline, I use the filter 'NotPrimaryAlignment' when I run the IndelRealigner function. As a consequence, when some duplicates are removed, the mate information of the remaining reads becomes (sometimes) wrong and I must fix it with Picard.

  • ocanelaocanela Member Posts: 5

    Hi, sorry for my last comment, I was wrong. It seems that the problem is not with gatk. It arises when using Picard's FixMateInformation function in a previous step.

Sign In or Register to comment.