GATK licensing moves to direct-through-Broad model -- read about it on the GATK blog

(howto) Perform local realignment around indels

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,781Administrator, GATK Dev admin
edited July 2013 in Tutorials

Objective

Perform local realignment around indels to correct mapping-related artifacts.

Prerequisites

  • TBD

Steps

  1. Create a target list of intervals to be realigned
  2. Perform realignment of the target intervals

1. Create a target list of intervals to be realigned

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \ 
    -T RealignerTargetCreator \ 
    -R reference.fa \ 
    -I dedup_reads.bam \ 
    -L 20 \ 
    -known gold_indels.vcf \ 
    -o target_intervals.list 

Expected Result

This creates a file called target_intervals.list containing the list of intervals that the program identified as needing realignment within our target, chromosome 20.

The list of known indel sites (gold_indels.vcf) are used as targets for realignment. Only use it if there is such a list for your organism.


2. Perform realignment of the target intervals

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \ 
    -T IndelRealigner \ 
    -R reference.fa \ 
    -I dedup_reads.bam \ 
    -targetIntervals target_intervals.list \ 
    -known gold_indels.vcf \ 
    -o realigned_reads.bam 

Expected Result

This creates a file called realigned_reads.bam containing all the original reads, but with better local alignments in the regions that were realigned.

Note that here, we didn’t include the -L 20 argument. It's not necessary since the program will only run on the target intervals we are providing.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,781Administrator, GATK Dev admin

    Please have a look at the FAQ article on recommended known sets to use per tool.

    Geraldine Van der Auwera, PhD

  • VewehrVewehr Czech RepublicPosts: 3Member

    I'd like to analyze our data from malignant cells containing 4bp somatic indel but it is not aligned properly by BWA so I tried to realign it by indel realignment procedure according to the Best Practices and this howto but I was not succesfull. I tried to change final vcf file to contain proper indel instead of wrong one but indel was not realigned with this file used as input VCF file with known indels in both steps of indel realignment. The sequences are:

    Consensus:
    CAAGATCTCTGGCAGTGGAGG

    Alignment according to bwa mem (indel is bold):
    CAAGATCTCTGCCTGGCAGTGGAGG

    Real indel (indel is bold):
    CAAGATCTCTGCCTGGCAGTGGAGG

    Could you help me ,please, how to set indel realignment to work as I`d like it to?

    Thank you.

  • pdexheimerpdexheimer Posts: 463Member, GATK Dev, DSDE Member mod

    @Vewehr -

    I'm confused by this - the two sequences you posted are exactly the same, only the position of the reported indel in them differs. But in terms of identifying the correct indel, you're already there.

    The GATK's convention is to left-align indels - that is, report the smallest coordinate that correctly represents the variation. I don't believe that can be changed. Ultimately, you just need to make sure that all of your indels are treated the same way (always use the smallest or the largest correct coordinate). The simplest way to do that is to run your comparison VCFs through LeftAlignAndTrimVariants

  • VewehrVewehr Czech RepublicPosts: 3Member

    Thanks a lot. I was not aware of left-align convention. The problem is, that there are several different variants present and they are all misaligned. I thought I would be able to realign them properly but I will probably have to make some conversion table :-).

  • mlyonmlyon UKPosts: 2Member

    For realignment of amplicon reads with a very small target ROI (~3kb) can I omit the RealignerTargetCreator step and submit the entire ROI to the IndelRealigner? Would there be any negative consequence of doing this (apart from performance)?

    Many thanks
    Matt

  • pdexheimerpdexheimer Posts: 463Member, GATK Dev, DSDE Member mod
    edited July 2014

    @mlyon - If memory serves, IndelRealigner will only attempt a single realignment per target. So if your amplicon contains multiple indels, it would only clean up one of them

    Post edited by pdexheimer on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,781Administrator, GATK Dev admin

    Hi @‌mylon

    @pdexheimer is correct, so you're better off running the RTC step. On such a small area it will take a very short time so I can't really se the point of skipping it.

    Geraldine Van der Auwera, PhD

  • mghandimghandi mghandi1Posts: 1Member
    edited September 2014

    Thanks for the great tool.
    The last sentence in the tutorial can be a bit confusing: "Note that here, we didn’t include the -L 20 argument. It's not necessary ...". If you add -L 20, it will limit the output to chr 20 only (other reads will be filtered out in the output bam) so the output will not be the same.

    Post edited by mghandi on
  • SheilaSheila Broad InstitutePosts: 1,241Member, GATK Dev, Broadie, Moderator, DSDE Member admin

    @mghandi

    Hi,

    The last sentence is meant to point out that even if you used the -L argument in Realigner Target Creator, you do not need to use it when running Indel Realigner. This is because Indel Realigner only realigns around the intervals given in the output file of Realigner Target Creator. So, if you have already put in the intervals you want to focus on in Realigner Target Creator, you do not need to input them again into Indel Realigner.

    I hope this clarifies things.

    -Sheila

  • simonsanchezjsimonsanchezj GermanyPosts: 45Member

    Hello,
    In the 2.8 bundle (hg19), there are two dbsnp vcf files (dbsnp_138 and dbsnp_130.excluding_sites_after_129). Which of these is recommended to be used with the base recalibrator (knownsites)?
    Thanks,

  • SheilaSheila Broad InstitutePosts: 1,241Member, GATK Dev, Broadie, Moderator, DSDE Member admin

    @simonsanchezj

    Hi,

    The dbsnp_138 file is the basic recommendation for our users. The other file, dbsnp_130.excluding_sites_after_129, is a specially modified callset containing the set of sites from dbsnp 129, but updated with annotations from 130. It's meant to be used to replicate some findings from the original GATK paper, so it is more for internal Broad purposes.

    -Sheila

  • simonsanchezjsimonsanchezj GermanyPosts: 45Member

    thanks!

  • WVNicholsonWVNicholson Warwick University, CoventryPosts: 14Member
    edited February 19

    I'm thinking about performing local realignment around indels with one of my own datasets. It turns out that I have a VCF for structural variants that includes insertions and deletions (but also other things like copy number variation, gene duplication and tandem duplication). I'm just wondering if simply removing all of the lines for types of structural variant other than insertions and deletions is going to be a safe way of creating a VCF with known indels only? Also, I'm assuming that it is undesirable to do re-alignment around the other types of structural variant; but perhaps that is wrong (as I guess some of these structural variants have various things in common with indels and possibly some of the same issues)? Indeed having a look at the notes in the presentation it seems likely that it would be desirable to do re-alignment around all of the structural variants I mentioned; but it's not clear what are the best options to use with the IndelRealigner (e.g. should I use the full Smith-Waterman realignment)? (By the way, I have two examples of structural variants that are described in my VCF file as "complex structural alterations" - I'm not sure what those are or if they are even remotely like indels?)

    William

    Post edited by WVNicholson on
  • SheilaSheila Broad InstitutePosts: 1,241Member, GATK Dev, Broadie, Moderator, DSDE Member admin

    @WVNicholson
    Hi William,

    You should be able to simply input your file as it is, but if you want, you can use Select Variants to select the indels only. https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php

    -Sheila

  • hoosier060hoosier060 Posts: 9Member

    Hi,

    I was wondering what the differences are between IndelRealigner vs. LeftAlignAndTrimVariants.
    Is running one of the two sufficient for accurate indel calling?
    I have my own understanding of the two modules but want to be clarified by GATK team officially.

    Thanks in advance!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,781Administrator, GATK Dev admin

    Hi @hoosier060,

    Those are two completely different tools -- IndelRealigner cleans up indels in BAM files (important for calling) and LAATV normalizes indel representations in VCF files (typically used for files generated with a different program or after subsetting).

    Geraldine Van der Auwera, PhD

  • SerenaRhieSerenaRhie Posts: 7Member

    @Carneiro @Geraldine_VdAuwera (I've posted this question elsewhere, but I think this is a better place to ask) For updates, I'm using GATK-3.3.0 and want to be clear with the default option for -mode.
    is USE_READS the default behavior if not specified? Please update your tool documentation with the default value, which is stated as "NA". Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,781Administrator, GATK Dev admin

    @SerenaRhie In future please don't post the same question in two different places, it makes the job of maintaining the forum more difficult.

    Yes, USE_READS is the default value. I'm not sure why the defaults are not getting written properly in the doc; will check & fix.

    Geraldine Van der Auwera, PhD

  • SerenaRhieSerenaRhie Posts: 7Member
    edited April 13

    @Geraldine_VdAuwera Thank you for your reply! ...And sorry for the inconvenience. :smile:

    Post edited by SerenaRhie on
Sign In or Register to comment.