The current GATK version is 3.3-0

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.

(howto) Perform local realignment around indels

Posts: 7,408Administrator, GATK Developer admin
edited July 2013

Objective

Perform local realignment around indels to correct mapping-related artifacts.

• TBD

Steps

1. Create a target list of intervals to be realigned
2. Perform realignment of the target intervals

1. Create a target list of intervals to be realigned

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \
-T RealignerTargetCreator \
-R reference.fa \
-I dedup_reads.bam \
-L 20 \
-known gold_indels.vcf \
-o target_intervals.list


Expected Result

This creates a file called target_intervals.list containing the list of intervals that the program identified as needing realignment within our target, chromosome 20.

The list of known indel sites (gold_indels.vcf) are used as targets for realignment. Only use it if there is such a list for your organism.

2. Perform realignment of the target intervals

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \
-T IndelRealigner \
-R reference.fa \
-I dedup_reads.bam \
-targetIntervals target_intervals.list \
-known gold_indels.vcf \
-o realigned_reads.bam


Expected Result

This creates a file called realigned_reads.bam containing all the original reads, but with better local alignments in the regions that were realigned.

Note that here, we didn’t include the -L 20 argument. It's not necessary since the program will only run on the target intervals we are providing.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Tagged:

Comments

• Posts: 2Member
• Posts: 7,408Administrator, GATK Developer admin

Please have a look at the FAQ article on recommended known sets to use per tool.

Geraldine Van der Auwera, PhD

• Czech RepublicPosts: 3Member

I'd like to analyze our data from malignant cells containing 4bp somatic indel but it is not aligned properly by BWA so I tried to realign it by indel realignment procedure according to the Best Practices and this howto but I was not succesfull. I tried to change final vcf file to contain proper indel instead of wrong one but indel was not realigned with this file used as input VCF file with known indels in both steps of indel realignment. The sequences are:

Consensus:
CAAGATCTCTGGCAGTGGAGG

Alignment according to bwa mem (indel is bold):
CAAGATCTCTGCCTGGCAGTGGAGG

Real indel (indel is bold):
CAAGATCTCTGCCTGGCAGTGGAGG

Could you help me ,please, how to set indel realignment to work as I`d like it to?

Thank you.

• Posts: 442Member, GSA Collaborator ✭✭✭✭

I'm confused by this - the two sequences you posted are exactly the same, only the position of the reported indel in them differs. But in terms of identifying the correct indel, you're already there.

The GATK's convention is to left-align indels - that is, report the smallest coordinate that correctly represents the variation. I don't believe that can be changed. Ultimately, you just need to make sure that all of your indels are treated the same way (always use the smallest or the largest correct coordinate). The simplest way to do that is to run your comparison VCFs through LeftAlignAndTrimVariants

• Czech RepublicPosts: 3Member

Thanks a lot. I was not aware of left-align convention. The problem is, that there are several different variants present and they are all misaligned. I thought I would be able to realign them properly but I will probably have to make some conversion table :-).

• UKPosts: 2Member

For realignment of amplicon reads with a very small target ROI (~3kb) can I omit the RealignerTargetCreator step and submit the entire ROI to the IndelRealigner? Would there be any negative consequence of doing this (apart from performance)?

Many thanks
Matt

• Posts: 442Member, GSA Collaborator ✭✭✭✭
edited July 2014

@mlyon - If memory serves, IndelRealigner will only attempt a single realignment per target. So if your amplicon contains multiple indels, it would only clean up one of them

Post edited by pdexheimer on
• Posts: 7,408Administrator, GATK Developer admin

Hi @‌mylon

@pdexheimer is correct, so you're better off running the RTC step. On such a small area it will take a very short time so I can't really se the point of skipping it.

Geraldine Van der Auwera, PhD

• mghandi1Posts: 1Member
edited September 2014

Thanks for the great tool.
The last sentence in the tutorial can be a bit confusing: "Note that here, we didn’t include the -L 20 argument. It's not necessary ...". If you add -L 20, it will limit the output to chr 20 only (other reads will be filtered out in the output bam) so the output will not be the same.

Post edited by mghandi on
• Broad InstitutePosts: 1,062Member, GATK Developer, Broadie, Moderator admin

Hi,

The last sentence is meant to point out that even if you used the -L argument in Realigner Target Creator, you do not need to use it when running Indel Realigner. This is because Indel Realigner only realigns around the intervals given in the output file of Realigner Target Creator. So, if you have already put in the intervals you want to focus on in Realigner Target Creator, you do not need to input them again into Indel Realigner.

I hope this clarifies things.

-Sheila

• GermanyPosts: 29Member

Hello,
In the 2.8 bundle (hg19), there are two dbsnp vcf files (dbsnp_138 and dbsnp_130.excluding_sites_after_129). Which of these is recommended to be used with the base recalibrator (knownsites)?
Thanks,

• Broad InstitutePosts: 1,062Member, GATK Developer, Broadie, Moderator admin

Hi,

The dbsnp_138 file is the basic recommendation for our users. The other file, dbsnp_130.excluding_sites_after_129, is a specially modified callset containing the set of sites from dbsnp 129, but updated with annotations from 130. It's meant to be used to replicate some findings from the original GATK paper, so it is more for internal Broad purposes.

-Sheila

• GermanyPosts: 29Member

thanks!

• Warwick University, CoventryPosts: 4Member
edited February 19

I'm thinking about performing local realignment around indels with one of my own datasets. It turns out that I have a VCF for structural variants that includes insertions and deletions (but also other things like copy number variation, gene duplication and tandem duplication). I'm just wondering if simply removing all of the lines for types of structural variant other than insertions and deletions is going to be a safe way of creating a VCF with known indels only? Also, I'm assuming that it is undesirable to do re-alignment around the other types of structural variant; but perhaps that is wrong (as I guess some of these structural variants have various things in common with indels and possibly some of the same issues)? Indeed having a look at the notes in the presentation it seems likely that it would be desirable to do re-alignment around all of the structural variants I mentioned; but it's not clear what are the best options to use with the IndelRealigner (e.g. should I use the full Smith-Waterman realignment)? (By the way, I have two examples of structural variants that are described in my VCF file as "complex structural alterations" - I'm not sure what those are or if they are even remotely like indels?)

William

Post edited by WVNicholson on
• Broad InstitutePosts: 1,062Member, GATK Developer, Broadie, Moderator admin

@WVNicholson
Hi William,

You should be able to simply input your file as it is, but if you want, you can use Select Variants to select the indels only. https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php

-Sheila

• Posts: 9Member

Hi,

I was wondering what the differences are between IndelRealigner vs. LeftAlignAndTrimVariants.
Is running one of the two sufficient for accurate indel calling?
I have my own understanding of the two modules but want to be clarified by GATK team officially.

Thanks in advance!

• Posts: 7,408Administrator, GATK Developer admin

Hi @hoosier060,

Those are two completely different tools -- IndelRealigner cleans up indels in BAM files (important for calling) and LAATV normalizes indel representations in VCF files (typically used for files generated with a different program or after subsetting).

Geraldine Van der Auwera, PhD

Sign In or Register to comment.