The current GATK version is 3.2-2

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

# (howto) Perform local realignment around indels

edited July 2013

#### Objective

Perform local realignment around indels to correct mapping-related artifacts.

• TBD

#### Steps

1. Create a target list of intervals to be realigned
2. Perform realignment of the target intervals

### 1. Create a target list of intervals to be realigned

#### Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \
-T RealignerTargetCreator \
-R reference.fa \
-L 20 \
-known gold_indels.vcf \
-o target_intervals.list


#### Expected Result

This creates a file called target_intervals.list containing the list of intervals that the program identified as needing realignment within our target, chromosome 20.

The list of known indel sites (gold_indels.vcf) are used as targets for realignment. Only use it if there is such a list for your organism.

### 2. Perform realignment of the target intervals

#### Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \
-T IndelRealigner \
-R reference.fa \
-targetIntervals target_intervals.list \
-known gold_indels.vcf \


#### Expected Result

This creates a file called realigned_reads.bam containing all the original reads, but with better local alignments in the regions that were realigned.

Note that here, we didn’t include the -L 20 argument. It's not necessary since the program will only run on the target intervals we are providing.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Tagged:

• Posts: 2Member

Please have a look at the FAQ article on recommended known sets to use per tool.

Geraldine Van der Auwera, PhD

• Czech RepublicPosts: 3Member

I'd like to analyze our data from malignant cells containing 4bp somatic indel but it is not aligned properly by BWA so I tried to realign it by indel realignment procedure according to the Best Practices and this howto but I was not succesfull. I tried to change final vcf file to contain proper indel instead of wrong one but indel was not realigned with this file used as input VCF file with known indels in both steps of indel realignment. The sequences are:

Consensus:
CAAGATCTCTGGCAGTGGAGG

Alignment according to bwa mem (indel is bold):
CAAGATCTCTGCCTGGCAGTGGAGG

Real indel (indel is bold):
CAAGATCTCTGCCTGGCAGTGGAGG

Could you help me ,please, how to set indel realignment to work as I`d like it to?

Thank you.

• Posts: 322Member, GSA Collaborator ✭✭✭

I'm confused by this - the two sequences you posted are exactly the same, only the position of the reported indel in them differs. But in terms of identifying the correct indel, you're already there.

The GATK's convention is to left-align indels - that is, report the smallest coordinate that correctly represents the variation. I don't believe that can be changed. Ultimately, you just need to make sure that all of your indels are treated the same way (always use the smallest or the largest correct coordinate). The simplest way to do that is to run your comparison VCFs through LeftAlignAndTrimVariants

• Czech RepublicPosts: 3Member

Thanks a lot. I was not aware of left-align convention. The problem is, that there are several different variants present and they are all misaligned. I thought I would be able to realign them properly but I will probably have to make some conversion table :-).