The current GATK version is 3.4-46

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# (howto) Perform local realignment around indels

edited July 27

#### Objective

Perform local realignment around indels to correct mapping-related artifacts.

• TBD

#### Steps

1. Create a target list of intervals to be realigned
2. Perform realignment of the target intervals

### 1. Create a target list of intervals to be realigned

#### Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \
-T RealignerTargetCreator \
-R reference.fa \
-L 20 \
-known gold_indels.vcf \
-o realignment_targets.list


#### Expected Result

This creates a file called realignment_targets.list containing the list of intervals that the program identified as needing realignment within our target, chromosome 20.

The list of known indel sites (gold_indels.vcf) are used as targets for realignment. Only use it if there is such a list for your organism.

### 2. Perform realignment of the target intervals

#### Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \
-T IndelRealigner \
-R reference.fa \
-targetIntervals realignment_targets.list \
-known gold_indels.vcf \


#### Expected Result

This creates a file called realigned_reads.bam containing all the original reads, but with better local alignments in the regions that were realigned.

Note that here, we didn’t include the -L 20 argument. It's not necessary since the program will only run on the target intervals we are providing.

Post edited by dekling on

Geraldine Van der Auwera, PhD

Tagged:

• Posts: 2Member

Please have a look at the FAQ article on recommended known sets to use per tool.

Geraldine Van der Auwera, PhD

• Czech RepublicPosts: 3Member

I'd like to analyze our data from malignant cells containing 4bp somatic indel but it is not aligned properly by BWA so I tried to realign it by indel realignment procedure according to the Best Practices and this howto but I was not succesfull. I tried to change final vcf file to contain proper indel instead of wrong one but indel was not realigned with this file used as input VCF file with known indels in both steps of indel realignment. The sequences are:

Consensus:
CAAGATCTCTGGCAGTGGAGG

Alignment according to bwa mem (indel is bold):
CAAGATCTCTGCCTGGCAGTGGAGG

Real indel (indel is bold):
CAAGATCTCTGCCTGGCAGTGGAGG

Could you help me ,please, how to set indel realignment to work as Id like it to?

Thank you.

• Posts: 502Member, GATK Dev, DSDE Dev mod

I'm confused by this - the two sequences you posted are exactly the same, only the position of the reported indel in them differs. But in terms of identifying the correct indel, you're already there.

The GATK's convention is to left-align indels - that is, report the smallest coordinate that correctly represents the variation. I don't believe that can be changed. Ultimately, you just need to make sure that all of your indels are treated the same way (always use the smallest or the largest correct coordinate). The simplest way to do that is to run your comparison VCFs through LeftAlignAndTrimVariants

• Czech RepublicPosts: 3Member

Thanks a lot. I was not aware of left-align convention. The problem is, that there are several different variants present and they are all misaligned. I thought I would be able to realign them properly but I will probably have to make some conversion table :-).

• UKPosts: 2Member

For realignment of amplicon reads with a very small target ROI (~3kb) can I omit the RealignerTargetCreator step and submit the entire ROI to the IndelRealigner? Would there be any negative consequence of doing this (apart from performance)?

Many thanks
Matt

• Posts: 502Member, GATK Dev, DSDE Dev mod
edited July 2014

@mlyon - If memory serves, IndelRealigner will only attempt a single realignment per target. So if your amplicon contains multiple indels, it would only clean up one of them

Post edited by pdexheimer on

Hi @‌mylon

@pdexheimer is correct, so you're better off running the RTC step. On such a small area it will take a very short time so I can't really se the point of skipping it.

Geraldine Van der Auwera, PhD

• mghandi1Posts: 1Member
edited September 2014

Thanks for the great tool.
The last sentence in the tutorial can be a bit confusing: "Note that here, we didn’t include the -L 20 argument. It's not necessary ...". If you add -L 20, it will limit the output to chr 20 only (other reads will be filtered out in the output bam) so the output will not be the same.

Post edited by mghandi on

@mghandi‌

Hi,

The last sentence is meant to point out that even if you used the -L argument in Realigner Target Creator, you do not need to use it when running Indel Realigner. This is because Indel Realigner only realigns around the intervals given in the output file of Realigner Target Creator. So, if you have already put in the intervals you want to focus on in Realigner Target Creator, you do not need to input them again into Indel Realigner.

I hope this clarifies things.

-Sheila

• GermanyPosts: 47Member

Hello,
In the 2.8 bundle (hg19), there are two dbsnp vcf files (dbsnp_138 and dbsnp_130.excluding_sites_after_129). Which of these is recommended to be used with the base recalibrator (knownsites)?
Thanks,

@simonsanchezj‌

Hi,

The dbsnp_138 file is the basic recommendation for our users. The other file, dbsnp_130.excluding_sites_after_129, is a specially modified callset containing the set of sites from dbsnp 129, but updated with annotations from 130. It's meant to be used to replicate some findings from the original GATK paper, so it is more for internal Broad purposes.

-Sheila

• GermanyPosts: 47Member

thanks!

• Warwick University, CoventryPosts: 14Member
edited February 19

I'm thinking about performing local realignment around indels with one of my own datasets. It turns out that I have a VCF for structural variants that includes insertions and deletions (but also other things like copy number variation, gene duplication and tandem duplication). I'm just wondering if simply removing all of the lines for types of structural variant other than insertions and deletions is going to be a safe way of creating a VCF with known indels only? Also, I'm assuming that it is undesirable to do re-alignment around the other types of structural variant; but perhaps that is wrong (as I guess some of these structural variants have various things in common with indels and possibly some of the same issues)? Indeed having a look at the notes in the presentation it seems likely that it would be desirable to do re-alignment around all of the structural variants I mentioned; but it's not clear what are the best options to use with the IndelRealigner (e.g. should I use the full Smith-Waterman realignment)? (By the way, I have two examples of structural variants that are described in my VCF file as "complex structural alterations" - I'm not sure what those are or if they are even remotely like indels?)

William

Post edited by WVNicholson on

@WVNicholson
Hi William,

You should be able to simply input your file as it is, but if you want, you can use Select Variants to select the indels only. https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php

-Sheila

• Posts: 9Member

Hi,

I was wondering what the differences are between IndelRealigner vs. LeftAlignAndTrimVariants.
Is running one of the two sufficient for accurate indel calling?
I have my own understanding of the two modules but want to be clarified by GATK team officially.

Hi @hoosier060,

Those are two completely different tools -- IndelRealigner cleans up indels in BAM files (important for calling) and LAATV normalizes indel representations in VCF files (typically used for files generated with a different program or after subsetting).

Geraldine Van der Auwera, PhD

• Posts: 7Member

@Carneiro @Geraldine_VdAuwera (I've posted this question elsewhere, but I think this is a better place to ask) For updates, I'm using GATK-3.3.0 and want to be clear with the default option for -mode.
is USE_READS the default behavior if not specified? Please update your tool documentation with the default value, which is stated as "NA". Thanks.

@SerenaRhie In future please don't post the same question in two different places, it makes the job of maintaining the forum more difficult.

Yes, USE_READS` is the default value. I'm not sure why the defaults are not getting written properly in the doc; will check & fix.

Geraldine Van der Auwera, PhD

• Posts: 7Member
edited April 13

Post edited by SerenaRhie on
• Posts: 38Member
edited August 24

Hello,

While creating a target list of intervals to be realigned from my BAM file I am getting the following error message:

##### ERROR MESSAGE: SAM/BAM/CRAM file htsjdk.samtools.SamReader\$PrimitiveSamReaderToSamReaderAdapter@7b085bce is malformed: BAM file has a read with mismatching number of bases and base qualities. Offender: HISEQ:429:C68GLACXX:1:1214:18721:100660 [126 bases] [0 quals]. You can use --defaultBaseQualities to assign a default base quality for all reads, but this can be dangerous in you don't know what you are doing.

What should I do to check the error in the BAM file? I have used BWA Mem for realignment and Picardtools for marking duplicates. This is an exome sample.

Post edited by aneek on
• Posts: 38Member

More information regarding this error. I have checked the BAM file in Picardtools using ValidateSamFile function and no errors were reported by the program. Any reply to my queries or how can I solve this problem!!