The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

# Indel realigner introducing errors

Member

Hi

I'm indel realigning with version 2.4-9 using generic commands such as:

java -Xmx4g -jar /path/to/GenomeAnalysisTK.jar \
-T RealignerTargetCreator \
-R /path/to/reference.fasta \
-I /path/to/input.bam \
-o /path/to/realigner.intervals

java -Xmx4g -jar /path/to/GenomeAnalysisTK.jar \
-T IndelRealigner \
-R /path/to/reference.fasta \
-I /path/to/sample-level.bam \
-targetIntervals /path/to/realigner.intervals.from.rtc \
-o /path/to/realigned.bam \
-model USE_SW \
-LOD 0.4

In most cases this is working fine, but in a few cases it is introducing artefacts that subsequently cause the bam file to fail Picard's ValidateSamFile, and in a couple of cases it introduces errors that can't be fixed by CleanSam and/or FixMateInformation.

Here's an example of an error that can be fixed by CleanSam:

Before indel realignment:

HS24_08564:7:2311:4630:19372#87 69 AAKM01002546 471 0 * = 471 0
HS24_08564:7:2311:4630:19372#87 137 AAKM01002546 471 25 100M = 471 0

After indel realignment:

HS24_08564:7:2311:4630:19372#87 69 AAKM01002546 471 0 * = 471 0
HS24_08564:7:2311:4630:19372#87 137 AAKM01002546 471 35 91M1D9M = 471 0

Here's an example of an error that can't be fixed:

Before indel realignment:

HS24_10061:6:1312:10172:98346#54 69 AAKM01002280 649 0 * = 649 0
HS24_10061:6:1312:10172:98346#54 137 AAKM01002280 649 37 100M = 649 0

After indel realignment:

HS24_10061:6:1312:10172:98346#54 69 AAKM01002280 649 0 * = 649 0
HS24_10061:6:1312:10172:98346#54 137 AAKM01002280 649 47 91M2D9M0S = 649 0

Is this a known bug? Any chance of a fix?

Thanks!

Richard

Tagged:

• Member

BTW, also have the same issue with version 2.7-4

Hi Richard, good to hear from you.

1) We can definitely take a look and patch it up if you are able to upload a small bam snippet that easily reproduces the error(s). See the posting about uploading to our ftp for more details.

2) I'm embarrassed to admit that I don't see the error in the first example. In the second example, is the problem that there's a trailing 0S? While that seems weird, is it technically an error? (In any case, we'll fix it of course - but I'm surprised that it fails validation)

3) Keep in mind that using the Smith-Waterman capabilities of the realigner with such a low LOD is really dangerous. That LOD is more appropriate when realigner only around known indels. Please see the best practices for more details.

• Member

1) OK, I've now uploaded a tarball (GATK_0S_bug_reproducible_example.tar.gz). See the script indel_realignment_bug_for_Eric_Banks.sh in here for details.

2) The first example causes errors such as the following with ValidateSamFile

ERROR: Record 80, Read name HS24_08564:7:2311:4630:19372#87, CIGAR M operator maps off end of reference

I think it probably is the trailing 0S that is causing CleanSam problems in the second example. It seems like Alec Wysoker thinks this is fixable in Picard so that CleanSam should be able to clean up such problems (http://sourceforge.net/mailarchive/forum.php?thread_name=90E74C31-3DEC-4773-BFC9-45AEA9209F87@broadinstitute.org&amp;forum_name=samtools-help), but would still be good to ensure GATK isn't introducing such errors in the first place.

3) Thanks for this. I'll look at the best practices again and reconsider

Best wishes and thanks for your help, Richard