Problem with Indelrealigner assigning incorrect mate information (GATK v3.4-46-gbc02625)

kw10kw10 CambridgeMember

Mate information seems to be assigned incorrectly when the supplementary alignment is on the same chromosome as one of the non-supplementary alignments, and the primary alignments are on different chromosomes. For example:

(The first 9 columns of a SAM line for read HS36_15753:4:1106:13106:57119#25)

HS36_15753:4:1106:13106:57119#25 2209 12 2870282 60 37H38M = 2870346 0 # supplementary, second read; mate info, columns 7-9) matches the other part of the second read (with flag 161)
HS36_15753:4:1106:13106:57119#25 81 12 2870346 60 75M hs37d5 32110385 0 # first read in a pair
HS36_15753:4:1106:13106:57119#25 161 hs37d5 32110385 60 45M30S 12 2870346 0 # first part of the second read

after realignment becomes:

HS36_15753:4:1106:13106:57119#25 2209 12 2870282 60 37H38M = 2870346 139 # mate coord is correct but insert size is changed to 139
HS36_15753:4:1106:13106:57119#25 81 12 2870346 60 75M = 2870282 -139 # first read; mate info says the supplementary alignment is the mate
HS36_15753:4:1106:13106:57119#25 161 hs37d5 32110385 60 45M30S 12 2870346 0 # no change

When I run Picard FixMateInformation I get back the original alignments:

HS36_15753:4:1106:13106:57119#25 2209 12 2870282 60 37H38M = 2870346 0
HS36_15753:4:1106:13106:57119#25 81 12 2870346 60 75M hs37d5 32110385 0
HS36_15753:4:1106:13106:57119#25 161 hs37d5 32110385 60 45M30S 12 2870346 0

I noticed this problem because it causes downstream problems with duplicate marking (duplicates with the exact 3 alignments are not getting marked as duplicate)

Thanks

Tagged:

Issue · Github
by Sheila

Issue Number
71
State
closed
Last Updated
Closed By
vdauwera

Answers

Sign In or Register to comment.