"ERROR:INVALID_TAG_NM" possibly causing CollectWgsMetrics to fail

sp580sp580 GermanyMember

Hello!

I've been trying to run CollectWgsMetrics on my BAM files, but it fails, producing this message Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException

I wen through some threads dealing with this issue, and the recomendation was to run ValidateSamFile on the BAM. This is what I got in one file:
## HISTOGRAM java.lang.String
Error Type Count
ERROR:INVALID_TAG_NM 1587

I also saw that the error is reported by CollectWgsMetrics always after the same possition in all BAM files: Last read position: 1:193.462.779.

These are the lines corresponding to the same BAM file ran through ValidateSamFile:
A00556:11:HGLC7DSXX:3:1428:32669:14168 163 1 193462779 40 151M = 193463115 487 GGTTGAGTCCTTAGGGTTTTCTATGTATAGTATCATGTCATCTGCATACAGTGACAGTTTGATCTCTTCTCTTCCTATATGGATGCCTTTTATTTCTTTTGTTTGTCTAATTGCTGTGGCTAGGACTTCCAAAACTATGTTGAAGAGCAGT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF:FFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFF,FF MC:Z:151M MD:Z:151 PG:Z:MarkDuplicates RG:Z:HGLC7DSXX.3 NM:i:0AS:i:151 XS:i:151 A00556:11:HGLC7DSXX:2:2517:5168:19711 83 1 193462779 11 2S149M = 193462532 -396 TGGGTTGAGTCCTTAGGGTTTTCTATGTATAGTACCATGTCATCTGCATACAGTGACAGTTTGATCTCTTCTCTTCCTATATGGATGCCTTTTATTTCTTTTGTTTGTCTAATTGCTGTGGCCAGGACTTCCAAAACTATGTTGAAGAGCA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MC:Z:151M MD:Z:32T87T28 PG:Z:MarkDuplicates RG:Z:HGLC7DSXX.2 NM:i:2AS:i:139 XS:i:146 A00556:11:HGLC7DSXX:3:2330:29369:7921 83 1 193462779 0 4S147M = 193462581 -345 TTGGGGTTGAGTCCTTAGGGTTTTCTATGTATAGTATCATGTCATCTGCATACAGTGACAGTTTGATCTCTTCTCTTCCCATATGGATGCCTTTTATTTCTTTTGTTTGTCTAATTGCTGTGGCTAGGACTTCCAAAACTATGTTGAAGAG FFF,F:FFFFFFFF:FFFFF,F,F:F:FFFFFFFFFF::FFFFFFFFFF:FFFFFFFFFFFFF::FFFFFFFF:FFFFFFFFFFF,FFF:FFFFFFFFFF,FFFF:FF:FFFF:FFFFFFFFFFFFF::FFFFFF,FFFFFFFFFF:,F,, MC:Z:151M MD:Z:75T71 PG:Z:MarkDuplicates RG:Z:HGLC7DSXX.3 NM:i:1AS:i:142 XS:i:146 A00556:11:HGLC7DSXX:4:1376:31602:35650 83 1 193462779 3 4S147M = 193462570 -356 TTTGGGTTGAGTCCTTAGGGTTTTCTATGTATAGTATCATGTCATCTGCATACAGTGACAGTTTGATCTCTTCTCTTCCTATATGGATGCCTTTTATTTCTTTTGTTTGTCTAATTGCTGTGGCTAGGACTTCCAAAACTATGTCGAAGAG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MC:Z:151M MD:Z:140T6 PG:Z:MarkDuplicates RG:Z:HGLC7DSXX.4 NM:i:1AS:i:142 XS:i:146 A00556:11:HGLC7DSXX:4:2266:24867:32816 147 1 193462779 10 2S149M = 193462538 -390 GGGGTTGAGTCCCTAGGGTTTTCTATGTATAGTATCATGTCATCTGCATACAGTGACAGTTTGATCTCTTCTCTTCCTATATGGATGCCTTTTATTTCTTTTGTTTGTCTAATTGCTGTGGCTAGGACTTCCAAAACGATGTTGAAGAGCA FFFFFFFFFFFFFFFF:FF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF,FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MC:Z:151M MD:Z:10T124T13 PG:Z:MarkDuplicates RG:Z:HGLC7DSXX.4 NM:i:2AS:i:139 XS:i:146

I cannot, however figure out what is the problem with the NM tag, and I would really appreciate some suggestions.

Thanks in advance!

PS: before I was working with sequences produced by HiSeq4000 and this problem did not appear. Now I am working with sequences produced by NovaSeq.

Best Answer

Answers

Sign In or Register to comment.