Attention:
The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

"ERROR:INVALID_TAG_NM" possibly causing CollectWgsMetrics to fail

sp580sp580 GermanyMember

Hello!

I've been trying to run CollectWgsMetrics on my BAM files, but it fails, producing this message Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException

I wen through some threads dealing with this issue, and the recomendation was to run ValidateSamFile on the BAM. This is what I got in one file:
## HISTOGRAM java.lang.String
Error Type Count
ERROR:INVALID_TAG_NM 1587

I also saw that the error is reported by CollectWgsMetrics always after the same possition in all BAM files: Last read position: 1:193.462.779.

These are the lines corresponding to the same BAM file ran through ValidateSamFile:
A00556:11:HGLC7DSXX:3:1428:32669:14168 163 1 193462779 40 151M = 193463115 487 GGTTGAGTCCTTAGGGTTTTCTATGTATAGTATCATGTCATCTGCATACAGTGACAGTTTGATCTCTTCTCTTCCTATATGGATGCCTTTTATTTCTTTTGTTTGTCTAATTGCTGTGGCTAGGACTTCCAAAACTATGTTGAAGAGCAGT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF:FFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFF,FF MC:Z:151M MD:Z:151 PG:Z:MarkDuplicates RG:Z:HGLC7DSXX.3 NM:i:0AS:i:151 XS:i:151 A00556:11:HGLC7DSXX:2:2517:5168:19711 83 1 193462779 11 2S149M = 193462532 -396 TGGGTTGAGTCCTTAGGGTTTTCTATGTATAGTACCATGTCATCTGCATACAGTGACAGTTTGATCTCTTCTCTTCCTATATGGATGCCTTTTATTTCTTTTGTTTGTCTAATTGCTGTGGCCAGGACTTCCAAAACTATGTTGAAGAGCA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MC:Z:151M MD:Z:32T87T28 PG:Z:MarkDuplicates RG:Z:HGLC7DSXX.2 NM:i:2AS:i:139 XS:i:146 A00556:11:HGLC7DSXX:3:2330:29369:7921 83 1 193462779 0 4S147M = 193462581 -345 TTGGGGTTGAGTCCTTAGGGTTTTCTATGTATAGTATCATGTCATCTGCATACAGTGACAGTTTGATCTCTTCTCTTCCCATATGGATGCCTTTTATTTCTTTTGTTTGTCTAATTGCTGTGGCTAGGACTTCCAAAACTATGTTGAAGAG FFF,F:FFFFFFFF:FFFFF,F,F:F:FFFFFFFFFF::FFFFFFFFFF:FFFFFFFFFFFFF::FFFFFFFF:FFFFFFFFFFF,FFF:FFFFFFFFFF,FFFF:FF:FFFF:FFFFFFFFFFFFF::FFFFFF,FFFFFFFFFF:,F,, MC:Z:151M MD:Z:75T71 PG:Z:MarkDuplicates RG:Z:HGLC7DSXX.3 NM:i:1AS:i:142 XS:i:146 A00556:11:HGLC7DSXX:4:1376:31602:35650 83 1 193462779 3 4S147M = 193462570 -356 TTTGGGTTGAGTCCTTAGGGTTTTCTATGTATAGTATCATGTCATCTGCATACAGTGACAGTTTGATCTCTTCTCTTCCTATATGGATGCCTTTTATTTCTTTTGTTTGTCTAATTGCTGTGGCTAGGACTTCCAAAACTATGTCGAAGAG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MC:Z:151M MD:Z:140T6 PG:Z:MarkDuplicates RG:Z:HGLC7DSXX.4 NM:i:1AS:i:142 XS:i:146 A00556:11:HGLC7DSXX:4:2266:24867:32816 147 1 193462779 10 2S149M = 193462538 -390 GGGGTTGAGTCCCTAGGGTTTTCTATGTATAGTATCATGTCATCTGCATACAGTGACAGTTTGATCTCTTCTCTTCCTATATGGATGCCTTTTATTTCTTTTGTTTGTCTAATTGCTGTGGCTAGGACTTCCAAAACGATGTTGAAGAGCA FFFFFFFFFFFFFFFF:FF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF,FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MC:Z:151M MD:Z:10T124T13 PG:Z:MarkDuplicates RG:Z:HGLC7DSXX.4 NM:i:2AS:i:139 XS:i:146

I cannot, however figure out what is the problem with the NM tag, and I would really appreciate some suggestions.

Thanks in advance!

PS: before I was working with sequences produced by HiSeq4000 and this problem did not appear. Now I am working with sequences produced by NovaSeq.

Best Answer

Answers

Sign In or Register to comment.