Unclear error message on missing @RG tag in header

TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 86Member
edited January 15 in Ask the GATK team

When GATK finds a read for which a corresponding @RG tag is missing in the header, the error message given implies that the read itself is lacking an RG tag rather than the header. Could this be fixed please so that the two error conditions are differentiated? It will save people time when debugging their pipelines if they don't have to go looking at the wrong thing.

ERROR MESSAGE: SAM/BAM file SAMFileReader{/lustre/blah/DDD_MAIN5247030.bam} is malformed: Read HS7_7515:4:2101:12189:66438#2 is missing the read group (RG) tag, which is required by the GATK. Please use to fix this problem

The reads have the RG tag but an @RG tag matching their ID does not exist in the header.

901282:HS7_7515:4:2101:12189:66438#2 99 1 37000590 60 75M = 37000629 114 * * X0:i:1 X1:i:0 BC:Z:CGATGTAT BD:Z:* MD:Z:75 PG:Z:MarkDuplicates RG:Z:1#2 BI:Z:* AM:i:37 NM:i:0 SM:i:37 MQ:i:60 QT:Z:BCAADFFE XT:A:U BQ:Z:* 901283:HS7_7515:4:2101:12189:66438#2 147 1 37000629 60 75M = 37000590 -114 * * X0:i:1 X1:i:0 BD:Z:* MD:Z:75 PG:Z:MarkDuplicates RG:Z:1#2 BI:Z:* AM:i:37 NM:i:0 SM:i:37 MQ:i:60 XT:A:U BQ:Z:*

P.S. your spam filter is stopping me posting discussions with URLs in, could you whitelist any gatkforums dot broad institute dot org urls?

Post edited by TechnicalVault on

Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

Tagged:

Best Answer

Answers

Sign In or Register to comment.