Service Notice: Due to the blizzard currently hammering the US Northeast, the Broad is shut down and the GATK forum will be mostly unattended while we hunker down and sip hot cocoa with marshmallows. Assuming the power stays on and we're able to dig ourselves out of the snow when it's all over, normal service should resume Wednesday or Thursday.

Unclear error message on missing @RG tag in header

TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 86Member
edited January 2014 in Ask the GATK team

When GATK finds a read for which a corresponding @RG tag is missing in the header, the error message given implies that the read itself is lacking an RG tag rather than the header. Could this be fixed please so that the two error conditions are differentiated? It will save people time when debugging their pipelines if they don't have to go looking at the wrong thing.

ERROR MESSAGE: SAM/BAM file SAMFileReader{/lustre/blah/DDD_MAIN5247030.bam} is malformed: Read HS7_7515:4:2101:12189:66438#2 is missing the read group (RG) tag, which is required by the GATK. Please use to fix this problem

The reads have the RG tag but an @RG tag matching their ID does not exist in the header.

901282:HS7_7515:4:2101:12189:66438#2 99 1 37000590 60 75M = 37000629 114 * * X0:i:1 X1:i:0 BC:Z:CGATGTAT BD:Z:* MD:Z:75 PG:Z:MarkDuplicates RG:Z:1#2 BI:Z:* AM:i:37 NM:i:0 SM:i:37 MQ:i:60 QT:Z:BCAADFFE XT:A:U BQ:Z:* 901283:HS7_7515:4:2101:12189:66438#2 147 1 37000629 60 75M = 37000590 -114 * * X0:i:1 X1:i:0 BD:Z:* MD:Z:75 PG:Z:MarkDuplicates RG:Z:1#2 BI:Z:* AM:i:37 NM:i:0 SM:i:37 MQ:i:60 XT:A:U BQ:Z:*

P.S. your spam filter is stopping me posting discussions with URLs in, could you whitelist any gatkforums dot broad institute dot org urls?

Post edited by TechnicalVault on

Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute


Best Answer


Sign In or Register to comment.