Service note: Geraldine is on vacation this week; other members of GSA will be responding to questions, but they have a lot of work besides this, so be aware that responses may be a little slower than usual. Thank you for your patience.

GATK Error SAM/BAM file SAMFileReader is malformed BAM header

dashtidashti Posts: 1Member

Hi there,

I get an error when I try to run GATK with the following command:

java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fa -I  merged_bam_files_indexed_markduplicate.bam -o reads.intervals

However I get this error:

SAM/BAM file SAMFileReader{/merged_bam_files_indexed_markduplicate.bam} is malformed: Read HWI-ST303_0093:5:5:13416:34802#0 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK.  Please use http://gatkforums.broadinstitute.org/discussion/59/companion-utilities-replacereadgroups to fix this problem

It suggest that it a header issue however my bam file has a header:

samtools view -h merged_bam_files_indexed_markduplicate.bam | grep ^@RG
@RG     ID:test1      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test   CN:japan
@RG     ID:test2      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test    CN:japan

when I grep the read within the error:

HWI-ST303_0093:5:5:13416:34802#0        99      1       1090    29      23S60M17S       =       1150    160     TGTTTGGGTTGAAGATTGATACTGGAAGAAGATTAGAATTGTAGAAAGGGGAAAACGATGTTAGAAAGTTAATACGGCTTACTCCAGATCCTTGGATCTC        GGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGFGGGGGGGGGDGFGFGGGGGFEDFGEGGGDGEG?FGGDDGFFDGGEDDFFFFEDG?E        MD:Z:60 PG:Z:MarkDuplicates     RG:Z:test1      XG:i:0  AM:i:29 NM:i:0  SM:i:29 XM:i:0  XO:i:0  XT:A:M

Following Picard solution:

java -XX:MaxDirectMemorySize=4G -jar picard-tools-1.85/AddOrReplaceReadGroups.jar I= test.bam O= test.header.bam SORT_ORDER=coordinate RGID=test RGLB=test  RGPL=Illumina RGSM=test/ RGPU=HWI-ST303  RGCN=japan CREATE_INDEX=True 

I get this error after 2 min.:

Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 12247781, Read name HWI-ST303_0093:5:26:10129:50409#0, MAPQ should be 0 for unmapped read.`

Any recommendation on how to solve this issue ?

My plan is to do the following to resolve the issue:

picard/MarkDuplicates.jar I=test.bam O=test_markduplicate.bam M=test.matrix AS=true VALIDATION_STRINGENCY=LENIANT
samtools  index test_markduplicate.bam

I see a lot of messages like below but the command still running:

Ignoring SAM validation error: ERROR: Record (number), Read name HWI-ST303_0093:5:5:13416:34802#0, RG ID on SAMRecord not found in header: test1

while running the command

then try the GATK RealignerTargetCreator

I already tried to do the following

java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fa -I  merged_bam_files_indexed_markduplicate.bam -o reads.intervals --validation_strictness LENIENT

But I still got the same error

N.B: the same command run with no issue with GATK version (1.2)

My pipeline in short: mapping the paired end reads with

bwa aln -q 20 ref.fa read > files.sai
bwa sampe ref.fa file1.sai file2.sai read1 read2 > test1.sam
samtools view -bS test1.sam | samtools sort - test
samtools  index test1.bam
samtools merge -rh RG.txt test test1.bam test2.bam

RG.txt

@RG     ID:test1      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test   CN:japan
@RG     ID:test2      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test    CN:japan

samtools  index test.bam
picard/MarkDuplicates.jar I=test.bam O=test_markduplicate.bam M=test.matrix AS=true VALIDATION_STRINGENCY=SILENT
samtools  index test_markduplicate.bam
Post edited by Geraldine_VdAuwera on
Jack

Answers

Sign In or Register to comment.