Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

SAM/BAM file has inconsistent mapping information

mikemike Posts: 103Member
edited January 2013 in Ask the GATK team

Hi,

I run into an error at step of IndelRealigner for GATK v2.0 complaining about SAM/BAM file has inconsistent mapping information

here is the command I used (take out full path for clarity):

java -Xmx4g -jar /Path/GenomeAnalysisTK-2.1-8-g5efb575/bin/GenomeAnalysisTK.jar -T IndelRealigner -I /Path/myBam.bam -R /path/hg19.fa -targetIntervals /path/myBam.output.intervals -o /Path/my_realignedBam.bam -known /Path/bundle-1.5/hg19/Mills_and_1000G_ gold_standard.indels.hg19.sites.vcf -known /Path/bundle-1.5/hg19/1000G_phase1.indels.hg19.vcf

Here is the error message I encountered: ...

ERROR MESSAGE: SAM/BAM file SAMFileReader{/Path/myBam.bam} is malformed: read NCI-GA1:30:70BETAAXX:2:114

:10000:10163 145 chr1 * 37 108M chr14 59648529 * GCAAGACCAACAAGAAGATCGCCATTGCTAACTGTGGACAACTCTAATAAATTTGGCTTGTGTTTTATCTTAGCCACCACACTGTTCTTTCTG TAGCTCAAGAGAGTA @?BEC@BCB@DB@;=8BAB<8BDDDEFIIHEIHI>I<IIDHHIHDIIII@GIDIIIIICIIHIHIHIIIIIIBIIHIIDIHIIIIIIFIDI has inconsistent mapping information.

ERROR ------------------------------------------------------------------------------------------

...

Anybody encountered similar issue? Advice would be greatly appreciated!

Mike

Post edited by Geraldine_VdAuwera on
Tagged:

Best Answer

  • pdexheimerpdexheimer Posts: 360Member, GSA Collaborator ✭✭✭
    Answer ✓

    Huh. I'm surprised that the content of the read was modified in the error message. The two reads you posted look legal to me, but they do both have the inconsistency that GATK complained about. Positions in SAM files are 1-based, so a value of 0 means "unknown" - which means the first read you posted aligned somewhere on chr1, but we don't know where. It's reasonable for GATK to consider this unmapped, which leads to the same scenario I outlined before.

    The second read has the same problem, but this time in the position of the read's mate. Again, the flags say the mate is mapped but there's no position provided. GATK may not choke on this read, though, because it might not look at the mate position information.

    I've never seen bwa output alignments like this, my best suggestion would be to try aligning them again (maybe this is a filesystem/threading hiccup?). I'll note that BLAT aligns that first read to chr14:59648613, so even the chr1 entry is probably wrong.

Answers

  • pdexheimerpdexheimer Posts: 360Member, GSA Collaborator ✭✭✭

    This looks like a malformed bam - both the POS and TLEN fields are *, which is illegal according to the spec. The error is most likely "Inconsistent mapping information" because the parser treats it as unaligned due to the lack of a POS, but the FLAGs specify that it is aligned

  • mikemike Posts: 103Member

    Thanks for the input, however, the error message was from GATK, which somehow change the read content a bit. If I pulled out the reads directly from the bam file, they look normal in POS and TLEN fields. Below are the actual reads I pulled out from the bam file, which looks fine to me except for the POS as 0, not sure if that is the issue (the * in above message for the read are modification of reads within GATK error message, not sure why is that)

    NCI-GA1:30:70BETAAXX:2:114:10000:10163 145 chr1 0 37 108M chr14 59648529 0 GCAAGACCAACAAGAAGATCGCCATTGCTAACTGTGGACAACTCTAATAAATTTGGCTTGTGTTTTATCTTAGCCACCACACTGTTCTTTCTGTAGCTCAAGAGAGTA @?BEC@BCB@DB@;=8BAB<8BDDDEFIIHEIHI>I<IIDHHIHDIIII@GIDIIIIICIIHIHIHIIIIIIBIIHIIDIHIIIIIIFIDI X0:i:1 X1:i:0 MD:Z:0 RG:Z:70BETAAXX_Sample_F4 XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XN:i:107 XO:i:0 XT:A:N NCI-GA1:30:70BETAAXX:2:114:10000:10163 97 chr14 59648529 37 108M chr1 0 0 TGGATGGCAAGCATGTGGTTTTTTGGCAAGGTAAAGACAGAAGGAATATCTTGGAAGGCACAGAGTGCTTTGGGTCCAGAAATGGCAAGACCAACAAGAAGATCGCCA HHHHHHHGHHEBHHHDGGBGGGEDGHHHHFHEHHHHHGHHGHHH>HHFHGHDHHHHHGDHHHH<HBHHFHDFFFBGHGHBEEB@EFHGEBDB3BB2@@>@>BB@@B@A X0:i:1 X1:i:0 MD:Z:108 RG:Z:70BETAAXX_Sample_F4 XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U

    Plus, I had total 5 bam files, this is only bam file that GATK v 2.0 complained and the other 4 seem fine. (BTW, the exome-seq data was mapped with bwa and from illumina GA IIx)

    Thanks in advance for any other advice!

    Mike

  • mikemike Posts: 103Member

    Sorry, my bad, the two reads I pasted above shall be separated (they stuck together in screen) NCI-GA1:30:70BETAAXX:2:114:10000:10163 145 chr1 0 37 108M chr14 59648529 0 GCAAGACCAACAAGAAGATCGCCATTGCTAACTGTGGACAACTCTAATAAATTTGGCTTGTGTTTTATCTTAGCCACCACACTGTTCTTTCTGTAGCTCAAGAGAGTA @?BEC@BCB@DB@;=8BAB<8BDDDEFIIHEIHI>I<IIDHHIHDIIII@GIDIIIIICIIHIHIHIIIIIIBIIHIIDIHIIIIIIFIDI X0:i:1 X1:i:0 MD:Z:0 RG:Z:70BETAAXX_Sample_F4 XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XN:i:107 XO:i:0 XT:A:N

    NCI-GA1:30:70BETAAXX:2:114:10000:10163 97 chr14 59648529 37 108M chr1 0 0 TGGATGGCAAGCATGTGGTTTTTTGGCAAGGTAAAGACAGAAGGAATATCTTGGAAGGCACAGAGTGCTTTGGGTCCAGAAATGGCAAGACCAACAAGAAGATCGCCA HHHHHHHGHHEBHHHDGGBGGGEDGHHHHFHEHHHHHGHHGHHH>HHFHGHDHHHHHGDHHHH<HBHHFHDFFFBGHGHBEEB@EFHGEBDB3BB2@@>@>BB@@B@A X0:i:1 X1:i:0 MD:Z:108 RG:Z:70BETAAXX_Sample_F4 XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U

    Just realized that POS as 0 mean unmapped, based on http://picard.sourceforge.net/explain-flags.html, for the paired reads above flag 97 as Summary: read paired mate reverse strand first in pair

    flag 145 as Summary: read paired read reverse strand second in pair

    the flag did not say mapped or unmapped, could that be the issue?

    Thanks

    Mike

  • mikemike Posts: 103Member
    edited October 2012

    Dear pdexheimer:

    Thanks so much for the insight, which sounds very reasonable to me. I will realign this sample and see.

    Thanks again and best Mike

    Post edited by mike on
Sign In or Register to comment.