Cigar after picard MergeBamAlignment

gtangtan Member
edited March 9 in Ask the GATK team

Hi,
I got confused by the CIGAR in bam file. after MergeBamAlignment.
I have reads in uBam. Then it's converted into fastq and trimmed on 5' side. Then aligned with STAR and MergeBamAlignment the mapped Bam with the uBam to recover the tags.

Below is an example:
original fastq read

@K00206:101:HH7C2BBXX:1:1101:10003:10141
GATAATAACCTTGGAGCCGTTTCAACCACAGGACATGGGGAAAGTATCCTGAAGGTGAATCTGGCCAGACTTGCCCTCTTCCATGTAGAGCAAGGAAAGACCGTAGAGGAGGCTGCTCAGTTGGCA

I did 5bp triming on the 5'side.

@K00206:101:HH7C2BBXX:1:1101:10003:10141
TAACCTTGGAGCCGTTTCAACCACAGGACATGGGGAAAGTATCCTGAAGGTGAATCTGGCCAGACTTGCCCTCTTCCATGTAGAGCAAGGAAAGACCGTAGAGGAGGCTGCTCAGTTGGCA

After STAR mapping.

K00206:101:HH7C2BBXX:1:1101:10003:10141 16  19  9113177 255 32M975N89M  * 0 0 TGCCAACTGAGCAGCCTCCTCTACGGTCTTTCCTTGCTCTACATGGAAGAGGGCAAGTCTGGCCAGATTCACCTTCAGGATACTTTCCCCATGTCCTGTGGTTGAAACGGCTCCAAGGTTA JJJJJJJJJFJFAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ NH:i:1  HI:i:1  AS:i:120  nM:i:0  NM:i:0  MD:Z:121  jM:B:c,22 jI:B:i,9113209,9114183  XS:A:-

After MergeBamAlignment.

K00206:101:HH7C2BBXX:1:1101:10003:10141 16  19  9113177 255 5S32M975N89M  * 0 0 TGCCAACTGAGCAGCCTCCTCTACGGTCTTTCCTTGCTCTACATGGAAGAGGGCAAGTCTGGCCAGATTCACCTTCAGGATACTTTCCCCATGTCCTGTGGTTGAAACGGCTCCAAGGTTATTATC  JJJJJJJJJFJFAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA  MD:Z:0T0G0C0C1A1T1A0G0C0A0G0C2C1T0C1A0C0G0G0T3T1C0T1G1T0C0T0A0C1T0G1A0A0G1G1G1A0A1T1T1G0C0C0A0G0A0T1C0A0C0C0T0T0C1G0G0A1A1T0T0T0C0C0C1A3C0C1G0T0G0G0T0T1A0A0A1G0G0C0T0C0C0A1G0G0T1A0  PG:Z:STAR RG:Z:E11_m_Mut-6-2_A2B5 NH:i:1  NM:i:87 UQ:i:3524 AS:i:120

It's aligned to the reverse strand. The SEQ in bam is fine. But the CIGAR has "5S" on the left. Is it supposed to be on the right? When I visualise the bam after MergeBamAlignment, it's full of mismatches because of the CIGAR.

Any ideas?

Many thanks,
Ge

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @gtan
    Hi Ge,

    I don't think we recommend doing any trimming. Have a look at this article for more information.

    -Sheila

  • gtangtan Member

    Hi Sheila,

    Trimming is not the issue here.
    In my opinion, the problem is that MergeBamAlignment deals with the CIGAR in a wrong way. It simply doesn't consider the strand information.
    Or if MergeBamAlignment cannot merge original reads with trimmed alignment, it should stop and warn in the first place, instead of giving faulty results.

    Ge

  • shleeshlee CambridgeMember, Administrator, Broadie, Moderator admin

    Hi @gtan,

    When your uBAM contains untrimmed reads and your alignments contain trimmed reads, then MergeBamAlignment restores full-length sequences from the uBAM and soft-clips those bases not present in the alignment BAM.

    What you can do is create a uBAM from your FASTQ reads using FastqToSam, then AddOrReplaceReadGroups, then perform MergeBamAlignment.

    Alternatively, you could forego the features offered by MergeBamAlignment and proceed by using AddOrReplaceReadGroups directly on your alignment file. I just learned that BWA's -Y option asks the tool to soft-clip supplementary alignments instead of the default hard-clipping. So if you ran BWA with this particular option then there is less of a need to perform MergeBamAlignment.

  • gtangtan Member

    Thanks for your answer.
    Maybe I should make myself more clear.

    From the sequencer, we have fastq file for each sample. Then FastqToSam with sample name as ReadGroup RG tag, and then MergeSamFiles into one uBam to store.

    The intended pipeline for aligmment:
    uBam is SamToFastq to one fastq. The fastq is trimmed and aligned to get BAM. MergeBamAlignment to merge uBam and Bam to recover the RG tags from uBam. And this step caueses the problem I mentioned in the original answer.

    The alternative:
    uBam is SamToFastq with OUTPUT_PER_RG=true into multiple fastqs. Do the trimming, FastqToSam with RG, MergeSamFiles into a trimmed uBam. Do the alignment to get BAM. MergeBamAlignment to merge trimmed uBam and Bam to recover the RG tags from trimmed uBam.
    This will work, but requires extra steps of conversion.

    If you have better suggestions...

  • shleeshlee CambridgeMember, Administrator, Broadie, Moderator admin

    Hi @gtan,

    uBam is SamToFastq to one fastq. The fastq is trimmed and aligned to get BAM. MergeBamAlignment to merge uBam and Bam to recover the RG tags from uBam. And this step caueses the problem I mentioned in the original answer.

    Yes, this step causes the problem because you trimmed your reads. I'm glad to hear that one of the alternatives we proposed works for you. However, the extra steps in the conversion seems wasteful, so you are asking for better suggestions. Again, as I said above, you could try forgoing the MergeBamAlignment step. However, you lose out on maintaining a lossless dataset. You already lose the trimmed bases and you will lose reads and information that MergeBamAlignment salvages. However, these can be reconstituted using alternative means as I alluded to before. E.g. per-read-group mapping with BWA plus AddOrReplaceReadGroups, FixMateInformation and SetNmMdAndUqTags. You would have to check to see if the attributes you need that MergeBamAlignment provides are in place for your final clean BAM, e.g. unmapped reads.

Sign In or Register to comment.