Service notice: Several of our team members are on vacation so service will be slow through at least July 13th, possibly longer depending on how much backlog accumulates during that time. This means that for a while it may take us more time than usual to answer your questions. Thank you for your patience.

Sam flags down a boat

shleeshlee CambridgeMember, Broadie, Moderator
edited February 2016 in Announcements

imageLet's talk about SAM flags. We'll breeze over some background (skipping details better left to CS majors), filter alignment records via their flag bits using Samtools, and end with how to create a valid BAM containing read pairs or sets where at least one record has the specified flag bit.

SAM flags summarize many properties of reads, represented by flag bits, into a single number. Think of each flag bit as one layer of watercolor paint on a canvas. Each SAM flag is then the distinct chroma resulting from multiple layers of overlapping pigment.


image

Cats pPuUrR12; Sam flags down a boat

That's the mnemonic I created to memorize the twelve characters in pPuUrR12sfdb. Typically, I use the Explain flags page on the Picard website to decipher SAM flag values. To do better, I made a summary chart and mnemonic for the twelve SAM flag bits. The chart's rows 1–8 refer to cats purring (pur1 and PUR2), and rows 9–12 refer to the title of this blogpost sfdb.

If you are an American football fan and/or watched the Super Bowl last week, you may know the Carolina Panthers' mascot Sir Purr. Sir Purr's team lost 24:10 and gives us another mnemonic: Sir pPuUrR12 loses Superbowl Fifty to Denver Broncos.


image

Dive deeper into bits with this exercise

If you're interested, this exercise uses Mac OS's Calculator app. Switch to Calculator's Programmer mode and play around with the features (A–E) I've highlighted in the figure.

In the example in (F), SAM flag 99 displays in binary as 0000 0110 0011. This is read right-to-left, where 0=no and 1=yes, giving us yes values for positions 1, 2, 6 and 7. Using our mnemonic, this translates to pPR1 or paired, properly paired, mate reverse and read1.

If you forget what a particular bit represents, use the samtools flags command as shown below.

samtools flags 256

Replacing 256 with SECONDARY or 0x100 returns the same answer.

0x100   256 SECONDARY

Filter reads by their flag bits using Samtools

The flagstat option gives a useful summary count for the SAM flag categories.

samtools flagstat xyz.bam 

The next set of commands either display actual records or return counts -c of records. Depending on capitalization, -f and -F either include or exclude records with the specified flag bits, respectively. Listing two flag values, e.g. -F 2 -f 256, returns records that satisfy both conditions, e.g. not proper pair and secondary.

samtools view -f 0x63 xyz.bam | more #display records with all bits represented by 99 
samtools view -c -f 99 xyz.bam #count records with all bits represented by 99
samtools view -c -f 1 -f 2 -f 32 -f 64 xyz.bam #same as above
samtools view -c -F 2 xyz.bam #count records that do not have the 2 bit in flag

Subset reads by flag bit into a valid BAM

For transforming BAMs, I stick to using Picard tools for downstream GATK compatibility.

At one point I wanted to visualize in isolation only secondary alignments but in the context of their read sets. To create such a subsetted BAM, I used three commands. The first uses the 256 SAM flag to extract the read names, the second sorts reads by queryname, and the third command uses FilterSamReads and the list of read names to generate a valid BAM containing our reads of interest.

# Generate a list of unique read names of secondary alignments
samtools view -f 256 xyz.bam | cut -f1 | sort | uniq > xyz_f256.txt 

# Queryname sort the input BAM
java -jar picard.jar SortSam INPUT=xyz.bam OUTPUT=xyz_querynamesort.bam SORT_ORDER=queryname

# Create a new BAM containing read sets
java -jar picard.jar FilterSamReads INPUT=xyz_querynamesort.bam OUTPUT=xyz_f256.bam \
FILTER=includeReadList READ_LIST_FILE=xyz_f256.txt SORT_ORDER=coordinate CREATE_INDEX=true TMP_DIR=/tmp

Since forward and reverse reads in a pair and multiple alignments for a given read are all identically named in the BAM, the list of read names pulls out the alignment set for a given read name. You can also set the FILTER parameter to excludeReadList.


Nautical flags at top spell SAM FLAGS. If I've gone overboard with sailing references, je m'excuse. I've had a great year racing solings.

Post edited by shlee on
Sign In or Register to comment.