#### Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

# Sam flags down a boat

edited February 2016

Let's talk about SAM flags. We'll breeze over some background (skipping details better left to CS majors), filter alignment records via their flag bits using Samtools, and end with how to create a valid BAM containing read pairs or sets where at least one record has the specified flag bit.

SAM flags summarize many properties of reads, represented by flag bits, into a single number. Think of each flag bit as one layer of watercolor paint on a canvas. Each SAM flag is then the distinct chroma resulting from multiple layers of overlapping pigment.

### Cats pPuUrR12; Sam flags down a boat

That's the mnemonic I created to memorize the twelve characters in `pPuUrR12sfdb`. Typically, I use the Explain flags page on the Picard website to decipher SAM flag values. To do better, I made a summary chart and mnemonic for the twelve SAM flag bits. The chart's rows 1–8 refer to cats purring (`pur1` and `PUR2`), and rows 9–12 refer to the title of this blogpost `sfdb`.

If you are an American football fan and/or watched the Super Bowl last week, you may know the Carolina Panthers' mascot Sir Purr. Sir Purr's team lost 24:10 and gives us another mnemonic: Sir pPuUrR12 loses Superbowl Fifty to Denver Broncos.

### Dive deeper into bits with this exercise

If you're interested, this exercise uses Mac OS's Calculator app. Switch to Calculator's Programmer mode and play around with the features (A–E) I've highlighted in the figure.

In the example in (F), SAM flag `99` displays in binary as `0000 0110 0011`. This is read right-to-left, where 0=no and 1=yes, giving us yes values for positions 1, 2, 6 and 7. Using our mnemonic, this translates to `pPR1` or paired, properly paired, mate reverse and read1.

If you forget what a particular bit represents, use the `samtools flags` command as shown below.

``````samtools flags 256
``````

Replacing `256` with `SECONDARY` or `0x100` returns the same answer.

``````0x100   256 SECONDARY
``````

### Filter reads by their flag bits using Samtools

The `flagstat` option gives a useful summary count for the SAM flag categories.

``````samtools flagstat xyz.bam
``````

The next set of commands either display actual records or return counts `-c` of records. Depending on capitalization, `-f` and `-F` either include or exclude records with the specified flag bits, respectively. Listing two flag values, e.g. `-F 2 -f 256`, returns records that satisfy both conditions, e.g. not proper pair and secondary.

``````samtools view -f 0x63 xyz.bam | more #display records with all bits represented by 99
samtools view -c -f 99 xyz.bam #count records with all bits represented by 99
samtools view -c -f 1 -f 2 -f 32 -f 64 xyz.bam #same as above
samtools view -c -F 2 xyz.bam #count records that do not have the 2 bit in flag
``````

### Subset reads by flag bit into a valid BAM

For transforming BAMs, I stick to using Picard tools for downstream GATK compatibility.

At one point I wanted to visualize in isolation only secondary alignments but in the context of their read sets. To create such a subsetted BAM, I used three commands. The first uses the `256` SAM flag to extract the read names, the second sorts reads by queryname, and the third command uses FilterSamReads and the list of read names to generate a valid BAM containing our reads of interest.

``````# Generate a list of unique read names of secondary alignments
samtools view -f 256 xyz.bam | cut -f1 | sort | uniq > xyz_f256.txt

# Queryname sort the input BAM
java -jar picard.jar SortSam INPUT=xyz.bam OUTPUT=xyz_querynamesort.bam SORT_ORDER=queryname

# Create a new BAM containing read sets
java -jar picard.jar FilterSamReads INPUT=xyz_querynamesort.bam OUTPUT=xyz_f256.bam \
Since forward and reverse reads in a pair and multiple alignments for a given read are all identically named in the BAM, the list of read names pulls out the alignment set for a given read name. You can also set the `FILTER` parameter to `excludeReadList`.