If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Sam flags down a boat
Let's talk about SAM flags. We'll breeze over some background (skipping details better left to CS majors), filter alignment records via their flag bits using Samtools, and end with how to create a valid BAM containing read pairs or sets where at least one record has the specified flag bit.
SAM flags summarize many properties of reads, represented by flag bits, into a single number. Think of each flag bit as one layer of watercolor paint on a canvas. Each SAM flag is then the distinct chroma resulting from multiple layers of overlapping pigment.
Cats pPuUrR12; Sam flags down a boat
That's the mnemonic I created to memorize the twelve characters in
pPuUrR12sfdb. Typically, I use the Explain flags page on the Picard website to decipher SAM flag values. To do better, I made a summary chart and mnemonic for the twelve SAM flag bits. The chart's rows 1–8 refer to cats purring (
PUR2), and rows 9–12 refer to the title of this blogpost
If you are an American football fan and/or watched the Super Bowl last week, you may know the Carolina Panthers' mascot Sir Purr. Sir Purr's team lost 24:10 and gives us another mnemonic: Sir pPuUrR12 loses Superbowl Fifty to Denver Broncos.
Dive deeper into bits with this exercise
If you're interested, this exercise uses Mac OS's Calculator app. Switch to Calculator's Programmer mode and play around with the features (A–E) I've highlighted in the figure.
In the example in (F), SAM flag
99 displays in binary as
0000 0110 0011. This is read right-to-left, where 0=no and 1=yes, giving us yes values for positions 1, 2, 6 and 7. Using our mnemonic, this translates to
pPR1 or paired, properly paired, mate reverse and read1.
If you forget what a particular bit represents, use the
samtools flags command as shown below.
samtools flags 256
0x100 returns the same answer.
0x100 256 SECONDARY
Filter reads by their flag bits using Samtools
flagstat option gives a useful summary count for the SAM flag categories.
samtools flagstat xyz.bam
The next set of commands either display actual records or return counts
-c of records. Depending on capitalization,
-F either include or exclude records with the specified flag bits, respectively. Listing two flag values, e.g.
-F 2 -f 256, returns records that satisfy both conditions, e.g. not proper pair and secondary.
samtools view -f 0x63 xyz.bam | more #display records with all bits represented by 99 samtools view -c -f 99 xyz.bam #count records with all bits represented by 99 samtools view -c -f 1 -f 2 -f 32 -f 64 xyz.bam #same as above samtools view -c -F 2 xyz.bam #count records that do not have the 2 bit in flag
Subset reads by flag bit into a valid BAM
For transforming BAMs, I stick to using Picard tools for downstream GATK compatibility.
At one point I wanted to visualize in isolation only secondary alignments but in the context of their read sets. To create such a subsetted BAM, I used three commands. The first uses the
256 SAM flag to extract the read names, the second sorts reads by queryname, and the third command uses FilterSamReads and the list of read names to generate a valid BAM containing our reads of interest.
# Generate a list of unique read names of secondary alignments samtools view -f 256 xyz.bam | cut -f1 | sort | uniq > xyz_f256.txt # Queryname sort the input BAM java -jar picard.jar SortSam INPUT=xyz.bam OUTPUT=xyz_querynamesort.bam SORT_ORDER=queryname # Create a new BAM containing read sets java -jar picard.jar FilterSamReads INPUT=xyz_querynamesort.bam OUTPUT=xyz_f256.bam \ FILTER=includeReadList READ_LIST_FILE=xyz_f256.txt SORT_ORDER=coordinate CREATE_INDEX=true TMP_DIR=/tmp
Since forward and reverse reads in a pair and multiple alignments for a given read are all identically named in the BAM, the list of read names pulls out the alignment set for a given read name. You can also set the
FILTER parameter to
Nautical flags at top spell SAM FLAGS. If I've gone overboard with sailing references, je m'excuse. I've had a great year racing solings.