This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
RG header SM field: pooling
The SAM standard and the GATK documentation both describe the SM field of the RG tag similarly, as containing the sample name, but when sequencing pools of samples, use the pool name instead of an individual sample name. And, the LB field is described as containing the library name.
Say I have samples S1 through S10, I make barcoded libraries from each one, so I have libraries S1 through S10, and I then pool S1 through S5 together into Pool1 and sequence that on one lane, and pool S6 through S10 together into Pool2 and sequence that on a second lane. Then, my understanding is that I would set the SM field to either Pool1 or Pool2, and LB to S1, S2, ... S10.
This is what I in fact did. Now, I discover that UnifiedGenotyper is putting the SM value instead of the LB value into the VCF file sample column header.
How did I misinterpret the seemingly clear documents?
I suggest that the description of SM be changed to NOT say that it should be the pool name, since that can be interpreted in more than one way.