If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Panel of Normals (PON)
A Panel of Normal or PON is a type of resource used in somatic variant analysis. Depending on the type of variant you're looking for, the PON will be generated differently. What all PONs have in common is that (1) they are made from normal samples (in this context, "normal" means derived from healthy tissue that is believed to not have any somatic alterations) and (2) their main purpose is to capture recurrent technical artifacts in order to improve the results of the variant calling analysis.
As a result, the most important selection criteria for choosing normals to include in any PON are the technical properties of how the data was generated. It's very important to use normals that are as technically similar as possible to the tumor (same exome or genome preparation methods, sequencing technology and so on). Additionally, the samples should come from subjects that were young and healthy to minimize the chance of using as normal a sample from someone who has an undiagnosed tumor. Normals are typically derived from blood samples.
There is no definitive rule for how many samples should be used to make a PON (even a small PON is better than no PON) but in practice we recommend aiming for a minimum of 40.
At the Broad Institute, we typically make a standard PON for a given version of the pipeline (corresponding to the combination of all protocols used in production to generate the sequence data, starting from sample preparation and including the analysis software) and use it to process all tumor samples that go through that version of the pipeline. Because we process many samples in the same way, we are able to make PONs composed of hundreds of samples.
Variant type-specific recommendations are given below.
Short variants (SNVs and indels)
For short variant discovery, the PON is created by running the variant caller Mutect2 individually on a set of normal samples and combining the resulting variant calls with some criteria (e.g. excluding any sites that are not present in at least 2 normals) as defined in the Best Practices documentation. This produces a sites-only VCF file that can be used as PON for Mutect2.
Copy Number Variants
For CNV discovery, the PON is created by running the initial coverage collection tools individually on a set of normal samples and combining the resulting copy ratio data using a dedicated PON creation tool. This produces a binary file that can be used as PON.