We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Panel of Normals


I am currently using mutect2 with and without panel of normals and I would like to know what I use. Do you have a documentation about how does panel of normals selected. How does panel of normal selected from normal samples ?

Thanks in advance, Gufran


  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @gufran A generic pon is usually not so bad, and we provide some public ones in the GATK resource bucket. For example: gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz and gs://gatk-best-practices/somatic-b37/Mutect2-exome-panel.vcf. In fact, we use these in our validations to keep ourselves honest.

    If you make your own panel, you will want at least 20 samples. The more similar to your tumor samples, the the better, but an imperfect match is much better than nothing.

  • LindaLinda Member
    @davidben How can I get in the GATK resource bucket if I don't have a valid google count? I have tried many times but I can't receive the validation text from google.
  • gufrangufran Member

    But what I am asking is about how do you choose variants from the normal bam files. I am suspicious about all results that I got so I want to know how do you make your pon

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @gufran We make a panel of normals using CreateSomaticPanelOfNormals. This is described in section III A of our documentation: https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf. We also provide a WDL workflow: https://github.com/broadinstitute/gatk/blob/master/scripts/mutect2_wdl/mutect2_pon.wdl. Finally, a blog post on this workflow is also coming soon.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @Linda you can also access the GATK resource bundle via FTP: https://software.broadinstitute.org/gatk/download/bundle.

  • manolismanolis Member ✭✭✭

    Hi @davidben, I want to run Mutect2 in TumorOnly mode but I will have only 2 normal-WGS to create a pon. 2 are better than nothing or with only 2 normal-WGS (tumors and normals are from different people) doesn't work? Many thanks

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @manolis Two is better than nothing. I would probably set --min-sample-count 1 in CreateSomaticPanelOfNormals if I were you.

    That being said, I think you would be much better off using a larger, unmatched panel than a small, matched panel. Because FilterMutectCalls catches most technical artifacts with its other filters, the benefit of the panel is overwhelmingly due to catching mapping errors, which a generic panel can do reasonably well. We have some public ones in the GATK resource bundle / bucket, or you could make one with whatever WGS normals you have lying around.

  • manolismanolis Member ✭✭✭

    Many thanks @davidben. I wanted to use my PON just to be sure that the data were produced from the same NGS protocol/machine... but too few samples. I will try also yours file gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz. Best!

  • manolismanolis Member ✭✭✭

    gatk, bash pipeline, linux server

    Hi @davidben ,

    when I'm going to create my own pon file (in this example was created from wes data; I used CreateSomaticPanelOfNormals after GenomicsDBImport) I can see in the header the sample names of my input files:

    ##GATKCommandLine=<ID=CreateSomaticPanelOfNormals,CommandLine="CreateSomaticPanelOfNormals  ....
    ##INFO=<ID=BETA,Number=2,Type=Float,Description="Beta distribution parameters to fit artifact allele fractions">
    ##INFO=<ID=FRACTION,Number=1,Type=Float,Description="Fraction of samples exhibiting artifact">
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO

    I would like to use also your general pon file:


    ##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
    ##FORMAT=<ID=SA_POST_PROB,Number=3,Type=Float,Description="posterior probabilities of the presence of strand artifact">
    ##GATKCommandLine=<ID=Mutect2,CommandLine="Mutect2  --tumorSampleName HG02775 --output normalForPON.vcf.gz --input shlee-dev/1kg/exome_pon/CreateMutect2Pon1of2/218e3427-9fb6-4868-970d-038c6649138a/call-CramToBamAndIndex/HG02775.alt_bwamem_GRCh38DH.20150826.PJL.exome.bam ... ...
    ##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
    ##INFO=<ID=TLOD,Number=A,Type=String,Description="Tumor LOD score">
    ##Mutect Version=2.1-beta
    ##filtering_status=Warning: unfiltered Mutect 2 calls.  Please run FilterMutectCalls to remove false positives.
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO

    It is not clear for me the input files/normal samples... in this case they are not reported in the header...
    Why is reported a "tumor sample"?
    It was created with Mutect Version=2.1-beta. Can I use it with gatk v4.1.1.0?
    I have to pre-process it before to use it as pon? if yes, just running "FilterMutectCalls v4.1.1.0"?

    Many thanks,

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    Why is reported a "tumor sample"?

    That's an old oversight but it won't affect the functionality of the pon.

    It was created with Mutect Version=2.1-beta. Can I use it with gatk v4.1.1.0?

    Yes, it will work.

    I have to pre-process it before to use it as pon? if yes, just running "FilterMutectCalls v4.1.1.0

    You do not need to (nor should you) preprocess it in any way.

Sign In or Register to comment.