Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Panel of Normals

Hi,

I am currently using mutect2 with and without panel of normals and I would like to know what I use. Do you have a documentation about how does panel of normals selected. How does panel of normal selected from normal samples ?

Thanks in advance, Gufran

Answers

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @gufran A generic pon is usually not so bad, and we provide some public ones in the GATK resource bucket. For example: gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz and gs://gatk-best-practices/somatic-b37/Mutect2-exome-panel.vcf. In fact, we use these in our validations to keep ourselves honest.

    If you make your own panel, you will want at least 20 samples. The more similar to your tumor samples, the the better, but an imperfect match is much better than nothing.

  • LindaLinda Member
    @davidben How can I get in the GATK resource bucket if I don't have a valid google count? I have tried many times but I can't receive the validation text from google.
  • gufrangufran Member

    But what I am asking is about how do you choose variants from the normal bam files. I am suspicious about all results that I got so I want to know how do you make your pon

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @gufran We make a panel of normals using CreateSomaticPanelOfNormals. This is described in section III A of our documentation: https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf. We also provide a WDL workflow: https://github.com/broadinstitute/gatk/blob/master/scripts/mutect2_wdl/mutect2_pon.wdl. Finally, a blog post on this workflow is also coming soon.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @Linda you can also access the GATK resource bundle via FTP: https://software.broadinstitute.org/gatk/download/bundle.

  • manolismanolis Member ✭✭

    Hi @davidben, I want to run Mutect2 in TumorOnly mode but I will have only 2 normal-WGS to create a pon. 2 are better than nothing or with only 2 normal-WGS (tumors and normals are from different people) doesn't work? Many thanks

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @manolis Two is better than nothing. I would probably set --min-sample-count 1 in CreateSomaticPanelOfNormals if I were you.

    That being said, I think you would be much better off using a larger, unmatched panel than a small, matched panel. Because FilterMutectCalls catches most technical artifacts with its other filters, the benefit of the panel is overwhelmingly due to catching mapping errors, which a generic panel can do reasonably well. We have some public ones in the GATK resource bundle / bucket, or you could make one with whatever WGS normals you have lying around.

  • manolismanolis Member ✭✭

    Many thanks @davidben. I wanted to use my PON just to be sure that the data were produced from the same NGS protocol/machine... but too few samples. I will try also yours file gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz. Best!

  • manolismanolis Member ✭✭

    gatk 4.1.1.0, bash pipeline, linux server

    Hi @davidben ,

    when I'm going to create my own pon file (in this example was created from wes data; I used CreateSomaticPanelOfNormals after GenomicsDBImport) I can see in the header the sample names of my input files:

    ##fileformat=VCFv4.2
    ##GATKCommandLine=<ID=CreateSomaticPanelOfNormals,CommandLine="CreateSomaticPanelOfNormals  ....
    ##INFO=<ID=BETA,Number=2,Type=Float,Description="Beta distribution parameters to fit artifact allele fractions">
    ##INFO=<ID=FRACTION,Number=1,Type=Float,Description="Fraction of samples exhibiting artifact">
    ##contig=<ID=chr1,length=248956422,assembly=38>
    ...
    ##contig=<ID=HLA-DRB1*16:02:01,length=11005,assembly=38>
    ##normal_sample=wes12928
    ...
    ##normal_sample=wesC597
    ##source=CreateSomaticPanelOfNormals
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
    

    I would like to use also your general pon file:

    gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz.

    ##fileformat=VCFv4.2
    ##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
    ....
    ##FORMAT=<ID=SA_POST_PROB,Number=3,Type=Float,Description="posterior probabilities of the presence of strand artifact">
    ##GATKCommandLine=<ID=Mutect2,CommandLine="Mutect2  --tumorSampleName HG02775 --output normalForPON.vcf.gz --input shlee-dev/1kg/exome_pon/CreateMutect2Pon1of2/218e3427-9fb6-4868-970d-038c6649138a/call-CramToBamAndIndex/HG02775.alt_bwamem_GRCh38DH.20150826.PJL.exome.bam ... ...
    ##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
    ...
    ##INFO=<ID=TLOD,Number=A,Type=String,Description="Tumor LOD score">
    ##Mutect Version=2.1-beta
    ##contig=<ID=chr1,length=248956422,assembly=GRCh38>
    ...
    ##contig=<ID=HLA-DRB1*16:02:01,length=11005,assembly=GRCh38>
    ##filtering_status=Warning: unfiltered Mutect 2 calls.  Please run FilterMutectCalls to remove false positives.
    ##source=Mutect2
    ##tumor_sample=HG02775
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
    

    It is not clear for me the input files/normal samples... in this case they are not reported in the header...
    Why is reported a "tumor sample"?
    It was created with Mutect Version=2.1-beta. Can I use it with gatk v4.1.1.0?
    I have to pre-process it before to use it as pon? if yes, just running "FilterMutectCalls v4.1.1.0"?

    Many thanks,
    best!

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    Why is reported a "tumor sample"?

    That's an old oversight but it won't affect the functionality of the pon.

    It was created with Mutect Version=2.1-beta. Can I use it with gatk v4.1.1.0?

    Yes, it will work.

    I have to pre-process it before to use it as pon? if yes, just running "FilterMutectCalls v4.1.1.0

    You do not need to (nor should you) preprocess it in any way.

Sign In or Register to comment.