To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Somatic mutation calling - PON vs VCF

I have a dog tumor sample without matched normal. I know that it is recommended to have matched normal but for this specific data, it is not possible to get a matched normal sample.

  1. I see that in GATK4-Mutect2 workflow, arguments (-normal, -pon, --germline-resource) are not mandatory. Therefore, technically I should be able to run Mutect2 with tumor sample and any of the available resources corresponding to normal (i.e. -normal, -pon, --germline-resource). Is this correct?

  2. I am planning to use the BROAD's 435 dogs SNP/INDELS VCF to filter germline mutations in addition to ENSEMBL variants. Would this be a feasible approach in absence of matched normal?

  3. Although I have the VCF file for 435 dogs, would it be helpful creating additional PON with the same 435 dogs data. I am not sure if PON with same data may provide some additional benefit in addition to VCF. OR generating PON with GATK4 may have improved calls as compared to older versions.

  4. My tumor sample is from the Golden Retriever dog. If I am creating PON, do you recommend to use normal only from the same breed OR it is fine to mix breeds (i.e. PON with 435 dogs data).

Best Answer

  • shleeshlee CambridgeMember, Broadie, Moderator
    Accepted Answer

    Hi @sutturka,

    Two clarification questions:
    A. What are the ENSEMBL variants? Are these common population germline variants?
    B. Are the 435 SNP and INDELs germline variant calls?

    To answer your questions tentatively:
    1. Yes, you are able to run GATK4-Mutect2 with just the tumor sample in tumor-only mode, as outlined in section 2.
    2. Yes.
    3. Yes, definitely. Please run the 435 dog BAMs through GATK4-Mutect2 to create a Panel Of Normals. It is important to capture regions of sequencing artifacts that germline calling filters. You would not want to mistake these for somatic variants. See https://software.broadinstitute.org/gatk/documentation/article?id=11127 for some background context.
    4. This is a great question. We know our dog breeds are inbred and different breeds can be rather distinct and some breeds are more prone to cancer than others. However, how do you know how much of a Golden Retriever your particular sample contains? Was this designation genetically determined or deduced by phenotype? Some comparative genomics may be helpful in determining the best course of action. To be on the safe side, I think having all the 435 dogs in the PoN ensures common germline variation is excluded.

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator
    Accepted Answer

    Hi @sutturka,

    Two clarification questions:
    A. What are the ENSEMBL variants? Are these common population germline variants?
    B. Are the 435 SNP and INDELs germline variant calls?

    To answer your questions tentatively:
    1. Yes, you are able to run GATK4-Mutect2 with just the tumor sample in tumor-only mode, as outlined in section 2.
    2. Yes.
    3. Yes, definitely. Please run the 435 dog BAMs through GATK4-Mutect2 to create a Panel Of Normals. It is important to capture regions of sequencing artifacts that germline calling filters. You would not want to mistake these for somatic variants. See https://software.broadinstitute.org/gatk/documentation/article?id=11127 for some background context.
    4. This is a great question. We know our dog breeds are inbred and different breeds can be rather distinct and some breeds are more prone to cancer than others. However, how do you know how much of a Golden Retriever your particular sample contains? Was this designation genetically determined or deduced by phenotype? Some comparative genomics may be helpful in determining the best course of action. To be on the safe side, I think having all the 435 dogs in the PoN ensures common germline variation is excluded.

  • Thank you for the answers. This is very helpful.

    Answer to your clarification questions is "Yes". ENSEMBL and 435 dogs data both are germline variant calls.

    1. We know the sample is from Golden Retriever based on the phenotype only. Your suggestion of PON with 435 dogs seems to be optimal approach.
Sign In or Register to comment.