Somatic mutation calling - PON vs VCF

I have a dog tumor sample without matched normal. I know that it is recommended to have matched normal but for this specific data, it is not possible to get a matched normal sample.

  1. I see that in GATK4-Mutect2 workflow, arguments (-normal, -pon, --germline-resource) are not mandatory. Therefore, technically I should be able to run Mutect2 with tumor sample and any of the available resources corresponding to normal (i.e. -normal, -pon, --germline-resource). Is this correct?

  2. I am planning to use the BROAD's 435 dogs SNP/INDELS VCF to filter germline mutations in addition to ENSEMBL variants. Would this be a feasible approach in absence of matched normal?

  3. Although I have the VCF file for 435 dogs, would it be helpful creating additional PON with the same 435 dogs data. I am not sure if PON with same data may provide some additional benefit in addition to VCF. OR generating PON with GATK4 may have improved calls as compared to older versions.

  4. My tumor sample is from the Golden Retriever dog. If I am creating PON, do you recommend to use normal only from the same breed OR it is fine to mix breeds (i.e. PON with 435 dogs data).

Best Answer

  • shleeshlee Cambridge admin
    Accepted Answer

    Hi @sutturka,

    Two clarification questions:
    A. What are the ENSEMBL variants? Are these common population germline variants?
    B. Are the 435 SNP and INDELs germline variant calls?

    To answer your questions tentatively:
    1. Yes, you are able to run GATK4-Mutect2 with just the tumor sample in tumor-only mode, as outlined in section 2.
    2. Yes.
    3. Yes, definitely. Please run the 435 dog BAMs through GATK4-Mutect2 to create a Panel Of Normals. It is important to capture regions of sequencing artifacts that germline calling filters. You would not want to mistake these for somatic variants. See https://software.broadinstitute.org/gatk/documentation/article?id=11127 for some background context.
    4. This is a great question. We know our dog breeds are inbred and different breeds can be rather distinct and some breeds are more prone to cancer than others. However, how do you know how much of a Golden Retriever your particular sample contains? Was this designation genetically determined or deduced by phenotype? Some comparative genomics may be helpful in determining the best course of action. To be on the safe side, I think having all the 435 dogs in the PoN ensures common germline variation is excluded.

Answers

  • shleeshlee CambridgeMember, Administrator, Broadie, Moderator admin
    Accepted Answer

    Hi @sutturka,

    Two clarification questions:
    A. What are the ENSEMBL variants? Are these common population germline variants?
    B. Are the 435 SNP and INDELs germline variant calls?

    To answer your questions tentatively:
    1. Yes, you are able to run GATK4-Mutect2 with just the tumor sample in tumor-only mode, as outlined in section 2.
    2. Yes.
    3. Yes, definitely. Please run the 435 dog BAMs through GATK4-Mutect2 to create a Panel Of Normals. It is important to capture regions of sequencing artifacts that germline calling filters. You would not want to mistake these for somatic variants. See https://software.broadinstitute.org/gatk/documentation/article?id=11127 for some background context.
    4. This is a great question. We know our dog breeds are inbred and different breeds can be rather distinct and some breeds are more prone to cancer than others. However, how do you know how much of a Golden Retriever your particular sample contains? Was this designation genetically determined or deduced by phenotype? Some comparative genomics may be helpful in determining the best course of action. To be on the safe side, I think having all the 435 dogs in the PoN ensures common germline variation is excluded.

  • sutturkasutturka Member

    Thank you for the answers. This is very helpful.

    Answer to your clarification questions is "Yes". ENSEMBL and 435 dogs data both are germline variant calls.

    1. We know the sample is from Golden Retriever based on the phenotype only. Your suggestion of PON with 435 dogs seems to be optimal approach.
  • SamirSamir Member ✭✭

    Hi @sutturka , I have been referred by @shlee from following discussion. I am working on canine glioma genome. For somatic variant calling using GATK4 workflow and specifically creating PON and Filtering calls, I wonder if we can talk over getting BROAD's 435 dogs SNP/INDELS VCF (seems like link is archived recently and not available).

    Thanks,
    Samir

  • sutturkasutturka Member

    Hi Samir,

    I got the link for 435 dogs data from this paper. However, it seems like they have moved the data. I know that same group is in process updating this data for over 700 dogs but I am unsure of the timeline. Unfortunately, I did not download the data before removal. Seems like we have to wait until new data is uploaded. Meanwhile, I am using the DoGSD data for creating PON and VCF. Hope this helps.

    Thanks
    Sagar

  • SamirSamir Member ✭✭

    Thanks @sutturka . I have emailed VGB group at Broad and awaiting their reply. If you happen to have old version of 435 snp, it would be of much help if there is a way to send it over.

Sign In or Register to comment.