Mutect HC+PON mode

Where could I find the vcf of normal panel to run Mutect in HC+PON mode?
Thanks in advance.
Giovanni

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Giovanni,

    We don't provide data for the panel of normals. You can put together your own panel of normals depending on your study design and patient cohort, using data from e.g. the 1000 Genomes project. The idea of the panel of normals is to estimate normal germline variation background, so I'd recommend putting together a panel that matches the ethnicity of your subjects.

  • irongraftirongraft Member

    Hi Geraldine,
    I have not any normal sample sequenced so far, but have a set of non-tumor samples with clear evidence of mendelian diseases and thus sequenced for that. These samples are italian, as the tumor samples. Could I consider them as a pool of non-tumor (not properly normal) samples? If not, it would suffice to filter variants from EVS or 1000g, for example, which are etchincally equals to my smaples, regardless of the MAF. Isn't it?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I believe you could use your set of mendelian disease samples as "normals" in this context, yes. Using variants from 1000G is also an option -- if you combine both approaches, you can boost your panel size.

  • ekanterakisekanterakis UKMember

    Hello, I was wondering how the "normal" and "normal_panel" vcfs are treated differently. Could one for example use a normal panel as a composite "normal" sample? How would that run differ from a normal+normal_panel run? Thank you.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @ekanterakis,

    In both cases the purpose is to exclude germline variants from the somatic callset. Using a matched normal of the same individual gives you the exact genetic background against which you can call somatic events. In contrast, the panel of normals is more an approximation of the background of the individual, assuming that the panel is representative of the genetic population that the individual belongs to. On top of that, using a matched normal provides some additional information, such as how much relative power you have to discover variants in the read data for the tumor vs. the normal, which is used to adjust the confidence score of variants. You can't do that with a PON since that is just a list of sites and does not include read data.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @ekanterakis,

    In both cases the purpose is to exclude germline variants from the somatic callset. Using a matched normal of the same individual gives you the exact genetic background against which you can call somatic events. In contrast, the panel of normals is more an approximation of the background of the individual, assuming that the panel is representative of the genetic population that the individual belongs to. On top of that, using a matched normal provides some additional information, such as how much relative power you have to discover variants in the read data for the tumor vs. the normal, which is used to adjust the confidence score of variants. You can't do that with a PON since that is just a list of sites and does not include read data.

  • ekanterakisekanterakis UKMember

    Thanks for the detailed answer Geraldine. Given that the PON vcf is artificial, I assume the only information taken from it is the chr, pos, alt and perhaps genotype? Is that correct? Thanks again.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    That's right, except the genotype is not used. Note that sites that get tentatively rejected because they are in the PON can get "rescued" if they are also in COSMIC.

  • ac67479ac67479 AustinMember

    Hello ! I was wondering how many "normal" files would be requested to have a solid panels of normal ? In my case I am dealing with Glioblastoma tumor samples. Should I try to find normal brain tissue ? Or rather any tissues from healthy patients that was sequenced the same way ? and therefore, how many of them approximately ? Thanks in advance

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    I believe the recommended number is around 40 samples, but the more, the better. The PON acts as a filter for technical artifacts, so the normals should have a technical profile as similar as possible to the case samples. This means that the way the samples are prepared and processed should be as similar as possible. At minimum the DNA library preparation and sequencing technology should be the same. Ideally the tissue would be the same too, to have the same sampling and tissue preservation techniques applied, but this is often not possible, and in many cases only blood normals are available. That is considered good enough if that's all you can find in sufficient quantities.
  • minemine Member

    Hello,

    I have ER+ breast cancer and uninvolved breast tissue adjacent to ER+. Some of uninvolved samples are matched but it is not identified clearly which one is matched with which one. And also I have 5 non-cancer samples but it is said that at least 40 samples would be fine. I have 21 uninvolved samples. Can I use these uninvolved samples in order to create panel of normals? It is not 40 but I think it may be enough. Or do you recommend to use non-cancer samples even if I have only 5 samples? Would you help me, please?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mine
    Hi,

    Some of uninvolved samples are matched but it is not identified clearly which one is matched with which one.

    Do you mean you have tumor samples and normals samples from the same patient, but you don't know which patient the samples came from? How many patient samples do you have?

    And also I have 5 non-cancer samples but it is said that at least 40 samples would be fine.

    What are these 5 non-cancer samples? Where did you get them from?

    I have 21 uninvolved samples. Can I use these uninvolved samples in order to create panel of normals?

    I am not sure what "uninvolved" is? Are these samples separate from your matched tumor and normal pairs?

    Thanks,
    Sheila

  • minemine Member
    edited May 2018

    Involved samples are obtained from non-cancer breast tissues taken from patients having cancer. Related uninvolved and cancer are obtained from same patient but I don't know which involved sample matches with which cancer sample. Non-cancer ones are obtained from people that do not have cancer. There are 42 cancer, 31 uninvolved and 5 normal samples. I hope I can mention it clearly this time.

    And also it is recommend that matched normal sample should be used in Mutect2. However as I mentioned, I don't know which cancer sample matches with which uninvolved sample.

    I get it the RNA-seq data from NCBI.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mine
    Hi,

    I don't know which involved sample matches with which cancer sample.

    Is there any way to find out this information? Matched tumor-normal analysis is crucial in our somatic SNV/indel workflow. It would be great if you could find this information out.

    If you cannot find out this information, you can do a tumor-only analysis. You can use all the normal samples as PoN samples. This tutorial has some extra information that may be helpful.

    -Sheila

  • minemine Member

    I know that Mutect2 is created for somatic mutations in cancer samples. However, as I said I don't have matched samples. Can I use HyplotypeCaller instead of Mutect2?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mine
    Hi,

    You can use Mutect2 for unmatched tumor samples. Have a look at this tutorial.

    -Sheila

    P.S. You may also find this blog post helpful.

  • minemine Member

    Thank you so much.

Sign In or Register to comment.