MuTect2 PON CombineVariants

igorigor New YorkMember ✭✭

In the MuTect2 documentation, it says:
"For full PON creation, call each of your normals separately in artifact detection mode. Then use CombineVariants to output only sites where a variant was seen in at least two samples"
However, if you only have two samples in your panel, then filtering by two is going to create a lot of false negatives. On the other hand, if you have 1000 samples, two may not be enough. Do you really mean always use two samples, or should that be a fraction of total samples?

Best Answer

Answers

  • igorigor New YorkMember ✭✭

    @Geraldine_VdAuwera said:
    The "two" recommendation is based on the size of panel we normally use in production, which I believe is around ~50 samples. If your panels are very different you should do some exploratory analysis to determine what adjustments might be necessary.

    Is there a good metric to evaluate that? Just based on personal experience, 2 out of 50 seems reasonable, but that's not very scientific. For example, if the SNP population frequency is only 1%, even 2 out of 50 is too stringent.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    AFAIK this was derived empirically by analysts in the CGA group, but I'm not aware of any systematic guidelines for scaling to different-sized PONs. If it helps, the idea here is that the PON mainly aims to rule out systematic errors that can be found in multiple normal samples generated using the same technology. The basic assumption is that if you find the same variant in multiple samples, it is either a real variant that is shared in the population, or it is a recurring error due to flaws in the data generation process. Something found only in a single sample may be a random error, which we'll need to filter through other means. So my intuition is that scaling your PON upwards shouldn't require changing the requirements logic, considering what the PON aims to do. However, smaller PONs will be underpowered -- but there's not really a good solution to that aside from getting more samples for your PON.

Sign In or Register to comment.