We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Training a Filter on Truthset gathered by several Tumor-Normal analyses for Tumor only samples?

EADGEADG KielMember ✭✭✭


I have the following project setup 150 matched tumor/normal samples and 100 tumor only samples from the same entity.

I analyzed them all with Mutect2 using a PON which I build up from the normal Samples. While the 150 tumor/normal samples are fine in the analyzes, the 100 tumor only samples producing a lot of rubbish aside to the unknown "real" variants.

Now my problem, since my gatk-workshop in Cambridge last year an idea is swirling around my head and I don't know if it is great or complete dumb ...Can I use my results from the 150 matched tumor/normal samples as a truth set and train my filters on the 150 tumors (only) samples of this set? So I can use my trained filters for the 100 tumor only samples instead of the blunt filtering by AF etc?

Hope somebody can give me an advice or already tried that and can share his/her experience? Otherwise, I will try and report ;)

Thanks in advance,



  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi EADG,

    I don't think the team has done anything like this. How do you plan to "train" the filtering? Do you mean use the same filters from the matched set on the unmatched set? How did you set the filters for the matched set?


  • EADGEADG KielMember ✭✭✭

    Hi @Sheila,

    thanks for your response, I draw a little picture to show what I am trying to do (I think my English if not sufficient enough to explain it in the right way ;).

    I don't "filter" my raw variants from the matched analysis, only kick the variants out which don't pass an internal filter of MuTect2 and end up with a few variants by only doing this (capture/Panel).

    Greets EADG

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi EADG,

    Thanks for the picture. I am still not sure what you mean by "train/adjust". Am I right in thinking you are planning to use the variants from the tumor/normal run to filter the unmatched run output? What do you mean "only kick the variants out which don't pass an internal filter of MuTect2"? In GATK4 (which I assume you are using), Mutect2 does not do any upfront filtering.

    In any case, I don't think our team has any experience doing this. It may be best to test it out and let us know how things go :smile:


Sign In or Register to comment.