Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

Training a Filter on Truthset gathered by several Tumor-Normal analyses for Tumor only samples?

EADGEADG KielMember ✭✭✭


I have the following project setup 150 matched tumor/normal samples and 100 tumor only samples from the same entity.

I analyzed them all with Mutect2 using a PON which I build up from the normal Samples. While the 150 tumor/normal samples are fine in the analyzes, the 100 tumor only samples producing a lot of rubbish aside to the unknown "real" variants.

Now my problem, since my gatk-workshop in Cambridge last year an idea is swirling around my head and I don't know if it is great or complete dumb ...Can I use my results from the 150 matched tumor/normal samples as a truth set and train my filters on the 150 tumors (only) samples of this set? So I can use my trained filters for the 100 tumor only samples instead of the blunt filtering by AF etc?

Hope somebody can give me an advice or already tried that and can share his/her experience? Otherwise, I will try and report ;)

Thanks in advance,



  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi EADG,

    I don't think the team has done anything like this. How do you plan to "train" the filtering? Do you mean use the same filters from the matched set on the unmatched set? How did you set the filters for the matched set?


  • EADGEADG KielMember ✭✭✭

    Hi @Sheila,

    thanks for your response, I draw a little picture to show what I am trying to do (I think my English if not sufficient enough to explain it in the right way ;).

    I don't "filter" my raw variants from the matched analysis, only kick the variants out which don't pass an internal filter of MuTect2 and end up with a few variants by only doing this (capture/Panel).

    Greets EADG

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi EADG,

    Thanks for the picture. I am still not sure what you mean by "train/adjust". Am I right in thinking you are planning to use the variants from the tumor/normal run to filter the unmatched run output? What do you mean "only kick the variants out which don't pass an internal filter of MuTect2"? In GATK4 (which I assume you are using), Mutect2 does not do any upfront filtering.

    In any case, I don't think our team has any experience doing this. It may be best to test it out and let us know how things go :smile:


Sign In or Register to comment.