If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

Training a Filter on Truthset gathered by several Tumor-Normal analyses for Tumor only samples?

EADGEADG KielMember ✭✭✭


I have the following project setup 150 matched tumor/normal samples and 100 tumor only samples from the same entity.

I analyzed them all with Mutect2 using a PON which I build up from the normal Samples. While the 150 tumor/normal samples are fine in the analyzes, the 100 tumor only samples producing a lot of rubbish aside to the unknown "real" variants.

Now my problem, since my gatk-workshop in Cambridge last year an idea is swirling around my head and I don't know if it is great or complete dumb ...Can I use my results from the 150 matched tumor/normal samples as a truth set and train my filters on the 150 tumors (only) samples of this set? So I can use my trained filters for the 100 tumor only samples instead of the blunt filtering by AF etc?

Hope somebody can give me an advice or already tried that and can share his/her experience? Otherwise, I will try and report ;)

Thanks in advance,



  • SheilaSheila Broad InstituteMember, Broadie admin

    Hi EADG,

    I don't think the team has done anything like this. How do you plan to "train" the filtering? Do you mean use the same filters from the matched set on the unmatched set? How did you set the filters for the matched set?


  • EADGEADG KielMember ✭✭✭

    Hi @Sheila,

    thanks for your response, I draw a little picture to show what I am trying to do (I think my English if not sufficient enough to explain it in the right way ;).

    I don't "filter" my raw variants from the matched analysis, only kick the variants out which don't pass an internal filter of MuTect2 and end up with a few variants by only doing this (capture/Panel).

    Greets EADG

  • SheilaSheila Broad InstituteMember, Broadie admin

    Hi EADG,

    Thanks for the picture. I am still not sure what you mean by "train/adjust". Am I right in thinking you are planning to use the variants from the tumor/normal run to filter the unmatched run output? What do you mean "only kick the variants out which don't pass an internal filter of MuTect2"? In GATK4 (which I assume you are using), Mutect2 does not do any upfront filtering.

    In any case, I don't think our team has any experience doing this. It may be best to test it out and let us know how things go :smile:


Sign In or Register to comment.