Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MuTect2 AD does not match AF

I am seeing issues with the allelic depths and allelic frequencies/fraction reported in some of our variants. For example:

Chr Start End Ref Alt POS REF ALT QUAL NORMAL.AD NORMAL.AF TUMOR.AD FREQ TUMOR.AD.TOTAL NORMAL.AD.TOTAL chrX 152482917 152482917 T C 152482917 T C 986.47 79,0 0.02 436,443 0.503 879 79

Any idea what is going on? In this case, the Normal sample AF is reported as "0.02", or 2%, but it also reports that the Normal sample has 0 alt reads. How can you have a variant frequency when there are no reads? I have seen this occurring as well for the tumor samples; how could you have a variant at all when there are no reported alternate reads for the variant?

Best Answer

Answers

  • steve1steve1 Member

    Thanks David. We do not use recent releases. We are locked into GATK 3.8. Are there any guidelines on how we can deal with this?

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    Oh boy, 3.8. None of us were on the M2 team then, and M2 back then wasn't even beta yet. I would use a more recent release if at all possible. GATK 3.8 will probably cost you about three times as many false positives with worse sensitivity, too. The need for reproducibility may override that, of course, but it's worth knowing how big the difference is.

    Also, it may be more feasible than you would think to re-run all of your samples through GATK 4.1. M2 is about ten times as fast as in GATK 3 and the cost for the whole pipeline of a typical WGS pair is a dollar or two (in Google cloud fees) on Firecloud.

  • ashbigashbig philadelphiaMember

    hi @davidben,

    Could you point me to a resource that describes this behavior in depth? I just came across a situation where a known cancer driving mutation (in a reference standard) was called at the correct AF by the probabilistic method, despite AD depths being off by a solid margin. Any additional details beyond your comment are helpful.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited February 25

    Hi @ashbig

    We do not support GATK3 anymore and hence would be a better idea to use the latest GATK4.1. Also, most of the issue with GATK3 have been resolved in GATK4. Please get back to us if you come across this issue with GATK4.1.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @ashbig First, I wouldn't necessarily give too much credit to our method. There's a lot of statistical noise after all! But anyway, the probabilistic method is the Somatic Likelihoods Model section of our notes, here: https://github.com/broadinstitute/gatk/blob/f159ce555523ff368fa80bac535818508ad36be2/docs/mutect/mutect.pdf (I'm pointing you to a pull request that heavily edits the notes rather than the current master branch). The AF estimate is the mean of the mean-field Dirichlet posterior q(f). I regret that there's nothing more accessible to point you to, but conceptually the probabilistic model is just the vanilla computation involving ADs but where the ADs account for the uncertainty of the reads.

  • ashbigashbig philadelphiaMember
    edited March 8

    Sorry I did not see responses sooner!

    @bhanuGandham I actually am using GATK 4.1 in this case the calculated AF by depth was about .9% and the probabilistic method called it at about 2%. It is an incredibly low level variant, so I wasn't expecting much, but there was a difference. I'm happy to provide more details if you're curious.

    @davidben I understand, I am thankful for what you did point me to! While I don't entirely understand the dirichlet method, i do get you computing the probability from reads being from a given allele changing the AF. I'll be analyzing how this performs against reference standards in the near future, I can update you on the results if you like.

Sign In or Register to comment.