MuTect2 strandbias + TLOD clarification


I have a set of tumour samples and I would like to call variants using MuTect2 without matching normals, annotate using VEP and filter out known germline variants afterwards.

I used tumor-only mode with downsampling process turned off. There are a number of artefacts that are being called and I found at least one variant that looks real but was not called. I could think of two options to improve the calling, hence my questions :)

1- Strand bias: How can I find information about strand bias? I am looking for details like what we typically see in call.stats output of MuTect (ie lod scores of forward and reverse strands), but have not been able to modify my code to include that information. I think some artefacts may be due to strand bias.

2- TLOD : This is where I got confused. Could you explain how MuTect2 calculates TLOD in the absence of matching normal? I use the LOD scores to determine real calls. The majority of real variants would have massive TLOD compared to all calls within each sample. But in my set of samples, there was one variant that seems to be true and had small value of TLOD. I started to think that MuTect2 has to have something as normal to generate correct TLOD, but I am not sure.

This is what I ran:

Using hg38

gatk Mutect2 \
-R hg38 \
-I test.bam \
-L interval_list \
-O test.vcf \
-tumor test.bam \
--contamination-fraction-to-filter 0.0 \
--max-reads-per-alignment-start 0 \

Any comments would be highly appreciated.

Thank you


  • xiuczxiucz Member


    You can look at this thread about Strand bias.

    I am also interested in your questions and waiting for any more comments.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    1) Have a look at this thread.

    2) I think the Mutect2 presentation here will help. Also, this doc may help as well.


  • davidbendavidben BostonMember, Broadie, Dev ✭✭

    @anabbi TLOD is calculated independently of the normal. The TLOD of an allele is the log likelihood ratio of two models: (i) the allele exists in the tumor sample; and (ii) the allele does not exist and any reads supporting it are sequencing errors. This calculation only uses tumor reads. That is, TLOD is a judgment about whether the variant exists in the tumor sample, but says nothing about whether it is germline or somatic. This latter judgment uses the NLOD, among other things.

