[GATK 4.0.0.0] joint calling for Mutect2?

Hello,

I am interested in inferring clonal evolution using somatic variants called by Mutect2. One way to infer is by tracking VAF (variant allele fraction) of somatic variants in multiple time points and clustering.
One challenge in using Mutect2 calls is its difficulty to compute VAFs especially for indel, because some variants are called using local assembly in a subset of time points. Allele counting in time points where the variant is not called is tricky. Thus, I usually limit variants to SNPs which is less hard to count. But some cohorts don't carry many somatic variants and I believe it would be helped by joint calling. Does it make sense?
Would you consider implementing joint calling for Mutect2 like Haplotypecaller?


(Image credit: https://github.com/chrisamiller/fishplot)

Best Answers

Answers

  • dayzcooldayzcool Member

    @Sheila and @shlee,
    Thank you so much for referring me to the informative article and discussion!
    I can't say I fully understand the technical difficulties. But I understood it is nontrivial to implement and joint somatic variant calling should be different from the joint calling of haplotypecaller. I still think it might be worthwhile for Mutect2 to be able to call somatic variants jointly on multiple tumor samples from an individual. It would help track somatic variants of a person over time.

    By the way, the article mentioned that Mutect2 would run without a matched normal. I wonder if Mutect2 now supports the tumor only mode. I remember no variant passed filters in tumor only mode for an older version. (One the other hand, I now think tumor only calling with high false positives would be a privacy threat..)


    Image credit: Libbrecht lab

    Issue · Github
    by shlee

    Issue Number
    4327
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    davidbenjamin
  • shleeshlee CambridgeMember, Broadie, Moderator admin

    I still think it might be worthwhile for Mutect2 to be able to call somatic variants jointly on multiple tumor samples from an individual.

    That's a great idea @dayzcool and I'll put in a feature request to that effect.

    Yes, Mutect2 has a tumor-only mode, which refers to analyzing a sample BAM (labeled with -tumor without a matched normal BAM (labeled with -normal). The mode is available to (i) call on normal samples towards panel of normals generation and (ii) for tumor analysis without a matched normal, in which case the expectation is that the germline resource will aid in filtering germline variants. The matched normal BAM is traditional, of course, in somatic calling. An alternative is to provide the matched normal's calls as a VCF using the germline resource argument. The privacy threat you mention is something we expect researchers to contend with. We just provide useful tool levers and general recommendations.

  • dayzcooldayzcool Member

    @shlee, thanks again for your kind help!!

  • dayzcooldayzcool Member

    Awesome - thank you for the update, @shlee!
    It is a great time for me to be able to use it, as I will work on clonal evolution.

    Happy Lunar New Year!!

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Happy New Year to you too @dayzcool!

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    Thanks @shlee for getting the word out about this new feature and thanks @dayzcool for trying it out!

  • dayzcooldayzcool Member

    Thanks again, @shlee and @davidben!

    At first glance, I feel I need a user guide for the multi-sample calling feature. Some things aren't straightforward to me. For instances, assuming that multiple samples are specimens from a biological being, how do multiple normal samples work? How would filtering be changed? And, do you plan to implement it in reference pipeline in wdl? I would love to be able to run multiple sample calling in mutect2.wdl pipeline.. It's very possible I might have missed docs/discussions though.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @dayzcool Multiple normal samples are pooled together as, effectively, a single normal. In the vcf they are separate, but what I mean is that as far as somatic calls are concerned they are just aggregated.

    We made a reasonable effort to refactor the filtering algorithm to extend to multiple samples, but it is somewhat heuristic and is probably the most "beta" aspect of the new feature. Essentially it's a vote among tumor samples, weighted by alt read counts, as to whether an allele is somatic or not.

    We plan to make a multi-sample wdl, but it is not written yet.

    The command lines for calling and filtering are (leaving out the pon and gnomad for brevity):

    gatk Mutect2 -R $ref \
        -I tumor1.bam -I tumor2.bam \
        -I normal1.bam -I normal2.bam \
        -normal normal1 -normal normal2 \
        -O calls.vcf
    
    gatk FilterMutectCalls -V calls.vcf -O filtered.vcf
    

    where normal1 is the sample name in the header of normal1.bam.

Sign In or Register to comment.