MuTect2 and smalls mpileup reads info seem to be very different?

naveen90naveen90 BCMMember
edited January 17 in Ask the GATK team

Hi - I used MuTect2 to call variants in multiple samples from one patient. However, I wanted read information for those samples where a mutation wasn't detected in all the samples and decided to use samtools mpileup at these sites. I noticed that a variant in a germline sample was reported as 24:0 ref:alt (MuTect2) and 104:25 (samtools). In one case, I'd call it a somatic mutation while it would be a gremlin mutation in the other case. Why do we see this difference? Is there a way to make MuTect2 output read info when it detects a mutation in one of the samples from a patient?

Answers

  • naveen90naveen90 BCMMember

    sorry, I meant samtools, not smalls

  • naveen90naveen90 BCMMember

    additionally, the ref:alt read info in the baserecalibrated bam (input) and the mutect2 out bam are very different. Recalibrated input bam:

    MuTect2 output bam:

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @naveen90
    Hi,

    I suspect the Mutect2 read counts are lower because of filtering and reassembly. Have a look at this document for more information on the local reassembly.

    Also, I think this document will be helpful.

    -Sheila

  • naveen90naveen90 BCMMember

    Hi Sheila, thank you for the links.
    The counts being different makes sense for the tumor samples. However, for a few variants, mutect2 assigns a PASS filter where the counts are (as I had mentioned before) 24:0 (ref:alt) which suggests that its a somatic mutation. However, when I check IGV using the input recalibrated bam- the counts are 105:25 (ref:alt) which suggests that its a germline variant. Is the local reassembly done for the matched normal sample as well? It's mutations like the one mentioned above that I'm concerned about the most because mutect2 and IGV results are discordant. Could this be an erroneous mutect2 variant call?

    Naveen

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @naveen90
    Hi Naveen,

    Yes, the local reassembly is done on both the normal and tumor BAMs. Can you post the VCF record in question and IGV screenshots of the tumor and normal bamouts?

    Thanks,
    Sheila

  • naveen90naveen90 BCMMember

    tumor sample from bamout:

    is it possible to get a bam for the normal sample as well? even if I am able to visualize the normal bamout file, its likely to show absence of a gremlin mutation. However, the input normal bam suggests otherwise. this is the screenshot of the input normal bam file:

    merged vc record for the site (normal sample in bold): chr1 12921539 . T G . PASS AC=4;AF=0.400;AN=10;DP=250;NLOD=7.22;N_ART_LOD=-1.412e+00;POP_AF=1.000e-03;set=filterInvariant-variant2-filterInvariant3-filterInvariant5 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:51,4:0.071:30,4:21,0:32:361,492:34:16:0.00,0.071,0.073:0.304,2.546e-03,0.693 ./. 0/1:13,4:0.251:5,4:8,0:30:367,489:35:20:0.00,0.232,0.235:0.392,5.007e-03,0.603 0/1:35,4:0.103:21,4:14,0:34:360,494:33:18:0.051,0.101,0.103:0.064,6.070e-03,0.930 0/1:29,5:0.157:22,4:7,1:30:360,513:28:31:0.00,0.152,0.147:0.300,4.228e-03,0.696 0/0:24,0:0.039:12,0:12,0:0:302,0:0:0

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @naveen90
    Hi Naveen,

    is it possible to get a bam for the normal sample as well?

    Actually, the bamout file contains both the tumor and normal reassembled reads. You can right click on the reads in IGV and choose "Color alignments by" > "sample".
    Can you confirm that the normal sample does indeed show the variant in some reads? You are using the latest version 4 right? If not, can you check if you get the same results with GATK4?

    For combining variants, did you simply combine all VCFs from Mutect2 runs so the normal is chosen from a priority file?

    Thanks,
    Sheila

  • naveen90naveen90 BCMMember

    bamout file for the mutation of interest. it looks like both samples (Tumor and normal) have variant reads. However, as you can see above the mutect2 output is 24:0 which suggests that it contains only wild type reads.

    for merging vcf's, I used GATks combine variants. all the samples belong to the same patient, therefore, they have the same matched normal.

    java -jar /volumes/seq/code/3rd_party/GATKv3.5/GenomeAnalysisTK.jar -T CombineVariants -o out.vcf -R hg19_all.fa --genotypemergeoption UNSORTED -V input1.vcf -V input2.vcf etc

    I don't think combining variants should be the issue as every input file has 24:0 as the read info for the normal sample.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @naveen90
    Hi Naveen,

    I am assuming the green reads are from the normal sample, and they do not contain the SNP. That is why the VCF shows 0 for the alt allele. Notice only the blue reads ( I am assuming those are the tumor reads) contain the orange SNP. That is why it is called as a somatic mutation. I suspect the local reassembly is reason for the difference in Mutect2 and Samtools. Does this make sense? You can always try inputting the bamout file from that region to Samtools and seeing if the counts change.

    -Sheila

  • naveen90naveen90 BCMMember


    I should have sorted by sample. Yes, its clear from this image that Mutect2 calls it a somatic mutation. I also see why there are differences between MuTect2 and samtools. However, the only thing that I'm still confused about is how would reassembly make a site that looks clearly germline (prior to Mutect2) somatic (after MuTect2)? Image below is the normal sample prior to Mutect. The same sample can be seen after mutect2 (fig above):

    I don't seen to face this issue in all my mutation calls, its seen in a small subset of mutations..

  • naveen90naveen90 BCMMember

    one more question as well: in the fig 1 with multiple samples seen above: green is the normal sample, blue is the tumor. What are those reads in red?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @naveen90
    Hi Naveen,

    The reassembly step is documented here.

    The red reads are the artificial haplotypes that the tool constructs. The document should help with that as well :smile:

    -Sheila

    P.S. You may find the hands-on tutorial in the presentations section helpful as well.

Sign In or Register to comment.