Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MuTect2: complex variant expression

Lyon0804Lyon0804 Hangzhou, ChinaMember

Hi, I use GATK3.7 MuTect2 to call somatic variant:
here is the command:

java -jar GenomeAnalysisTK.jar -T MuTect2 -R hg19.fasta  -I:tumor tumor.bam -I:normal normal.bam --dbsnp dbsnp_138.hg19.vcf --cosmic commic_v79.vcf -o result.vcf

and I get a decompressed complex variant like:

chr17   37880996        .       T       TTTA
chr17   37880997        .       G       T

the mapping result looks like:

37880961  37880971  37880981  37880991     37881001  37881011  37881021         
GCGTACCCTTGTCCCCAGGAAGCATACGTGATGGCTG***GTGTGGGCTCCCCATATGTCTCCCGCCTTCTGGGCATCTG
.....................................   ........................................
.....................................***........................................
.....................................***........................................
.,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,***,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
.,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,***,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,..................................TTAT........................................
,,...................................***........................................
.....................................***........................................
... .................................***........................................
... ................................TTAT........................................
,,, .................................***........................................
,,, .................................***........................................
,,,..................................***........................................
,,,..................................***........................................
,,,,................................TTAT........................................
.... ................................***........................................
.... ................................***........................................
,,,, ...............................TTAT........................................
.....................................***........................................
...... ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,***,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
....... .............................***........................................
........ ...........................TTAT........................................

It should be considered compress the two variant to a complex one:

chr17   37880996        .       G       TTAT

Can MuTect2 do anything about this situation?

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Lyon0804
    Hi,

    Can you please post IGV screenshots of the original BAM file and bamout file here?

    Thanks,
    Sheila

  • Lyon0804Lyon0804 Hangzhou, ChinaMember

    mapping_view

    @Sheila here is the screenshots.
    I think it should be considered as a deletion-insertion variant according to HGVS

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Lyon0804
    Hi,

    I am not sure what you mean by deletion-insertion? It looks like the output is correct with the insertion and SNP. However, the site looks like it is in a messy region. Is that the bamout you posted? Can you post zoomed out screenshots of the bamout (include ~300 bases before and after the site of interest)?

    Thanks,
    Sheila

  • Lyon0804Lyon0804 Hangzhou, ChinaMember

    @Sheila
    Hi,
    I mean, the output should be "more correct" like

    chr17   37880996        .       G       TTAT
    

    instead of the current output linke

    chr17   37880996        .       T       TTTA
    chr17   37880997        .       G       T
    

    The "deletion-insertion" variant is a variant type, defined by HGVS.
    I think the first output is more reasonable to "the real variant" cause the two variants happened in the same reads. By the second output, it may considered as the two variants happened in the different reads.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Lyon0804
    Hi,

    That would not be correct. The reference allele at position 37880996 is T not G. I think the confusing part is that there are two insertions in the BAM file, but MuTect2 is only calling one of them. The insertion you are referring to after the G SNP is not called in the VCF. Only the insertion before the G is called.

    I hope that helps.

    -Sheila

  • Lyon0804Lyon0804 Hangzhou, ChinaMember

    @Sheila

    I'm sorry, I make mistake, the first variant should be:

    Expression1:

    chr17   37880997       .       G       TTAT
    

    sorry for my carelessness.

    and the other is:

    Expression2:

    chr17   37880996        .       T       TTTA
    chr17   37880997        .       G       T
    

    All the insertion or deletion is called, the only matter is the way of expression.
    There is no such thing as "before G" or "after G", both the two expressions make the same sense.

    // T at chr:37880996, G at chr17:37880997
    *Expression1* shows:
            TG
            TTTAT
    
    *Expression2* shows:
            T***G
            TTTAT
    

    but the Expression2 may cause confusion. It looks like some cells detected SNP, and some other cells detected INSERTION. In fact, the SNP and INSERTION is detected in the same reads(same cell). That's what I mean "more correct".

    Sorry for my carelessness, but I hope MuTect2 can do some thing on this stuff.

    Issue · Github
    by Sheila

    Issue Number
    2054
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    sooheelee
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited May 2017

    @Lyon0804
    Hi,

    I am having someone else on my team check. We will get back to you soon.

    -Sheila

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited May 2017

    Hi @Lyon0804,

    I will help you as much as I can. My knowledge is tentative. From the existence of tools LeftAlignAndTrimVariants and LeftAlignIndels, you can see we make an effort to standardize how variants, particularly indels that can have ambiguous placement, are represented in a VCF.

    Given an indel can be placed equally correctly at various locations against the genome, the preference is to left-most align indels when possible. This is why your SNV gets shifted to the right.

    As for the separation of the SNV and indel, I'm going to guess this is for historic reasons. There are downstream tools in somatic analyses that only take in SNVs and cannot use indel data (because passenger mutation rates will be different for SNVs versus indels). From what you've shown, it appears that GATK3 MuTect2 automatically separates SNVs and indels when possible, and I think this is to enable these types of downstream analyses.

    Your MuTect2 output should keep the phasing information to tie these two variants together in the same haplotype. Please let us know if this is not the case.

    We are gearing up to release GATK4 this summer. I believe the GATK4 MuTect2 has some improvements and I will ask if condensing variants in their variant representation is a feature.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    P.S. @Lyon0804 -- Your mapping result that you represent with dots, commas, asterisks and spaces--I'm interested in learning what this convention is called and what each element represents. The asterisks are space-filling gaps, this much I gather. Are the dots for forward read matches and the commas for reverse read matches? The spaces are gaps that demarcate separate reads?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Just checked with our developer and @Lyon0804, you can see if the latest GATK4 MuTect2 represents your variant(s) differently. You can get the jar at https://github.com/broadinstitute/gatk-protected/releases/tag/1.0.0.0-alpha1.2.6.

  • Lyon0804Lyon0804 Hangzhou, ChinaMember
    edited May 2017

    @shlee said:
    P.S. @Lyon0804 -- Your mapping result that you represent with dots, commas, asterisks and spaces--I'm interested in learning what this convention is called and what each element represents. The asterisks are space-filling gaps, this much I gather. Are the dots for forward read matches and the commas for reverse read matches? The spaces are gaps that demarcate separate reads?

    Yes, you're right. It was coming from samtools tview , you can learn more from http://samtools.sourceforge.net/tview.shtml
    Thank you very much @shlee , I will use GATK4 MuTect2 later and expect the release version very much!

Sign In or Register to comment.