Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Mutect2 does not call this variant

Hello GATK team,

I'm trying to do some benchmarking with Mutect2 and a synthetic dataset. Even if the results are very good in terms of precision (~99%), recall or sensitivity can be better (~80%). In order to enhance sensitivity, i try to investigate why some variants are not called and in particular this variant :

chr1:201030579:A>T

I guess, it's because this variant is too near from the end of the reads.
So i tried to add "--pir_median_threshold 0 --pir_mad_threshold 0" but the result is the same.

Do you have any idea about how to call this kind of variant ?

this synthetic dataset is used to simulate amplicon reads.

java/jre1.8.0_76/bin/java -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx16G -jar GenomeAnalysisTK-3.7/GenomeAnalysisTK.jar -T MuTect2 -R hg19.fa -I:synthetic_tumor.sort.region1.bam -I:synthetic_normal.sort.region1.bam --cosmic CosmicCodingMutsV75.hg19.sort.vcf -L test.bed -o mutect2.test.vcf -bamout bamout.bam -log mutect2.log

you will find enclosed :
the IGV snapshot
the tumor bam file in a zip file
the normal bam file in a zip file
the bed file in a zip file

Thank you for your help

Answers

  • EADGEADG KielMember ✭✭✭

    Hi @arnaud,

    the support team can help you at best when you can provide a screenshot from your bamout of this region with "color by sample" is turned on(@shlee :smiley: ) . Mostly you find the reason why your variant isnt called there. Otherwise you can take a look at this FAQ https://software.broadinstitute.org/gatk/documentation/article?id=1235

    Greetings EADG

  • arnaudarnaud Member

    Thank you for your reply

    Here another IGV snapshot with "color by sample" turn on

    According to the FAQ

    1. Generate the bamout and compare it to the input bam : Done
    2. Check the base qualities of the non-reference bases : base qualities are greater than 30
    3. Check the mapping qualities of the reads that support the non-reference allele(s) : mapping qualities are greater than 30
    4. Check how many alternate alleles are present :Done

    When we look at the bamout file we can see that the reads supporting the alternate allele are not present for example this reads

    NS500754:56:HCVLTAFXX:2:21206:14752:20124

    When i track it, i get

    Found NS500754:56:HCVLTAFXX:2:21206:14752:20124 1/2 49b aligned read. - Present in original active region
    Found NS500754:56:HCVLTAFXX:2:21206:14752:20124 2/2 49b aligned read. - Present in original active region
    Found NS500754:56:HCVLTAFXX:2:21206:14752:20124 1/2 49b aligned read. - Present in assembly active region
    Found NS500754:56:HCVLTAFXX:2:21206:14752:20124 2/2 49b aligned read. - Present in assembly active region
    Found NS500754:56:HCVLTAFXX:2:21206:14752:20124 1/2 49b aligned read. - Present in trimming result
    Found NS500754:56:HCVLTAFXX:2:21206:14752:20124 2/2 49b aligned read. - Present in trimming result

    I don't know why Mutect has filtered out these reads

    you will find enclosed :
    the IGV snapshot
    the tumor bam file in a zip file
    the normal bam file in a zip file
    the bed file in a zip file
    and the log file

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @arnaud,

    Try reducing the --minDanglingBranchLength option from its default of 4 to three. You variant is within 4 bases of the right-end of the graph and so MuTect2 does not attempt recovery of what it considers a dangling branch.

    Branches in the graph must connect to the reference graph at the start and end. Those that do not are called dangling branches. In recovery, MuTect2/HaplotypeCaller will try adding additional sequence upstream or downstream of the variant allele to pad and close the graph, i.e. reconnect dangling heads. Again, the default minimum length of a dangling branch to attempt recovery is 4. Your synthetic amplicon data appears not to offer additional sequence for padding around the variant. And so you are left with changing the --minDanglingBranchLength option if you want to recover the variant at the end of the alignment block.

    Thanks @EADG for the FAQ link.

  • arnaudarnaud Member

    Thank you
    i did not mention it, but i already tried to change minDanglingBranchLength/minPruning parameters, but without succes.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @arnaud
    Hi,

    If you look at the bamout file, the variant from the tumor original BAM file disappears. This means that after reassembly, the realigned reads do not support the variant. That is why the variant is not called in the final VCF.

    -Sheila

Sign In or Register to comment.