Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MuTect2 high insertion counts?

FourieFourie Member
edited May 2016 in Ask the GATK team

Hi Folks

I am using Mutect2 to analyze blood vs. FFPE tumor samples (breast cancer).

I am getting (what I think are) unusually high insertion:SNV ratios - the ratio is between 2:1 and 3:1, thus high numbers of insertions.
The deletion:SNV ratio is between 0.1:1 and 0.25:1.

I was wondering if anyone else had experienced something similar or had any advice / comments?

Best regards,

Fourie

Tagged:

Issue · Github
by Sheila

Issue Number
885
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Comments

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Fourie
    Hi Fourie,

    I think it will be easiest to diagnose what is going on if you can share some test data with us. Instructions are here.

    Thanks,
    Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Fourie, nice to see you around!

    To elaborate a bit on Sheila's response, because this a dataset-wide observation (as opposed to a discrete problem such as my team is better equipped to handle), we consulted a couple of friendly analysts from the cancer group informally. This didn't ring any bells as a thing they would have seen previously, and they indicated that it's not something they would expect to see systematically, so it's very possible that this could be an artifact pattern (possibly due to FFPE). But ultimately it's really hard to say anything with confidence without seeing the data, hence Sheila's request for some test data. However, there's not really anything we could do for you even if you could share the whole dataset, because this would have to go to an experienced analyst from the cancer group for formal review -- and at this time I'm afraid we don't have the resources to arrange anything like that. I'm hopeful that others in the community who have been using MuTect2 might chime in here of course.

  • kmhernankmhernan Chicago, ILMember

    @Geraldine_VdAuwera I'm finding particularly high and particularly long insertions for only a subset of cancers from the TCGA. It seems like it is related to the soft clipped reads. Still investigating.

  • kmhernankmhernan Chicago, ILMember

    @Fourie could you look at some of those insertions in IGV with allowing soft clipped reads to be seen and report what you see?

  • kmhernankmhernan Chicago, ILMember

    ok some updates on my end @Sheila and @Geraldine_VdAuwera ... it seems like this is mostly happening in WGA data from TCGA. I have analyzed all the TCGA data in total and we see the pattern of high frequency and large insertions mostly attributed to the WGA amplification samples. These seem to have lots of soft clipped reads and I believe they are leading to false/artifactual indels. Some manuscripts talk about chimeras forming by neighboring amplicons randomly connecting on the same chromosome (see: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4527218/ and http://bmcbiotechnol.biomedcentral.com/articles/10.1186/1472-6750-7-19). This is probably even more prevalent in the older libraries using this technique (REPLI-G from Qiagen are the problematic ones in TCGA). I'm not sure if that gives us enough precedence to use the flag to ignore soft clipped reads. Do y'all have any suggestions? I can't figure out any way to post-calling filter out the indels because no information about soft clipped reads is present in the vcf and many tools for counting (e.g., bam-readcounts, sambamba depth, ignore soft clipped reads). Let me know your thoughts.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hmm, interesting. It's a difficult call to make because if you ignore the soft-clipped sequence, you'll lose the ability to call legitimate indels on the larger end of the size spectrum. There is one softclip-related case figure that can be addressed up front -- if the reads are clipped on both ends, there's a read filter that you can enable to ignore those -- but I assume those are probably not a majority of what you're seeing.

    Are these neighboring amplicons too close together to be able to identify chimeric read pairs based on abnormal insert size?

  • kmhernankmhernan Chicago, ILMember

    Thanks @Geraldine_VdAuwera . I am not sure about the insert size comment, I will have to look more deeply into it. However, I am certainly still seeing some TCGA samples (non WGA amplified) that have upwards of 80% insertions and some with seemingly strangely large insertions as called by mutect2. I may test that read filter, but it seems that most have a 3' softclipped chunk of the read. Is there anyway to get statistics about the number of supporting soft clipped bases? That may help filter?

    Issue · Github
    by Sheila

    Issue Number
    1274
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I think the ClippingRankSum annotation might do the trick -- it should tell you whether the reads supporting the variant show a disproportionate amount of soft-clips compared to reads supporting the reference.

  • kmhernankmhernan Chicago, ILMember

    @Geraldine_VdAuwera and @Fourie we published a report about this on the GDC https://gdc.cancer.gov/about-gdc/scientific-reports/mutect2-insertion-artifacts and recently a user informed us that they did a pretty extensive investigation of it across many cancers and have a paper under review with a biorxiv pre print here http://biorxiv.org/content/early/2016/12/08/092163

Sign In or Register to comment.