Getting insertion counts

brannonabrannona Member Posts: 4


For matched tumor and normal pairs, we easily get insertion and deletion counts from the output of Somatic Indel Detector in GATK. However, when we run multiple samples from the same patient, sometimes calls are made in one sample but not another, so we might not have the numbers for all samples for all indel events. We can get the deletion counts from Depth of Coverage in GATK, but retrieving insertions is trickier.

Does you have a suggestion for how to solve this problem in an automated (ie non-IGV fashion)?

Additionally, as DepthofCoverage is being retired, what do you recommend that we use for getting SNP and deletion counts?

Thank you

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,428 admin

    Hi there,

    DepthofCoverage is actually getting a reprieve -- we won't retire it until DiagnoseTargets is able to completely take over the DoC functionality.

    Unfortunately we don't have experience with cancer / somatic mutations, so we can't really advise you on this topic. Perhaps someone in the user community can give you some pointers.

    Geraldine Van der Auwera, PhD

  • brannonabrannona Member Posts: 4

    I'm glad to hear that DoC will remain active for a while.
    My other question does not require any knowledge of cancer or somatic mutations, so I apologize for not being concise. Reworded: Is there a GATK tool that I can use to get counts of specific indels? (Something like BaseCounts or DoC for indels.)
    Thank you.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,428 admin

    Do you mean counting in how many of the patient's samples a specific indel occurs? If so I don't think we have a specific tool to do that, but you could just call indels on the interval where the indel occurs, then use the variant manipulation tools to find out the counts. Does that make sense?

    Geraldine Van der Auwera, PhD

  • brannonabrannona Member Posts: 4

    Hi Geraldine,
    I mean counting how many of the reads in a bam or sample does a specific indel occur. The issue is that while it may occur in that sample, it may be below the threshold of what UnifiedGenotyper would call. For example, if there's only 2 indels out of 634 reads, UnifiedGenotyper would likely not call that, but we still need to retrieve that data.
    Thank you.

  • brannonabrannona Member Posts: 4

    Thank you! I'll need to play around with the read filters a bit, but I think this will work.

  • alirezakjalirezakj Member Posts: 61

    Hi Geraldine,

    In the DoC tool there is an option for counting the bases called --printBaseCounts and another for counting deletions called --includeDeletions but there is nothing for counting insertions! I have some ultra-deep sequencing data and I would like to count the bases, insertions and deletions per base. Is it possible to do this on GATK, if so which tool? In IGV if you mouse over the top it would show the coverage of C,G,T,A and Ins and Del for each base. I want to do basically the same thing IGV dose but for all regions printed in a form of DoC output, unfortunately, DoC dose it all except the insertions!


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,428 admin

    Hi @alirezakj,

    Sorry for the late reply, I thought I had responded to this already.

    Basically there is no way to do this for insertions since DoC counts everything relative to positions on the reference, but by definition the inserted bases do not register at any of the reference positions.

    Geraldine Van der Auwera, PhD

