Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Missing variant using RNA-seq best practices

RMuletRMulet BarcelonaMember

Hi,

I am doing variant calling on RNA-seq data processed according to the RNA-seq best practices guide published on this site. In general I am satisfied with the results, but I am missing one variant in a sample that should clearly be there -- not only do I see it on IGV but it's also been detected by targeted DNA sequencing.

When I use either HaplotypeCaller or Mutect2 in a BAM file prior to processing the variant of interest (see picture above, first track on IGV) is found without any issues. However, after STAR 2nd pass, Split'N'Trim and base recalibration (second track), both tools fail to do so; interestingly, other programs like bcftools mpileup + bcftools call do find this variant. I have also tried --disable-tool-default-read-filters and other permissive options in Mutect2, to no avail.

My personal feeling about this is that it has to do something with the clipping almost next to the variant. Indeed, running Mutect2 with various intermediate files reveals the variant stops being detected after the Split'N'Trim step, which introduces the hard clipping.

Is this the expected behaviour? It does not make much sense to me, obviously there will be clipping close to the end of the exon... How to deal with such cases?

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited June 14

    Hi @RMulet

    My guess is that most of the variants you don't see with Haplotypecaller is due to insufficient envidence for variants in that region. This could be due to low quality reads etc.
    Here is a document that explains how Haplotypecaller performs Local re-assembly and haplotype determination for variant calling. Also Haplotypecaller has parameters that can be tweaked to be more sensitive to calling variants which can be found in the tool documentation page (warning, this increases the chance of false positives). Another resource to see what Haplotypecaller is doing behind the scenes is this doc "(howto) Generate a "bamout file" showing how HaplotypeCaller has remapped sequence reads".

    Take a look at these docs it might give you an insight into what the problem here might be.

  • RMuletRMulet BarcelonaMember

    Thanks for the response. However, as I explained in my previous message, running Haplotype caller on the same BAM file prior to the Split'N'Trim step returns that variant. So unless that step messes up the quality of the RNA-seq reads somehow, I fail to see what's happening.

    In any case, I will generate a bamout file as you suggested, maybe that will give me some answers. As for the sensitivity of the variant calling, I already tried several options with little success.

Sign In or Register to comment.