We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Question about GATK4 SplitNCigarReads tool

Hi, I used the GATK SplitNCigarReads tools to process RNA-Seq data, which is said to reduce the false positive rate. Then, the processed data was used for SNP calling(by using variant calling tools in GATK). However, after annotating the SNP calling result with GTF file. It shows that only 20%~30% SNP sites locate in the exonic region. I was wonder about it. Ideally, most of the SNP sites may locate in the exonic region. Could u please help me solve this puzzle?


  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @hubs

    This forum deals with questions related to GATK tools specific questions. Your question is more regarding your dataset. Your question will be better suited in www.biostar.org or www.seqanswers.com

  • hubshubs Member
    I do not think it caused by my dataset. It just an RNA MeRIP-Seq INPUT sample which can be regarded as RNA-Seq data. Just use the RNA-Seq SNP calling method supported by 'Calling variants in RNAseq' in this forum. However, the result is not satisfactory.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin


    You see this % because there's probably reads outside of the exon. You could filter things outside of the exon out if they don't expect them. Or not call over those regions in the first place.
    SplitNCigarReads doesn't move the alignments, just splits them, so if their aligner put a bunch of reads in non-exonic regions, haplotypecaller (or Mutect) will call in those regions
    I hope this helps.

Sign In or Register to comment.