BQSR for RNA-seq


I am performing BQSR on RNA-seq data for the purpose of SNP calling. I was wondering about some issues:

  1. My organism does not have a known set of SNPs. I asked this question before in the forum and accordingly, I am using a set of SNPs filtered as the input of knownsites for BQSR. In filtering, some SNPs have a tag 'SNPcluster', shall I use this file or shall I somehow filter the file so only the PASS SNPs are retained?

  2. I have performed SNP calling only on a subset of my bam file because I was interested in certain chromosomes. Would that be fine if I only use this subset of SNPs and try to recalibrate only the subset of bam file not the entire bam? I am asking this question in case this can somehow create a bias for final results.

Thanks in advance for your help!


  • swongswong Phoenix, AZMember

    Hello, I am also performing BQSR on RNA-seq data for SNP calling. In the BQSR tutorial 2 passes to analyze covariation is performed to generate a recal_data.table and a post_recal_data.table. Which one would I use for the applying recalibration to my RNASeq data? I am assuming I should use the post_recal_data.table. Am I correct in my thinking? Thank you so much!

  • swongswong Phoenix, AZMember

    Actually, let me be more specific. Is the post_recal_data.table only used for generating the before and after plot? In the BQSR tuturial it is not mentioned again. Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    edited June 2015

    @swong That's right, the post recal table is only for plotting/QC purposes. The first recal table is what you should use to do the recalibration.

