BQSR with MuTect2: use it or not ?

alaabadrealaabadre FranceMember
edited June 2017 in Ask the GATK team


I've been reading some threads on the forum about BQSR with MuTect2. I know it has been proposed in Best-Practices uses. However, there were a lot of mixed comments and I can't find a clear conclusion on whether to use BQSR with MuTect2 since MuTect2 takes into consideration the base quality score, and that's what BQSR does. I am working on 18 human samples matched normal and tumor. Those samples have been exome-sequenced. I am using MuTect2 from GATK 3.7 stable version. I generated results using the proposed pipeline here. I used the following inputs:

  • Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
  • dbsnp_138.hg19.vcf
  • hg19_ref_genome.fa

Following this thread here for example, I am worried that potential true variants could be altered due to recalibration.

I also have another doubt, in BQSR thread, I just want to make sure that BQSR does NOT change the base of the variant itself but it just assigns a low base quality score if it gets recalibrated.

I have analyzed commands ran by The Cancer Genome Atlas and they actually use BQSR in their workflow. So finally, I would like to know if it safe to use BQSR with MuTect2 ? It is better to have multiple dbSNPs to avoid having mismatches of potential variants (for example, I have downloaded from NCBI all kwown SNPs of the human ~ 57GB vcf file) ?

Thank you in advance !


Best Answers


  • alaabadrealaabadre FranceMember

    @Geraldine_VdAuwera said:
    We do recommend running BQSR for cancer samples, yes. The BaseRecalibrator only re-evaluates base quality scores, and does note ever change the base calls themselves.

    If you're worried about high mutation rates you can include Cosmic as a known sites resource.

    @Geraldine_VdAuwera thank you for your reply. I got one more question, do you mean including the Cosmic file during the BQSR first and second step of the recalibration as known sites resource ? Thank you !

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Accepted Answer

    Yes that's what I meant -- add them in addition to dbsnp. You're welcome!

  • cyriaccyriac Member

    Hello. Unlike germline variants, somatic variants are sporadic across the genome, and rarely re-occur at the same position. dbSNP is a database of positions where we are most likely to find germline events, and hence ignored by BQSR. But cosmic is not a database of "positions where we are most likely to find somatic events". Recurrently somatic mutated "hotspots" in cancer are important, but they are the exception - most somatic mutations are spread out randomly across the genome, and BQSR would treat them as artifacts - and re-evaluate their base quality scores. I do not think BQSR should be in the best-practices for a somatic variant calling pipeline. Let us know otherwise.

    Maybe MuTect2 is smart enough to use the uncalibrated BQ scores that BQSR preserves in the BAM file, but most somatic variant callers will use the recalibrated scores and suffer from (hopefully minor) loss in sensitivity.

Sign In or Register to comment.