Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

BQSR with MuTect2: use it or not ?

alaabadrealaabadre FranceMember
edited June 2017 in Ask the GATK team

Hello,

I've been reading some threads on the forum about BQSR with MuTect2. I know it has been proposed in Best-Practices uses. However, there were a lot of mixed comments and I can't find a clear conclusion on whether to use BQSR with MuTect2 since MuTect2 takes into consideration the base quality score, and that's what BQSR does. I am working on 18 human samples matched normal and tumor. Those samples have been exome-sequenced. I am using MuTect2 from GATK 3.7 stable version. I generated results using the proposed pipeline here. I used the following inputs:

  • Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
  • dbsnp_138.hg19.vcf
  • hg19_ref_genome.fa

Following this thread here for example, I am worried that potential true variants could be altered due to recalibration.

I also have another doubt, in BQSR thread, I just want to make sure that BQSR does NOT change the base of the variant itself but it just assigns a low base quality score if it gets recalibrated.

I have analyzed commands ran by The Cancer Genome Atlas and they actually use BQSR in their workflow. So finally, I would like to know if it safe to use BQSR with MuTect2 ? It is better to have multiple dbSNPs to avoid having mismatches of potential variants (for example, I have downloaded from NCBI all kwown SNPs of the human ~ 57GB vcf file) ?

Thank you in advance !

Tagged:

Best Answers

Answers

  • alaabadrealaabadre FranceMember

    @Geraldine_VdAuwera said:
    We do recommend running BQSR for cancer samples, yes. The BaseRecalibrator only re-evaluates base quality scores, and does note ever change the base calls themselves.

    If you're worried about high mutation rates you can include Cosmic as a known sites resource.

    @Geraldine_VdAuwera thank you for your reply. I got one more question, do you mean including the Cosmic file during the BQSR first and second step of the recalibration as known sites resource ? Thank you !

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Accepted Answer

    Yes that's what I meant -- add them in addition to dbsnp. You're welcome!

  • cyriaccyriac Member

    Hello. Unlike germline variants, somatic variants are sporadic across the genome, and rarely re-occur at the same position. dbSNP is a database of positions where we are most likely to find germline events, and hence ignored by BQSR. But cosmic is not a database of "positions where we are most likely to find somatic events". Recurrently somatic mutated "hotspots" in cancer are important, but they are the exception - most somatic mutations are spread out randomly across the genome, and BQSR would treat them as artifacts - and re-evaluate their base quality scores. I do not think BQSR should be in the best-practices for a somatic variant calling pipeline. Let us know otherwise.

    Maybe MuTect2 is smart enough to use the uncalibrated BQ scores that BQSR preserves in the BAM file, but most somatic variant callers will use the recalibrated scores and suffer from (hopefully minor) loss in sensitivity.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited January 24

    Hey @cyriac

    You made a very good point and so I reached out to our dev team and this what they had to say:

    The user has a good point -- COSMIC is not adequate to cover a significant proportion of somatic variation and therefore most somatic variants will be counted as errors in BQSR.
    But now let's consider whether this matters.
    A typical tumor will have a somatic mutation rate of about one in a million sites.
    Even if these are all clonal hets and the tumor is pure, we get one extra "error" per two million bases.
    Given that a typical base quality score is about 30 (one error in a thousand bases), this difference is negligible compared to the benefit of BQSR.
    It would only start to become relevant at very high base qualities of 50+, and perhaps Maddy and/or Mark F could tell you whether BQSR is omitted in their consensus calling pipeline which has very high base qualities like this.
    Now you could also imagine an extremely active tumor having a mutation rate of one in ten thousand, but first this is still only relevant if your base qualities are around 40, which is atypical, and second, there is no way these are all clonal, so the great majority of these somatic mutation sites only have a few non-ref bases.
    So, in conclusion BQSR is recommended for tumor samples except perhaps with abnormally high base qualities not found in standard Illumina sequencing.

  • cyriaccyriac Member

    Thanks @bhanuGandham - the devs appear to acknowledge that recall rate in hypermutated tumors may be affected, but also makes a good case that the effect is minimal. For reference, this concern arose ~5 years ago on biostars, and this should finally put it to rest.

Sign In or Register to comment.