We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Base Recalibration with different dbSNPbuild

Hi,
I wanted to clarify that if I can run BaseRecalibration step using dbSNPbuild 144 on already calibrated bam files with different dbSNP build (138). I am not sure whether it is advisable or will give a drastic change in variation calling? If I could get an insight on this will be really helpful.
Looking forward to your response.
Thanks,
Best Answer
-
shlee Cambridge ✭✭✭✭✭
Hi @Jaspreet,
Although from years ago, it looks like this discussion could be relevant.
Remember that BaseRecalibrator uses the dbSNP file to mask sites that it wants to ignore. If you use a version of dbSNP that has more sites (v144 vs. v138), then sites that have counted towards recalibration (not covered by v138) are now (with v144) masked and not counted towards recalibration. So you have less data that is being counted towards calculations. However, the common variant sites are a fraction of the bases that we expect should be reference. Remember also that BQSR looks for wide-spread and systematic errors so the loss/gain of these sites should impact the final adjustments minimally. As to using BaseRecalibrator on an already calibrated BAM file, I'm not sure of the impact. Perhaps not much will change with one or two additional iterations but things could morph into something monstrous and unexpected if you iterate tens or hundreds of times over the same data. Or perhaps bases maintain their adjusted qualities. I'm not all that familiar with BQSR's inner workings, sorry. I'll keep this question in mind for any future updates to the BQSR documentation. In the meanwhile, if you test it out and come to an answer, we would love to hear about the results.
Answers
@Jaspreet
Hi,
I am not sure I understand. Do you want to run BaseRecalibrator on already recalibrated BAM file? Or, do you want to run BaseRecalibrator on original BAM files with a different dbSNP?
Do not run BaseRecalibrator on already recalibrated BAM files
I am not sure if there will be much of a difference if you use a different version of dbSNP. We have not tested it ourselves and cannot comment. Please let us know of your findings, if there is anything interesting
-Sheila
@Sheila
Hi Sheila,
Thank you for your response. Yes, I have already calibrated BAM files (dbSNP138) but wanted to run BaseRecalibrator on these files using dbSNP144 again. I can try using dbSNP144 but as per your post running BaseRecalibrator again on calibrated file is not advisable.
I don't know whether this question will be appropriate but just wanted to clarify, what could be the possible outcomes using BaseRecalibrator on already calibated BAM files? According to my understanding the calibrated base qualities should not vary. I can be completely wrong but still if you can provide your view will be really helpful.
Looking forward to your response.
Thanks,
Jaspreet
Hi @Jaspreet,
Although from years ago, it looks like this discussion could be relevant.
Remember that BaseRecalibrator uses the dbSNP file to mask sites that it wants to ignore. If you use a version of dbSNP that has more sites (v144 vs. v138), then sites that have counted towards recalibration (not covered by v138) are now (with v144) masked and not counted towards recalibration. So you have less data that is being counted towards calculations. However, the common variant sites are a fraction of the bases that we expect should be reference. Remember also that BQSR looks for wide-spread and systematic errors so the loss/gain of these sites should impact the final adjustments minimally. As to using BaseRecalibrator on an already calibrated BAM file, I'm not sure of the impact. Perhaps not much will change with one or two additional iterations but things could morph into something monstrous and unexpected if you iterate tens or hundreds of times over the same data. Or perhaps bases maintain their adjusted qualities. I'm not all that familiar with BQSR's inner workings, sorry. I'll keep this question in mind for any future updates to the BQSR documentation. In the meanwhile, if you test it out and come to an answer, we would love to hear about the results.
Hi @shlee,
Thank you for your valuable insight. I'll try it and share the results if something interesting comes up