We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!
Base quality score recalibration makes all reads low quality?

Hello,
I am running the GATK pipeline on barley exome capture data. In order to verify that we are doing a good job with realigning around indels and are calling true variants, we are comparing our alignments to known variants from Sanger resequencing. We are using the latest version of consed (26) to view our BAM files. However, we noticed that after base score quality recalibration, almost every base seems to be marked as Phred20. Before recalibration, our scores were Phred30 and above, and our alignments looked relatively clean.
Is there an option in the base quality recalibrator tool that would affect this? How can we avoid tamping down our quality scores so aggressively? Thank you.
Best Answer
-
Geraldine_VdAuwera Cambridge, MA admin
Yes, possibly -- it's definitely worth a shot to try and bootstrap a set of variants from your data in addition to the knowns you have.
Answers
It's hard to say, it depends on the data. What did the pre/post recalibration plots look like?
Thank you for the fast response. Attached are the plots generated from AnalyzeCovariates and our recalibration report. It looks like we have reads with reported qualities up to Phred40, but for some reason, they are being adjusted down to about Phred25.
Based on the plots there are two possibilities: either the base quality calls in your data are overestimated (the machine was too optimistic about its own accuracy) and therefore the recalibrator is correct to tamp down the qualities, or the set of known variants you are using is insufficient and a lot of true variation is getting counted as errors. What are you using as known variants?
We only have a set of about 8,000-9,000 known variants from a SNP genotyping array, and a handful of Sanger resequenced loci. Could this be a problem? Would calling SNPs, and using the high confidence variants in a second round of the BQSR help?
Yes, possibly -- it's definitely worth a shot to try and bootstrap a set of variants from your data in addition to the knowns you have.
Thank you, we will try that and report back here.