Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Errors about misencoded quality scores

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited November 2015 in Solutions to Problems

The problem

You get an error like this:

SAM/BAM/CRAM file <filename> appears to be using the wrong encoding for quality scores

Why this happens

The standard format for quality score encodings is that Q0 == ASCII 33 according to the SAM specification. However, in some datasets (including older Illumina data), encoding starts at ASCII 64. This is a problem because the GATK assumes that it can use the quality scores as they are. If they are in fact encoded using a different scale, our tools will make an incorrect estimation of the quality of your data, and your analysis results will be off.

To prevent this from happening, the GATK engine performs a sanity check of the quality score encodings that will abort the program run if they are not standard (since version 2.3), and output the error message shown above.


If this happens to you, you'll need to run again with the flag --fix_misencoded_quality_scores / -fixMisencodedQuals. What will happen is that the engine will simply subtract 31 from every quality score as it is read in, and proceed with the corrected values. Output files will include the correct scores where applicable.

Related problems

In some cases the data contains a mix of encodings (which is likely to arise if you're passing in a lot of different files from different sources together), and the GATK can't automatically compensate for that. There is an argument you can use to override this check: -allowPotentiallyMisencodedQuals / --allow_potentially_misencoded_quality_scores; but you use it at your own risk. We strongly encourage you to check the encodings of your files rather than use this option.

Post edited by Geraldine_VdAuwera on


  • jlrfloresjlrflores Member ✭✭

    I am using FastQC to evaluate the results of -fixMisencodedQuals on my data. I have a dataset that, according to FastQC, has Illumina 1.5 encoding. After applying -fixMisencodedQuals, GATK proceeds normally, however the FastQC results look strange. The quality score distribution appears to be totally squashed, and it does not appear that a simple "subtract 31" was applied. Before/after pictures attached.

    The sample was sequenced on multiple technologies, and this dataset has a very high error rate after processing. I also have results for the same sequence data on an earlier version of GATK and the results look better, the error rate is lower.

    Is there an alternative tit "-fixMisencodedQuals"? Am I the only one observing this strange behavior?


    Issue · Github
    by Sheila

    Issue Number
    Last Updated
    Closed By
  • SheilaSheila Broad InstituteMember, Broadie admin


    I am getting Geraldine's opinion before I post a final response here. We will get back to you soon.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    How did you apply the encoding fix? As part of a separate command? Can you list any and all commands that were applied to this dataset?

    What kind of data is this?

Sign In or Register to comment.