Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

quality score error

shinkenshinken IrapuatoMember

Hi I am using whole genome sequence data of sanger sequencing, and I am having the error:

we encountered an extremely high quality score of 68

One posible solution is to use the flag -allowPotentiallyMisencodedQuals

The only explanation about this flag that I found is: Ignore warnings about base quality score encoding, and is here

Thus if I use this flag I can conserve this "High quality score" bases for the analysis and SNP calling? Because that is what I want.

Thank you very much


Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    No, incorrectly encoded quality scores only make it seem like bases have very high quality, but it is not real. The problem is that some datasets use a different scale -- it's like they are counted in different units. Imagine if my height is measured in centimeters, I get a "score" of 170. Now if someone takes that score and thinks the unit is meters, they will think I am 170 meters tall. But I'm really not that tall! It is a measurement interpretation error, and could cause important problems later on (my tailor would make me huge shirts that would not fit me). So for your data, ignoring the problem is a bad idea too. It is better to run with the flag to fix misencoded quality scores (see documentation) so that you will have accurate estimation of the data quality. Otherwise you could get some inaccurate results.

  • shinkenshinken IrapuatoMember

    Thank you very much, but It looks that this flag "subtract 31 from every quality score as it is read in, and proceed with the corrected values". What I have is sanger sequencing is not illumina, thus If I understand well I have Phred+33. Therefore, I don't need to correct this values. is this right? or What do you suggest?

    Thank you very much

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The technology used for sequencing doesn't necessarily determine the quality score encoding. But anyway, if the program is detecting high values (starting at 64) then you most probably have the wrong encoding (relative to what GATK expects). If you're skeptical you can run a QC tool to determine what is the range of base quality values present in your data. Some more details about this is described here:

    In short, I suggest applying the correction, because otherwise your data values will be interpreted incorrectly by GATK.

  • shinkenshinken IrapuatoMember

    Thank you very much for the answers. I use fastqc, following your recommendation to use a QC tool, and the program says that I have Sanger / Illumina 1.9. Thus I am still confused about to correct using the flag --fix_misencoded_quality_scores. Is not possible to have high quality scores in sanger sequencing in phredd+33? this is because several of my bases have a quality score around 30, with some of them above 64.

Sign In or Register to comment.