Do I need to correct my base qualities or not?

Hi all,

I have recently upgraded to GATK v 2.4-9-g532efad and am implementing a pipeline for indel-realignment and base-quality recalibration. I decided to use the -fixMisencodedQuals flag in RealignerTargetCreator but am getting the following error.

ERROR MESSAGE: Bad input: while fixing mis-encoded base qualities we encountered a read that was correctly encoded; we cannot handle such a mixture of reads so unfortunately the BAM must be fixed with some other tool

This implies that the qualities need to be fixed and there is no solution in GATK.

However, if I simply remove the -fixMisencodedQuals flag from my call, then the bam files proceed without any error! So I'm a little confused about whether they need to be corrected or not? Are they valid for analysis?



Best Answer


  • MarkDunningMarkDunning Member

    This explanation makes sense. I believe the bam files that were failing were written on a different quality scale.

    Would it be possible to have this situation give a warning rather than an error? I want to be able to process many different bam files using the same GATK options and I'd like to be able to fix the misencoded bams if it is required, and allow bams that do not need fixing to pass through.

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Hi Mark, sorry this is not possible. The situation you are describing is rather dangerous and I would highly recommend reprocessing the misencoded bams from scratch.

  • MarkDunningMarkDunning Member

    Hi Eric,

    I agree that the misencoded bams need to to be fixed.

    However, my "error" was due to bams that had quality scores on the correct scale, but I was attempting to fix them using the fixMisencodedQualiities flag.

    I might not know in advance if the bams will require correction or not, so would prefer to use the fixMisencodedQuals flag all the time and have the program not crash if the correction is not required.


  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Unfortunately that's not possible.

  • jlrfloresjlrflores Member

    Is this still the case, that GATK does not have the ability to determine if the quals are miscoded or not for a bam file?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    It can determine it but cannot decide by itself what it should do. This is done on purpose to force users to be aware of the provenance and characteristics of their data, and make a deliberate decision about how to process the data. This can be handled programmatically in a pipeline script if desired, outside of GATK itself.

