The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

DepthOfCoverage producing "extremely high quality score" error

igorigor New YorkMember
edited April 2013 in Ask the GATK team

I am running GATK DepthOfCoverage one a bunch of samples (sequenced using Illumina MiSeq). For one of the samples, I am getting the following error:

ERROR MESSAGE: SAM/BAM file SAMFileReader{<file.bam>} appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 61; please see the GATK --help documentation for options related to this error

I found a suggestion that I should be using fixMisencodedQuals flag in this case. However, if I do that, "the engine will simply subtract 31 from every quality score as it is read in". That will fix this error, but then all my quality scores will be incorrect.

Furthermore, the BAM file I am using was last recalibrated with GATK. Why is GATK producing files with invalid quality scores?

What should I do?

Any help would be much appreciated.

Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Igor,

    It is unlikely that the GATK "produced" the bad quals. Most likely, Base Recalibrator just took the quals in your data as they were, and recalibrated (adjusted them upwards or downwards) as appropriate.

    Ideally you should check your original fastq files and see what the quals encoding is in there. If they're in the old (deprecated) encoding, then you can simply use the fixMisencodedQuals flag -- the resulting quals will all be correct. The new encoding is just like the old one but offset by 31. If not, let me know and I will try to help you figure out what's happening.

  • igorigor New YorkMember

    These are definitely the new encodings. I've never gotten this error before and it occurs for only one of the files in the batch (they were all processed the same way at the same time).

    With a different set of intervals, there is no error, so I think this might only be happening at very few positions for some reason.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Actually, let me qualify that -- newer versions of GATK shouldn't be producing bad quals, but some of the older ones did, so if your files were processed with an older version, then it is possible that it was the GATK's fault.

    For running DoC the quals don't matter so you can safely ignore the error message. In the docs you will find a flag to override the error.

  • igorigor New YorkMember
    edited April 2013

    I am using GATK 2.3-6 for both recalibration and DoC. I am surprised if the qualities are problematic prior to realignment or recalibration, why didn't GATK complain then (which is why I assumed the faulty qualities were introduced by GATK)?

    Which flag should I be using? Would it affect minimum base quality setting?

  • I got the same error while using RealignerTargetCreator in GATK v2.5-2. However, fastqc output shows that the encoding of my bam file is Illumina 1.5 and the quality score is from 0 to 40.

  • CarneiroCarneiro Charlestown, MAMember

    can you take a look at a read from your bam file to see the qual encoding? It may be a bug on fastqc.

  • Here is a read from the fastq file, does it help in resolving that error?

                    @A807HEABXX:3:1:1345:2117#TTAGGCAT/1
                    GTTGCTGTCTTCCTGCTTGCATTCTGAGTCTGGCATCCTTTCTGTTCCTGGGCTTAACCATGCTTTTCCTGCCTTCAGACTTTTGTATCA
                    +
                    ggfgggggeefggggggdgcgedfdfafbdegefeeggggegggccggg\gdee`\acbcdaadaeebc^MKRYYc^dbcZccccO\]]a
    
  • CarneiroCarneiro Charlestown, MAMember

    these are illumina new encodings. Probably a bug with FastQC.

  • hi, Mauricio, thanks for the reply! fastqc says the encoding is illumina 1.5, do you mean that is wrong? are those quality scores "extremely high"? Thanks!

  • CarneiroCarneiro Charlestown, MAMember

    If I remember correctly 1.5 is the letter encoded quality scores (starting at 59). They have reverted back to the sanger standard after 1.8. So yeah, you need to reencode your quality scores before using this file with the GATK.

  • CarneiroCarneiro Charlestown, MAMember

    the flag is -allowPotentiallyMisencodedQuals

Sign In or Register to comment.