Trying to run BQSR with mm10 genome and getting "Bad input: while fixing mis-encoded base qualities"

jprestonjpreston Eugene, ORMember


I've had good luck running BQSR on E. coli and C. remanei in the past, but mouse is giving me some trouble. It works if I remove the "--fix_misencoded_quality_scores," but I'm not sure if that is very helpful, as I thought that was the point of running BQSR.
I have formatted the mm10 reference as suggested, with the chromosomes in the correct order and the chrom labels as just numbers without "chr", but I keep getting the same error. For the known sites vcf, I just use a vcf generated from the same, non-calibrated bam file, generated with the program Lofreq.

Thanks in advance for any help. Here is my input and error message:

$ java -jar /usr/local/packages/GATK/2.6-4/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home13/jpreston/genomes/sorted_mm10/sorted_mm10_2.fa -I /home9/anniep/dnaseq/bowtie/242_normal_sorted.bam -knownSites /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf --fix_misencoded_quality_scores -o /home9/anniep/dnaseq/bowtie/242_normal_sorted_recal.table
INFO 21:42:19,395 HelpFormatter - --------------------------------------------------------------------------------
INFO 21:42:19,397 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.6-4-g3e5ff60, Compiled 2013/06/24 14:48:56
INFO 21:42:19,397 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 21:42:19,397 HelpFormatter - For support and documentation go to
INFO 21:42:19,401 HelpFormatter - Program Args: -T BaseRecalibrator -R /home13/jpreston/genomes/sorted_mm10/sorted_mm10_2.fa -I /home9/anniep/dnaseq/bowtie/242_normal_sorted.bam -knownSites /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf --fix_misencoded_quality_scores -o /home9/anniep/dnaseq/bowtie/242_normal_sorted_recal.table
INFO 21:42:19,401 HelpFormatter - Date/Time: 2016/04/10 21:42:19
INFO 21:42:19,401 HelpFormatter - --------------------------------------------------------------------------------
INFO 21:42:19,402 HelpFormatter - --------------------------------------------------------------------------------
INFO 21:42:19,413 ArgumentTypeDescriptor - Dynamically determined type of /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf to be VCF
INFO 21:42:19,982 GenomeAnalysisEngine - Strictness is SILENT
INFO 21:42:20,069 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 21:42:20,076 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 21:42:20,134 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.06
INFO 21:42:20,168 RMDTrackBuilder - Loading Tribble index from disk for file /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf
WARN 21:42:20,288 RMDTrackBuilder - Index file /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf.idx is out of date (index older than input file), deleting and updating the index file
INFO 21:42:20,394 RMDTrackBuilder - Creating Tribble index in memory for file /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf
INFO 21:42:21,123 RMDTrackBuilder - Writing Tribble index to disk for file /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf.idx
INFO 21:42:29,567 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 21:42:29,571 GenomeAnalysisEngine - Done preparing for traversal
INFO 21:42:29,571 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining
INFO 21:42:29,596 BaseRecalibrator - The covariates being used here:
INFO 21:42:29,596 BaseRecalibrator - ReadGroupCovariate
INFO 21:42:29,596 BaseRecalibrator - QualityScoreCovariate
INFO 21:42:29,596 BaseRecalibrator - ContextCovariate
INFO 21:42:29,597 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3
INFO 21:42:29,597 BaseRecalibrator - CycleCovariate
INFO 21:42:29,600 ReadShardBalancer$1 - Loading BAM index data
INFO 21:42:29,600 ReadShardBalancer$1 - Done loading BAM index data
WARN 21:42:30,549 RestStorageService - Error Response: PUT '/GATK_Run_Reports/' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 988, Content-MD5: d3m5ghuffx5ZqbXFYdU8eQ==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: 7779b9821b9f7f1e59a9b5c561d53c79, Date: Mon, 11 Apr 2016 04:42:29 GMT, Authorization: AWS AKIAIMHBU7X642TCHQ2A:43FosbVvbsP2X/SqzsJ7PhY1w80=, User-Agent: JetS3t/0.8.1 (Linux/2.6.32-358.23.2.el6.x86_64; amd64; en; JVM 1.7.0_80), Host:, Expect: 100-continue], Response Headers: [x-amz-request-id: FE8C7B3005828024, x-amz-id-2: euWnPAZr6BuQp05KFLPCkdLpzrcwXMCEWm3Tlfjk2lGbgP89RENyKG387IdN+YcAXsa58zB7zzk=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Mon, 11 Apr 2016 04:42:29 GMT, Connection: close, Server: AmazonS3]

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.6-4-g3e5ff60):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions
ERROR MESSAGE: Bad input: while fixing mis-encoded base qualities we encountered a read that was correctly encoded; we cannot handle such a mixture of reads so unfortunately the BAM must be fixed with some other tool
ERROR ------------------------------------------------------------------------------------------

Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Accepted Answer

    That argument is not related to BQSR, it's meant to adjust the quality encoding schema when using data encoded with older schemas. You should definitely remove it from your command line.

Sign In or Register to comment.