It looks like you're new here. If you want to get involved, click one of these buttons!
Here's what I'm running:
INFO 12:18:33,096 HelpFormatter - Program Args: -T BaseRecalibrator -I /home/sheenams/gatk_test/gatk2.2/H103.GATKinitialrmdup.srt.bam -
R /home/genetics/Genomes/gatk-bundle/human_g1k_v37.fasta -knownSites /home/genetics/Genomes/gatk-bundle/dbsnp_135.b37.vcf -cov ReadGroup
Covariate -cov QualityScoreCovariate -cov CycleCovariate -cov ContextCovariate -o /home/sheenams/gatk_test/gatk2.2/H103.recal_data.csv -
log /home/sheenams/gatk_test/gatk2.2/H103.gatk_log
Here's the error I'm getting
INFO 12:18:33,309 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 12:18:33,310 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining
INFO 12:18:33,353 BaseRecalibrator - The covariates being used here:
INFO 12:18:33,353 BaseRecalibrator - ReadGroupCovariate
INFO 12:18:33,354 BaseRecalibrator - QualityScoreCovariate
INFO 12:18:33,354 BaseRecalibrator - ContextCovariate
INFO 12:18:33,354 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3
INFO 12:18:33,354 BaseRecalibrator - CycleCovariate
INFO 12:18:33,355 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 3]
INFO 12:18:33,355 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO 12:18:33,355 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO 12:18:33,356 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 94, 3]
INFO 12:18:33,356 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO 12:18:33,356 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO 12:18:33,356 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 94, 1012, 3]
INFO 12:18:33,356 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO 12:18:33,356 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO 12:18:33,356 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 94, 2002, 3]
INFO 12:18:33,356 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO 12:18:33,356 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO 12:18:36,198 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Array length mismatch detected. Malformed read?
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateFractionalErrorArray(BaseRecalibrator.java:371)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:246)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:287)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:252)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:91)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:55)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.2-3-gde33222):
##### ERROR
##### ERROR Please visit the wiki to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Array length mismatch detected. Malformed read?
##### ERROR ------------------------------------------------------------------------------------------
I used Picards' ValidateSam script on my bam file, but it says its fine. How do I fix this error?
Thanks
ebanks
Posts: 478 mod
Hey everyone,
I've added some functionality to the GATK (which will be available in the 2.3 release) that automatically checks for bad encodings and allows the user to fix them on the fly. Just wanted to let you know.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
Answers
Try using the malformed read filter:
http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_filters_MalformedReadFilter.html
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •How do I actually implement that filter? I tried adding the following commands (one at a time) but they all give errors:
-filterMBQ--filter_mismatching_base_and_quals--read_filter MalformedRead- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •What error do you get?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Using the --read_filter MalformedRead:
Using -filterMQB or --filter_mismatching_base_and_quals:
I'm appending these options to the end of the original code I posted before. Like so:
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Oh, my bad, I forgot that the malformed read filter is applied by default by BaseRecalibrator. Can you isolate the reads that the program is choking on so we can have a closer look?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •How do I figure out which reads are causing it to choke? The print out above is all of the information the program provides.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •It seems that it happens within the first 1M reads since you don't even get to the first line of time estimates. You can run with
-L 1:1-1000000to limit the run to that first interval, confirm that the problem is in that region, then try to narrow it down.Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I narrowed it down to this read, but I don't know what is wrong with it or how to get the program to skip it. Any ideas?
HWI-ST873_0085:4:2102:19701:169876#CACCGG 99 1 710166 29 101M = 710334 264 GAGCTGGCCAGGCATGGTGGTTCACGCCTATAATTCCAACAGTTTGGGAGGCCGAGGCAGGCAGATAACTTGAGGTCAGGAATTCAAGACCAGCCTGGCCA [__eeeeefegfcfcggfafeggfhdggbegffhfhgXeca]Xa[effgbffdg
dgdabcaZPT_bcbdYYX]^]bccbbb_cbcc[``GWR[^] X0:i:1 X1:i:0 MD:Z:101 RG:Z:1 XG:i:0 AM:i:29 NM:i:0 SM:i:29 XM:i:0 XO:i:0 MQ:i:29 XT:A:UHWI-ST873_0085:4:2102:19701:169876#CACCGG 147 1 710334 29 5S96M = 710166 -264 CTAAAATCCCAACATTTTGGGGGGCGGGGGAAGGAAGATAATTTGAGGTAAGGAATTCAGGACCACCCTGGCAAACAGAGTAAAACCCTCTCTCGACTAAA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBeR
b[JWJ[JOKQSJ^YJ MD:Z:6G2C4A1A3C1A2T0G2T4C1C0C6C4G2T0G0A5G3A0A1C4T2G0G6C0G4T6 RG:Z:1 XG:i:0 AM:i:29 NM:i:27 SM:i:29 XM:i:27 XO:i:0 MQ:i:29 XT:A:M- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi there,
Could you please upload the snippet of your bam file containing this read? Having it as a bam rather than just the read data makes it easier for us to debug.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •The forum won't allow me to attach the bam. Here is the dropbox link. This is a snippet of the bam with some reads on either side of the read in question. (710166 - 710166)
https://www.dropbox.com/s/03g9mi0e7y0runy/H103.srt.bam
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I'm sorry, I meant to ask you to upload it to our FTP. Can you do that please, making sure to include your username in the filename? Thanks!
Here are the ftp details: http://www.broadinstitute.org/gatk/guide/article?id=1215
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I uploaded it as sheenams.bam
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Perfect, thank you. We'll have a look at this and get back to you.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I get the exact same error message, the read triggering the error seems is at position 10141 on chr 1, added my bam file to the ftp ( Tom_MalformedRead.bam )
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Noted, thanks. We'll notify you when we have a status update on this problem.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi @sheenams and @Tom - I've tried running the BaseRecalibrator on your two bam files and it works just fine with no errors...
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Are you using different parameters or another version of GATK? Here is what I use:
Just run that again, and attached text file with the output/errors I still get from GATK.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Ah, drat. I just embarrassingly managed to delete your bam file instead of copying it @Tom - coffee hasn't kicked in yet. Could I trouble you to upload it again? Many apologies.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •No problem, I've just uploaded it again for you.
Don't know if this is important but my java version info is:
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •That's helpful, thanks. Yes, the issue is with your JDK not the bam file. For some reason we lost this line from the prerequisites documentation entry: "Please be aware that at this time we only support the Sun/Oracle Java JDK." We'll update the documentation accordingly, but it looks like you will need to switch to the Java SE Runtime Environment.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •do you advise java 6 or 7 ?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •just changed my jre to the linux version from the oracle/java website, java -version now is
But I still get the same error.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Sorry Tom, we actually require java version 1.6; we're not supporting 1.7 yet.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •changed to 1.6, still no change, error still the same
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Noted -- sorry to make you jump through all these hoops. We'll have another look and get back to you with a status update.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Status update: we cannot reproduce the error on our end, so we aren't really going to be able to help you through this unfortunately...
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Would it be an issue then if I try to see if the base recalibration still works with older versions my bam files? but use the most recent tools again for further analysis (variant calling/Genotyper and VariantRecalibrator)?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I tried getting to find out more by debugging with the github source code on the line were the error is triggered:
In
BaseRecalibrator.javathe functioncalculateFractionalErrorArray( final int[] errorArray, final byte[] baqArray ) {has errorArray.length = 90and baqArray.length = 138
so for some reason it gets a baqArray.length diff then the read length of 90 and of course triggers the != length check.
Any clue were I should go look next? maybe its an issue with the ref genome I use? is that involved in setting the baqArray? if so what genome ref file did you use when you couldn't reproduce the error? The -R passed ref fasta file seems to be the only diff factor between me running GATK and giving an error and you not getting the error. (since java version is ok, GATK version is probably the same too, and the -knownSites files come straight of your ftp)
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Not sureTom, sheenams seemed to be using the reference from our bundle yet got the same error.
You can try a different version of just Base Recalibrator, yes. Try stepping back one sub-release at a time. Let us know how it goes.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi, I got the same error. I can run Base recalibration on the BAM files using GenomeAnalysisTK-2.1-11 but not with 2.2-5, 2.2-8 or 2.2-9 versions.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •@henrik_stranneheim, can you tell us what was your command line? And could you also upload a snippet of your bam file? Maybe if we find out what you three have in common we can track this bug down.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I'd like to ask everyone who's having this problem to please try the newly-released GATK 2.2-10. It contains a patch that might solve this problem (though we're not sure, since we can't yet reproduce it on our end).
David
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •No change for me with 2.2-10, ERROR:
When I look at the code where it reports the length mismatch, I did notice that the length of baqArray is probably not correct since if differs not only from errorArray length checked in the code but also from the actual read length. So maybe the value of baqArray is in error. But since since the baq stuff is handled in /sting/utils/baq/BAQ.java maybe the issue is not in the BaseRecalibrator.java specific code but with the utils code.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Tom,
Since we're unfortunately still not able to reproduce this on our end, could you please send us the actual contents of the baqArray and errorArray as reported by your debugger at the time the error occurs?
David
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Based on latest code from github repo for file org/broadinstitute/sting/gatk/walkers/bqsr/BaseRecalibrator.java Error occurs within this function
final double[] snpErrors = calculateFractionalErrorArray(isSNP, baqArray);because a length mismatch) These are the values for the read within the Long map function were the above function gets called:- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi all, I also experience the same problem. Here is my workflow: GSNAP > Picard Markduplicate > RealignerTargetCreator > IndelRealigner > BaseRecalibrator I am running java 1.6 and GenomeAnalysisTK-2.2-9-g54ae978 Below is the head of bam file generated by IndelRealigner:
@HD VN:1.0 GO:none SO:coordinate @SQ SN:1 LN:249250621 @SQ SN:2 LN:243199373 @SQ SN:3 LN:198022430 @SQ SN:4 LN:191154276 @SQ SN:5 LN:180915260 @SQ SN:6 LN:171115067 @SQ SN:7 LN:159138663 @SQ SN:8 LN:146364022 @SQ SN:9 LN:141213431 @SQ SN:10 LN:135534747 @SQ SN:11 LN:135006516 @SQ SN:12 LN:133851895 @SQ SN:13 LN:115169878 @SQ SN:14 LN:107349540 @SQ SN:15 LN:102531392 @SQ SN:16 LN:90354753 @SQ SN:17 LN:81195210 @SQ SN:18 LN:78077248 @SQ SN:19 LN:59128983 @SQ SN:20 LN:63025520 @SQ SN:21 LN:48129895 @SQ SN:22 LN:51304566 @SQ SN:X LN:155270560 @SQ SN:Y LN:59373566 @SQ SN:MT LN:16569 @RG ID:HeLa SM:HeLaSeq PL:Illumina @PG ID:GATK IndelRealigner VN:2.2-9-g54ae978 CL:knownAlleles=[] targetIntervals=/g/steinmetz/landry/helaseq/project/snvs/snvs_gatk_JL/2012.11.27/HeLa.dedup.RG.bam.chr.22.intervals LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15 maxReadsInMemory=150000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=null generate_nWayOut_md5s=false check_early=false noPGTag=false keepPGTags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null HWI-ST169_0159:8:23:7337:157085#0 99 22 16049926 40 91M1I5M4S = 16050088 263 CAGCCCACGCAGCACGTTCGAGAAATCTCACTTGTGGCGGGGTTCCCAGCTGTTGCCATGCAGCCTACGGGAAGAGCACTGATAAGTCCCACGGACTACGA ffdfffffffffffffdfffffffffffffffffcffeffbBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB X2:i:0 MD:Z:76AT18 RG:Z:HeLa NH:i:1 NM:i:2 SM:i:40 MQ:i:40 XQ:i:40
For some reasons, Novalign alignments are going through fine. I also tried to run the dedup file (after Markduplicate removal step) and I got the same error (I don't have the original alignment unfortunately).
Thanks for your help. Best, Jonathan
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •sorry the copy paste of the sam file was not really a good idea!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I'm getting the same error with BaseRealigner (using files that used to work fine)
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi all,
We would love to fix this but we literally do not have a file contributed from the community that will reproduce the error at the Broad. Mike -- are you able to create a small file and an exact command line that reproduces the error? If so, please upload it to us and we'll fix ASAP
Best,
Mark
-- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi everyone,
I just wanted to add to Mark's comment. We are looking for a file that when run with the standard BaseRecalibrator command will reproduce the error you all are seeing. While it would be ideal if we could isolate this problem to a snippet of the file (so that uploading is easy), we would even be willing to test on the full bam. But we need to make sure that the following command line fails (i.e. produced the error) on whatever BAM it is that you upload:
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi, Thanks for your email. The command that I used is the following (with java version "1.6.0_18"): java -Xmx5g -jar GenomeAnalysisTK-2.2-9-g54ae978/GenomeAnalysisTK.jar -T BaseRecalibrator --intervals 22 -R jl.ref.fa -I jl.chr22.bam --knownSites dbSNP137.vcf -o chr.22.grp
The 2 files (jl.ref.fa and jl.chr22.bam) are uploaded on your ftp server. Let me know if you need anything else.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Yes, please upload your bam index file plus the reference dict and index (fai) files. We want to make sure that we are using the exact same files as you are.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Do you have any ideas what in the source code could explain the diff array length for the same read data? (as posted above) Think with that info it should at least be possible to see where the wrong data comes from, so I can further debug that for you, even if you can't reproduce it. Because as it seems to me, especially if an other user reports similar error when using BaseRealigner, that it might be something handled in /sting/utils/baq/BAQ.java maybe the issue for the wrong BAQ length.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Ok, done!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •After some testing, I realized that my input bam file contains illumina (ASCII 64-126) quality score (which failed during BaseRecalibrator). If I convert the quality score to sanger quality (ASCII 33-126) then the error disappear during BaseRecalibrator. Is that helpful?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thank you, @JL69. We are still completely unable to reproduce the error with your data for some reason. However, it does help to know that your bam file used the wrong encoding for quality scores. Some points for everyone who reported this bug: 1. Do your bams also use the wrong encoding? 2. Can you try using version 2.2-15 once it gets released (sometime later tonight)? We added a check for such cases. 3. Are any of you by chance registered for the GATK workshop next week? Thanks again
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi, 2. Version 2.2-15 did report a bug about quality encoding format. Do you know any conversion tool to re-encode the quality to sanger? 3. No, sorry Thanks
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •@JL69: No, but it doesn't mean there aren't any.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •To reiterate what Eric has already said, if more people in this thread with this problem could try out version 2.2-15 and tell us whether they now get a message to the effect that "the BAM file appears to be using the wrong encoding for quality scores" (and provide us the actual text of the message), it would help us a lot to narrow this problem down.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I will repeat this with the new version over the weekend and report my results.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •The text of the error that I got is the following:
ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.2-15-g9214b2f):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: SAM/BAM file SAMFileReader{/g/steinmetz/landry/helaseq/project/bam/tmp/new.gsnap.dedup.bam} is malformed: we encountered an extremely high quality score (68) with BAQ correction factor of 4; the BAM file appears to be using the wrong encoding for quality scores
ERROR ------------------------------------------------------------------------------------------
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Eric, Why exactly is Illumina coding the "wrong coding"? What is the correct coding? Should we convert all our Illumina files before running through GATK?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •The SAM specification states that Q0s should be encoded as ASCII 33 and numbers should proceed upwards from there. Illumina encoding, however, starts at ASCII 64. What this means is that if you don't convert then e.g. your Q10s (90% of being correct) will be interpreted as Q41s (99.99% chance of being correct) which will drastically effect variant detection for the worse - you will see many, many false positives.
Just to be clear, "Illumina" BAM files do not necessarily use the wrong encoding; it's the raw Illumina output that needs to be converted - and the conversion needs to be made further upstream when creating the FASTQs. But this is a topic which is outside of our expertise, so you may want to consult the community if you have mis-encoded files.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I've never heard/seen of a program that will convert the scores from Illumina to Sanger when it is already in the SAM format. There is a plethora of programs that will convert them in the fastq format (a google search will point to a lot of conversations regarding that, just be sure that whatever program you choose was written for the proper Illumina output format (the old illumina/solexa format was slightly different and there are some conversion programs out there that were written based on the that).
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •A bit late perhaps, but for my data at least, all issues (regarding array length mismatch) seem to be resolved with the 2.3 version of GATK. Thx for your time.
- Spam
- Abuse
- Troll
1 • Off Topic Disagree Agree 1Like WTF •Hi, I have tried the 2.3 version of GATK BaseRecalibrator with the input of a set of bam files that include reads with quality scores in a mix of incompatible formats. Based on the statement in 2.3 version, I added the flag '-fixMisencodedQuals' in the command as some of the input bam files were in the Solexa/Illumina quality score format. However, the program was still aborted with the following error:
ERROR stack trace
java.lang.ArrayIndexOutOfBoundsException: -4 at org.broadinstitute.sting.utils.baq.BAQ.calcEpsilon(BAQ.java:158) at org.broadinstitute.sting.utils.baq.BAQ.hmm_glocal(BAQ.java:246) at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:542) at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:595) at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:530) at org.broadinstitute.sting.utils.baq.BAQ.baqRead(BAQ.java:663) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateBAQArray(BaseRecalibrator.java:428) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:243) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$MapReduceJob.run(NanoScheduler.java:468) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679)
ERROR ------------------------------------------------------------------------------------------
I guess the program cannot read the quality score in standard format (Sanger) when adding the flag '-fixMisencodedQuals'. Is there any solution dealing with bam files that include reads with quality scores in a mix of incompatible formats?
Thanks a lot in advance!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •That's right, at this time we can't handle bam files that contain a mix of scoring schemes. You'll need to convert them to have all the same format. Unfortunately we don't offer a solution to that.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Error persists for me on the data indepenently of the aligner (LifeScope and NovoalignCS):
with argument "--allow_potentially_misencoded_quality_scores":
Trying "-fixMisencodedQuals":
--> BQSR does not work on these files.
See also this post: gatkforums.broadinstitute.org/discussion/2230/allow_potentially_misencoded_quality_scores-and-wrong-encoding-for-quality-scores-error
What is the highest allowed value for the Quality Score according to GATK?
Wikipedia:
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I did capping of the quality score with:
samtools calmd -Abr aln.bam ref.fa > aln.baq.bamstill:
Now 63 - 33 = 30 how can this be to high?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Bernt,
This shouldn't still be happening with 2.4-7. Could you please upload a snippet of your original bam file?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hey Geraldine,
I uploaded chr12 of the dataset into my folder on the FTP server. I did this mapping with NovoalignCS because i thought errors would be introduced by ABIs LifeScope mapper, but it seems to be a general problem with the combination SOLiD GATK on these particular exomes.
Also the error discribed here: gatkforums.broadinstitute.org/discussion/2230/allow_potentially_misencoded_quality_scores-and-wrong-encoding-for-quality-scores-error still persists. I now suppose it is because the reverse reads these exomes are to short (10-15bp) and GATK does not handle these somehow, thereby calling to less varians (3000 per exome with 120M reads, while samtools mpileup and freebays call as expected). I will later upload a sampledataset of one of these exomes too and start a new discussion on this.
Thank you for your time!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks Bernt, we'll have a look at this and let you know when we have a solution.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi BerntPopp,
It looks like your issue has nothing to do with the topic of this thread (array length mismatch), so I will restrict comments to the other thread you opened.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •