It looks like you're new here. If you want to get involved, click one of these buttons!
vcf4
Posts: 6Member ✭
Dear GATK Team,
I am running a pipeline on several high coverage human individuals that have been mapped using bwa and processed using samtools, picard and gatk. The bam-files pass ValidateSam from picard, but when I run the bqsr step some of them fails giving a Malformed read error (using -filterMBQ does not help in this case). I tracked down the error to bamfiles that ends with a paired end read where the mate maps in the beginning of the contig (in my case human mtDNA).
Eg, this will make it crash:
readX 177 MT 16558 37 7S2M2I10M80S = 294 -16176 GACCTGTGATCC...
readY 177 MT 16558 37 7S2M2I10M80S = 238 -16232 GACCTGTGATCC...
readZ 113 MT 16558 37 7S2M2I10M80S = 273 -16197 GACCTGTGATCC...
[END]
where a file ending like this wont crash:
readX 83 MT 16469 60 101M = 16246 -324 TGGGGGTAGCTAAAGTGAAC...
readY 147 MT 16469 60 101M = 16267 -303 TGGGGGTAGCTAAAGTGA...
readZ 147 MT 16469 60 101M = 16193 -377 TGGGGGTAGCTAAAGTGAAC...
[END]
I am running GATK v2.3-9-ge5ebf34, but the same error occurs using GATK v-2.2-3 (my previous version). I can genotype the files using UnifiedGenotyper without any problem as well.
This is the error:
org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Array length mismatch detected. Malformed read? at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateFractionalErrorArray(BaseRecalibrator.java:380) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:246) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$MapReduceJob.run(NanoScheduler.java:468) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
Cheers,
Simon
Comments
When I look at it, it could just as well be due to the soft-clipping of the reads.
Cheers,
Simon
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Simon,
The tool shouldn't be freaking out over these. Could you please upload a bug report so we can take a closer look?
bug report instructions
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi, just out of curiosity, in your analysis did you happen to run local realignment with GATK. I'm asking b/c I am wondering if you need to something to fix the mate pair (the sam flags) in your bam file. I think the local realignment process in GATK does that, otherwise I think you would need to run FixMatePairs from the picard suite. Just a guess.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Kurt, Yes, I did local realignment using GATK, and did it for all files - only some were giving me these problems.
Geraldine: I will upload a bug report.
Cheers
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Dear Geraldine,
I was not able to upload my tar-ball to your ftp, it wouldnt let me write there - even using the right upload credentials :)
Instead I shared the tar-file in my dropbox, I hope it works for you. It contains "MT:16001-16569", for a sample that will give the error and a version where I removed the offending reads (the last ones with the soft-clipping in the cigar) which runs without problems.
https://www.dropbox.com/s/w9erfkifo52q4k4/gatk_bqsr_simon_bug.tar.gz
Best,
Simon
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks Simon, we can work with that. We seem to be having some issues with the FTP server today.
I'll let you know what we come up with. Thanks for reporting the bug!
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Great - thank you very much!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Simon,
It looks like you have removed your files so that we cannot access them.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Sorry thought you had downloaded it already. It is back.
Cheers,
Simon
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Okay, thanks for the excellent test files. I have added a fix for this problem that will be available for the next release (2.4) - which will come out in the next week or two.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Ok thank you very much for fixing it. As a workaround I excluded the last 1kb of the mtDNA in the BQSR as I have >500 mill reads for all samples and it probably wont make that much of a difference.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •