The current GATK version is 3.3-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.

# BQSR fails giving Malformed read

Posts: 6Member

Dear GATK Team,

I am running a pipeline on several high coverage human individuals that have been mapped using bwa and processed using samtools, picard and gatk. The bam-files pass ValidateSam from picard, but when I run the bqsr step some of them fails giving a Malformed read error (using -filterMBQ does not help in this case). I tracked down the error to bamfiles that ends with a paired end read where the mate maps in the beginning of the contig (in my case human mtDNA).

Eg, this will make it crash:

readX 177 MT 16558 37 7S2M2I10M80S = 294 -16176 GACCTGTGATCC...
readY 177 MT 16558 37 7S2M2I10M80S = 238 -16232 GACCTGTGATCC...
readZ 113 MT 16558 37 7S2M2I10M80S = 273 -16197 GACCTGTGATCC...
[END]

where a file ending like this wont crash:

readX 83 MT 16469 60 101M = 16246 -324 TGGGGGTAGCTAAAGTGAAC...
readY 147 MT 16469 60 101M = 16267 -303 TGGGGGTAGCTAAAGTGA...
readZ 147 MT 16469 60 101M = 16193 -377 TGGGGGTAGCTAAAGTGAAC...
[END]

I am running GATK v2.3-9-ge5ebf34, but the same error occurs using GATK v-2.2-3 (my previous version). I can genotype the files using UnifiedGenotyper without any problem as well.

This is the error:

##### ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Array length mismatch detected. Malformed read?
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateFractionalErrorArray(BaseRecalibrator.java:380)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:246)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$MapReduceJob.run(NanoScheduler.java:468) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor\$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Cheers,

Simon

Tagged:

## Comments

• Posts: 6Member

When I look at it, it could just as well be due to the soft-clipping of the reads.

Cheers,

Simon

• Posts: 7,364Administrator, GATK Developer admin
edited January 2013

Hi Simon,

The tool shouldn't be freaking out over these. Could you please upload a bug report so we can take a closer look?

bug report instructions

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

• Posts: 188Member ✭✭✭

Hi, just out of curiosity, in your analysis did you happen to run local realignment with GATK. I'm asking b/c I am wondering if you need to something to fix the mate pair (the sam flags) in your bam file. I think the local realignment process in GATK does that, otherwise I think you would need to run FixMatePairs from the picard suite. Just a guess.

• Posts: 6Member

Hi Kurt,
Yes, I did local realignment using GATK, and did it for all files - only some were giving me these problems.

Geraldine: I will upload a bug report.

Cheers

• Posts: 6Member

Dear Geraldine,

I was not able to upload my tar-ball to your ftp, it wouldnt let me write there - even using the right upload credentials

Instead I shared the tar-file in my dropbox, I hope it works for you. It contains "MT:16001-16569", for a sample that will give the error and a version where I removed the offending reads (the last ones with the soft-clipping in the cigar) which runs without problems.

https://www.dropbox.com/s/w9erfkifo52q4k4/gatk_bqsr_simon_bug.tar.gz

Best,

Simon

• Posts: 7,364Administrator, GATK Developer admin

Thanks Simon, we can work with that. We seem to be having some issues with the FTP server today.

I'll let you know what we come up with. Thanks for reporting the bug!

Geraldine Van der Auwera, PhD

• Posts: 6Member

Great - thank you very much!

• Broad InstitutePosts: 684Member, Administrator, GATK Developer, Broadie, Moderator, DSDE Member, GP Member admin

Hi Simon,

It looks like you have removed your files so that we cannot access them.

Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

• Posts: 6Member

Sorry thought you had downloaded it already. It is back.

Cheers,

Simon

• Broad InstitutePosts: 684Member, Administrator, GATK Developer, Broadie, Moderator, DSDE Member, GP Member admin

Okay, thanks for the excellent test files. I have added a fix for this problem that will be available for the next release (2.4) - which will come out in the next week or two.

Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

• Posts: 6Member

Ok thank you very much for fixing it. As a workaround I excluded the last 1kb of the mtDNA in the BQSR as I have >500 mill reads for all samples and it probably wont make that much of a difference.

Sign In or Register to comment.