Problems with tophat bam file, but bowtie2 and bwa run fine in GATK

celiaschunterceliaschunter Member
edited August 2015 in Ask the GATK team

Hi. I am trying to use GATK with RNAseq data and I have successfully used it before, whereas it was always with a transciptome reference. It works fine with bam flies produced by bowtie2 as well as bwa, however now that we have a genome assembly I was using Tophat and GATK for some reason just doesnt like the bam file.

I reran a small test with bwa and just bowtie2 and tophat and the bwa and bowtie2 bamfiles are accepted (after sorting and dedubbing and adding a RG), but the tophat one after the step: 'Done initializing BAM readers: total time 0.86' just prints the IDs of the reference genome and then says ##### ERROR ------

Here are the same paired-end reads mapped with the different mappers - all three are treated the same way after they are mapped with the same commands (Picard sort, markduplicates , add RG).
TOPHAT:
MG00HS18:605:HKW37ADXX:1:1101:4371:1990 99 orpheusFish_scaffold1168 185548 50 71M = 185685 235 AAGACACATGAAGCCCTTTGACCTTTCAGGCTGGGAGCGCTTGATGTGCACCGCAGGTCGTCCTGGGGTGG [email protected]?B8AAADCC?B,38 MD:Z:0G4G65 PG:Z:MarkDuplicates RG:Z:31-1_C3 XG:i:0 NH:i:1 NM:i:2 XM:i:2 XN:i:0 XO:i:0 AS:i:-10 YT:Z:UU
MG00HS18:605:HKW37ADXX:1:1101:4371:1990 147 orpheusFish_scaffold1168 185685 50 98M = 185548 -235 TTGATAAGAACAACGCAGGAAATGAACCTCACCTGAAAATAACGTGGACTAAATTAGTGATTATTTTATTGCTTTGCAAACACATCTTCCAGCTTTAT [email protected]>FEFEC:[email protected]@[email protected]DD MD:Z:98 PG:Z:MarkDuplicates RG:Z:31-1_C3 XG:i:0 NH:i:1 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YT:Z:UU

BOWTIE2:
MG00HS18:605:HKW37ADXX:1:1101:4371:1990 99 orpheusFish_scaffold1168 185548 42 71M = 185685 235 AAGACACATGAAGCCCTTTGACCTTTCAGGCTGGGAGCGCTTGATGTGCACCGCAGGTCGTCCTGGGGTGG [email protected]?B8AAADCC?B,38 MD:Z:0G4G65 PG:Z:MarkDuplicates RG:Z:31-1_C3 XG:i:0 NM:i:2 XM:i:2 XN:i:0 XO:i:0 AS:i:-10 YS:i:0 YT:Z:CP
MG00HS18:605:HKW37ADXX:1:1101:4371:1990 147 orpheusFish_scaffold1168 185685 42 98M = 185548 -235 TTGATAAGAACAACGCAGGAAATGAACCTCACCTGAAAATAACGTGGACTAAATTAGTGATTATTTTATTGCTTTGCAAACACATCTTCCAGCTTTAT [email protected]>FEFEC:[email protected]@[email protected]DD MD:Z:98 PG:Z:MarkDuplicates RG:Z:31-1_C3 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:-10 YT:Z:CP

BWA:
MG00HS18:605:HKW37ADXX:1:1101:4371:1990 99 orpheusFish_scaffold1168 185548 60 71M = 185685 235 AAGACACATGAAGCCCTTTGACCTTTCAGGCTGGGAGCGCTTGATGTGCACCGCAGGTCGTCCTGGGGTGG [email protected]?B8AAADCC?B,38 X0:i:1 X1:i:0 MD:Z:0G4G65 PG:Z:MarkDuplicates RG:Z:31-1_C3 XG:i:0 AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 XT:A:U
MG00HS18:605:HKW37ADXX:1:1101:4371:1990 147 orpheusFish_scaffold1168 185685 60 98M = 185548 -235 TTGATAAGAACAACGCAGGAAATGAACCTCACCTGAAAATAACGTGGACTAAATTAGTGATTATTTTATTGCTTTGCAAACACATCTTCCAGCTTTAT [email protected]>FEFEC:[email protected]@[email protected]DD X0:i:1 X1:i:0 MD:Z:98 PG:Z:MarkDuplicates RG:Z:31-1_C3 XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U

This is a line of what is printed no matter what I use in GATK with the tophat bam file, the other two bamfiles work just fine with the same reference.
orpheusFish_scaffold30395, orpheusFish_scaffold30396, orpheusFish_scaffold30397, orpheusFish_scaffold30398, orpheusFish_scaffold30399, orpheusFish_scaffold30400, orpheusFish_scaffold30401, orpheusFish_scaffold30402, orpheusFish_scaffold30403, orpheusFish_scaffold30404, orpheusFish_scaffold30405, orpheusFish_scaffold30406, orpheusFish_scaffold30407, orpheusFish_scaffold30408, orpheusFish_scaffold30409, orpheusFish_scaffold30410, orpheusFish_scaffold30411, orpheusFish_scaffold30412, orpheusFish_scaffold30413, orpheusFish_scaffold30414]

ERROR ------------------------------------------------------------------------------------------

With bowtie2 and bwa it all runs fine - what's wrong with the Tophat bam file?
I am hoping not to have to rerun everything as I have a large amount of samples and a large amount of sequences.

Thank you!!

BTW, they all pass Picard's ValidateSam, so I am not sure what it is that GATK doesnt like about the Tophat bamfile.

Post edited by celiaschunter on

Answers

Sign In or Register to comment.