Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Problems with tophat bam file, but bowtie2 and bwa run fine in GATK

celiaschunterceliaschunter Member
edited August 2015 in Ask the GATK team

Hi. I am trying to use GATK with RNAseq data and I have successfully used it before, whereas it was always with a transciptome reference. It works fine with bam flies produced by bowtie2 as well as bwa, however now that we have a genome assembly I was using Tophat and GATK for some reason just doesnt like the bam file.

I reran a small test with bwa and just bowtie2 and tophat and the bwa and bowtie2 bamfiles are accepted (after sorting and dedubbing and adding a RG), but the tophat one after the step: 'Done initializing BAM readers: total time 0.86' just prints the IDs of the reference genome and then says ##### ERROR ------

Here are the same paired-end reads mapped with the different mappers - all three are treated the same way after they are mapped with the same commands (Picard sort, markduplicates , add RG).
TOPHAT:
MG00HS18:605:HKW37ADXX:1:1101:4371:1990 99 orpheusFish_scaffold1168 185548 50 71M = 185685 235 AAGACACATGAAGCCCTTTGACCTTTCAGGCTGGGAGCGCTTGATGTGCACCGCAGGTCGTCCTGGGGTGG [email protected]?B8AAADCC?B,38 MD:Z:0G4G65 PG:Z:MarkDuplicates RG:Z:31-1_C3 XG:i:0 NH:i:1 NM:i:2 XM:i:2 XN:i:0 XO:i:0 AS:i:-10 YT:Z:UU
MG00HS18:605:HKW37ADXX:1:1101:4371:1990 147 orpheusFish_scaffold1168 185685 50 98M = 185548 -235 TTGATAAGAACAACGCAGGAAATGAACCTCACCTGAAAATAACGTGGACTAAATTAGTGATTATTTTATTGCTTTGCAAACACATCTTCCAGCTTTAT [email protected]>FEFEC:[email protected]@[email protected]DD MD:Z:98 PG:Z:MarkDuplicates RG:Z:31-1_C3 XG:i:0 NH:i:1 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YT:Z:UU

BOWTIE2:
MG00HS18:605:HKW37ADXX:1:1101:4371:1990 99 orpheusFish_scaffold1168 185548 42 71M = 185685 235 AAGACACATGAAGCCCTTTGACCTTTCAGGCTGGGAGCGCTTGATGTGCACCGCAGGTCGTCCTGGGGTGG [email protected]?B8AAADCC?B,38 MD:Z:0G4G65 PG:Z:MarkDuplicates RG:Z:31-1_C3 XG:i:0 NM:i:2 XM:i:2 XN:i:0 XO:i:0 AS:i:-10 YS:i:0 YT:Z:CP
MG00HS18:605:HKW37ADXX:1:1101:4371:1990 147 orpheusFish_scaffold1168 185685 42 98M = 185548 -235 TTGATAAGAACAACGCAGGAAATGAACCTCACCTGAAAATAACGTGGACTAAATTAGTGATTATTTTATTGCTTTGCAAACACATCTTCCAGCTTTAT [email protected]>FEFEC:[email protected]@[email protected]DD MD:Z:98 PG:Z:MarkDuplicates RG:Z:31-1_C3 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:-10 YT:Z:CP

BWA:
MG00HS18:605:HKW37ADXX:1:1101:4371:1990 99 orpheusFish_scaffold1168 185548 60 71M = 185685 235 AAGACACATGAAGCCCTTTGACCTTTCAGGCTGGGAGCGCTTGATGTGCACCGCAGGTCGTCCTGGGGTGG [email protected]?B8AAADCC?B,38 X0:i:1 X1:i:0 MD:Z:0G4G65 PG:Z:MarkDuplicates RG:Z:31-1_C3 XG:i:0 AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 XT:A:U
MG00HS18:605:HKW37ADXX:1:1101:4371:1990 147 orpheusFish_scaffold1168 185685 60 98M = 185548 -235 TTGATAAGAACAACGCAGGAAATGAACCTCACCTGAAAATAACGTGGACTAAATTAGTGATTATTTTATTGCTTTGCAAACACATCTTCCAGCTTTAT [email protected]>FEFEC:[email protected]@[email protected]DD X0:i:1 X1:i:0 MD:Z:98 PG:Z:MarkDuplicates RG:Z:31-1_C3 XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U

This is a line of what is printed no matter what I use in GATK with the tophat bam file, the other two bamfiles work just fine with the same reference.
orpheusFish_scaffold30395, orpheusFish_scaffold30396, orpheusFish_scaffold30397, orpheusFish_scaffold30398, orpheusFish_scaffold30399, orpheusFish_scaffold30400, orpheusFish_scaffold30401, orpheusFish_scaffold30402, orpheusFish_scaffold30403, orpheusFish_scaffold30404, orpheusFish_scaffold30405, orpheusFish_scaffold30406, orpheusFish_scaffold30407, orpheusFish_scaffold30408, orpheusFish_scaffold30409, orpheusFish_scaffold30410, orpheusFish_scaffold30411, orpheusFish_scaffold30412, orpheusFish_scaffold30413, orpheusFish_scaffold30414]

ERROR ------------------------------------------------------------------------------------------

With bowtie2 and bwa it all runs fine - what's wrong with the Tophat bam file?
I am hoping not to have to rerun everything as I have a large amount of samples and a large amount of sequences.

Thank you!!

BTW, they all pass Picard's ValidateSam, so I am not sure what it is that GATK doesnt like about the Tophat bamfile.

Post edited by celiaschunter on

Answers

Sign In or Register to comment.