Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Question: RNAseqC: WARNING: Transcript has no coverage

jcm136jcm136 NIAID/NIHMember

Hello,

I am working with a paired-end data set of rhesus macaque reads aligned with STAR against a current rhesus .fasta reference and .gtf annotation. I am interested in doing some quality control measures on my .bam file with RNAseqC to determine numbers of reads that map to non-genic, exonic, or intronic regions.

For now, I'm trying to bypass the command-line execution and opting to perform the run on the web-based GenePattern interface offered by Broad institute. For those familiar, there are a number of inputs required for RNAseqC, all of which I believe that I have provided:

-indexed .bam with read groups created with Picard tools -.gtf file -indexed .fasta -dictionary of .fasta created with Picard tools

Unfortunately there is no accompanying 18S rRNA information for rhesus I am able to provide.

Anyways, the program seems to complete successfully, however for all of my transcripts I get this message in the output:

WARNING: Transcript has no coverage: ACTB_transcript_01

This is peculiar to me, because if I load the .bam, .gtf, and .fasta in IGV, there are clearly an abundant number of reads mapping to the ACTB gene.

I am wondering why RNAseqC is not picking up any reads to exons, or anything really. If anyone has encountered similar issues it would be greatly appreciated. From what I have seen, the chromosomes are annotated the same in both the .gtf and .bam file.

thank you!

For what it's worth, I'm getting the following ERRORs in the output log when running the GATK depth of coverage analysis:

Running GATK Depth of Coverage Analysis ....
Arguments: -T DepthOfCoverage -R ref_input/MacaM_Rhesus_Genome_v7.fasta -I bam_input/JB37_sorted_ReadGroup.bam -o .//JB37_sorted_ReadGroup.bam/lowexpr//perBaseDoC.out -L .//JB37_sorted_ReadGroup.bam/lowexpr/intervals.list -l ERROR
Arguments Array: [-T, DepthOfCoverage, -R, ref_input/MacaM_Rhesus_Genome_v7.fasta, -I, bam_input/JB37_sorted_ReadGroup.bam, -o, .//JB37_sorted_ReadGroup.bam/lowexpr//perBaseDoC.out, -L, .//JB37_sorted_ReadGroup.bam/lowexpr/intervals.list, -l, ERROR]
GATK command result code: 0
Depth of Coverage run time: 1 min
... GATK Depth of Coverage Analysis DONE
Running GATK Depth of Coverage Analysis ....
Arguments: -T DepthOfCoverage -R ref_input/MacaM_Rhesus_Genome_v7.fasta -I bam_input/JB37_sorted_ReadGroup.bam -o .//JB37_sorted_ReadGroup.bam/medexpr//perBaseDoC.out -L .//JB37_sorted_ReadGroup.bam/medexpr/intervals.list -l ERROR
Arguments Array: [-T, DepthOfCoverage, -R, ref_input/MacaM_Rhesus_Genome_v7.fasta, -I, bam_input/JB37_sorted_ReadGroup.bam, -o, .//JB37_sorted_ReadGroup.bam/medexpr//perBaseDoC.out, -L, .//JB37_sorted_ReadGroup.bam/medexpr/intervals.list, -l, ERROR]
GATK command result code: 0
Depth of Coverage run time: 2 min
... GATK Depth of Coverage Analysis DONE
Running GATK Depth of Coverage Analysis ....
Arguments: -T DepthOfCoverage -R ref_input/MacaM_Rhesus_Genome_v7.fasta -I bam_input/JB37_sorted_ReadGroup.bam -o .//JB37_sorted_ReadGroup.bam/highexpr//perBaseDoC.out -L .//JB37_sorted_ReadGroup.bam/highexpr/intervals.list -l ERROR
Arguments Array: [-T, DepthOfCoverage, -R, ref_input/MacaM_Rhesus_Genome_v7.fasta, -I, bam_input/JB37_sorted_ReadGroup.bam, -o, .//JB37_sorted_ReadGroup.bam/highexpr//perBaseDoC.out, -L, .//JB37_sorted_ReadGroup.bam/highexpr/intervals.list, -l, ERROR]
GATK command result code: 0
Depth of Coverage run time: 7 min
... GATK Depth of Coverage Analysis DONE

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @jcm136
    Hi,

    I think you will need to ask the CGA help team for help with RNA-SeQC.

    However, for the GATK tool, can you please post the entire log output you get?

    Thanks,
    Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @jcm136
    Hi again,

    Which version of Java are you using for RNAseqC? I think you may still need to use Java 1.7 for that tool, while GATK uses Java 1.8.

    -Sheila

  • jcm136jcm136 NIAID/NIHMember

    Hi Sheila...sorry for the delayed response. I hope it hasn't hinder the troubleshooting. Thank you so much for your reply.

    I was indeed using Java 1.8. So are you saying that many of the picard tool operations on the .bam (CreateSequenceDictionary, AddOrReplaceReadGroups) all need to be done with Java 1.7?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Recent versions of Picard (2.x) use Java 1.8 as well. RNAseqC is different because it is developed by another group and relies on an older version of the GATK engine.
  • jcm136jcm136 NIAID/NIHMember

    Thank you Geraldine...I have attached the log.out file which I hope might be helpful. According to this there are a number of errors when RNAseqC tries to process the .bam file.

    So using a 1.x Picard version and Java 1.7 may be helpful here?

    Issue · Github
    by Sheila

    Issue Number
    1426
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @jcm136, I don't think what you're seeing is due to the java version. The program seems to be running correctly except for not finding coverage across any of the transcripts. There are two likely reasons why this might happen. It could be that the GTF file of transcripts may not be formatted in a way that the program can parse correctly. Or, more likely, the transcripts are parsed correctly but the contig (chromosome) names do not match between the GTF file and the FASTA file. For example, in humans there are two reference builds that are essentially the same but chromosomes are called "1,2,3,..." in one and "chr1,chr2,chr3,..." in the other. GATK has some safeguards to detect mismatches like that but RNAseqC may not check for this. You should check the files and see whether the naming of the chromosomes is consistent between them. If that's not the problem, you should ask the developers of RNAseqC. We don't work on that program so we can't provide support for it beyond the general advice I just gave you. Good luck!

Sign In or Register to comment.