Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

DepthOfCoverage calculateCoverageOverGenes missing genes

rshillrshill Member
edited October 2014 in Ask the GATK team

I've been learning how to use the DepthOfCoverage to calculate coverage across genes. I noticed that some genes were covered by the bam, in the interval_list file, and in the geneList file, but were not reported in the gene_summary table. Most of these were non-coding RNAs, and in reviewing the geneList file, I noticed that for the non-coding RNAs, the Coding region start position is 1 base higher than the Coding region end position (which also put the Coding region start position higher than the Transcription end position). I adjusted the file and made the Coding region identical to the Transcription regions for the non-coding RNAs, and this resolved the issue for most of the genes. It appears that remaining genes that are still not reported in the gene_summary table, all overlap with exons or UTRs from other genes. My questions are:

  1. Is it the case that if two exons overlap, only one will be reported on, or is something else going on?
  2. What regions of the gene is the tool reporting on, the whole transcribed region, or just the coding region?
  3. Am I safe in changing the coding regions of non-coding RNAs to equal the transcribed region for the coverage analysis?

Thank you.

Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    By default, overlaps are merged into a single interval by the GATK engine. Anyway, I would recommend using the DiagnoseTargets tool rather than DepthOfCoverage because it will be more appropriate for the analysis you are trying to run.

Sign In or Register to comment.