Depth of Coverage - reported only first exon

MarcelaDMarcelaD Member
edited May 2013 in Ask the GATK team

Hi there,

this is my interval_list

chr1 762095 762275 LINC00115|NR_024321
chr1 762280 762414 LINC00115|NR_024321
chr1 762420 762565 LINC00115|NR_024321
chr1 777259 777349 LOC643837
chr1 777391 777481 LOC643837
chr1 777482 777642 LOC643837

chr1 783061 783151 LOC643837
chr1 792270 792446 LOC643837
chr1 861266 861496 NM_152486|SAMD11
chr1 865582 865787 NM_152486|SAMD11
chr1 866331 866507 NM_152486|SAMD11

and this is the output from the sample_interval_summary

chr1:762095-762275 ...
chr1:762280-762414 ...
chr1:762420-762565 ...
chr1:777259-777349 ...
chr1:783061-783151 ...
chr1:792270-792446 ...
chr1:861266-861496 ...
chr1:865582-865787 ...
chr1:866331-866507 ...

why am I missing two exons?

this is my cmd:

java -Xmx32g -jar /local/apps/gatk/2.5-2-gf57256b/GenomeAnalysisTK.jar
-I sample.bam -R .../genome.fa -T DepthOfCoverage -o jtn
-geneList hg19.tsv -L exons.list
--omitDepthOutputAtEachBase --includeDeletions
--interval_merging OVERLAPPING_ONLY -l INFO

Thanks for your input!

/M

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Marcela,

    Please see the discussion here: http://gatkforums.broadinstitute.org/discussion/1831/depth-of-coverage-only-first-gene-summary-output

    I believe this is the same problem and can be solved the same way.

  • MarcelaDMarcelaD Member

    Thanks for your quick answer!

    The issue is that I do use a geneList file (hg19.tsv)

    -geneList

    666 LINC00115 chr1 + 762095 762565 762095 762565 3 762095,762280,762420, 762275,762414,762565, 0 |chr1:762095-762565|LINC00115|NR_024321| cmpl cmpl 0,0,0, 666 LOC643837 chr1 + 777259 777642 777259 777642 3 777259,777391,777482, 777349,777481,777642, 0 |chr1:777259-777642|LOC643837|NR_015368|NR_047519|NR_047520|NR_047521|NR_047522|NR_047523|NR_047524|NR_047525|NR_047526| cmpl cmpl 0,0,0, 666 LOC643837 chr1 + 783061 792446 783061 792446 2 783061,792270, 783151,792446, 0 |chr1:783061-792446|LOC643837|NR_015368|NR_047519|NR_047520|NR_047521|NR_047522|NR_047523|NR_047524|NR_047525| cmpl cmpl 0,0, 666 NM_152486 chr1 + 861266 879593 861266 879593 13 861266,865582,866331,871064,874367,874612,876485,877519,877806,878173,878532,878657,879125, 861496,865787,866507,871262,874575,874816,876719,877733,878088,878465,878652,878777,879593, 0 |chr1:861266-879593|NM_152486|SAMD11| cmpl cmpl 0,0,0,0,0,0,0,0,0,0,0,0,0,

    So I do have a list of genes (hg19.tsv )and a list of exons or interval list (exons.list)

    And it only happens now and then, for instance for LINC00115 I do have the coverage at each exon

    Thanks!

  • CarneiroCarneiro Charlestown, MAMember admin

    did this solve your problem? I'm afraid I didn't understand your answer.

  • MarcelaDMarcelaD Member

    Hi,

    sorry if I didn't explain my self, here I give it a try,

    This is my interval_list (-L) or exons:

    chr1 762095 762275 LINC00115|NR_024321 chr1 762280 762414 LINC00115|NR_024321 chr1 762420 762565 LINC00115|NR_024321 chr1 777259 777349 LOC643837 chr1 777391 777481 LOC643837 chr1 777482 777642 LOC643837 chr1 783061 783151 LOC643837 chr1 792270 792446 LOC643837 chr1 861266 861496 NM_152486|SAMD11 chr1 865582 865787 NM_152486|SAMD11 chr1 866331 866507 NM_152486|SAMD11

    Together with my -geneList (see above) I would expect 5 lines in the sample_interval_summary for LOC643837, but instead, I get 3, one for the first transcript (missing the last 2) and 2 for the second (correct output):

    chr1:762095-762275 ... chr1:762280-762414 ... chr1:762420-762565 ... **chr1:777259-777349** ... **chr1:783061-783151** ... **chr1:792270-792446** ... chr1:861266-861496 ... chr1:865582-865787 ... chr1:866331-866507 ..

    Why is that?

    Thanks again
    /M

Sign In or Register to comment.