Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

What does the output of DepthOfCoverage means?

I have tried looking for the good discussion on how to calculate the average coverage of exome sequencing after alignment. I found that depthofcoverage is a good tool to get the output, however, I am unable to understand what all the output of DepthOfCoverage means.

My Aim is to calculate the average x coverage or statistics summary of a depth of coverage of 7 samples of exome sequencing after alignment.

So for that I followed the steps:

  1. create an input bam file with list the bam files with path directing to it. file called input_bam.list
    eg
    /home/test/Desktop/bam1.bam
    /home/test/Desktop/bam2.bam
    /home/test/Desktop/bam3.bam

  2. we have bed files with region and chr
    with headers
    chr start stop name

  3. I created refgene files as well using
    http://genome.ucsc.edu/cgi-bin/hgTables?command=start plus for region using bed file

and sorted the file using following command
sort -nk3 -nk5 hgTables.txt > genes_refgene_sorted.txt

  1. after executing following command:

    java -jar ./../GATK/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T DepthOfCoverage -I input_bam.list -o file_base_name_withbedfile --outputFormat table -R humangenome/ucsc/ucsc.hg19.fasta -L Regions.bed -geneList genes_refgene_sorted.txt -dt NONE

**error **

MESSAGE: Input file must have contiguous chromosomes. Saw feature chr22:19510547-19512860 followed later by chr18:19993564-19997878 and then chr22:22113947-22221970, for input source: Desktop/genes_refgene_sorted.txt

please suggest if I should sort the file with a different command.

If I use the command without refgene

java -jar ./../GATK/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T DepthOfCoverage -I input_bam.list -o file_base_name_withbedfile --outputFormat table -R humangenome/ucsc/ucsc.hg19.fasta -L Regions.bed

I get the following output files

file_base_name_withbedfile.sample_cumulative_coverage_counts
file_base_name_withbedfile.sample_cumulative_coverage_proportions
file_base_name_withbedfile.sample_interval_statistics
file_base_name_withbedfile.sample_interval_summary
file_base_name_withbedfile.sample_statistics
file_base_name_withbedfile.sample_summary

I don't understand which output file is the best to answer my question fo depth.

In the last output file -- file_base_name_withbedfile.sample_summary
the output looks like
sample_id total mean granular_third_quartile granular_median granular_first_quartile %_bases_above_15
test 1162396121 1775.69 500 500 343 91.7
Total 1162396121 1775.69 N/A N/A N/A

I don't understand what to make of it, and why there are NA

and in file file_base_name_withbedfile.sample_interval_summary
the output looks like the following, I don't understand what to make out of this apart from total coverage over 3 bam files for that location. That means there are total 6638920 reads (or nt) in 3 bam files (for example) in that particular location. what does test granular Q value mean? which column should I use to average x coverage to state that after alignment the exomes have x coverage.

Target total_coverage average_coverage test_total_cvg test_mean_cvg test_granular_Q1 test_granular_median test_granular_Q3 test_%_above_15
chr1:1716462-1719040 6638920 2574.22 6638920 2574.22 >500 >500 >500 100.0
chr1:1719110-1720851 4192130 2406.50 4192130 2406.50 >500 >500 >500 91.8
chr1:1721604-1722165 1011309 1799.48 1011309 1799.48 >500 >500 >500 99.3
chr1:1724574-1725729 3912540 3384.55 3912540 3384.55 >500 >500 >500 99.9

If this is a redundant question, could anyone direct me to the correct discussion to understand the output.

Thanks in advance.

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @nikkinath
    Hi,

    We recommend using DiagnoseTargets for exome coverage analysis. You might find it more useful than DepthOfCoverage for your purpose.

    As for the error you are getting, it is because the input gene list has to be sorted by chromosome number. So, you have an interval from chromosome 18 in the middle of two intervals from chromosome 22. You need to make sure the chromosomes are in order.

    -Sheila

  • nikkinathnikkinath GermanyMember

    Thanks @Sheila for your post. I will try to sort gene list file according to the chromosome. I would still like to understand what is the output means of DepthOfCoverage? If there is a relevant post or discussion can you please direct to that discussion.

    Thanks
    Neetika

  • SheilaSheila Broad InstituteMember, Broadie admin

    @nikkinath
    Hi Neetika,

    We don't have much documentation on DepthOfCoverage, but have a look at the tool documentation and this article. If you have specific questions about the outputs, I can answer them here.

    -Sheila

  • sespiritusespiritu Toronto, ON, CanadaMember

    @Sheila
    Hi Sheila,

    I have a few things I want to clarify:
    1. _summary files: stats "aggregated over all bases" -- this includes N bases?
    2. Are reads marked as duplicated included in the counts?

    Thanks!

  • SheilaSheila Broad InstituteMember, Broadie admin

    @sespiritu
    Hi,

    I don't think N bases are counted. However, if you would like to include sites where the reference is N, you can use --includeRefNSites.

    Duplicate reads are not included. If you would like to include duplicate reads, you can disable the read filter. Have a look at the engine arguments for how to do so.

    -Sheila

Sign In or Register to comment.