The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.


moranmoran The BroadMember Posts: 17


I'm trying to calculate depth of coverage for entire contigs for multiple samples. I have ran the following command:
java -Xmx2g -jar GenomeAnalysisTK.jar \
-R Ecoli/Ecoli.allSubTypes.fasta \
-T DepthOfCoverage \
-o ../Ecoli/all.tmpOut \
-I Ecoli/bamlist.list \
-geneList Ecoli/Ecoli.refSeq

Where I've tried to generate a refSeq file with one line per contig.

I was expecting to have the output be in the form of a matrix with the various contigs as rows and the samples as columns.

Instead I got this looking file:
Locus Total_Depth Average_Depth_sample Depth_for_sample1
gi|312944605|gb|CP001855.1|:1 0 0.00 0
gi|312944605|gb|CP001855.1|:2 0 0.00 0
gi|312944605|gb|CP001855.1|:3 0 0.00 0
gi|312944605|gb|CP001855.1|:4 0 0.00 0
gi|312944605|gb|CP001855.1|:5 0 0.00 0
gi|312944605|gb|CP001855.1|:6 0 0.00 0
gi|312944605|gb|CP001855.1|:7 0 0.00 0
gi|312944605|gb|CP001855.1|:8 0 0.00 0
gi|312944605|gb|CP001855.1|:9 0 0.00 0

were each base is a row. right?

What am I doing wrong?

Thanks! Moran.


Best Answer


  • moranmoran The BroadMember Posts: 17

    Thanks for the very fast answer! Do you also have an example for a bam list file? from the outputs I think it's treating all my bams as a single file...

    thanks again!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,421 admin

    The GATK always processes the content of all bams in a bam list together as if the data came from a single file. I do believe DoC reports results partitioned by sample by default, but they will all be in a single file per output type. They should be identified by sample in the summary table. If that's not the case, can you please post a few lines from the table so I can see what you're getting?

    Geraldine Van der Auwera, PhD

  • moranmoran The BroadMember Posts: 17

    I've fixed the sample in the header tag, and it works great now.

    But, now I have a question about the content.. In the mean coverage statistics, does it normalize this value by the total number of mapped reads for each sample?

    Also, can I define additional statistics to be calculated per interval per sample? For example, the percent of the interval covered.

    thanks for all the prompt replies!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,421 admin

    Ah, good to hear.

    As I recall DoC doesn't do any normalization; if it did you'd get different values depending on whether you ran samples alone or together, which would be bad in my opinion.

    All the statistics that can currently be calculated are listed in the technical doc for the tool. If you are interested in statistics that are not available, you can always modify the tool yourself; we are always happy to look at a patch to include user contributions in the codebase.

    That said you may want to check out DiagnoseTargets first, which provides a lot of statistics about intervals that DoC doesn't. Maybe it will have what you want.

    Geraldine Van der Auwera, PhD

  • moranmoran The BroadMember Posts: 17

    Great. will check it now.

    One more question:
    Is there a way to define an interval that contains multiple contigs? I'm working with bacteria, and I have many contigs per genome, and I would like to summarize this per genome. (I've aligned my reads to a reference sequence that contains multiple genomes concatenated).


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,421 admin

    Not directly, no. You'll need to calculate that from the per-contig summary table. I would recommend writing a script to process the table.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.