The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.10.2 is now available at
GATK version 4.beta.2 (i.e. the second beta release) is out. See the GATK4 BETA page for download and details.


moranmoran The BroadMember


I'm trying to calculate depth of coverage for entire contigs for multiple samples. I have ran the following command:
java -Xmx2g -jar GenomeAnalysisTK.jar \
-R Ecoli/Ecoli.allSubTypes.fasta \
-T DepthOfCoverage \
-o ../Ecoli/all.tmpOut \
-I Ecoli/bamlist.list \
-geneList Ecoli/Ecoli.refSeq

Where I've tried to generate a refSeq file with one line per contig.

I was expecting to have the output be in the form of a matrix with the various contigs as rows and the samples as columns.

Instead I got this looking file:
Locus Total_Depth Average_Depth_sample Depth_for_sample1
gi|312944605|gb|CP001855.1|:1 0 0.00 0
gi|312944605|gb|CP001855.1|:2 0 0.00 0
gi|312944605|gb|CP001855.1|:3 0 0.00 0
gi|312944605|gb|CP001855.1|:4 0 0.00 0
gi|312944605|gb|CP001855.1|:5 0 0.00 0
gi|312944605|gb|CP001855.1|:6 0 0.00 0
gi|312944605|gb|CP001855.1|:7 0 0.00 0
gi|312944605|gb|CP001855.1|:8 0 0.00 0
gi|312944605|gb|CP001855.1|:9 0 0.00 0

were each base is a row. right?

What am I doing wrong?

Thanks! Moran.


Best Answer


  • moranmoran The BroadMember

    Thanks for the very fast answer! Do you also have an example for a bam list file? from the outputs I think it's treating all my bams as a single file...

    thanks again!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    The GATK always processes the content of all bams in a bam list together as if the data came from a single file. I do believe DoC reports results partitioned by sample by default, but they will all be in a single file per output type. They should be identified by sample in the summary table. If that's not the case, can you please post a few lines from the table so I can see what you're getting?

  • moranmoran The BroadMember

    I've fixed the sample in the header tag, and it works great now.

    But, now I have a question about the content.. In the mean coverage statistics, does it normalize this value by the total number of mapped reads for each sample?

    Also, can I define additional statistics to be calculated per interval per sample? For example, the percent of the interval covered.

    thanks for all the prompt replies!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Ah, good to hear.

    As I recall DoC doesn't do any normalization; if it did you'd get different values depending on whether you ran samples alone or together, which would be bad in my opinion.

    All the statistics that can currently be calculated are listed in the technical doc for the tool. If you are interested in statistics that are not available, you can always modify the tool yourself; we are always happy to look at a patch to include user contributions in the codebase.

    That said you may want to check out DiagnoseTargets first, which provides a lot of statistics about intervals that DoC doesn't. Maybe it will have what you want.

  • moranmoran The BroadMember

    Great. will check it now.

    One more question:
    Is there a way to define an interval that contains multiple contigs? I'm working with bacteria, and I have many contigs per genome, and I would like to summarize this per genome. (I've aligned my reads to a reference sequence that contains multiple genomes concatenated).


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Not directly, no. You'll need to calculate that from the per-contig summary table. I would recommend writing a script to process the table.

Sign In or Register to comment.