The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.


blueskypyblueskypy Posts: 266 ✭✭
edited June 2013 in Ask the GATK team

In the output grp file,

#:GATKTable:BaseCoverageDistribution:A simplified GATK table report
Coverage  Count    Filtered
       0  2859049   2932784
       1   856997    837791
       2   288587    276253
       3    95618     91703

what's the meaning of the three columns?



  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,743 admin

    I'll have the tool author (@Carneiro) confirm, but as I recall:

    1. The first column, Coverage, is the depth of coverage corresponding to the bin (ie the first line is the set of loci that are covered by 0 reads, the second is the set covered by 1 read, etc);

    2. The second, Count, is the number of loci in the bin (without any filtering);

    3. The third, Filtered, is the number of loci in the bin after applying quality filtering to exclude bad reads.

    Geraldine Van der Auwera, PhD

  • blueskypyblueskypy Posts: 266 ✭✭
    edited June 2013

    hi, Geraldine,
    Thanks for the quick response! Two questions:

    1. if the coverage is 2, can I interpret it as the so-called 2x coverage?

    2. By your explanation, the Count should > Filtered; why Count < Filtered at coverage 0?

  • blueskypyblueskypy Posts: 266 ✭✭

    Well, I found Count < Filtered at some other coverages as well. For example:
    23 7910 8119

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,743 admin
    1. The 2x-style expression is a convention to express the overall coverage of a dataset, so I'm not sure it's appropriate to use it in this context. If you do use it, make sure to communicate clearly what you mean, to avoid any unfortunate misunderstandings.

    2. Hmm, I may have misremembered. Based on the tech doc it looks like it might be the count of filtered reads (not including good reads). I'll ask @Carneiro to confirm, but I reckon that makes sense. But if so, there's an awful lot of low-quality reads in your data, at least based on the low-value bins you posted...

    Geraldine Van der Auwera, PhD

  • blueskypyblueskypy Posts: 266 ✭✭

    Thanks so much, Geraldine! Please confirm!

  • avidLearneravidLearner Posts: 9

    Hi Geraldine. Did you get a chance to confirm @blueskypy 's second question - whether the filtered column includes the good reads? I just ran this tool and for most rows, the values for the 3rd column are higher.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,743 admin

    Hi @avidLearner,

    It seems I forgot to report back, so here it is.

    Here "filtered" doesn't refer to the count of filtered reads. This output gives two separate coverage distributions (to be plotted as a histogram), one for data that passes internal filters for minimum_mapping_quality and minimum_base_quality, and another one for the data that gets filtered out on those criteria. The first column is the bin, which represents the amount of coverage, and the second and third columns are the numbers of positions in the analysis where coverage was equal to that amount, one for the "passing filters" distribution, and one for the "failing filters" distribution. There is no column that gives total count. I'll see if we can change the column names to avoid confusion.

    Geraldine Van der Auwera, PhD

  • avidLearneravidLearner Posts: 9

    Thanks for the clarification @Geraldine_VdAuwera.

Sign In or Register to comment.