The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Picard (2.1.0): the CollectHsMetrics discrepancy

FlorianTFlorianT ParisMember Posts: 4

Hi everyone,
I'm having trouble understanding some results using CollectHSMetrics from
Picard's latest version (2.1.0). It seems like the values i'm getting for
MEAN_BAIT_COVERAGE and MEAN_TARGET_COVERAGE are incoherent,
because from the documentation these values indicate:

  • MEAN_BAIT_COVERAGE: The mean coverage of all baits in the experiment.
  • MEAN_TARGET_COVERAGE: The mean coverage of targets that received at least coverage depth = 2 at one base.

In my case, when I use CollectHSMetrics, I provide the same interval_list file
for both BAIT and TARGET intervals. So I should always have a MEAN_TARGET_COVERAGE
value equal to or higher than the MEAN_BAIT_COVERAGE value (equal if coverage
depth is >= 2 everywhere, and higher if some bp get a coverage of 0 or 1).
But of all the samples I've tested so far, all of them have a MEAN_TARGET_COVERAGE
value lower than the MEAN_BAIT_COVERAGE value.

Can someone help me figure this out?

BTW, I'm using this command:

$JAVA -jar $PICARD CollectHsMetrics \
    I=$NAME.sorted.bam \
    O=CouvertureHSmetrics.txt \
    BAIT_INTERVALS=$INTERVAL_LIST \
    TARGET_INTERVALS=$INTERVAL_LIST \
    R=$REF_GEN_PATH/GRCh37.fa \
    PER_TARGET_COVERAGE=CouverturePerTargetHSmetrics.txt

(with $INTERVAL_LIST being the same file for BAIT and TARGET)

Thanks

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,734 admin

    Hi there, it sounds like you're using this tool in a way that it's not designed for. The coverage calculations are done quite differently for baits and for targets because they're trying to model effects in a very specific experimental design (capture using hybrid selection). If you're looking for coverage analysis of targets, and are not interested in evaluating performance of a bait set, then you should either focus on the targets results or use tools like GATK DepthOfCoverage or DisgnoseTargets that are more generic and don't make assumptions about the experimental design.

    Geraldine Van der Auwera, PhD

  • FlorianTFlorianT ParisMember Posts: 4

    Thank you for your answer,
    Well I don't have the intervals for baits, I just have one file for targets. And I've been using Picard a few times before (with the same interval file for both baits and targets). And I never had a mean_bait_coverage higher than the mean_target_coverage, and if I trust the documentation of Picard, this should indeed never happen in this case (where targets and baits are equal).
    And it seems like this issue appeared with the last versions of Picard (2.1.0), so I'm wondering what does the mean_target_coverage stand for in this new version.

    I checked the result using GATK DepthOfCoverage, and it matches the mean_bait_coverage value I'm obtaining with Picard, so that's a relief but not really an answer, I'm still really curious about what is being computed as the mean_target_coverage in Picard.

    Issue · Github
    by Sheila

    Issue Number
    704
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • SheilaSheila Broad InstituteMember, Broadie, Moderator Posts: 4,855 admin

    @FlorianT
    Hi,

    So you are saying the results from 2.1.0 do not match the results from earlier versions? Have you run on the same files you did with earlier versions of Picard? I will ask our Picard expert to have a look.

    Thanks,
    Sheila

  • FlorianTFlorianT ParisMember Posts: 4

    Hi,
    With earlier versions of Picard, I can only use CalculateHsMetrics since CollectHsMetrics was introduced in v2.1.0 if I'm not mistaking. But the metric's names are the same for both functions so I'm guessing they should provide the same information. So I tried CalculateHsMetrics from Picard 2.0.1 and 2.1.0 (on the same dataset) and here are the results:

    Picard 2.0.1 (CalculateHsMetrics)
    MEAN_BAIT_COVERAGE=141.54202
    MEAN_TARGET_COVERAGE=141.54202

    Picard 2.1.0 (CalculateHsMetrics)
    MEAN_BAIT_COVERAGE=142.328618
    MEAN_TARGET_COVERAGE=141.54202

    Picard 2.1.0 (CollectHsMetrics)
    MEAN_BAIT_COVERAGE=142.328618
    MEAN_TARGET_COVERAGE=139.853543

    For reference, GATK DepthOfCoverage's mean = 142.33

    So in the v2.0.1, the results are equals, which is coherent because the interval lists files are the same for both bait and target, and there isn't any bp with a coverage depth lower than 2. And according to the documentation on HsMetrics, if there were some bp with a coverage depth lower than 2, then the value of MEAN_TARGET_COVERAGE should be higher than the MEAN_BAIT_COVERAGE value.
    So apparently the algorithm evolved since Picard 2.1.0, but I can't find any information on what is being computed for these values in this last version.

    Thanks for the help

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,734 admin

    Hi @FlorianT, you're correct that some changes have been made. We don't have this fully documented, but here are the notes that the developers logged with the code changes:

    • Targeted metrics now filters out low quality bases and enforces a minimum mapping quality, currently both set to the previous defaults.
    • Adding a median target coverage metric and the percentage of bases at 1x or greater to the targeted metrics.
    • Do not count overlapping bases in targeted metrics. To do this, we need to clone the SAMRecord, so adding an option to allow side effects when computing the targeted metrics. Also adding an option to not clip overlapping reads when computing targeted metrics.
    • Fixes a bug where we were not counting bases from supplemental records in targeted metrics. Supplemental records should have only been ignored when counting metrics related to the number of records (not bases).
    • Some metrics were computed a minimum coverage of two, which no longer applies with high-quality capture technologies. Updated to consider all depths, including zero.
    • Adding tests for CollectHsMetrics: overlapping reads, minimum base quality, and minimum mapping quality.
    • Adding fields to per target metrics: max_normalized_coverage, min_coverage, max_coverage, and pct_target_bases_0x

    I can also tell you that based on a superficial reading of the code, it looks like bait coverage can take into account reads that are very close to a bait despite not overlapping it, whereas that is not done for target coverage. I think this may account for the difference you're seeing.

    Geraldine Van der Auwera, PhD

  • stevegroenewoldstevegroenewold Iowa CityMember Posts: 1

    I've also noted that in addition to the low MAPQ and low BASEQ reads/bases, duplicate reads are also included in the MEAN_BAIT_COVERAGE calculation, but excluded from the MEAN_TARGET_COVERAGE calculation.

Sign In or Register to comment.