We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Picard (2.1.0): the CollectHsMetrics discrepancy

Hi everyone,
I'm having trouble understanding some results using CollectHSMetrics from
Picard's latest version (2.1.0). It seems like the values i'm getting for
because from the documentation these values indicate:

  • MEAN_BAIT_COVERAGE: The mean coverage of all baits in the experiment.
  • MEAN_TARGET_COVERAGE: The mean coverage of targets that received at least coverage depth = 2 at one base.

In my case, when I use CollectHSMetrics, I provide the same interval_list file
for both BAIT and TARGET intervals. So I should always have a MEAN_TARGET_COVERAGE
value equal to or higher than the MEAN_BAIT_COVERAGE value (equal if coverage
depth is >= 2 everywhere, and higher if some bp get a coverage of 0 or 1).
But of all the samples I've tested so far, all of them have a MEAN_TARGET_COVERAGE
value lower than the MEAN_BAIT_COVERAGE value.

Can someone help me figure this out?

BTW, I'm using this command:

$JAVA -jar $PICARD CollectHsMetrics \
    I=$NAME.sorted.bam \
    O=CouvertureHSmetrics.txt \
    R=$REF_GEN_PATH/GRCh37.fa \

(with $INTERVAL_LIST being the same file for BAIT and TARGET)


Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there, it sounds like you're using this tool in a way that it's not designed for. The coverage calculations are done quite differently for baits and for targets because they're trying to model effects in a very specific experimental design (capture using hybrid selection). If you're looking for coverage analysis of targets, and are not interested in evaluating performance of a bait set, then you should either focus on the targets results or use tools like GATK DepthOfCoverage or DisgnoseTargets that are more generic and don't make assumptions about the experimental design.

  • FlorianTFlorianT ParisMember

    Thank you for your answer,
    Well I don't have the intervals for baits, I just have one file for targets. And I've been using Picard a few times before (with the same interval file for both baits and targets). And I never had a mean_bait_coverage higher than the mean_target_coverage, and if I trust the documentation of Picard, this should indeed never happen in this case (where targets and baits are equal).
    And it seems like this issue appeared with the last versions of Picard (2.1.0), so I'm wondering what does the mean_target_coverage stand for in this new version.

    I checked the result using GATK DepthOfCoverage, and it matches the mean_bait_coverage value I'm obtaining with Picard, so that's a relief but not really an answer, I'm still really curious about what is being computed as the mean_target_coverage in Picard.

    Issue · Github
    by Sheila

    Issue Number
    Last Updated
    Closed By
  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    So you are saying the results from 2.1.0 do not match the results from earlier versions? Have you run on the same files you did with earlier versions of Picard? I will ask our Picard expert to have a look.


  • FlorianTFlorianT ParisMember

    With earlier versions of Picard, I can only use CalculateHsMetrics since CollectHsMetrics was introduced in v2.1.0 if I'm not mistaking. But the metric's names are the same for both functions so I'm guessing they should provide the same information. So I tried CalculateHsMetrics from Picard 2.0.1 and 2.1.0 (on the same dataset) and here are the results:

    Picard 2.0.1 (CalculateHsMetrics)

    Picard 2.1.0 (CalculateHsMetrics)

    Picard 2.1.0 (CollectHsMetrics)

    For reference, GATK DepthOfCoverage's mean = 142.33

    So in the v2.0.1, the results are equals, which is coherent because the interval lists files are the same for both bait and target, and there isn't any bp with a coverage depth lower than 2. And according to the documentation on HsMetrics, if there were some bp with a coverage depth lower than 2, then the value of MEAN_TARGET_COVERAGE should be higher than the MEAN_BAIT_COVERAGE value.
    So apparently the algorithm evolved since Picard 2.1.0, but I can't find any information on what is being computed for these values in this last version.

    Thanks for the help

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @FlorianT, you're correct that some changes have been made. We don't have this fully documented, but here are the notes that the developers logged with the code changes:

    • Targeted metrics now filters out low quality bases and enforces a minimum mapping quality, currently both set to the previous defaults.
    • Adding a median target coverage metric and the percentage of bases at 1x or greater to the targeted metrics.
    • Do not count overlapping bases in targeted metrics. To do this, we need to clone the SAMRecord, so adding an option to allow side effects when computing the targeted metrics. Also adding an option to not clip overlapping reads when computing targeted metrics.
    • Fixes a bug where we were not counting bases from supplemental records in targeted metrics. Supplemental records should have only been ignored when counting metrics related to the number of records (not bases).
    • Some metrics were computed a minimum coverage of two, which no longer applies with high-quality capture technologies. Updated to consider all depths, including zero.
    • Adding tests for CollectHsMetrics: overlapping reads, minimum base quality, and minimum mapping quality.
    • Adding fields to per target metrics: max_normalized_coverage, min_coverage, max_coverage, and pct_target_bases_0x

    I can also tell you that based on a superficial reading of the code, it looks like bait coverage can take into account reads that are very close to a bait despite not overlapping it, whereas that is not done for target coverage. I think this may account for the difference you're seeing.

  • stevegroenewoldstevegroenewold Iowa CityMember

    I've also noted that in addition to the low MAPQ and low BASEQ reads/bases, duplicate reads are also included in the MEAN_BAIT_COVERAGE calculation, but excluded from the MEAN_TARGET_COVERAGE calculation.

Sign In or Register to comment.