Different coverage metrics between GATK and PICARD

asakiasaki chinaMember
edited November 2017 in Ask the GATK team

Hi all,

I want to evaluate the panel performance by calculating the target coverage.

Both GATK and PICARD offered great convenience for the calculation. However, slightly difference was observed.

And it seems that PICARD can not calculate the percentage of target base larger than pre-defined coverage as GATK, by using -ct option. Is there any option?

COMMANDS:

java -jar /bioinfo/software/bin/GenomeAnalysisTK.jar -T DepthOfCoverage -R /bioinfo/data/iGenomes/genome.fa -o gatk -I ../4D_S1_dedup.bam -ct 1 -ct 2 -ct 10 -ct 30 -ct 50 -ct 100 -L ../../../Agilent_ClearSeq_Inherited.v1.bed

java -jar /bioinfo/software/packages/picard-2.9.2/picard.jar CollectHsMetrics I=../4D_S1_dedup.bam O=picard.metrics R=/bioinfo/data/iGenomes/genome.fa BI=../agilent.intervals TI=../agilent.intervals MQ=0 Q=0

agilent.intervals was obtained by PICARD

java -jar /bioinfo/software/packages/picard-2.9.2/picard.jar BedToIntervalList I=Agilent_ClearSeq_Inherited.v1.bed O=Agilent_ClearSeq_Inherited.v1.interval SD=/bioinfo/data/iGenomes/genome.dict 

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi again @asaki,

    Yes, the tools are different as they were designed with different applications in mind.

    One thing CollectHsMetrics does is take into consideration base qualities and mapping quality but I see you already set these to 0. CollectHsMetrics also has a coverage cap that defaults to 200 towards theoretical sensitivity calculations but this shouldn't impact coverage counts.

    As for DepthOfCoverage, remember the GATK engine applies filters upfront, before the tool sees any of the reads. In this case, these filters are:

    MalformedReadFilter
    BadCigarFilter
    UnmappedReadFilter
    NotPrimaryAlignmentFilter
    FailsVendorQualityCheckFilter
    DuplicateReadFilter
    

    I think this might explain in part the differences you are seeing. You can disable them and see if the metrics are then more concordant.

  • asakiasaki chinaMember

    @shlee ,
    Thanks for the quick response.

    I know different stratergies might be applied for GATK and PICARD, which can result in different outcome.

    As you mentioned, walkers in GATK might discard some reads out of quality under pre-defined filters.
    If I am right, this might lead to slight lower target coverage against PICARD, while the figure showed the oppsite.

    The only possibility I can think is the definition of threshold of on target reads.

    Nevertheless, I will choose GATK to calculate the coverage, since it's of great convenience to obtain results with any defined cutoff.

  • asakiasaki chinaMember

    @shlee ,

    Just an update,

    As you mentioned, several Read Filters are automatically applied in GATK DepthOfCoverage.
    4.95% of reads are removed based on filters.

    INFO  14:26:39,725 MicroScheduler - 159168 reads were filtered out during the traversal out of approximately 3215381 total reads (4.95%)
    INFO  14:26:39,725 MicroScheduler -   -> 0 reads (0.00% of total) failing BadCigarFilter
    INFO  14:26:39,726 MicroScheduler -   -> 155144 reads (4.83% of total) failing DuplicateReadFilter
    INFO  14:26:39,726 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
    INFO  14:26:39,726 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter
    INFO  14:26:39,726 MicroScheduler -   -> 1061 reads (0.03% of total) failing NotPrimaryAlignmentFilter
    INFO  14:26:39,727 MicroScheduler -   -> 2963 reads (0.09% of total) failing UnmappedReadFilter
    

    After disabling the fliter, the percentage of target coverage increased as expected.

    sample_id       total   mean    granular_third_quartile granular_median granular_first_quartile %_bases_above_10        %_bases_above_30        %_bases_above_50        %_bases_above_100
    4D_S1   384528783       35.42   47      31      19      90.9    50.4    20.8    1.7
    
  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @asaki,

    Thanks for the followup. So GATK counts more depth compared to Picard and also by disabling engine filters, GATK counts even more depth. You have chosen to continue using GATK for its convenient -ct option. One last thing I'd like to point out, if by panel you mean a targeted exome kit, then we actually recommend you use DiagnoseTargets. This is discussed in Article#40.

  • asakiasaki chinaMember

    @shlee ,

    Thanks, DiagnoseTargets is a great tool, however, not offering direct summary of target coverage information.

Sign In or Register to comment.