Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

CollectHsMetrics errors with IllegalStateException: Could not find percentile: 0.2

I'm running CollectHSMetrics using a new interval list, and get the following error:

Exception in thread "main" java.lang.IllegalStateException: Could not find percentile: 0.2
at htsjdk.samtools.util.Histogram.getPercentile(Histogram.java:327)
at picard.analysis.directed.TargetMetricsCollector$PerUnitTargetMetricCollector.calculateTargetCoverageMetrics(TargetMetricsCollector.java:688)
at picard.analysis.directed.TargetMetricsCollector$PerUnitTargetMetricCollector.finish(TargetMetricsCollector.java:626)
at picard.metrics.MultiLevelCollector$AllReadsDistributor.finish(MultiLevelCollector.java:208)
at picard.metrics.MultiLevelCollector.finish(MultiLevelCollector.java:324)
at picard.analysis.directed.CollectTargetedMetrics.doWork(CollectTargetedMetrics.java:153)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

Java version = 1.8.0_20

I ran ValidateSamFile on my input BAM, and got the following:

11:59:41.844 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/ifs/rcgroups/clindsley/programs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Sat Feb 23 11:59:41 EST 2019] ValidateSamFile INPUT=/ifs/rcgroups/clindsley/chrisg/oAML/feb21/bam/00282155.x.sort.dedup.realign.recal.bam MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sat Feb 23 11:59:41 EST 2019] Executing as [email protected] on Linux 2.6.32-358.2.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26; Deflater: Intel; Inflater: Intel; Picard version: 2.13.2-SNAPSHOT

I ran IntervalListTools on my interval list and got the following:

[Sun Feb 24 01:19:50 EST 2019] IntervalListTools INPUT=[/ifs/rcgroups/clindsley/chrisg/oAML/feb21/oAML_covered_hg19_picard_baits.interval_list] OUTPUT=/dev/null PADDING=0 UNIQUE=false SORT=true ACTION=CONCAT SCATTER_COUNT=1 INCLUDE_FILTERED=false BREAK_BANDS_AT_MULTIPLES_OF=0 SUBDIVISION_MODE=INTERVAL_SUBDIVISION INVERT=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sun Feb 24 01:19:50 EST 2019] Executing as [email protected] on Linux 2.6.32-358.2.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26; Deflater: Intel; Inflater: Intel; Picard version: 2.13.2-SNAPSHOT
INFO 2019-02-24 01:19:50 IntervalListTools Produced 1715 intervals totalling 495567 unique bases.
[Sun Feb 24 01:19:50 EST 2019] picard.util.IntervalListTools done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=378011648

(Of note, I got similar output for my targets interval list, just not showing it here.)

I have looked for this error online and don't see much that applies to me. The only refs I saw to it were folks with empty interval lists, which is not my problem.

One other piece of info that might be helpful - the problem seems to be the header. I took the current interval list and used it to run CollectHSMetrics on a different BAM with a different header, and it worked fine. I then took an old interval list that I have used successfully for a different project (as recently as last week), combined it with the header for the current project, and got this error. However, I was able to run IntervalListTools on the current interval list (with the problematic header) and got non-zero output, as above (1715 intervals etc). So I'm stumped.

I run CollectHSMetrics all the time and have never run into this problem, so any thoughts would be much appreciated!

thanks
Chris

Best Answer

  • cgibsoncgibson
    Accepted Answer
    OK, problem solved - I had provided a path to an incomplete targets interval list that contained only a header. So in fact I did have an empty interval list, as did others online who were posted about this problem. Should have read my log more carefully - sorry!

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @cgibson

    Would you please post the entire error log when you run CollectHSMetrics and ValidateSamFile please.

  • cgibsoncgibson Member
    Sorry, the output I posted from ValidateSamFile is the entirety of the output from the command line to the end - there is nothing else.

    For CollectHSMetrics, I thought what I posted was the entire error log, but here is the full log starting from the beginning of the task in our pipeline:

    START task_collectHSMetrics
    Fri Feb 22 08:34:15 EST 2019

    picard_bait_file is /ifs/rcgroups/clindsley/chrisg/oAML/feb21/oAML_covered_hg19_picard_baits.bed
    Fri Feb 22 08:34:15 EST 2019
    cat: /ifs/rcgroups/clindsley/chrisg/oAML/feb21/oAML_covered_hg19.bed: No such file or directory
    cat: /ifs/rcgroups/clindsley/chrisg/donorchip/feb11/oAML_covered_hg19.bed: No such file or directory
    08:34:17.875 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/ifs/rcgroups/clindsley/programs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Fri Feb 22 08:34:17 EST 2019] CollectHsMetrics BAIT_INTERVALS=[/ifs/rcgroups/clindsley/chrisg/oAML/feb21/oAML_covered_hg19_picard_baits.bed] BAIT_SET_NAME=oAML_covered_hg19.bed TARGET_INTERVALS=[/ifs/rcgroups/clindsley/chrisg/donorchip/feb11/oAML_covered_hg19_picard_targets.bed] INPUT=/ifs/rcgroups/clindsley/chrisg/oAML/feb21/bam/00282155.x.sort.dedup.realign.recal.bam OUTPUT=/ifs/rcgroups/clindsley/chrisg/oAML/feb21/coverage/00282155.x.summary_coverage METRIC_ACCUMULATION_LEVEL=[ALL_READS] PER_TARGET_COVERAGE=/ifs/rcgroups/clindsley/chrisg/oAML/feb21/coverage/00282155.x.bait_coverage REFERENCE_SEQUENCE=/ifs/rcgroups/clindsley/fa/Homo_sapiens_assembly19.fasta NEAR_DISTANCE=250 MINIMUM_MAPPING_QUALITY=20 MINIMUM_BASE_QUALITY=20 CLIP_OVERLAPPING_READS=true COVERAGE_CAP=200 SAMPLE_SIZE=10000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
    [Fri Feb 22 08:34:17 EST 2019] Executing as [email protected] on Linux 2.6.32-642.13.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26; Deflater: Intel; Inflater: Intel; Picard version: 2.13.2-SNAPSHOT
    INFO 2019-02-22 08:34:25 CollectHsMetrics Processed 1,000,000 records. Elapsed time: 00:00:06s. Time for last 1,000,000: 6s. Last read position: 3:37,626,900
    INFO 2019-02-22 08:34:30 CollectHsMetrics Processed 2,000,000 records. Elapsed time: 00:00:12s. Time for last 1,000,000: 5s. Last read position: 4:106,156,031
    INFO 2019-02-22 08:34:36 CollectHsMetrics Processed 3,000,000 records. Elapsed time: 00:00:18s. Time for last 1,000,000: 5s. Last read position: 6:29,894,104
    INFO 2019-02-22 08:34:42 CollectHsMetrics Processed 4,000,000 records. Elapsed time: 00:00:24s. Time for last 1,000,000: 5s. Last read position: 7:66,459,281
    INFO 2019-02-22 08:34:48 CollectHsMetrics Processed 5,000,000 records. Elapsed time: 00:00:29s. Time for last 1,000,000: 5s. Last read position: 9:6,669,026
    INFO 2019-02-22 08:34:54 CollectHsMetrics Processed 6,000,000 records. Elapsed time: 00:00:35s. Time for last 1,000,000: 5s. Last read position: 11:118,370,627
    INFO 2019-02-22 08:35:00 CollectHsMetrics Processed 7,000,000 records. Elapsed time: 00:00:41s. Time for last 1,000,000: 6s. Last read position: 13:32,914,202
    INFO 2019-02-22 08:35:06 CollectHsMetrics Processed 8,000,000 records. Elapsed time: 00:00:47s. Time for last 1,000,000: 5s. Last read position: 16:3,900,640
    INFO 2019-02-22 08:35:12 CollectHsMetrics Processed 9,000,000 records. Elapsed time: 00:00:53s. Time for last 1,000,000: 5s. Last read position: 17:1,564,064
    INFO 2019-02-22 08:35:17 CollectHsMetrics Processed 10,000,000 records. Elapsed time: 00:00:59s. Time for last 1,000,000: 5s. Last read position: 17:40,856,984
    INFO 2019-02-22 08:35:23 CollectHsMetrics Processed 11,000,000 records. Elapsed time: 00:01:05s. Time for last 1,000,000: 5s. Last read position: 20:62,320,917
    INFO 2019-02-22 08:35:29 CollectHsMetrics Processed 12,000,000 records. Elapsed time: 00:01:11s. Time for last 1,000,000: 5s. Last read position: X:76,763,885
    [Fri Feb 22 08:35:32 EST 2019] picard.analysis.directed.CollectHsMetrics done. Elapsed time: 1.25 minutes.
    Runtime.totalMemory()=663748608

    Exception in thread "main" java.lang.IllegalStateException: Could not find percentile: 0.2
    at htsjdk.samtools.util.Histogram.getPercentile(Histogram.java:327)
    at picard.analysis.directed.TargetMetricsCollector$PerUnitTargetMetricCollector.calculateTargetCoverageMetrics(TargetMetricsCollector.java:688)
    at picard.analysis.directed.TargetMetricsCollector$PerUnitTargetMetricCollector.finish(TargetMetricsCollector.java:626)
    at picard.metrics.MultiLevelCollector$AllReadsDistributor.finish(MultiLevelCollector.java:208)
    at picard.metrics.MultiLevelCollector.finish(MultiLevelCollector.java:324)
    at picard.analysis.directed.CollectTargetedMetrics.doWork(CollectTargetedMetrics.java:153)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

    Please let me know if you need any more information. Since I posted the first time, I tried recreating the interval file using bedtointervallist (using my test input bam as the reference), but the interval list looks identical and I get the same error.

    thanks
    Chris
  • cgibsoncgibson Member
    Hang on, sorry - I just noticed an incorrect file path in the error log that I hadn't noticed before. Let me try fixing it and rerun the task. I'll update here if it works.

    thanks
    Chris
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @cgibson

    Sounds good. Let us know!

  • cgibsoncgibson Member
    Accepted Answer
    OK, problem solved - I had provided a path to an incomplete targets interval list that contained only a header. So in fact I did have an empty interval list, as did others online who were posted about this problem. Should have read my log more carefully - sorry!
Sign In or Register to comment.