CollectHsMetrics errors with IllegalStateException: Could not find percentile: 0.2

I'm running CollectHSMetrics using a new interval list, and get the following error:

Exception in thread "main" java.lang.IllegalStateException: Could not find percentile: 0.2
at htsjdk.samtools.util.Histogram.getPercentile(Histogram.java:327)
at picard.analysis.directed.TargetMetricsCollector$PerUnitTargetMetricCollector.calculateTargetCoverageMetrics(TargetMetricsCollector.java:688)
at picard.analysis.directed.TargetMetricsCollector$PerUnitTargetMetricCollector.finish(TargetMetricsCollector.java:626)
at picard.metrics.MultiLevelCollector$AllReadsDistributor.finish(MultiLevelCollector.java:208)
at picard.metrics.MultiLevelCollector.finish(MultiLevelCollector.java:324)
at picard.analysis.directed.CollectTargetedMetrics.doWork(CollectTargetedMetrics.java:153)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

Java version = 1.8.0_20

I ran ValidateSamFile on my input BAM, and got the following:

11:59:41.844 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/ifs/rcgroups/clindsley/programs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Sat Feb 23 11:59:41 EST 2019] ValidateSamFile INPUT=/ifs/rcgroups/clindsley/chrisg/oAML/feb21/bam/00282155.x.sort.dedup.realign.recal.bam MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sat Feb 23 11:59:41 EST 2019] Executing as [email protected] on Linux 2.6.32-358.2.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26; Deflater: Intel; Inflater: Intel; Picard version: 2.13.2-SNAPSHOT

I ran IntervalListTools on my interval list and got the following:

[Sun Feb 24 01:19:50 EST 2019] IntervalListTools INPUT=[/ifs/rcgroups/clindsley/chrisg/oAML/feb21/oAML_covered_hg19_picard_baits.interval_list] OUTPUT=/dev/null PADDING=0 UNIQUE=false SORT=true ACTION=CONCAT SCATTER_COUNT=1 INCLUDE_FILTERED=false BREAK_BANDS_AT_MULTIPLES_OF=0 SUBDIVISION_MODE=INTERVAL_SUBDIVISION INVERT=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sun Feb 24 01:19:50 EST 2019] Executing as [email protected] on Linux 2.6.32-358.2.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26; Deflater: Intel; Inflater: Intel; Picard version: 2.13.2-SNAPSHOT
INFO 2019-02-24 01:19:50 IntervalListTools Produced 1715 intervals totalling 495567 unique bases.
[Sun Feb 24 01:19:50 EST 2019] picard.util.IntervalListTools done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=378011648

(Of note, I got similar output for my targets interval list, just not showing it here.)

I have looked for this error online and don't see much that applies to me. The only refs I saw to it were folks with empty interval lists, which is not my problem.

One other piece of info that might be helpful - the problem seems to be the header. I took the current interval list and used it to run CollectHSMetrics on a different BAM with a different header, and it worked fine. I then took an old interval list that I have used successfully for a different project (as recently as last week), combined it with the header for the current project, and got this error. However, I was able to run IntervalListTools on the current interval list (with the problematic header) and got non-zero output, as above (1715 intervals etc). So I'm stumped.

I run CollectHSMetrics all the time and have never run into this problem, so any thoughts would be much appreciated!

thanks
Chris

Best Answer

  • cgibsoncgibson
    Accepted Answer
    OK, problem solved - I had provided a path to an incomplete targets interval list that contained only a header. So in fact I did have an empty interval list, as did others online who were posted about this problem. Should have read my log more carefully - sorry!

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @cgibson

    Would you please post the entire error log when you run CollectHSMetrics and ValidateSamFile please.

  • cgibsoncgibson Member
    Sorry, the output I posted from ValidateSamFile is the entirety of the output from the command line to the end - there is nothing else.

    For CollectHSMetrics, I thought what I posted was the entire error log, but here is the full log starting from the beginning of the task in our pipeline:

    START task_collectHSMetrics
    Fri Feb 22 08:34:15 EST 2019

    picard_bait_file is /ifs/rcgroups/clindsley/chrisg/oAML/feb21/oAML_covered_hg19_picard_baits.bed
    Fri Feb 22 08:34:15 EST 2019
    cat: /ifs/rcgroups/clindsley/chrisg/oAML/feb21/oAML_covered_hg19.bed: No such file or directory
    cat: /ifs/rcgroups/clindsley/chrisg/donorchip/feb11/oAML_covered_hg19.bed: No such file or directory
    08:34:17.875 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/ifs/rcgroups/clindsley/programs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Fri Feb 22 08:34:17 EST 2019] CollectHsMetrics BAIT_INTERVALS=[/ifs/rcgroups/clindsley/chrisg/oAML/feb21/oAML_covered_hg19_picard_baits.bed] BAIT_SET_NAME=oAML_covered_hg19.bed TARGET_INTERVALS=[/ifs/rcgroups/clindsley/chrisg/donorchip/feb11/oAML_covered_hg19_picard_targets.bed] INPUT=/ifs/rcgroups/clindsley/chrisg/oAML/feb21/bam/00282155.x.sort.dedup.realign.recal.bam OUTPUT=/ifs/rcgroups/clindsley/chrisg/oAML/feb21/coverage/00282155.x.summary_coverage METRIC_ACCUMULATION_LEVEL=[ALL_READS] PER_TARGET_COVERAGE=/ifs/rcgroups/clindsley/chrisg/oAML/feb21/coverage/00282155.x.bait_coverage REFERENCE_SEQUENCE=/ifs/rcgroups/clindsley/fa/Homo_sapiens_assembly19.fasta NEAR_DISTANCE=250 MINIMUM_MAPPING_QUALITY=20 MINIMUM_BASE_QUALITY=20 CLIP_OVERLAPPING_READS=true COVERAGE_CAP=200 SAMPLE_SIZE=10000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
    [Fri Feb 22 08:34:17 EST 2019] Executing as [email protected] on Linux 2.6.32-642.13.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26; Deflater: Intel; Inflater: Intel; Picard version: 2.13.2-SNAPSHOT
    INFO 2019-02-22 08:34:25 CollectHsMetrics Processed 1,000,000 records. Elapsed time: 00:00:06s. Time for last 1,000,000: 6s. Last read position: 3:37,626,900
    INFO 2019-02-22 08:34:30 CollectHsMetrics Processed 2,000,000 records. Elapsed time: 00:00:12s. Time for last 1,000,000: 5s. Last read position: 4:106,156,031
    INFO 2019-02-22 08:34:36 CollectHsMetrics Processed 3,000,000 records. Elapsed time: 00:00:18s. Time for last 1,000,000: 5s. Last read position: 6:29,894,104
    INFO 2019-02-22 08:34:42 CollectHsMetrics Processed 4,000,000 records. Elapsed time: 00:00:24s. Time for last 1,000,000: 5s. Last read position: 7:66,459,281
    INFO 2019-02-22 08:34:48 CollectHsMetrics Processed 5,000,000 records. Elapsed time: 00:00:29s. Time for last 1,000,000: 5s. Last read position: 9:6,669,026
    INFO 2019-02-22 08:34:54 CollectHsMetrics Processed 6,000,000 records. Elapsed time: 00:00:35s. Time for last 1,000,000: 5s. Last read position: 11:118,370,627
    INFO 2019-02-22 08:35:00 CollectHsMetrics Processed 7,000,000 records. Elapsed time: 00:00:41s. Time for last 1,000,000: 6s. Last read position: 13:32,914,202
    INFO 2019-02-22 08:35:06 CollectHsMetrics Processed 8,000,000 records. Elapsed time: 00:00:47s. Time for last 1,000,000: 5s. Last read position: 16:3,900,640
    INFO 2019-02-22 08:35:12 CollectHsMetrics Processed 9,000,000 records. Elapsed time: 00:00:53s. Time for last 1,000,000: 5s. Last read position: 17:1,564,064
    INFO 2019-02-22 08:35:17 CollectHsMetrics Processed 10,000,000 records. Elapsed time: 00:00:59s. Time for last 1,000,000: 5s. Last read position: 17:40,856,984
    INFO 2019-02-22 08:35:23 CollectHsMetrics Processed 11,000,000 records. Elapsed time: 00:01:05s. Time for last 1,000,000: 5s. Last read position: 20:62,320,917
    INFO 2019-02-22 08:35:29 CollectHsMetrics Processed 12,000,000 records. Elapsed time: 00:01:11s. Time for last 1,000,000: 5s. Last read position: X:76,763,885
    [Fri Feb 22 08:35:32 EST 2019] picard.analysis.directed.CollectHsMetrics done. Elapsed time: 1.25 minutes.
    Runtime.totalMemory()=663748608

    Exception in thread "main" java.lang.IllegalStateException: Could not find percentile: 0.2
    at htsjdk.samtools.util.Histogram.getPercentile(Histogram.java:327)
    at picard.analysis.directed.TargetMetricsCollector$PerUnitTargetMetricCollector.calculateTargetCoverageMetrics(TargetMetricsCollector.java:688)
    at picard.analysis.directed.TargetMetricsCollector$PerUnitTargetMetricCollector.finish(TargetMetricsCollector.java:626)
    at picard.metrics.MultiLevelCollector$AllReadsDistributor.finish(MultiLevelCollector.java:208)
    at picard.metrics.MultiLevelCollector.finish(MultiLevelCollector.java:324)
    at picard.analysis.directed.CollectTargetedMetrics.doWork(CollectTargetedMetrics.java:153)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

    Please let me know if you need any more information. Since I posted the first time, I tried recreating the interval file using bedtointervallist (using my test input bam as the reference), but the interval list looks identical and I get the same error.

    thanks
    Chris
  • cgibsoncgibson Member
    Hang on, sorry - I just noticed an incorrect file path in the error log that I hadn't noticed before. Let me try fixing it and rerun the task. I'll update here if it works.

    thanks
    Chris
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @cgibson

    Sounds good. Let us know!

  • cgibsoncgibson Member
    Accepted Answer
    OK, problem solved - I had provided a path to an incomplete targets interval list that contained only a header. So in fact I did have an empty interval list, as did others online who were posted about this problem. Should have read my log more carefully - sorry!
Sign In or Register to comment.