This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Picard CalculateHsMetrics for targeted region larger than 2147483647bp.
I recently tried to obtain coverage metrics for a whole genome sequencing project (regular WGS, no hybrid selection), using Picard CalculateHsMetrics. Command line:
java -Xmx30g -XX:ParallelGCThreads=4 -XX:ConcGCThreads=4 -jar picard.jar CalculateHsMetrics I=something.bam O=something.HS_metrics BAIT_INTERVALS=genome.interval_list TARGET_INTERVALS=genome.interval_list VALIDATION_STRINGENCY=SILENT METRIC_ACCUMULATION_LEVEL=SAMPLE
This spends two hours calculating, and then fails in picard/analysis/directed/TargetMetricsCollector.java line 423:
final short depths = new short[(int) this.metrics.TARGET_TERRITORY]; // may not use entire array
Unfortunately in this case, this.metrics.TARGET_TERRITORY is the whole human genome, which is larger than Integer.MAX_INT, which rolls around to be a negative number, causing a java.lang.NegativeArraySizeException.
Now, the limit on the size of a java array is fixed by the fact that arrays are indexed by an int, not a long. Possible fixes to this issue are splitting the depths array into multiple sub-arrays.
Are there any plans to implement a fix to remove this limitation on CalculateHsMetrics? I'm only looking for mean read depth and coverage at 20X. Currently, I'm running two separate Picard runs with two halves of the genome, and then combining the results, which is a little bit of a shame.