HaplotypeCaller 4.beta.6 gVCF performance
Hi, ever since the 4.beta.4 release, I've noticed a significant increase in the memory requirements and execution time of HaplotypeCaller in gVCF mode. I tested the 4.beta.2 and 4.beta.6 version of HaplotypeCaller with a NA12878 BAM, aligned with BWA 0.7.13 with approximately 30x coverage. 4.beta.2 completed after roughly 5h with 2GB of memory, while 4.beta.6 completed after roughly 30h with 15GB of memory. 4.beta.6 failed with an out of memory exception when given less memory.
Both versions were ran with the same settings (--interval_set_rule UNION --genotyping_mode DISCOVERY --createOutputVariantIndex --emitRefConfidence GVCF) and parallelized on intervals from a custom BED file.
From my understanding of the release notes, the versions from 4.beta.4 onwards have a bug fix that corrects the results of HaplotypeCaller in gVCF mode. Is the performance difference to be expected?