If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
HaplotypeCaller 4.beta.6 gVCF performance
Hi, ever since the 4.beta.4 release, I've noticed a significant increase in the memory requirements and execution time of HaplotypeCaller in gVCF mode. I tested the 4.beta.2 and 4.beta.6 version of HaplotypeCaller with a NA12878 BAM, aligned with BWA 0.7.13 with approximately 30x coverage. 4.beta.2 completed after roughly 5h with 2GB of memory, while 4.beta.6 completed after roughly 30h with 15GB of memory. 4.beta.6 failed with an out of memory exception when given less memory.
Both versions were ran with the same settings (--interval_set_rule UNION --genotyping_mode DISCOVERY --createOutputVariantIndex --emitRefConfidence GVCF) and parallelized on intervals from a custom BED file.
From my understanding of the release notes, the versions from 4.beta.4 onwards have a bug fix that corrects the results of HaplotypeCaller in gVCF mode. Is the performance difference to be expected?