This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
What causes BaseRecalibratorSpark to run for a long time and end up failing with memory errors?
Hi, GATK team,
I am testing BaseRecalibrator in GATK 4.5 beta, when running in LOCAL mode, it finishes pretty fast. However when i run BaseRecalibratorSpark in SPARK mode, it runs for a long time and eventually fails with memory errors like:
'java.lang.OutOfMemoryError：GC overhead limit exceeded'
When I look at the stdout of the executors, it contains many messages like this:
14:17:19.753 INFO KnownSitesCache - Number of variants read: 37000001
I tested HaplotypeCallerSpark on the same SPARK cluster and it can finish pretty quick too.