Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Many simultaneous GATKs taking down Lustre storage
We have a cluster pointing at a Lustre scratch storage system on a DDN device. Over the last couple weeks, we've noticed a whole bunch of nodes going down when people were running hundreds of simultaneous GATK jobs that were reading/writing to/from that scratch system. It occurred with at least two users in different groups running newish versions of GATK (possibly 2.6-5 or 2.7-4, but we're still analyzing things - and it looks like one user had a nightly from January 20.) It looks like some of the runs with with -nct 12 and others had no -nct option. Obviously this would cause lots of I/O, but we haven't seen similar crashes from other programs doing heavy I/O on our system. When small batches of these same jobs are rerun, they seem to finish OK, so it's probably not a function of the particular input data.
Unfortunately, a search for lustre on the forum just finds a bunch of messages with /lustre file paths. So can anyone tell me if there are reports of this kind of error with GATK?
Harvard Medical School