I've been trying to run joint-discovery-gatk4 (copied from workspace help-gatk/Germline-SNPs-Indels-GATK4-b37
) on a sample set of 254 WGS bams and have been experiencing many failures with subtle error messages. In my first few attempts, failure messages indicated that task call-ImportGVCFs ran out of disk space, so I increased the "medium disk" size from 200GB eventually to 3Tb, at which point joint-discovery-gatk4 no longer failed in task call-ImportGVCFs - but did fail in a later task.
By subtle, I mean that the job monitor page indicates that the job failed - but without failure messages or clues:
other than Submission ID and the scary total cost. A search into the Submission ID output bucket revealed hundreds of shards for tasks call-HardFilterAndMakeSitesOnlyVcf and call-ImportGVCFs that may or may not have failed. Is there an easier way to determine what went wrong than clicking or downloading hundreds of shard log files looking for failure messages? So far my scan of log files has not revealed failure messages.
Since it isn't clear what went wrong, it isn't clear if this should posted to the Firehose or GATK forum.