HaplotypeCaller (gvcf mode) on whole genome vs chromosome by chromosome
I'm currently running my first real use of GATK. I was worried about running HaplotypeCaller on whole geneomes given some of the reports I've seen on these forums about how long it can take to run. In contrast, I was pleasantly surprised with the current GATK it is proceeding well (~7 day estimate on dog wgs). But it seems it could be much faster if I divided it up by chromosome with the -L flag.
I see that the advice is to not use the -L flag for whole genome analysis . But the wording in that link seems open: it is not necessary, but if it would help efficiency it might be worthwhile.
I've found a related question on the forums here , but it seems the descrepancy discussed in that thread is suspected to be due to downsampling and not actually the result of a chromosome-by-chromosome use of HaplotypeCaller.
Again, I'm content with a ~7 day run time in order to take proper care of our data. I wouldn't want to sacrifice power or accuracy for a shorter runtime, but if there is really no trade-off, a chromosomal approach would be even better. So I'm curious if there is a downside to partitioning the HaplotypeCaller step by chromosome?