How to run PathSeqPipelineSpark on a "normal" machine (even just a laptop) with multiple CPU cores？
Thanks for your interest in using PathSeq. PathSeqPipelineSpark (and in fact any GATK Spark tool) can be run on your local machine by omitting the Spark arguments. See first Usage example in the tool documentation here. If you want to specify how many CPU cores to use, you can specify it like this:
--spark-runner LOCAL --spark-master local
would use 4 cores. For more information see this (note this doc cites the GATK4-beta --sparkMaster argument instead of --spark-master used in the new GATK4 release).
Note you will need the necessary reference files that are built from the host and pathogen references. Pre-built references are available for download on the GATK Resource Bundle FTP server in /bundle/beta/PathSeq.
Additionally, a WDL is now available in the master branch on github in /scripts/pathseq/WDL. There is a readme file that further describes how it works.
You see this error because mutect2 is not a Spark tool, so that is why you cannot use the Spark options.
No there is no single built in way to parallelize Mutect2. Parallelizing is only done for certain tools because for the others there are errors generated due to the way their algorithms are designed. Hence we are being very cautious and have separate spark tools for them.