How to run PathSeqPipelineSpark on the local machine?

How to run PathSeqPipelineSpark on a "normal" machine (even just a laptop) with multiple CPU cores?

Answers

  • markwmarkw 7003AMember, Broadie, Moderator, Dev admin
    edited January 2018

    Hello Yinga,

    Thanks for your interest in using PathSeq. PathSeqPipelineSpark (and in fact any GATK Spark tool) can be run on your local machine by omitting the Spark arguments. See first Usage example in the tool documentation here. If you want to specify how many CPU cores to use, you can specify it like this:

    ./gatk PathSeqPipelineSpark
      ...
      -- \
      --spark-runner LOCAL --spark-master local[4]
    

    would use 4 cores. For more information see this (note this doc cites the GATK4-beta --sparkMaster argument instead of --spark-master used in the new GATK4 release).

    Note you will need the necessary reference files that are built from the host and pathogen references. Pre-built references are available for download on the GATK Resource Bundle FTP server in /bundle/beta/PathSeq.

    Additionally, a WDL is now available in the master branch on github in /scripts/pathseq/WDL. There is a readme file that further describes how it works.

  • jorgezjorgez Member
    Hello Markw,

    I am trying to follow your guidelines to select local cores but I get the following error:

    A USER ERROR has occurred: spark-master is not a recognized option

    In my case I am not running PathSeqPipelineSpark but Mutect2 (GATK v4.0.10.0), like this:

    ```
    gatk Mutect2 \
    -R GRCh38_full_analysis_set_plus_decoy_hla.fa \
    --tumor-sample HCC1143_tumor \
    --input hcc1143_N_subset50K.bam \
    --input hcc1143_T_subset50K.bam \
    --output mutect2.vcf \
    -- --spark-runner LOCAL --spark-master local[1]
    ```

    Any help to select local cores will be appreciated.

    Thanks
    Jorge
  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    HI @jorgez

    You see this error because mutect2 is not a Spark tool, so that is why you cannot use the Spark options.

  • jorgezjorgez Member
    Hi,

    Thanks so much for letting me know.

    Is there then a built in way to parallelise mutect2?

    Jorge
  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @jorgez

    No there is no single built in way to parallelize Mutect2. Parallelizing is only done for certain tools because for the others there are errors generated due to the way their algorithms are designed. Hence we are being very cautious and have separate spark tools for them.

  • jorgezjorgez Member

    Hi,

    I understand,

    Thanks
    Jorge

Sign In or Register to comment.