Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

modifying the processing-for-variant-discovery...inputs.json

Dear GATK Team,

I just successfully converted my fastq files to ubam and wanted to do a local data preprocessing using the uBAM to analysis-ready BAM universal pipeline.
I realized that I have to call the wdl file using the json file as an input.

I have some questions about editing the json file:

I. how should the flowcell_unmapped_bams_list be formatted? I have no access to the example "NA12878_24RG_small.txt"

II. I was able to download the “gs://gatk-legacy-bundles/b37/human_g1k_v37_decoy.fasta” from the Resource Bundle under the Download section. Regarding the ones that end with human_g1k_v37_decoy.fasta.sa, amb, bwt, ann, pac. I was not able to find those in the Resource Bundle on the homepage.

III. I was able to place the paths for picard and gatk. But what is gotc? ("PreProcessingForVariantDiscovery_GATK4.gotc_path": "/usr/gitc/")

IV. There are options like GATK4.agg_small_disk, medium disk and large disk. Where do I choose that the program should run on a specific size?

V. I installed gatk locally using the gatkcondaev.yml without the need of docker? Can I run the whole pipeline without docker? The json file has some options called DOCKERS.

Thanks for your help

zmk

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    Hi @zmk, sorry for the late response. I assume you're looking at https://github.com/gatk-workflows/gatk4-data-processing ?

    1. The list of ubams is just one filepath per line. You can find it here: https://console.cloud.google.com/storage/browser/gatk-test-data/wgs_ubam/NA12878_24RG

    2. Those are BWA index files; you can generate them with bwa index.

    3. "gotc" is short for "genomes on the cloud", the name of our internal project for genome processing. Currently the example json uses the docker container from that project.

    4. Those options control allocation of resources when run on cloud platforms; if you're running locally you don't need to specify them; you can either leave them as is (and they will be ignored) or you can remove them from the json.

    5. Yes you should be able to run it without docker, though you may need to adapt some of the paths to executables based on your local setup.

  • Angry_PandaAngry_Panda Member

    Dear GATK Team,

    I am using gatk4 with the same best practicehttps://github.com/gatk-workflows/gatk4-data-processing
    Thanks for the ubams list download source.

    I am a trying to run it with remote sever offered by my own company and I cannot use docker on it.
    I already successfully tried it in local with docker. but with the remote one. I am totally lost. There are so many docker thing in .json and .wdl file, such as "##_COMMENT5": "DOCKERS" in json file and runtime part in wdl file. I don't know how to change them.

    Do you have some teaching material or even better some script about this?
    Thanks for your help.

    BR, Angry_Panda

  • jgentryjgentry ✭✭✭ Member, Broadie, Dev ✭✭✭

    Hi @Angry_Panda - would you be able to use uDocker? If so, we can show you how to configure Cromwell to do this.

  • Angry_PandaAngry_Panda Member

    @jgentry said:
    Hi @Angry_Panda - would you be able to use uDocker? If so, we can show you how to configure Cromwell to do this.

    Oh @jgentry , I think I can use uDocker. Thanks very much and sorry for my late reply. I just noticed that my question got answered. Impressed by your people efficiency.
    Can you teach me how to do it?
    Thank you very much.

  • YannLGYannLG Member
    Hi,
    Has a solution been found or posted somewhere to replace the use of docker by singularity in the wdl workflow?

    To give a bit more details:
    I saw this post in discussion/11897/support-for-singularity with someone posting a full configuration to run cromwell with singularity. However, I wonder whether there could be a simpler option. Given that singularity seems to easily support pulling Docker containers, it maybe possible to simply replace the "docker run" calls by "singularity run" or exec.
    An example is found here learningpatterns.me/posts-output/2018-04-05-gatk-singularity-docker-job-array/. But it does not use the wdl workflow and simply run gatk within the container.

    For my particular application I am interested in running the following workflow
    "gatk4-data-processing" which apparently pull 3 different containers during processing (more exactly one each time a task is performed I think). Which part of the wdl should I modify to replace "docker run" by "singularity run" when the workflow is created by cromwell, if it's possible?

    Thank you for your help,
    Yann
  • oliversinnoliversinn GuatemalaMember
    I’m running the workflow locally. Im having problems placing my programs paths. Where should i place them, and what should i place?
  • YannLGYannLG Member
    Hi,

    To answer my own question,
    As mentioned I followed the insights from discussion/11897/support-for-singularity
    and also how to configure cromwell that I had not seen before posting
    cromwell.readthedocs.io/en/stable/backends/SLURM/
    The main trick is to replace gatk_cmd and other cmds with singularity run such as 'singularity run -B /my_path/GATK4.simg gatk' and comment/remove all docker attributes from the runtime in the tasks defined in the .wdl file.
    Two options are then possible:
    - submit cromwell on the main node and let it submit job to the scheduler with slurm config
    - submit cromwell on running node and let it manage jobs on the node.
    I think the first option might result in cromwell being killed if cromwell like running job are not allowed on the main node.

    Yann
Sign In or Register to comment.