modifying the processing-for-variant-discovery...inputs.json

Dear GATK Team,

I just successfully converted my fastq files to ubam and wanted to do a local data preprocessing using the uBAM to analysis-ready BAM universal pipeline.
I realized that I have to call the wdl file using the json file as an input.

I have some questions about editing the json file:

I. how should the flowcell_unmapped_bams_list be formatted? I have no access to the example "NA12878_24RG_small.txt"

II. I was able to download the “gs://gatk-legacy-bundles/b37/human_g1k_v37_decoy.fasta” from the Resource Bundle under the Download section. Regarding the ones that end with human_g1k_v37_decoy.fasta.sa, amb, bwt, ann, pac. I was not able to find those in the Resource Bundle on the homepage.

III. I was able to place the paths for picard and gatk. But what is gotc? ("PreProcessingForVariantDiscovery_GATK4.gotc_path": "/usr/gitc/")

IV. There are options like GATK4.agg_small_disk, medium disk and large disk. Where do I choose that the program should run on a specific size?

V. I installed gatk locally using the gatkcondaev.yml without the need of docker? Can I run the whole pipeline without docker? The json file has some options called DOCKERS.

Thanks for your help

zmk

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @zmk, sorry for the late response. I assume you're looking at https://github.com/gatk-workflows/gatk4-data-processing ?

    1. The list of ubams is just one filepath per line. You can find it here: https://console.cloud.google.com/storage/browser/gatk-test-data/wgs_ubam/NA12878_24RG

    2. Those are BWA index files; you can generate them with bwa index.

    3. "gotc" is short for "genomes on the cloud", the name of our internal project for genome processing. Currently the example json uses the docker container from that project.

    4. Those options control allocation of resources when run on cloud platforms; if you're running locally you don't need to specify them; you can either leave them as is (and they will be ignored) or you can remove them from the json.

    5. Yes you should be able to run it without docker, though you may need to adapt some of the paths to executables based on your local setup.

  • Angry_PandaAngry_Panda Member

    Dear GATK Team,

    I am using gatk4 with the same best practicehttps://github.com/gatk-workflows/gatk4-data-processing
    Thanks for the ubams list download source.

    I am a trying to run it with remote sever offered by my own company and I cannot use docker on it.
    I already successfully tried it in local with docker. but with the remote one. I am totally lost. There are so many docker thing in .json and .wdl file, such as "##_COMMENT5": "DOCKERS" in json file and runtime part in wdl file. I don't know how to change them.

    Do you have some teaching material or even better some script about this?
    Thanks for your help.

    BR, Angry_Panda

  • jgentryjgentry Member, Broadie, Dev ✭✭

    Hi @Angry_Panda - would you be able to use uDocker? If so, we can show you how to configure Cromwell to do this.

  • Angry_PandaAngry_Panda Member

    @jgentry said:
    Hi @Angry_Panda - would you be able to use uDocker? If so, we can show you how to configure Cromwell to do this.

    Oh @jgentry , I think I can use uDocker. Thanks very much and sorry for my late reply. I just noticed that my question got answered. Impressed by your people efficiency.
    Can you teach me how to do it?
    Thank you very much.

Sign In or Register to comment.