To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

How do I run gnu parallel in wdl?

I need to run this in wdl:

command {
    seq 0 4 | \
    parallel --eta --halt 2 \
    python /home/bin/make_examples.zip \
    --mode calling \
    --ref ${ReferenceFasta} \
    --reads ${InputBam} \
    --examples ${Examples}.tfrecord@4.gz \
    --regions "1:1-90,010,000" \
    --task {}
}
output {
    File ExamplesOutput1 = "${Examples}.tfrecord1.gz"
    File ExamplesOutput2 = "${Examples}.tfrecord2.gz"
    File ExamplesOutput3 = "${Examples}.tfrecord3.gz"
    File ExamplesOutput4 = "${Examples}.tfrecord4.gz"
}

Without parallelization it would be run like this, skipping the wdl part:
python bin/make_examples.zip --mode calling --ref reference.fasta --reads input.bam --examples output.tfrecord.gz"

And the problem, from what I can understand is the {} in "--task {}", the error message is
ERROR: Finished parsing without consuming all tokens.
output {
^

Is is possible to reconcile this conflict between parallel and wdl?

Best Answers

  • oskarvoskarv BergenMember
    Accepted Answer

    It worked, here's the updated code in case anyone else has the same issue:

    command<<<
    bash <<CODE
    seq 0 3 | \
    parallel --eta --halt 2 \
    python /home/bin/make_examples.zip \
    --mode calling \
    --ref ${ReferenceFasta} \
    --reads ${InputBam} \
    --examples ${Examples}.tfrecord@4.gz \
    --regions "1:1-90,010,000" \
    --task {}
    CODE
    >>>
    output {
    Array[File] ExamplesOutput1 = glob("${Examples}.tfrecord-*.gz")
    }
    

Answers

  • oskarvoskarv BergenMember
    Accepted Answer

    It worked, here's the updated code in case anyone else has the same issue:

    command<<<
    bash <<CODE
    seq 0 3 | \
    parallel --eta --halt 2 \
    python /home/bin/make_examples.zip \
    --mode calling \
    --ref ${ReferenceFasta} \
    --reads ${InputBam} \
    --examples ${Examples}.tfrecord@4.gz \
    --regions "1:1-90,010,000" \
    --task {}
    CODE
    >>>
    output {
    Array[File] ExamplesOutput1 = glob("${Examples}.tfrecord-*.gz")
    }
    
Sign In or Register to comment.