Attention:
The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

Two Arrays as input - scatter over one, use all of the other in one call

Hi,

I have 2 Arrays: samples and contigs. I used Haplotypecaller to call all samples and SelectVariants to split my g.vcfs by contig. Now I want to use combineGVCF to combine the g.vcfs of each contig. The Problem is that I don't understand how I can work with more than one array at the same time. My idea was:

scatter (contig in contigs)
        call mergeGVCF {
            input:
                samples = samples,
                contig = contig,
                reference = reference
        }
}

task mergeCalls {
    Array[String] samples
    String contig
    String reference

    command <<<

        /opt/gatk/4.0.4.0/gatk CombineGVCFs \
            -R ${reference} \
            -V ${sep="_${contig}.g.vcf -V " samples}_${contig}.g.vcf \
            -O ${contig}_merged.g.vcf

    >>>

The -V line should expand to something like this:
-V S1_1.g.vcf -V S2_1.g.vcf -V S3_1.g.vcf

WDL seems unable to work with variables inside the "sep=" part. Is there another way to do it?

Best regards and Thanks,
Daniel

Best Answer

Answers

  • dbeckerdbecker MunichMember ✭✭

    Thanks @EADG,

    I'm looking forward to see how my performance will change. Is there a manual somewhere about such non-wdl solutions. I had a hard time figuring out where to put { } for example. The variable I introduce in the bash loop seems to be not allowed to have those brackets. This makes string concatinations quite difficult.
    Or even better... Will there be a wdl way for operations like these in the future?

    Best,
    Daniel

  • EADGEADG KielMember ✭✭✭

    Hi @dbecker,

    you are welcome. Sry I only wrote a quick answer without testing the solution by my self. If you change the code like followed it should work.

    Array[String] samples
    String contig
    command<<<
    variantString=""
    contigString=${contig}
            for sample in ${sep=' ' samples}  ; do
               variant="-V $sample_$contigString.g.vcf "
               variantString=$variantString$variant
            done
    /opt/gatk/4.0.4.0/gatk CombineGVCFs \
                -R ${reference} \
                 $variantString \
                -O ${contig}_merged.g.vcf
        >>>
    

    The reason why the {} not work in the bash-loop have something to do how wdl/cromewell is processing the scripts.

    I sure that you will have none or at least a little performance issue with this solution.

    For the other things, @ChrisL might say some words ;)

    Greets EADG

Sign In or Register to comment.