Two Arrays as input - scatter over one, use all of the other in one call

Hi,

I have 2 Arrays: samples and contigs. I used Haplotypecaller to call all samples and SelectVariants to split my g.vcfs by contig. Now I want to use combineGVCF to combine the g.vcfs of each contig. The Problem is that I don't understand how I can work with more than one array at the same time. My idea was:

scatter (contig in contigs)
        call mergeGVCF {
            input:
                samples = samples,
                contig = contig,
                reference = reference
        }
}

task mergeCalls {
    Array[String] samples
    String contig
    String reference

    command <<<

        /opt/gatk/4.0.4.0/gatk CombineGVCFs \
            -R ${reference} \
            -V ${sep="_${contig}.g.vcf -V " samples}_${contig}.g.vcf \
            -O ${contig}_merged.g.vcf

    >>>

The -V line should expand to something like this:
-V S1_1.g.vcf -V S2_1.g.vcf -V S3_1.g.vcf

WDL seems unable to work with variables inside the "sep=" part. Is there another way to do it?

Best regards and Thanks,
Daniel

Best Answer

Answers

  • dbeckerdbecker MunichMember

    Thanks @EADG,

    I'm looking forward to see how my performance will change. Is there a manual somewhere about such non-wdl solutions. I had a hard time figuring out where to put { } for example. The variable I introduce in the bash loop seems to be not allowed to have those brackets. This makes string concatinations quite difficult.
    Or even better... Will there be a wdl way for operations like these in the future?

    Best,
    Daniel

  • EADGEADG KielMember

    Hi @dbecker,

    you are welcome. Sry I only wrote a quick answer without testing the solution by my self. If you change the code like followed it should work.

    Array[String] samples
    String contig
    command<<<
    variantString=""
    contigString=${contig}
            for sample in ${sep=' ' samples}  ; do
               variant="-V $sample_$contigString.g.vcf "
               variantString=$variantString$variant
            done
    /opt/gatk/4.0.4.0/gatk CombineGVCFs \
                -R ${reference} \
                 $variantString \
                -O ${contig}_merged.g.vcf
        >>>
    

    The reason why the {} not work in the bash-loop have something to do how wdl/cromewell is processing the scripts.

    I sure that you will have none or at least a little performance issue with this solution.

    For the other things, @ChrisL might say some words ;)

    Greets EADG

Sign In or Register to comment.