BSQR on RNA-seq data : error to find reference index file

Hi,
I use GATK 4.0.3 to discover variant in SeaBass. When I use BaseRecalibrator alone with one file, it did the job. But when I use it in a wdl script using scatter-gather BaseRecalibrator don't found the reference fasta index file in the folder. But in this folder there is the fasta file and the fasta index file. My script is adapted from the RNA-seq GATK best practices (https://software.broadinstitute.org/gatk/documentation/article?id=4067).
Do you have any idea why it doesn't work when it is include in a wdl script using scatter-gather?
As regards,
EliseG

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @EliseG
    Hi EliseG,

    Can you post your inputs json and WDL?

    Thanks,
    Sheila

  • EliseGEliseG Member

    Hi Sheila,
    Of course.
    This my json file :
    {
    "DataCleanupGATK.refFasta": "/home/egueret/Stage_UM_ISEM/Donnees_CRECHE/ref/labrax.fasta",
    "DataCleanupGATK.variationSites": "/home/egueret/Stage_UM_ISEM/Donnees_CRECHE/inputs/Final_list_57907_SNPs.recode.vcf",
    "DataCleanupGATK.inputSamplesFile": "/home/egueret/Stage_UM_ISEM/Donnees_CRECHE/inputs/inputTSV_gatk.txt",
    "DataCleanupGATK.refIndex": "/home/egueret/Stage_UM_ISEM/Donnees_CRECHE/ref/labrax.fasta.fai",
    "DataCleanupGATK.refDict": "/home/egueret/Stage_UM_ISEM/Donnees_CRECHE/ref/labrax.dict",
    "DataCleanupGATK.gatk": "/home/egueret/tools/gatk-4.0.3.0/gatk.jar"
    }

    and this is my wdl :

    Array[Array[File]] inputSamples = read_tsv(inputSamplesFile)

    workflow DataCleanupGATK {

    File gatk
    File inputSamplesFile
    Array[Array[File]] inputSamples = read_tsv(inputSamplesFile)
    File refFasta
    File refIndex
    File refDict
    File variationSites

    scatter (sample in inputSamples) {
    call BaseRecalibrator {
    input:
    sampleName=sample[0],
    RefFasta=refFasta,
    GATK=gatk,
    VariationSites=variationSites,
    BamSorteds=sample[1],
    BamIndex=sample[2]

    }
    call ApplyBQSR {
      input: 
        sampleName=sample[0], 
        RefFasta=refFasta, 
        BaseRecals=BaseRecalibrator.BaseRecal,
        BamSorteds=sample[1],
        BamIndex=sample[2],
        GATK=gatk
    
    }
    call AnalyseCovariate {
        input:
          sampleName=sample[0],
        BaseRecals=BaseRecalibrator.BaseRecal,
          GATK=gatk
    }
    

    }
    }

    task BaseRecalibrator {
    File GATK
    File RefFasta
    String sampleName
    Array[File] BamSorteds
    Array[File] BamIndex
    File VariationSites
    command {
    java -jar ${GATK} \
    BaseRecalibrator \
    -I ${sep="-I" BamSorteds} \
    -R ${RefFasta} \
    -OBI true \
    --known-sites ${VariationSites} \
    -O ${sampleName}_marked_duplicates_sorted_recal_data.table
    }
    output {
    File BaseRecal = "${sampleName}_marked_duplicates_sorted_recal_data.table"
    }
    }

    task ApplyBQSR {
    File GATK
    File RefFasta
    String sampleName
    Array[File] BamSorteds
    Array[File] BamIndex
    Array[File] BaseRecals
    command {
    java -jar ${GATK} \
    ApplyBQSR \
    -R ${RefFasta \
    -I ${sep="-I" BamSorteds} \
    --bqsr-recal-file ${sep="--bqsr-recal-file" BaseRecals} \
    -O ${sampleName}_marked_duplicates_sorted_recalibrated.bam
    }
    output {
    File BamRecal = "${sampleName}_marked_duplicates_sorted_recalibrated.bam"
    }
    }

    task AnalyseCovariate {
    File GATK
    Array[File] BaseRecals
    String sampleName
    command {
    java -jar ${GATK} \
    AnalyzeCovariates \
    -bqsr ${sep="-bqsr" BaseRecals} \
    -plots ${sampleName}_recalibration.pdf \
    -csv ${sampleName}_recalibration.csv
    }
    output {
    File Plot = "${sampleName}_recalibration.pdf"
    File Csv = "${sampleName}_recalibration.csv"
    }
    }

  • EliseGEliseG Member

    It's not really readable, sorry.

    Elise

  • EliseGEliseG Member

    Hi,
    That's work !
    Thank you very much @shlee and @jsoto for your help.
    As regards,
    EliseG

Sign In or Register to comment.