Error during bqsr step.

Hi there, got another problem when doing bqsr step.
It says my BAM is not indexed, but I've tried to run the bqsr step on terminal with the same files without any problems.
My code is below.
Any input is appreciated!

`workflow OUWES_singlecase {
File bwa
File picard
File gatk
File RefFasta
File Refdict
File RefIndex
File Refbwt
File Refsa
File Refpac
File Refamb
File Refann
File dbsnp
File genomeindels
File inputBAML1R1
File inputBAML1R2
File inputBAML2R1
File inputBAML2R2
File inputBAML3R1
File inputBAML3R2
File inputBAML4R1
File inputBAML4R2
String sampleName

Step.1 Alignment

call bwamem as L1{
input:
    bwa=bwa,
    RefFasta=RefFasta,
    RefIndex=RefIndex,
    Refbwt=Refbwt,
    Refsa=Refsa,
    Refpac=Refpac,
    Refamb=Refamb,
    Refann=Refann,
    inputBAM1=inputBAML1R1,
    inputBAM2=inputBAML1R2,
    lane="L1",
    sampleName=sampleName
    }

call bwamem as L2{
input:
    bwa=bwa,
    RefFasta=RefFasta,
    RefIndex=RefIndex,
    Refbwt=Refbwt,
    Refsa=Refsa,
    Refpac=Refpac,
    Refamb=Refamb,
    Refann=Refann,
    inputBAM1=inputBAML2R1,
    inputBAM2=inputBAML2R2,
    lane="L2",
    sampleName=sampleName
    }

call bwamem as L3{
input:
    bwa=bwa,
    RefFasta=RefFasta,
    RefIndex=RefIndex,
    Refbwt=Refbwt,
    Refsa=Refsa,
    Refpac=Refpac,
    Refamb=Refamb,
    Refann=Refann,
    inputBAM1=inputBAML3R1,
    inputBAM2=inputBAML3R2,
    lane="L3",
    sampleName=sampleName
    }

call bwamem as L4{
input:
    bwa=bwa,
    RefFasta=RefFasta,
    RefIndex=RefIndex,
    Refbwt=Refbwt,
    Refsa=Refsa,
    Refpac=Refpac,
    Refamb=Refamb,
    Refann=Refann,
    inputBAM1=inputBAML4R1,
    inputBAM2=inputBAML4R2,
    lane="L4",
    sampleName=sampleName
    }

Step2.Merge multiple lane data into one SAM

call mergesam {
    input:
        picard=picard,
        L1SAM=L1.rawsam,
        L2SAM=L2.rawsam,
        L3SAM=L3.rawsam,
        L4SAM=L4.rawsam,
        sampleName=sampleName
        }

Step3.Sort SAM by coordinate, convert to BAM

call sortSAMtoBAM {
    input:
        picard=picard,
        inputSAM=mergesam.merged_file,
        sampleName=sampleName
        }

Step4.Mark Duplicates

call markduplicate {
    input:
        picard=picard,
        inputBAM=sortSAMtoBAM.sortedbam,
        sampleName=sampleName
        }

Step5.build BAM index

call buildbamindex {
    input:
        picard=picard,
        inputBAM=markduplicate.dedupbam,
        sampleName=sampleName
        }

Step6.1BQSR_1

call bqsr_1 {
    input:
        gatk=gatk,
        inputBAM=markduplicate.dedupbam,
        RefFasta=RefFasta,
        knownSite1=dbsnp,
        knownSite2=genomeindels,
        sampleName=sampleName,
        Refdict=Refdict,
        RefIndex=RefIndex,
        bamindex=buildbamindex.bamindex
        }

}

task bqsr_1{
File gatk
File inputBAM
String sampleName
File knownSite1
File knownSite2
File RefFasta
File Refdict
File RefIndex
File bamindex

command{
    java -jar ${gatk} -T BaseRecalibrator \
    -R ${RefFasta} \
    -I ${inputBAM} \
    -knownSites ${knownSite1} \
    -knownSites ${knownSite2} \
    -o ${sampleName}_recal_data.table
    }
output{
    File bqsr1table = "${sampleName}_recal_data.table"
    }

}

task buildbamindex {
File picard
File inputBAM
String sampleName

command{
    java -jar ${picard} BuildBamIndex \
    I=${inputBAM} \
    O=${sampleName}_dedup.bam.bai
    }
output{
    File bamindex = "${sampleName}_dedup.bam.bai"
    }

}

task markduplicate {
File picard
File inputBAM
String sampleName

command{
    java -jar ${picard} MarkDuplicates \
    I=${inputBAM} \
    O=${sampleName}_dedup.bam \
    M=${sampleName}_metrics.txt
    }
output {
    File dedupbam = "${sampleName}_dedup.bam"
    }

}

task sortSAMtoBAM {
File picard
File inputSAM
String sampleName

command{
    java -jar ${picard} SortSam \
    I=${inputSAM} \
    O=${sampleName}_sorted.bam \
    SORT_ORDER=coordinate
    }
output {
    File sortedbam = "${sampleName}_sorted.bam"
    }

}

task bwamem {
File bwa
File RefFasta
File RefIndex
File Refbwt
File Refsa
File Refpac
File Refamb
File Refann
File inputBAM1
File inputBAM2
String lane
String sampleName
command {
${bwa} mem -M -R '@RG\tID:${sampleName}${lane}\tSM:${sampleName}\tLB:test\tPL:ILLUMINA' -t 8 ${RefFasta} ${inputBAM1} ${inputBAM2} > ${sampleName}${lane}.sam
}
output {
File rawsam = "${sampleName}_${lane}.sam"
}

}

task mergesam {
File picard
File L1SAM
File L2SAM
File L3SAM
File L4SAM
String sampleName
command {
java -jar ${picard} MergeSamFiles \
I=${L1SAM} \
I=${L2SAM} \
I=${L3SAM} \
I=${L4SAM} \
O=${sampleName}_merged.sam
}

output {
    File merged_file = "${sampleName}_merged.sam"
    }

}`

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/home/cytolab/Software/WDL/Sand/cromwell-executions/OUWES_singlecase/72680509-732b-45ed-b0ea-6467ef3dfeb2/call-bqsr_1/execution/tmp.kCZg6g
INFO 15:53:33,969 HelpFormatter - ----------------------------------------------------------------------------------
INFO 15:53:33,970 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50
INFO 15:53:33,971 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 15:53:33,971 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 15:53:33,971 HelpFormatter - [Mon Sep 25 15:53:33 CDT 2017] Executing on Linux 4.10.0-35-generic amd64
INFO 15:53:33,971 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11
INFO 15:53:33,973 HelpFormatter - Program Args: -T BaseRecalibrator -R /home/cytolab/Software/WDL/Sand/cromwell-executions/OUWES_singlecase/72680509-732b-45ed-b0ea-6467ef3dfeb2/call-bqsr_1/inputs/home/cytolab/Software/WDL/Sand/ucsc.hg19.fasta -I /home/cytolab/Software/WDL/Sand/cromwell-executions/OUWES_singlecase/72680509-732b-45ed-b0ea-6467ef3dfeb2/call-bqsr_1/inputs/home/cytolab/Software/WDL/Sand/cromwell-executions/OUWES_singlecase/72680509-732b-45ed-b0ea-6467ef3dfeb2/call-markduplicate/execution/170148DB1_dedup.bam -knownSites /home/cytolab/Software/WDL/Sand/cromwell-executions/OUWES_singlecase/72680509-732b-45ed-b0ea-6467ef3dfeb2/call-bqsr_1/inputs/home/cytolab/Software/GATK/reference/dbsnp_138.hg19.vcf -knownSites /home/cytolab/Software/WDL/Sand/cromwell-executions/OUWES_singlecase/72680509-732b-45ed-b0ea-6467ef3dfeb2/call-bqsr_1/inputs/home/cytolab/Software/GATK/reference/1000G_phase1.indels.hg19.sites.vcf -o 170148DB1_recal_data.table
INFO 15:53:33,974 HelpFormatter - Executing as [email protected] on Linux 4.10.0-35-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11.
INFO 15:53:33,975 HelpFormatter - Date/Time: 2017/09/25 15:53:33
INFO 15:53:33,975 HelpFormatter - ----------------------------------------------------------------------------------
INFO 15:53:33,975 HelpFormatter - ----------------------------------------------------------------------------------
ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/home/cytolab/Software/WDL/Sand/cromwell-executions/OUWES_singlecase/72680509-732b-45ed-b0ea-6467ef3dfeb2/call-bqsr_1/inputs/home/cytolab/Software/GATK/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
INFO 15:53:34,063 GenomeAnalysisEngine - Deflater: IntelDeflater
INFO 15:53:34,064 GenomeAnalysisEngine - Inflater: IntelInflater
INFO 15:53:34,064 GenomeAnalysisEngine - Strictness is SILENT
INFO 15:53:34,145 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 15:53:34,148 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 15:53:34,164 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
INFO 15:57:00,547 RMDTrackBuilder - Writing Tribble index to disk for file /home/cytolab/Software/WDL/Sand/cromwell-executions/OUWES_singlecase/72680509-732b-45ed-b0ea-6467ef3dfeb2/call-bqsr_1/inputs/home/cytolab/Software/GATK/reference/dbsnp_138.hg19.vcf.idx
INFO 15:58:03,182 RMDTrackBuilder - Writing Tribble index to disk for file /home/cytolab/Software/WDL/Sand/cromwell-executions/OUWES_singlecase/72680509-732b-45ed-b0ea-6467ef3dfeb2/call-bqsr_1/inputs/home/cytolab/Software/GATK/reference/1000G_phase1.indels.hg19.sites.vcf.idx
INFO 15:58:08,982 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Invalid command line: Cannot process the provided BAM/CRAM file(s) because they were not indexed. The GATK does offer limited processing of unindexed BAM/CRAMs in --unsafe mode, but this feature is unsupported -- use it at your own risk!
ERROR ------------------------------------------------------------------------------------------
Tagged:

Best Answers

  • RuchiRuchi admin
    Accepted Answer

    Hey @wenfu_li,

    The GATK error is claiming the input bam isn't indexed, but you have a task buildbamindex which creates the index. When the bam and bam index are localized for task bqsr_1, they exist in separate directories.
    When the the bam and bam index in bqsr_1 are saved in different directories as they are the outputs from two different tasks:

    • Bam index path: .../call-buildBamIndex/execution/${sampleName}_dedup.bam.bai
    • Bam path: .../call-markduplicate/execution/${sampleName}_dedup.bam1

    I believe you need the bam and bam index to be present in the same directory. You could try tweaking bqsr_1's command and linking the bam and bam index to be under the same dir:

    command{
    ln ${inputBam} input_bam
    ln ${bamindex} input_bam_index
    
        java -jar ${gatk} -T BaseRecalibrator \
        -R ${RefFasta} \
        -I input_bam \
        -knownSites ${knownSite1} \
        -knownSites ${knownSite2} \
        -o ${sampleName}_recal_data.table
        }
    output{
        File bqsr1table = "${sampleName}_recal_data.table"
        }
    

    Hope this works!

  • RuchiRuchi admin
    Accepted Answer

    hey @wenfu_li,

    So ln is a bash command to make soft-links, more docs on that here. Are you looking to find docs on best practices for writing tasks/workflows in WDL or docs for managing files using bash?

Answers

  • RuchiRuchi Member, Broadie, Moderator, Dev admin
    Accepted Answer

    Hey @wenfu_li,

    The GATK error is claiming the input bam isn't indexed, but you have a task buildbamindex which creates the index. When the bam and bam index are localized for task bqsr_1, they exist in separate directories.
    When the the bam and bam index in bqsr_1 are saved in different directories as they are the outputs from two different tasks:

    • Bam index path: .../call-buildBamIndex/execution/${sampleName}_dedup.bam.bai
    • Bam path: .../call-markduplicate/execution/${sampleName}_dedup.bam1

    I believe you need the bam and bam index to be present in the same directory. You could try tweaking bqsr_1's command and linking the bam and bam index to be under the same dir:

    command{
    ln ${inputBam} input_bam
    ln ${bamindex} input_bam_index
    
        java -jar ${gatk} -T BaseRecalibrator \
        -R ${RefFasta} \
        -I input_bam \
        -knownSites ${knownSite1} \
        -knownSites ${knownSite2} \
        -o ${sampleName}_recal_data.table
        }
    output{
        File bqsr1table = "${sampleName}_recal_data.table"
        }
    

    Hope this works!

  • Thanks @Ruchi, yes i use ln ${inputBam} input.bam ln ${bamindex} input.bam.bai and it looks working now.
    Btw is there a document i could read to know the command like this "ln" i could use?
    Thanks!

  • RuchiRuchi Member, Broadie, Moderator, Dev admin
    Accepted Answer

    hey @wenfu_li,

    So ln is a bash command to make soft-links, more docs on that here. Are you looking to find docs on best practices for writing tasks/workflows in WDL or docs for managing files using bash?

  • @Ruchi , Thanks for the link.
    I think i understand now it is bash command which also been using under terminal, never thought about that but it is very helpful.

Sign In or Register to comment.