GATK licensing moves to direct-through-Broad model -- read about it on the GATK blog

No plots generated by the BaseRecalibrator walker

alexpensonalexpenson Posts: 4Member

I cannot produce BQSR plots, although I can open the grp file with gsa.read.gatkreport.

Here's the command:

java -Xmx1g -jar $shares/GenomeAnalysisTK-2.3-6-gebbba25/GenomeAnalysisTK.jar \
-T BaseRecalibrator \
-I ./0.reorder.bam \
-R $shares/ftp.broadinstitute.org/bundle/2.3/hg19/ucsc.hg19.fasta \
-knownSites $shares/ftp.broadinstitute.org/bundle/2.3/hg19/dbsnp_137.hg19.vcf \
-BQSR ./0.reorder.bam.recal.grp \
-o ./0.reorder.bam.post_recal.grp \
--plot_pdf_file ./0.reorder.bam.post_recal.grp.pdf \
-L chr1:1-1000 \
-l DEBUG \
--intermediate_csv_file ./0.reorder.bam.post_recal.grp.csv

##### ERROR stack trace 
java.lang.NullPointerException
        at org.broadinstitute.sting.utils.Utils.join(Utils.java:286)
        at org.broadinstitute.sting.utils.recalibration.RecalUtils.writeCSV(RecalUtils.java:450)
        at org.broadinstitute.sting.utils.recalibration.RecalUtils.generateRecalibrationPlot(RecalUtils.java:394)
        at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.generatePlots(BaseRecalibrator.java:474)
        at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.onTraversalDone(BaseRecalibrator.java:464)
        at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.onTraversalDone(BaseRecalibrator.java:112)
        at org.broadinstitute.sting.gatk.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
        at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:97)
        at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281)
        at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147)
        at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

It looks like the csv file is not being produced.

Thanks!

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,836Administrator, GATK Dev admin
    Answer ✓

    Alright, what's failing is the method that creates the data lines to be written to the csv file. It's an extremely simple operation so there's got to be something wrong with your data. Considering the interval that you're running on is extremely short, there may not even be any valid data at all in the part of the table that fails to write. You should try running again from the first step of recalibration with a much longer interval, eg 20:10000000-20000000.

    Geraldine Van der Auwera, PhD

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,836Administrator, GATK Dev admin

    All the data needed for the plots should be in the gatkreport; what do you see in that file? Have you tried running without specifying the intermediate csv file?

    Geraldine Van der Auwera, PhD

  • alexpensonalexpenson Posts: 4Member

    The grp file looks good. I tried to use BQSR.R to make the plot, but it seems to require a csv file.
    If I specify the intermediate csv file, then it contains only the header line. If not, then it is not produced.
    Thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,836Administrator, GATK Dev admin

    What error do you get when you try using BQSR.R to make the plot?

    Geraldine Van der Auwera, PhD

  • alexpensonalexpenson Posts: 4Member

    After making some changes for ggplot2 v0.9.3:
    opts( -> theme( and theme_ -> element_

    I get:

    Error in distributeGraphRows(list(a, b, c), c(1, 1, 1)) :
      object 'a' not found

    It looks like a, b and c are created based on the lines in the csv file, and I don't have any.

    Thanks
    Alex

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,836Administrator, GATK Dev admin

    OK, that makes sense. Can you please run your original command again with -l DEBUG but without specifying the intermediate csv file filename, then post the console output? I need to know if with the internal default, the csv file gets created properly for you and if not, at what point it fails exactly.

    Geraldine Van der Auwera, PhD

  • alexpensonalexpenson Posts: 4Member

    Here is the output for the command above minus --intermediate_csv_file ./0.reorder.bam.post_recal.grp.csv
    Thanks!

    txt
    txt
    BQSR_output.txt
    97K
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,836Administrator, GATK Dev admin
    Answer ✓

    Alright, what's failing is the method that creates the data lines to be written to the csv file. It's an extremely simple operation so there's got to be something wrong with your data. Considering the interval that you're running on is extremely short, there may not even be any valid data at all in the part of the table that fails to write. You should try running again from the first step of recalibration with a much longer interval, eg 20:10000000-20000000.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.