Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

No plots generated by the BaseRecalibrator walker

PeteHaitchPeteHaitch Posts: 19Member
edited August 2012 in Ask the GATK team

I am using GATK v2 (GenomeAnalysisTK-2.0-0-g4c0ffd4) and was trying out the new BaseRecalibrator walker. According to this post the BaseRecalibrator should output "A PDF file containing quality control plots showing the patterns of recalibration of the data", however I do not have any such file. Both the BaseRecalibrator and PrintReads steps of the BQSR pipeline appear to have worked as I have a recalibrated BAM file and the accompanying GATKReport but I would like to be able to view plots of the recalibration process (and preferably have these generated automatically by the recalibration pipeline).

Can you please help? Thanks

Post edited by PeteHaitch on

Best Answer

  • rpoplinrpoplin Posts: 121 mod
    Answer ✓

    The BQSR.R script should be embedded inside the jar that was downloaded from that link. It gets pulled out and called with Rscript from within the BaseRecalibration walker during runtime. In order to call the Rscript directly, like in the command line I posted, you have to download the source from GitHub to get the source of the R script.

Answers

  • rpoplinrpoplin Posts: 121GATK Developer mod

    Hi there,

    The most common cause of this issue is not having Rscript in the environment path. Can you run "which Rscript" and see what it says. If that looks fine then there may be a warning message in the log output of the BaseRecalibrator run that might help us to debug.

    Cheers,

  • PeteHaitchPeteHaitch Posts: 19Member
    edited October 2012

    I can confirm that Rscript is in my path. I'll need to re-run BaseRecalibrator in order to get the log as I foolishly didn't save it to file. I'll post that once I have done so.

    Cheers

    unix303 507 % echo $PATH
    /usr/local/bioinf/bin:/cluster/home/users/lab0605/bioinformatics/bahlolab_db/annovar:/usr/local/bioinf/bin:/usr/local/bioinfsoftware/root/current/bin:/usr/local/bioinfsoftware/polyphen/current/bin/:/usr/local/bioinfsoftware/perl/current/bin/:/usr/local/bioinfsoftware/DiversityEstimates/current/bin:/usr/local/bioinfsoftware/DiversityEstimates/current/Scripts:/usr/local/bioinfsoftware/AmpliconNoise/current/bin:/usr/local/bioinfsoftware/AmpliconNoise/current/Scripts:/usr/lib64/qt-3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/users/lab0605/hickey/../bin/Linux:/home/users/lab0605/hickey/../bin:/usr/local/bioinfsoftware/vaast/current/bin:/usr/local/bioinfsoftware/vaast/current/bin/vaast_tools
    
    unix303 506 % which Rscript
    /usr/local/bioinf/bin/Rscript
    
    Post edited by Geraldine_VdAuwera on
  • ebanksebanks Posts: 678GATK Developer mod

    Note that you can just run the tool on a tiny interval (e.g. 1:1-1000) if you want to regenerate the error without waiting for it to rerun over the whole genome.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • PeteHaitchPeteHaitch Posts: 19Member

    I've rerun BaseRecalibrator with the `-l DEBUG' flag. Here is (I think) the relevant output:

    INFO  08:20:48,388 BaseRecalibrator - Generating recalibration plots... 
    DEBUG 08:20:48,827 RScriptExecutor - Executing: 
    DEBUG 08:20:48,827 RScriptExecutor -   Rscript 
    DEBUG 08:20:48,827 RScriptExecutor -   -e 
    DEBUG 08:20:48,828 RScriptExecutor -   tempLibDir = '/tmp/Rlib.2011861167067641148';source('/tmp/BQSR.1384038019643928030.R'); 
    DEBUG 08:20:48,829 RScriptExecutor -   /export/share/disk501/lab0605/hickey/Philippe_Bouillet/IGL00607-613/BAM/374_recal_data.grp.csv 
    DEBUG 08:20:48,829 RScriptExecutor -   /export/share/disk501/lab0605/hickey/Philippe_Bouillet/IGL00607-613/BAM/374_recal_data.grp.csv.pdf 
    Loading required package: methods
    Error in continuous_scale(c("y", "ymin", "ymax", "yend", "yintercept",  : 
      unused argument(s) (formatter = "comma")
    Calls: source ... eval -> eval -> scale_y_continuous -> continuous_scale
    In addition: Warning messages:
    1: NAs introduced by coercion 
    2: NAs introduced by coercion 
    Execution halted
    DEBUG 08:20:53,553 RScriptExecutor - Result: 1 
    WARN  08:20:53,553 RScriptExecutor - RScript exited with 1

    I can send the full log file if required.

    Cheers

  • rpoplinrpoplin Posts: 121GATK Developer mod

    Hi there,

    It looks like the data being sent to the plotting script is somehow malformed. Can you post the output of this file:

    /export/share/disk501/lab0605/hickey/Philippe_Bouillet/IGL00607-613/BAM/374_recal_data.grp.csv

    Even the first 20 lines or so would be helpful to debug.

    Out of curiosity did you happen to run with Eric's trick of -L 1:1-1000 or is this error also from running over a larger region?

    Thanks!

  • PeteHaitchPeteHaitch Posts: 19Member
    edited August 2012

    Hi Ryan,

    Hmm, the plot thickens. I re-ran BaseRecalibrator and still did not get a pdf plot, however the error message changed. Both attempts to use the BaseRecalibrator were using the entire BAM (i.e. not using Eric's trick).

    Here is my command line:

    gatk -T BaseRecalibrator -I realigned_374.bam -R /export/share/disk501/lab0605/Bioinformatics/databases/bahlolab_db/mm10/mm10.fasta --knownSites /export/share/disk501/lab0605/Bioinformatics/databases/bahlolab_db/mm10/annot/20110602-final-snps.vcf.gz -o 374_recal_data.grp -l DEBUG -log BQSR_374.out -k

    I'm not able to attach the csv file to my post ("Uploaded file type is not allowed"). Instead I've uploaded it to Dropbox https://www.dropbox.com/s/ombr1wgosksiuz2/374_recal_data.grp.csv

    The full log file is here as well https://www.dropbox.com/s/51l595pqnmy6k6c/BQSR_374.out

    Also, the -k for BaseRecalibrator is not a documented at the command line, although it is mentioned at http://www.broadinstitute.org/gatk/guide/topic?name=methods-and-workflows

    Cheers

    Post edited by rpoplin on
  • rpoplinrpoplin Posts: 121GATK Developer mod
    edited August 2012

    The input csv file looks fine and it doesn't look like there were any errors reported this time in your log file:

    INFO  16:21:57,814 BaseRecalibrator - Generating recalibration plots... 
    DEBUG 16:21:58,348 RScriptExecutor - Executing: 
    DEBUG 16:21:58,349 RScriptExecutor -   Rscript 
    DEBUG 16:21:58,349 RScriptExecutor -   -e 
    DEBUG 16:21:58,350 RScriptExecutor -   tempLibDir = '/tmp/Rlib.1049277187388468345';source('/tmp/BQSR.3375557281675659007.R'); 
    DEBUG 16:21:58,350 RScriptExecutor -   /export/share/disk501/lab0605/hickey/Philippe_Bouillet/IGL00607-613/BAM/374_recal_data.grp.csv 
    DEBUG 16:21:58,351 RScriptExecutor -   /export/share/disk501/lab0605/hickey/Philippe_Bouillet/IGL00607-613/BAM/374_recal_data.grp.csv.pdf 
    DEBUG 16:22:02,286 RScriptExecutor - Result: 1 
    WARN  16:22:02,287 RScriptExecutor - RScript exited with 1

    The pdf file should have showed up at /export/share/disk501/lab0605/hickey/Philippe_Bouillet/IGL00607-613/BAM/374_recal_data.grp.csv.pdf

    I think the problem must lie in some difference in the R setup between our systems. What if you execute the Rscript directly with this command:

    Rscript $GATK/public/R/scripts/org/broadinstitute/sting/gatk/walkers/bqsr/BQSR.R /export/share/disk501/lab0605/hickey/Philippe_Bouillet/IGL00607-613/BAM/374_recal_data.grp.csv /export/share/disk501/lab0605/hickey/Philippe_Bouillet/IGL00607-613/BAM/374_recal_data.grp.csv.pdf

    Can you please post any command line output from that. When I run that command here locally using the csv you posted on dropbox I get the pdf (attached).

    Thanks,

    pdf
    pdf
    374_recal_data.grp.csv.pdf
    442K
    Post edited by rpoplin on
  • PeteHaitchPeteHaitch Posts: 19Member

    Thanks for the pdf, Ryan.

    I think I've figured out the problem. I am using GATK v2.0 (Beta) (v2.0-39) downloaded from http://www.broadinstitute.org/gatk/download and this does not include the BQSR.R script. As far as I can tell, BQSR.R is only included in GATK-lite source that is available via GitHub.

    I'm a little confused as to why BQSR.R is only available in the GitHub source and not in the directly downloadable GATK v2.0. Can you please explain whether I am overlooking something or the reason for this?

    Thanks

  • PeteHaitchPeteHaitch Posts: 19Member
    edited August 2012

    I've found the problem and it's due to changes in ggplot2 for version >=0.9 and a missing grid package in R. Specifically, in BQSR.R the following is called:

    scale_y_continuous(formatter="comma")

    This will work for version < 0.9 of ggplot2 but not for versions >= 0.9; the current version of ggplot2 on CRAN is 0.9.1 and is the version I have installed. Full details of changes in ggplot2 for v >= 0.9 can be found at cloud.github.com/downloads/hadley/ggplot2/guide-col.pdf.

    The required code for v >= 0.9 of ggplot2 is as follows:

    # Preamble
    library(scales) # This package is required in order to use comma_format(). It contains some of the functionality previously in ggplot2.
    library(grid) # Required to use grid.layout (at least in R v2.15.1)
    # ...
    # scale_y_continuous(formatter="comma") replaced by scale_y_continuous(labels = comma_format()) in the following line:
    d <- p + geom_histogram(aes(fill=Recalibration,weight=Observations),alpha=0.6,binwidth=1,position="identity") + scale_fill_manual(values=c("maroon1","blue")) + facet_grid(.~EventType) + scale_y_continuous(labels = comma_format())
    

    With those alterations made to BQSR.R I can produce the pdf figure manually.

    Post edited by PeteHaitch on
  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    Thanks! I pushed a fix for this -- along with an expanded BQSR.R script -- to unstable, and it'll come along with the upcoming 2.1 release on Monday

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • PeteHaitchPeteHaitch Posts: 19Member

    No problem, Mark. One further improvement would be in the labelling of the x-axis tick marks for the "context covariate plots" - the three base suffixes are basically illegible in the plots I've seen.

  • meharmehar Posts: 33Member

    @Mark_DePristo said: Thanks! I pushed a fix for this -- along with an expanded BQSR.R script -- to unstable, and it'll come along with the upcoming 2.1 release on Monday

    Hi,

    I am using "GenomeAnalysisTK-2.1-12-ga99c19d" version and still not able to produce plots. Below is the command i have used,

    "java -jar $GATK_dir/GenomeAnalysisTK.jar -T BaseRecalibrator -I duprem.realigned.fixed.bam -R $Reference -knownSites $dbSNP_vcf/canFam2_snp131.vcf -o recal_data.grp"

    Any suugestions!!!

  • ivernarivernar Posts: 4Member

    Hi there,

    I am also trying to get the pdf with the plots... I am using the version: v2.1-8-g5efb575 I did both step of the BaseRecalibrator and the PrintReads, I get a recalibrated BAM file and the gatkReport, but the "A PDF file containing quality control plots showing the patterns of recalibration of the data" is missing...

    Can you please help?

    The commands I used:

    java -Xmx4g -jar /home/matteo/Desktop/programs/pipelines/gatk/GenomeAnalysisTK-2.1/GenomeAnalysisTK.jar -T BaseRecalibrator -R ncbi_S288c.fasta -I output/$1.realigned.bam -o output/$1.recal_data.grp

    java -Xmx4g -jar /home/matteo/Desktop/programs/pipelines/gatk/GenomeAnalysisTK-2.1/GenomeAnalysisTK.jar -T BaseRecalibrator -R ncbi_S288c.fasta -I output/$1.realigned.bam -BQSR output/$1.recal_data.grp -o output/$1.grp

    java -Xmx4g -jar /home/matteo/Desktop/programs/pipelines/gatk/GenomeAnalysisTK-2.1/GenomeAnalysisTK.jar -T PrintReads -R ncbi_S288c.fasta -I output/$1.realigned.bam -BQSR output/$1.grp -o output/$1.bam

  • ivernarivernar Posts: 4Member

    I forgot to say that the BQSR.R is present embedded inside the jar

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    @ivernar and @mehar, did you see any message at all in the stack trace about the Rscript failing, or is it just silently not generating the plots?

    Geraldine Van der Auwera, PhD

  • ivernarivernar Posts: 4Member

    Hi, I didnt see any message about the Rscript, I will run it with debug option and I'll let you know...

  • ivernarivernar Posts: 4Member

    sorry, I found it: ScriptExecutor - RScript exited with 1. Run with -l DEBUG for more info. so, I will run it with debug option and I'll let you know...

  • PeteHaitchPeteHaitch Posts: 19Member

    @ivernar - This sounds like the same error I encountered. I'm still unable to produce the BQSR plots automatically despite changes made from v2.0 to v2.1. My fix is to add the -k flag to my call to BaseRecalibrator (to retain the temporary csv file) and then run a modified version of the BQSR.r script over the csv file. I require a modified version of BQSR.r due to differences between my R configuration and the configuration/versions assumed by GATK.

    Here's an example call:

    GATK -T BaseRecalibrator -I ${SAMPLE_ID}.realigned.bam -R ${REF} -knownSites ${SNPS_VCF} -o ${SAMPLE_ID}.recal_data.grp -log ${SAMPLE_ID}.BQSR.log 
    Rscript BQSR_modified_by_PH.r ${SAMPLE_ID}_recal_data.grp.csv ${SAMPLE_ID}_recal_data.grp.csv.pdf

    I can't attach my BQSR_modified_by_PH.r script, however the changes are documented in my earlier comment.

  • meharmehar Posts: 33Member
    edited November 2012

    HI Geraldine_VdAuwera,

    Several packages were unavailable initially and i have installed them manually. The latest error is" there is no package called 'gsalib". I tried to install 'gsalib' but could not find the source for R-2.15.1. Any fix for this?

    Here are the few suspicious line from the error report of the BaseRecalibrator component,

    perl: warning: Falling back to the standard locale ("C").
    gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
    
    gdata: Unable to load perl libaries needed by read.xls()
    gdata: to support 'XLSX' (Excel 2007+) files.
    
    gdata: Run the function 'installXLSXsupport()'
    gdata: to automatically download and install the perl
    gdata: libaries needed to support Excel XLS and XLSX formats.
    
    Attaching package: 'gdata'
    
    The following object(s) are masked from 'package:stats':
    
        nobs
    
    The following object(s) are masked from 'package:utils':
    
        object.size
    
    Loading required package: caTools
    Loading required package: bitops
    Loading required package: grid
    Loading required package: KernSmooth
    KernSmooth 2.23 loaded
    Copyright M. P. Wand 1997-2009
    Loading required package: MASS
    
    Attaching package: 'gplots'
    
    The following object(s) are masked from 'package:stats':
    
        lowess
    
    Loading required package: plyr
    
    Attaching package: 'reshape'
    
    The following object(s) are masked from 'package:plyr':
    
        rename, round_any
    
    Error in library(gsalib) : there is no package called 'gsalib'
    Calls: source -> withVisible -> eval -> eval -> library
    Execution halted.
    
    Post edited by Geraldine_VdAuwera on
  • meharmehar Posts: 33Member
    edited November 2012

    Geraldine_VdAuwera

    @Geraldine_VdAuwera said: ivernar and mehar, did you see any message at all in the stack trace about the Rscript failing, or is it just silently not generating the plots? HI Geraldine_VdAuwera,

    Several packages were unavailable initially and i have installed them manually. The latest error is" there is no package called 'gsalib". I tried to install 'gsalib' but could not find the source for R-2.15.1. Any fix for this?

    Here are the few suspicious line from the error report of the BaseRecalibrator component,

    perl: warning: Falling back to the standard locale ("C"). gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
    
    gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files.
    
    gdata: Run the function 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats.
    
    Attaching package: 'gdata'
    
    The following object(s) are masked from 'package:stats':
    
    nobs
    The following object(s) are masked from 'package:utils':
    
    object.size
    Loading required package: caTools Loading required package: bitops Loading required package: grid Loading required package: KernSmooth KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 Loading required package: MASS
    
    Attaching package: 'gplots'
    
    The following object(s) are masked from 'package:stats':
    
    lowess
    Loading required package: plyr
    
    Attaching package: 'reshape'
    
    The following object(s) are masked from 'package:plyr':
    
    rename, round_any
    Error in library(gsalib) : there is no package called 'gsalib' Calls: source -> withVisible -> eval -> eval -> library Execution halted.
    
    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    We're not sure why some of you are still having trouble getting the rscripts to work automatically. Unfortunately it seems to be something that varies with your R environment config, which I can't help you with.

    However I can tell you that if you're trying to call gsalib independently of the GATK, you'll need to download the GATK source from github (see the Downloads page) and get gsalib from there. See this article for more details:

    http://gatkforums.broadinstitute.org/discussion/1244

    Geraldine Van der Auwera, PhD

  • mlcunhamlcunha Posts: 8Member
    edited November 2012

    Hi! I am running the GATK v2.2-3 and I get my GATK Report but I am not able to get any pdf file from the BaseRecalibrator walker. I've also rerun BaseRecalibrator with the `-l DEBUG' flag. Here is the output that I think is relevant (are there duplicate row names?!):

    Loading required package: methods
    Loading required package: gtools
    Loading required package: gdata
    gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
    
    gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
    
    Attaching package: 'gdata'
    
    The following object is masked from 'package:stats':
    
        nobs
    
    The following object is masked from 'package:utils':
    
        object.size
    
    Loading required package: caTools
    Loading required package: bitops
    Loading required package: grid
    Loading required package: KernSmooth
    KernSmooth 2.23 loaded
    Copyright M. P. Wand 1997-2009
    Loading required package: MASS
    
    Attaching package: 'gplots'
    
    The following object is masked from 'package:stats':
    
        lowess
    
    Loading required package: plyr
    
    Attaching package: 'reshape'
    
    The following object is masked from 'package:plyr':
    
        rename, round_any
    
    Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
      duplicate 'row.names' are not allowed
    Calls: source ... withVisible -> eval -> eval -> read.csv -> read.table
    Execution halted
    

    Here is my command line (this is when I rerun BaseRecalibrator to get the pdf with the the original qualities along with the recalibrated qualities):

    java -jar GenomeAnalysisTK.jar -I mySample.bam -BQSR mySample_first.grp  -R refHuman/human_g1k_v37.fasta -T BaseRecalibrator -o mySample_second.grp --plot_pdf_file mySample_first_second.grp.pdf  -knownSites:dbSNP,VCF dbsnp_137.b37.vcf -nct 12 -l DEBUG
    

    Do you think is there a problem with my Bam file? When I validate my file using the Picard tools (ValidateSamFile.jar) I got only the following output:

    ERROR: Record 44764671, Read name FCD0HJ5ACXX:4:1103:18118:188026#, MAPQ should be 0 for unmapped read.
    ERROR: Record 44990092, Read name FCD0HJ5ACXX:4:2204:12662:139015#, MAPQ should be 0 for unmapped read.
    ERROR: Record 45463864, Read name FCD0HJ5ACXX:4:1107:8307:101631#, MAPQ should be 0 for unmapped read.
    

    Do you think this is causing the problem? I have no idea what to do to get the pdf files since in the current version I cannot use the -k flag to get the csv file.

    Thanks for any help!

    Post edited by Geraldine_VdAuwera on
  • ebanksebanks Posts: 678GATK Developer mod

    You need to enable the creation of the pdf - it isn't done by default anymore. See the documentation for more details.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • mlcunhamlcunha Posts: 8Member

    Thanks for the quick answer. It means that I shall add to my command line the flag --emit_original_quals true. Is it correct?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin
  • mlcunhamlcunha Posts: 8Member
    edited November 2012

    But that is already in my command line. I posted it before but I will copy here again:

    java -jar GenomeAnalysisTK.jar -I mySample.bam -BQSR mySample_first.grp -R refHuman/human_g1k_v37.fasta -T BaseRecalibrator -o mySample_second.grp --plot_pdf_file mySample_first_second.grp.pdf -knownSites:dbSNP,VCF dbsnp_137.b37.vcf -nct 12 -l DEBUG
    
    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    Hmm, I missed that, sorry. Your command line looks fine, I guess it's the Rscript causing the problem. Not sure what's going on there. Have you tried loading the .grp file into R with gsalib?

    Geraldine Van der Auwera, PhD

  • mlcunhamlcunha Posts: 8Member
    edited November 2012

    Yes, it works when I load the .grpfile into R.

    > library(gsalib)   
    > summary(d)
    >d=gsa.read.gatkreport("mySample_first.grp")
         Length Class      Mode
        Arguments   2      data.frame list
        Quantized   3      data.frame list
        RecalTable0 6      data.frame list
        RecalTable1 6      data.frame list
        RecalTable2 8      data.frame list
    
    Post edited by Geraldine_VdAuwera on
  • mlcunhamlcunha Posts: 8Member

    I do not know if there is any problem with the GATK report because the .bam files I got after I did the quality score recalibration are smaller (I remember that in the GATK 2.1 version the sizes of the same files increased almost to the double). Just to give an example, sample1_realgn.bam (after running the tools RealignerTargetCreatorand IndelRealigner) was about 3.3Gb. After I run the tools BaseRecalibratorand PrintReads the new bam file is about 1.5Gb. If I run ReduceReads on the last file, I get a file with about 1.4Gb. Am I doing something wrong?

  • mlcunhamlcunha Posts: 8Member

    Regarding the last post, I rerun the command line for BaseRecalibrator and PrintReads and I got the files with almost the double size as I expected (from 3.3Gb to 5.8Gb after BaseRecalibrator and PrintReads, and then 1.4Gb after ReduceReads for the sample I gave as example). There was a problem with the server I was using but I do not know how (and if) it did affect my jobs... However I still have the same problem with plots. I still not get them but I am still able to load my .grp file in R as I previously mentioned.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    Ah, that confirms what I thought -- the bam file size is probably not relevant to your original problem. The debug message indicates that the R script failed, so that's most likely to be it. The question is why did the script fail. I'm consulting the team, we'll try to find the answer.

    Geraldine Van der Auwera, PhD

  • DaliaDalia Posts: 8Member

    Hi, I am also not able to generate plots with GATK2.2-3. I can load my .grp file into R with gsalib. However, running BaseRecalibrator, I get following error:

    INFO  13:39:40,097 BaseRecalibrator - Generating recalibration plots... 
    DEBUG 13:39:40,161 RScriptExecutor - Executing: 
    DEBUG 13:39:40,161 RScriptExecutor -   Rscript 
    DEBUG 13:39:40,161 RScriptExecutor -   -e 
    DEBUG 13:39:40,162 RScriptExecutor -   tempLibDir = '/scratch/pbs.27501.ax3/tmp/Rlib.7344000831173984327';source('/scratch/pbs.27501.ax3/tmp/BQSR.4899331575441606758.R'); 
    DEBUG 13:39:40,162 RScriptExecutor -   /scratch/pbs.27501.ax3/tmp/BQSR705839790581939999.csv 
    DEBUG 13:39:40,162 RScriptExecutor -   /ax3-cgi/analysis/1SF.recalibration_report.grp 
    DEBUG 13:39:40,163 RScriptExecutor -   /ax3-cgi/analysis/1SF.recalibration.pdf 
    WARNING: ignoring environment value of R_HOME
    Loading required package: methods
    Loading required package: gtools
    Loading required package: gdata
    gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
    
    gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
    
    Attaching package: ‘gdata’
    
    The following object(s) are masked from ‘package:stats’:
    
        nobs
    
    The following object(s) are masked from ‘package:utils’:
    
        object.size
    
    Loading required package: caTools
    Loading required package: bitops
    Loading required package: grid
    Loading required package: KernSmooth
    KernSmooth 2.23 loaded
    Copyright M. P. Wand 1997-2009
    Loading required package: MASS
    
    Attaching package: ‘gplots’
    
    The following object(s) are masked from ‘package:stats’:
    
        lowess
    
    Loading required package: plyr
    
    Attaching package: ‘reshape’
    
    The following object(s) are masked from ‘package:plyr’:
    
        rename, round_any
    
    Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
      more columns than column names
    Calls: source ... eval.with.vis -> eval.with.vis -> read.csv -> read.table
    Execution halted
    DEBUG 13:39:42,574 RScriptExecutor - Result: 1 
    WARN  13:39:42,575 RScriptExecutor - RScript exited with 1
    

    Many thanks for the advice!

    Dalia

  • dkvitekdkvitek Posts: 2Member
    edited November 2012

    I am getting the exact same error as Dalia, and I've tried both GATK 2.2-3 and 2.2-4. It seems the error is with reading the csv file:

    Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
    more columns than column names
    Calls: source ... eval.with.vis -> eval.with.vis -> read.csv -> read.table
    

    When I run the csv file manually through the BQSR.R script from the GATK-Lite clone from Github, it works perfectly:

    Rscript $GATK/BQSR.R BQSR8622570015259924177.csv SC5314.sf.dedup.realigned.grp SC5314.sf.dedup.realigned.csv.pdf
    

    Perhaps there is a difference between the BQSR.R script inside the full version of GATK2 vs GATK-Lite?

    Note: Since GATK 2.2-3 and 2.2-4 don't allow the -k option for BaseRecalibrator to retain the temporary csv file, I just wrote a script that checks the temp directory every second for a csv file and moves it to a different directory before it gets automatically deleted.

    Post edited by Geraldine_VdAuwera on
  • ebanksebanks Posts: 678GATK Developer mod

    There is a "hidden" argument that allows you to specify a permanent location for the csv: --intermediate_csv_file. We can make it un-hidden in the next release if that's helpful.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • dkvitekdkvitek Posts: 2Member

    Thanks for the tip Eric, making that argument visible might be helpful for debugging these sorts of issues. Upon further investigation, it looks like my csv file might be malformed rather than the R script having an issue. I'm having trouble attaching a file to my post, but here are the first few lines of the csv file:

    ReadGroup,CovariateValue,CovariateName,EventType,Observations,Errors,EmpiricalQuality,AverageReportedQuality,Accuracy,Recalibration
    622RY.1,CAC,Context,Base Insertion,13250864.00,857.001559,41.89,45.00,-3.11,ORIGINAL
    622RY.1,CAC,Context,Base Deletion,13250864.00,1,796.287276,38.68,45.00,-6.32,ORIGINAL
    622RY.1,AA,Context,Base Substitution,148573581.00,424,808.187498,25.44,26.56,-1.12,ORIGINAL
    622RY.1,AAA,Context,Base Insertion,51228232.00,8,346.105600,37.88,45.00,-7.12,ORIGINAL
    622RY.1,AAA,Context,Base Deletion,51228232.00,25,254.920299,33.07,45.00,-11.93,ORIGINAL
    

    There are 10 items in the header, 10 in the first row but 11 in the second row. Looking through the rest of the file, it looks like most rows have 11 items. For the rows that have 10 items, it looks like the column that is currently called "Errors" is missing. These are just my observations; I have never looked at these covariate csv files before so I'm not sure exactly what they are supposed to look like. Could there possibly be a bug in the code that generates these csv files?

  • ebanksebanks Posts: 678GATK Developer mod

    Excellent detective work! You have uncovered a major bug in the latest release of the code. What is happening is that for the Errors column the number is printed out with commas when great than 999 (which is bad in a comma-delimited file). I will push in a fix for this ASAP and it should be available in a few hours.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • shahr2shahr2 Posts: 1Member

    Has this Issue of creating a plot from BaseRecalibrator been solved??

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    @shahr2, several different issues have been addressed and resolved in this thread. If you are having trouble getting plots generated, see if any of these apply to you and if so, apply the corresponding solution (a common one is upgrading to the latest version of GATK). If your problem is not resolved here, feel free to post a new question, with details (error messages if applicable, etc). However, be advised that in many cases the problem is the user's R configuration, which is not something we can provide support for.

    Geraldine Van der Auwera, PhD

  • NielsNiels Posts: 2Member

    It appears to be that a new version of ggplot2 is released: 0.9.3 After I read this thread I managed to get he plots generated. However, in the debug it states the following:

    'opts' is deprecated. Use 'theme' instead. (Deprecated; last used in version 0.9.1)
    'theme_blank' is deprecated. Use 'element_blank' instead. (Deprecated; last used in version 0.9.1)
    'theme_blank' is deprecated. Use 'element_blank' instead. (Deprecated; last used in version 0.9.1)
    'theme_blank' is deprecated. Use 'element_blank' instead. (Deprecated; last used in version 0.9.1)
    'theme_blank' is deprecated. Use 'element_blank' instead. (Deprecated; last used in version 0.9.1)
    'opts' is deprecated. Use 'theme' instead. (Deprecated; last used in version 0.9.1)
    theme_text is deprecated. Use 'element_text' instead. (Deprecated; last used in version 0.9.1)
    'opts' is deprecated. Use 'theme' instead. (Deprecated; last used in version 0.9.1)
    theme_text is deprecated. Use 'element_text' instead. (Deprecated; last used in version 0.9.1)
    'opts' is deprecated. Use 'theme' instead. (Deprecated; last used in version 0.9.1)
    theme_text is deprecated. Use 'element_text' instead. (Deprecated; last used in version 0.9.1)
    'opts' is deprecated. Use 'theme' instead. (Deprecated; last used in version 0.9.1)
    theme_text is deprecated. Use 'element_text' instead. (Deprecated; last used in version 0.9.1)
    Warning messages:
    1: NAs introduced by coercion 
    2: NAs introduced by coercion 
    

    Is this an issue in the perfomance how the plots are generated? does this case any differences? As i cannot compare it to any previous versions of ggplot2.

    Thanks in advance.

    Niels

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    This should not affect the generation of the plots.

    Geraldine Van der Auwera, PhD

  • silentiosilentio Posts: 4Member
    edited January 2013

    I have encountered the same problem and solved the issue by installing the missing R packages indicated by running -l DEBUG flag. To summary, in order to generate pdf files, the following R packages have to be installed:

    • bitops
    • caTools
    • colorspace
    • gdata
    • ggplot2
    • gplots
    • gtools
    • RColorBrewer
    • reshape

    Hope this can help.

    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    Thanks for sharing your experience, @silentio

    Geraldine Van der Auwera, PhD

  • adalgeiradalgeir Posts: 10Member

    Thank you for the above information. I've installed the above packages (within R by '> install.packages(c("bitops"))' etc) but I can't figure out how to get the pdf. I'm running GATK v2.3-9-ge5ebf34 (not from queue). Do I get the following steps right?

    I first run the BaseRecalibrator to get a grp-output (paths simplified here):

    java -Xmx4g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -I myfile.realigned.bam -R path/to/hg19.fa -knownSites path/to/dbsnp_137.hg19.vcf -knownSites path/to/Mills_and_1000G_gold_standard.indels.hg19.vcf -knownSites path/to/1000G_phase1.indels.hg19.vcf -o myfile.recal_data.grp

    And then I run this command again with the -BQSR part (and a different -o filename):

    java -Xmx4g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -I myfile.realigned.bam -R path/to/hg19.fa -knownSites path/to/dbsnp_137.hg19.vcf -knownSites path/to/Mills_and_1000G_gold_standard.indels.hg19.vcf -knownSites path/to/1000G_phase1.indels.hg19.vcf -BQSR myfile.recal_data.grp -o myfile2.recal_data.grp

    The second step only adds the second grp file, almost identical to the first one.

    There seems to be no problem with the PrintReads step afterwards, it's just that it would be nice also to have the plots.

    Could this have something to do with how I installed the GATK? I downloaded the v2.3-9 (not Lite) and just mouse clicked the bz2-file. Should I have used some ant command(s) ?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    Yes, you need to get the source download from the repository, build gsalib (do ant gsalib) (it's an R library we use to read in gatkreports, aka .grp files) and install it into your R environment. Sorry if this is unclear from the docs; we're trying to think of a better way to provide this library in future.

    Geraldine Van der Auwera, PhD

  • adalgeiradalgeir Posts: 10Member

    Thanks Geraldine. I'm sorry to load you with more questions but I'm not too familiar with computer science language so could you tell me if I understood your instructions correctly? I did the following:

    I visited the Downloads page. According to your explanations on http://gatkforums.broadinstitute.org/discussion/1244 there should've been further instructions on the Downloads page but I didn't find them so I guessed how to continue. The Downloads page has a link (under GATK Lite) to Github, opening the Commit History page. I found that if I mouse-clicked "Tags" close to the upper right corner, it opened a list of code links. I selected the top "Source code (tar.gz)" (version 2.3) and moved the download to my GATK folder, where I double-clicked it and continued with the unix commands:

    cd path/to/GATK/gatk-2.3

    ant gsalib

    ant

    It seemed to install normally, only warning that the walker FindCoveredIntervals was currently undocumented.

    Does this mean that I've installed the latest full version GATK (not GATK Lite)? The java ... -h command doesn't reveal a version like "v2.3-9-ge5ebf34" but only writes "vexported" instead.

    I'm indeed uncertain if I did this right. Should I have used more ant commands? Long ago when I ran earlier versions I could download and install updates within my path/to/GATK/destination_directory by using the unix commands "git pull; ant clean; ant". It seems I can't do this any longer. Should I have included "ant clean" when I now installed 2.3? Or more / other ant commands?

    Please tell if you see something I should have done simpler or more correctly. I fear that I even selected a completely wrong code link path from the Downloads page.

    With great appreciation for your tremendous work with developing gatk, making it available and helping inexperienced people like me. Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    You're on the right track.

    Those commands don't really "install" anything, they just compile some program files. If you didn't overwrite the GenomeAnalysisTK.jar file that you already had, then you should be able to invoke that version just as you did before.

    What you still need to do is add gsalib to your R environment. To do that, go to the directory where you did ant gsalib, start R and do install.packages("gsalib"). After that you'll be able to load the gsalib library manually if you want to process gatkreports yourself, and the GATK will be able to call the necessary rscripts to generate plots of base recalibration.

    If you want to be able to update the GATK using git pull etc. when we release a new version, you need to "clone the repository" -- there should be a button to do that on the github page. Right now this is only possible for GATK Lite, but starting next week when we release version 2.4, you'll be able to do it for the full GATK (as long as you use GATK for non-commercial work).

    Geraldine Van der Auwera, PhD

  • adalgeiradalgeir Posts: 10Member

    Great! Although I didn't have luck with install.packages("gsalib"). It always directs to a Cran mirror and then says "package ‘gsalib’ is not available (for R version 2.15.2)". I tried from within the directory you suggested (path/to/GATK/gatk-2.3, where I had done ant gsalib), and also from an R subdirectory further down (path/to/GATK/gatk-2.3/public/R/src/org/broadinstitute/sting/utils/R/gsalib/R - did that make sense?). I also tried library(gsalib) in line with your explanations on http://gatkforums.broadinstitute.org/discussion/1244 and R accepted that from within the R-subdirectory (and also from elsewhere if I made a .Rprofile reference to that subdirectory). I'm still stuck with the gsalib installation but I guess this is a problem with my knowledge of R rather than gatk and I can't really expect you to spend time helping me with that. Just if you see a very simple solution ...

    But I look forward to trying the full 2.4 next week. And it's good to know I can move forward (with or without the gsalib) using the GenomeAnalysisTK.jar file I had before. Thanks again for your explanation and help!

  • chapmanbchapmanb Posts: 19Member

    To install the R gsalib library, you can do:

    sudo R CMD INSTALL --preclean public/R/src/org/broadinstitute/sting/utils/R/gsalib
    

    this is the equivalent of what ant gsalib does. That takes care of installation into R libraries. You can confirm by doing a library(gsalib) from an interactive R session. Hope this helps.

    Brad Chapman, Bioinformatics Core at Harvard School of Public Health

  • adalgeiradalgeir Posts: 10Member

    Yes it helped! Now R can run library(gsalib) from any directory. Thanks!

  • adalgeiradalgeir Posts: 10Member

    A little addition on my attempt to use the gsalib: I seem to be able to view .grp files from within R (after >library(gsalib)) since if I copy the example from http://gatkforums.broadinstitute.org/discussion/1244 into a "test.grp" file and do > d = gsa.read.gatkreport('test.grp') and > summary(d) it perfectly displays the example table. (Just to mention: the filename had to be shown within ' ' rather than "").

    But the v2.3-9-ge5ebf34 BaseRecalibrator still makes a .grp file without pdf-tables. Maybe I misunderstood how it should work (or it's confined to the Sting/Lite)? And if I try (as with the test.grp file) to look at my BaseRecalibrator .grp file in R, it results in a 'colnames' error' (...'names' attribute [6] must be the same length as the vector [1]). Well, perhaps that's an inappropriate twist to the discussions in this thread, and maybe it's also irrelevant since 2.4 is on its way soon.

    One more question however: Should I try adding BQSR.R, cf. the answer by rpoplin at the top of this page? If so, is there a particular path I should make and copy it to (related to my location of GenomeAnalysisTK.jar after opening the .bz2 download)?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    I just checked the command lines you posted earlier; it looks like you're not specifying plot filenames. If you don't specify them the tool doesn't generate the plots. See the technical documentation to find which arguments control the plots.

    Geraldine Van der Auwera, PhD

  • MattBMattB Posts: 19Member
    edited January 2013

    Just to get back on this last post by adalgeir I've downloaded the latest version 2.3-9 I have all the R dependencies installed including gsalib. I too also see the same error traced with -l DEBUG:

    Error in 'colnames<-'('*tmp*', value = c("ReadGroup", "EventType", "EmpiricalQuality",  : 
      'names' attribute [6] must be the same length as the vector [1]
    Calls: source ... finishTable -> .gsa.assignGATKTableToEnvironment -> colnames<-
    Execution halted 
    

    I'm running with my own data here:

    java -Xmx28g -jar ~/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T BaseRecalibrator -nct 9 -l DEBUG -I
    lane_1_sorted.bam -L chr1:1-2000 -R /data/GATK_bundle/hg19/ucsc.hg19.fasta -knownSites dbsnp_137.hg19.vcf -knownSites
    Mills_and_1000G_gold_standard.indels.hg19.vcf -knownSites 1000G_phase1.indels.hg19.vcf -o TEST_Lane_1_recal_data.grp 
    --plot_pdf_file TEST_lane_1_recal.pdf --intermediate_csv_file TEST_lane_1.cvs
    

    Is there a bug in the R code? Or have I done something wrong?

    Post edited by MattB on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    This is an error that pops up occasionally for some people and we're not sure why. Can you make sure that you have the latest version of gsalib installed? That doesn't get updated when you update the GATK itself.

    Geraldine Van der Auwera, PhD

  • MattBMattB Posts: 19Member
    edited January 2013

    The version of gsalib I'm using is one I built using ant from the GATK source on github just this week i.e. it should be current.

    Post edited by MattB on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    Matt, is this occurring when the script gets run by BaseRecalibrator, or when you try to run it yourself?

    Geraldine Van der Auwera, PhD

  • adalgeiradalgeir Posts: 10Member

    Thanks again Geraldine and apologies for not noticing the need of -plot specification. Now I've reached to the same stage as Matt. When I run BaseRecalibrator (the full version download of GenomeAnalysisTK.jar), it's exactly like Matt describes (the same error when run with -l DEBUG). I'll wait and see how this thread continues (I have to leave my office for a while). Thanks, both of you.

  • MattBMattB Posts: 19Member
    edited January 2013

    It's occurring when BaseRecalibrator is calling Rscipt.

    Post edited by MattB on
  • MattBMattB Posts: 19Member
    edited February 2013

    Right so removing -L chr1:1-2000 looks to have fixed my issue, the plots have been plotted by BaseRecalibrator calling Rscript, I wonder if there was insufficient data to produce them form such a small run.

    Post edited by MattB on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    Hi Matt,

    That's right, if you are running on a very small amount of data you end up with some empty tables in the gatkreport and so the script fails when it tries to process them. Sorry I didn't notice the -L in your command line.

    Geraldine Van der Auwera, PhD

  • murilogborgesmurilogborges Posts: 1Member

    Hello

    I had the same problems posted before, and decided to conglomerate very good tips posted by all the users above, in order to solve problems while dealing with the .pdf files generation, sorry if I forget mentioning someone.

    First I downloaded the Source Code:

    wget 'https://github.com/broadgsa/gatk/archive/master.zip' Then unzip master.zip Enter the gatk-master folder, execute

    cd /path/to/gatk-master
    ant gsalib 
    

    I wish you've got a "BUILD SUCCESSFUL" after that. Now let's enter the R environment, in order to install some key packages, as @silentio mentioned:

    R
    >install.packages("bitops")
    >install.packages("caTools")
    >install.packages("colorspace")
    >install.packages("gtools")
    >install.packages("gplots")
    >install.packages("gdata")
    >install.packages("ggplot2")
    >install.packages("RColorBrewer")
    >install.packages("reshape")
    

    now let's take a look at BaseRecalibrator:

    java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R genome.fa -I file.bam -knownSites sites.vcf -o recal.grp -intermediate recal.grp.csv

    the -intermediate option is an important part of the process, as noticed by @ebanks. With this intermediate .csv file, we can call the Rscript to generate the plots, as @dkvitek did:

    Rscript path/to/gatk-master/public/R/scripts/org/broadinstitute/sting/utils/recalibration/BQSR.R recal.grp.csv recal.grp recal.grp.pdf

    after some logs, you should have you .pdf file. You can take a look at your .pdf file by executing evince recal.grp.pdf Run the post recalibration providing -BQSR and once again run the Rscript

    Done! Maybe some dependent R packages will be needed, but R will ask you to install them.

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    @murilogborges, thanks for putting this together!

    As a caveat, please note that once you've configured your R environment as described, the GATK should be able to run the Rscripts for you, so you shouldn't need to run them manually nor keep intermediate files. However for those users who cannot seem to get that to work, this method should be a workable substitute.

    Geraldine Van der Auwera, PhD

  • KlausNZKlausNZ Posts: 28Member

    Hi Geraldine, I'm getting similar errors and have tracked it down to the ggplot library. Before I wake up my cluster administrator (I don't have rights to install R libs), can you please confirm that this is still a problem with v 2.4-9. If yes, do you have pointers as to which libraries and which R version needs to be installed? Many thanks in advance!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    Hi Klaus,

    This hasn't changed, so you'll need to wake up your admin, sorry! The R version shouldn't be very important -- fyi I have 2.13.1 on my machine. You'll need to install ggplot2 and its dependencies, which can be obtained from CRAN in the usual way, and our own library, gsalib, which at the moment is distributed with the source code on Github. You'll need to download the source and build GATK (ant clean dist), and the gsalib will be in one of the resource directories created by the build. I realize this is rather inconvenient but we are planning to put gsalib in CRAN fairly soon to make it all easier.

    Geraldine Van der Auwera, PhD

  • drashu_11drashu_11 Posts: 11Member

    Hi I have used the following commandline for Baserecalibration in my cluster

    $ java -Xmx30g -jar GenomeAnalysisTK.jar -T BaseRecalibrator --default_platform illumina -I Sample_realigned.bam -R ./reference/genome.fa -knownSites snps_indels.vcf -knownSites indels.vcf -o Sample_rcal.grp
    -plots Sample_rcal.pdf -S LENIENT

    after analysis I have the Sample_rcal.grp file but no plots Sample_rcal.pdf is generated In my cluster R enviroment there is no gsalib, gpplot2,gplots libraries so, I installed all required pakages on my local windows R environment and I can read the GATK report as follows-

    >library(gsalib)
    
    > d=gsa.read.gatkreport("Sample_rcal.grp")
    
    > summary(d)
    
                Length Class      Mode
    Arguments   2      data.frame list
    Quantized   3      data.frame list
    RecalTable0 6      data.frame list
    RecalTable1 6      data.frame list
    RecalTable2 8      data.frame list
    

    But I need to generate the plot file , How can I do this on my own machine ?

    Any suggestion

    Thanks

    Ashutsoh

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin
    edited April 2013

    If you want to run the script that generates the plots on your own machine, you will need to add the -intermediate argument to your BaseRecalibrator command in order to save the intermediate .csv file (otherwise it is deleted as a temp file). Then, copy that file to your local machine. You will need also to get the GATK source from Github because you need the BQSR.R script. Once you have that, you adapt this command with the appropriate filenames and paths, and run it to generate the plots:

    Rscript $GATK/public/R/scripts/org/broadinstitute/sting/gatk/walkers/bqsr/BQSR.R intermediate_file.csv output_plot.pdf
    
    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

  • smk_84smk_84 Posts: 59Member
    edited June 2013

    I am also not able to generate the bqsr plot. Here is the relevant debug output

    DEBUG 16:33:38,952 QualQuantizer - updated intervals: [QQ:0-9, QQ:10-12, QQ:13-14, QQ:15-15, QQ:16-16, QQ:17-17, QQ:18-18, QQ:19-19, QQ:20-20, QQ:21-21, QQ:22-22, QQ:23-29, QQ:30-30, QQ:31-31, QQ:32-32, QQ:33-93] 
    INFO  16:33:38,953 BaseRecalibrator - Writing recalibration report... 
    INFO  16:33:41,182 BaseRecalibrator - ...done! 
    INFO  16:33:41,183 BaseRecalibrator - Generating recalibration plots... 
    DEBUG 16:33:41,183 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 5, 1012, 3] 
    DEBUG 16:33:41,183 NestedIntegerArray - Pre-allocating first 2 dimensions 
    DEBUG 16:33:41,183 NestedIntegerArray - Done pre-allocating first 2 dimensions 
    DEBUG 16:33:41,262 RScriptExecutor - Executing: 
    DEBUG 16:33:41,262 RScriptExecutor -   Rscript 
    DEBUG 16:33:41,262 RScriptExecutor -   -e 
    DEBUG 16:33:41,263 RScriptExecutor -   tempLibDir = '/tmp/Rlib.2327386667045653425';source('/tmp/BQSR.6943530625249500071.R'); 
    DEBUG 16:33:41,263 RScriptExecutor -   /tmp/BQSR5693155957981819144.csv 
    DEBUG 16:33:41,263 RScriptExecutor -   /share/data/skhan/genome1_25_soyb/HN001/test_bqsr.grp 
    DEBUG 16:33:41,263 RScriptExecutor -   /share/data/skhan/genome1_25_soyb/HN001/test_bqsr.pdf 
    DEBUG 16:33:44,144 RScriptExecutor - Result: 1 
    WARN  16:33:44,144 RScriptExecutor - RScript exited with 1 
    INFO  16:33:44,145 BaseRecalibrator - Processed: 3913 reads 
    INFO  16:33:44,148 ProgressMeter -            done        3.92e+03    7.0 s       30.2 m     99.3%         7.0 s     0.0 s 
    INFO  16:33:44,148 ProgressMeter - Total runtime 7.10 secs, 0.12 min, 0.00 hours 
    INFO  16:33:44,149 MicroScheduler - 1643 reads were filtered out during traversal out of 5561 total (29.55%) 
    INFO  16:33:44,149 MicroScheduler -   -> 425 reads (7.64% of total) failing DuplicateReadFilter 
    INFO  16:33:44,149 MicroScheduler -   -> 1203 reads (21.63% of total) failing MappingQualityZeroFilter 
    INFO  16:33:44,150 MicroScheduler -   -> 15 reads (0.27% of total) failing NotPrimaryAlignmentFilter 
    DEBUG 16:33:44,153 GATKRunReport - Aggregating data for run report 
    DEBUG 16:33:44,613 GATKRunReport - Posting report of type STANDARD 
    DEBUG 16:33:44,615 GATKRunReport - Generating GATK report to AWS S3 with key TzWc7LoRH2GTCLSCKnfs3YFwXgS3edx2.report.xml.gz 
    INFO  16:33:45,263 GATKRunReport - Uploaded run statistics report to AWS S3 
    DEBUG 16:33:45,264 GATKRunReport - Uploaded to AWS: S3Object [key=TzWc7LoRH2GTCLSCKnfs3YFwXgS3edx2.report.xml.gz, bucket=<Unknown>, lastModified=Mon Jun 24 16:33:46 CDT 2013, dataInputStream=null, Metadata={ETag="ad272e580f30fbce0c3629f
    b6974b17f", Date=Mon Jun 24 16:33:46 CDT 2013, Content-Length=411, id-2=g5TacGPGNiZCpywuXaQZEwuUZoorzRV1pakmfJRuZBRPmfMdf4WlJ12tmnkdAfKi, request-id=80E37BAB3D7499B4, Content-MD5=rScuWA8w+84MNin7aXSxfw==, Content-Type=application/octet-
    stream}] 
    
    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    Hi @smk_84,

    I'm not sure why the plotting didn't work for you here. Did you generate plots successfully previously? And what version are you using?

    Geraldine Van der Auwera, PhD

  • smk_84smk_84 Posts: 59Member

    I did actually but it was on another server. I am using version 2.5-2

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    Hmm. Your local environment may not be correctly set up on the new server. You could try running again with the option to keep the intermediate csv file, and run the BQSR.R script (which you can get from the source code repository on github) directly on the csv file. If it doesn't work it should at least yield more informative error message.

    Geraldine Van der Auwera, PhD

  • beezonebeezone Posts: 8Member

    Why there's no RMSE value in generated plots?...@Geraldine

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    Hi @beezone,

    I don't understand your question, can you please clarify?

    Geraldine Van der Auwera, PhD

  • beezonebeezone Posts: 8Member

    Hello @ Geraldine, I am able to generate BQSR plot successfully using BaseRecalibrator and got some really nice recalibration plots such plots for Reported quality vs Empirical quality score. But what I am looking for is RMSE value of such plot, which is not shown/present in plot that I have generated. Please, have a look at this thread (http://gatkforums.broadinstitute.org/discussion/44/base-quality-score-recalibration-bqsr), I hope this will clarify about RMSE value that I am talking about.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,902Administrator, GATK Developer admin

    Oh, I see what you mean now. The publicly available R script that is used for plotting as part of the standard pipeline doesn't include calculation of the RMSE, sorry. The plots with RMSE value that are shown in that thread are produced with internal scripts. I'll see if we can make those public but I can't promise anything.

    Geraldine Van der Auwera, PhD

  • EugenieEugenie Posts: 13Member

    Hello Geraldine. Sorry for the strange issue. I don't think it worths a separate discussion and it's quite minor, but maybe somehow related to this topic. I am also trying to get plots but what I am curious about: why the error which occurs while Rscript is running is not reported in the log file (logging level : INFO). So when checking run on the way I see:

    ... INFO 14:55:54,431 AnalyzeCovariates - Generating csv file 'sample1.BQSR_test.csv' INFO 14:55:54,869 AnalyzeCovariates - Generating plots file 'sample1.BQSR_plots_test.pdf' INFO 14:55:56,688 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    org.broadinstitute.sting.utils.R.RScriptExecutorException: RScript exited with 1. Run with -l DEBUG for more info. at org.broadinstitute.sting.utils.R.RScriptExecutor.exec(RScriptExecutor.java:174) at org.broadinstitute.sting.utils.recalibration.RecalUtils.generatePlots(RecalUtils.java:548) at org.broadinstitute.sting.gatk.walkers.bqsr.AnalyzeCovariates.generatePlots(AnalyzeCovariates.java:380) at org.broadinstitute.sting.gatk.walkers.bqsr.AnalyzeCovariates.initialize(AnalyzeCovariates.java:394) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:107)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: RScript exited with 1. Run with -l DEBUG for more info.
    ERROR ------------------------------------------------------------------------------------------

    But the log file ends with just:

    ... INFO 14:55:54,431 AnalyzeCovariates - Generating csv file 'sample1.BQSR_test.csv' INFO 14:55:54,869 AnalyzeCovariates - Generating plots file 'sample1.BQSR_plots_test.pdf' INFO 14:55:56,688 GATKRunReport - Uploaded run statistics report to AWS S3

    Do you think there is a way to get a full report in log file? Otherwise it looks a bit confusing when checking afterwards... Maybe it's something special about running Rscript and not GATK walker?

    Thank you!

  • SheilaSheila Broad InstitutePosts: 295Member, GATK Developer, Broadie, Moderator admin

    @Eugenie

    Hi,

    You can try setting the logging level to ERROR to see if that helps.

    -Sheila

Sign In or Register to comment.