The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

What is the GATKReport file format?

Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,015 admin
edited August 2015 in FAQs

A GATKReport is simply a text document that contains well-formatted, easy to read representation of some tabular data. Many GATK tools output their results as GATKReports, so it's important to understand how they are formatted and how you can use them in further analyses.

Here's a simple example:

#:GATKReport.v1.0:2
#:GATKTable:true:2:9:%.18E:%.15f:;
#:GATKTable:ErrorRatePerCycle:The error rate per sequenced position in the reads
cycle  errorrate.61PA8.7         qualavg.61PA8.7                                         
0      7.451835696110506E-3      25.474613284804366                                      
1      2.362777171937477E-3      29.844949954504095                                      
2      9.087604507451836E-4      32.875909752547310
3      5.452562704471102E-4      34.498999090081895                                      
4      9.087604507451836E-4      35.148316651501370                                       
5      5.452562704471102E-4      36.072234352256190                                       
6      5.452562704471102E-4      36.121724890829700                                        
7      5.452562704471102E-4      36.191048034934500                                        
8      5.452562704471102E-4      36.003457059679770                                       

#:GATKTable:false:2:3:%s:%c:;
#:GATKTable:TableName:Description
key    column
1:1000  T 
1:1001  A 
1:1002  C 

This report contains two individual GATK report tables. Every table begins with a header for its metadata and then a header for its name and description. The next row contains the column names followed by the data.

We provide an R library called gsalib that allows you to load GATKReport files into R for further analysis. Here are four simple steps to getting gsalib, installing it and loading a report.

1. Start R (or open RStudio)

$ R

R version 2.11.0 (2010-04-22)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

2. Get the gsalib library from CRAN

The gsalib library is available on the Comprehensive R Archive Network, so you can just do:

> install.packages("gsalib") 

From within R (we use RStudio for convenience).

In some cases you need to explicitly tell R where to find the library; you can do this as follows:

$ cat .Rprofile 
.libPaths("/path/to/Sting/R/")

3. Load the gsalib library

> library(gsalib)

4. Finally, load the GATKReport file and have fun

> d = gsa.read.gatkreport("/path/to/my.gatkreport")
> summary(d)
              Length Class      Mode
CountVariants 27     data.frame list
CompOverlap   13     data.frame list
Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Tagged:

Comments

Sign In or Register to comment.