How to import GATK reports into Python/pandas DataFrames

myourshawmyourshaw University of ColoradoMember ✭✭
edited March 2018 in Ask the GATK team

Answering my own question. I searched around and could not find a Python equivalent to the gsalib R library, which enables R analysis of GATK reports produced by VariantEval and other tools. So I created a port of the R gsalib to a Python module. The GatkReport object will read a report file (supports v0 and v1), and load each table into a pandas DataFrame.

Install with: pip install gsalib.

Example usage:

from gsalib import GatkReport

report = GatkReport('/path/to/gsalib/test/test_v1.0_gatkreport.table')
table = report.tables['ExampleTable']
