How to import GATK reports into Python/pandas DataFrames
Answering my own question. I searched around and could not find a Python equivalent to the gsalib R library, which enables R analysis of GATK reports produced by VariantEval and other tools. So I created a port of the R gsalib to a Python module. The
GatkReport object will read a report file (supports v0 and v1), and load each table into a pandas DataFrame.
pip install gsalib.
from gsalib import GatkReport report = GatkReport('/path/to/gsalib/test/test_v1.0_gatkreport.table') table = report.tables['ExampleTable']