To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

GATK4 results for PrecisionFDA Consistency challenge data

Dear team,

I've run GATK4.0.0 using Cromwell (30.2) and WDLs at https://github.com/gatk-workflows/gatk4-data-processing and https://github.com/gatk-workflows/gatk4-germline-snps-indels. I had bwa aligned and deduped BAMs, so I modified "processing-for-variant-discovery-gatk4.wdl" to start from BQSR, but otherwise used the published WDLs with minimal modifications.

The results for the public PrecisionFDA datasets (https://precision.fda.gov/) are interesting. The recall and precision were great for the Truth challenge datasets (HiSeq2500, PCR-free, ~50x), but not for the Consistency challenge datasets (HiSeqX, PCR+, ~30x). In particular for indels from the Consistency challenge datasets, the recall and precision were far worse than GATK3 results available for these datasets: ~92% and ~79% for the Garvan dataset and ~89% and 83% for the HLI dataset after VQSR filtration.

Do these numbers match with what you normally get for PCR+ HiSeq X WGS datasets with depths ~35x? If not, are there any parameters that I need to change?

Also, I think it will be very helpful to the community if the team make your GATK4 results publicly available for these popular public datasets.

Best,

Sangtae

Issue · Github
by Sheila

Issue Number
2901
State
open
Last Updated
Assignee
Array

Answers

Sign In or Register to comment.