GATK4 results for PrecisionFDA Consistency challenge data

Dear team,

I've run GATK4.0.0 using Cromwell (30.2) and WDLs at https://github.com/gatk-workflows/gatk4-data-processing and https://github.com/gatk-workflows/gatk4-germline-snps-indels. I had bwa aligned and deduped BAMs, so I modified "processing-for-variant-discovery-gatk4.wdl" to start from BQSR, but otherwise used the published WDLs with minimal modifications.

The results for the public PrecisionFDA datasets (https://precision.fda.gov/) are interesting. The recall and precision were great for the Truth challenge datasets (HiSeq2500, PCR-free, ~50x), but not for the Consistency challenge datasets (HiSeqX, PCR+, ~30x). In particular for indels from the Consistency challenge datasets, the recall and precision were far worse than GATK3 results available for these datasets: ~92% and ~79% for the Garvan dataset and ~89% and 83% for the HLI dataset after VQSR filtration.

Do these numbers match with what you normally get for PCR+ HiSeq X WGS datasets with depths ~35x? If not, are there any parameters that I need to change?

Also, I think it will be very helpful to the community if the team make your GATK4 results publicly available for these popular public datasets.

Best,

Sangtae

Issue · Github
by Sheila

Issue Number
2901
State
closed
Last Updated
Assignee
Array
Closed By
vdauwera

Answers

Sign In or Register to comment.