Great workshop in Huntington.
Would you please give me a link to that training set you mentioned?
It helps a lot. That is what I was looking for.
Thank you very much!
Which training set are you referring to? I will ask Geraldine to get back to you.
She mentioned these bam/vcf files set with specific know variants and annotations (maybe phenotype).
My goal would be taking these bam files and after applying variant calling and filtering reach the same or similar results.
Thank you again for all the workshop teaching!
I am confirming with Geraldine and will get back to you asap
Hi @CesarDuarte, lovely to see you on the forum post-workshop
The test data sets we make available are here: https://console.cloud.google.com/storage/browser/gatk-test-data
They're organized by file type, not by testing set as such. I'm in talks with the team that specializes in evaluations to see if we can provide some more structured resources, but I don't know yet when that will bear fruit. In the meantime though you can check out a preprint they recently deposited in biorxiv that describes an approach that I think is very promising, and I think there's links to the data they used: https://www.biorxiv.org/content/early/2017/11/22/223297
Finally you should also check out the NIST Genomes in a Bottle project, as well as Blue Collar Bioinformatics -- that's a group out of the Harvard school of public health that does a lot of good work in the variant caller comparison space.
I hope this helps!