Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Non-human truth and training resources
I'm working on a bird whole genome and I need to set up truth and training resources. In the documentation it says the following and I just want to confirm that I understand this correctly:
"If you are working with non-human genomes, you will need to find or generate at least truth and training resource datasets with properties corresponding to those described below. To generate your own resource set, one idea is to first do an initial round of SNP calling and only use those SNPs which have the highest quality scores. These sites which have the most confidence are probably real and could be used as truth data to help disambiguate the rest of the variants in the call set."
Do I understand it correctly that in the arguments I will be using the following:
-resource: raw_variants ,known=false,training=true,truth=true,prior=15.0 raw_variants.vcf
So I will use the raw variant vcf file that I have created in the "Call variants" step previously as my truth site training resource?
Thanks a lot!