Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Non-human truth and training resources

henriettevdzhenriettevdz South Africa Member

Good day,

I'm working on a bird whole genome and I need to set up truth and training resources. In the documentation it says the following and I just want to confirm that I understand this correctly:

"If you are working with non-human genomes, you will need to find or generate at least truth and training resource datasets with properties corresponding to those described below. To generate your own resource set, one idea is to first do an initial round of SNP calling and only use those SNPs which have the highest quality scores. These sites which have the most confidence are probably real and could be used as truth data to help disambiguate the rest of the variants in the call set."

Do I understand it correctly that in the arguments I will be using the following:
-resource: raw_variants ,known=false,training=true,truth=true,prior=15.0 raw_variants.vcf

So I will use the raw variant vcf file that I have created in the "Call variants" step previously as my truth site training resource?

Thanks a lot!
Henriette

Issue · Github
by Sheila

Issue Number
1108
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Best Answer

Answers

Sign In or Register to comment.