Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantEval Stratifications based on FORMAT Fields

LouisBLouisB Broad InstituteMember, Broadie, Dev ✭✭

I'd like to be able to perform stratifications in a multi sample vcf, by values that are in the format fields. Almost all of the existing stratifications are based on site specific information rather than sample specific ones. One stratification in particular that I would like to perform is by ReadDepth. I would like to be able to differentiate for instance, all samples with ReadDepth greater than 20. This works in single sample vcfs, but it produces strange results in ones with multiple samples, since each VariantContext contains multiple genotypes.

Melting my vcfs and reporting multiple lines for each position seems possible, but ugly. Splitting vcfs so that each sample is in it's own vcf is also possible and ugly. What is the recommended method for dealing with this sort of stratification?

Best Answer

Answers

Sign In or Register to comment.