If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Exclude variants not genotyped in x out of y samples
I apologise if this has been answered already, I haven't managed to find the answer (I promise I tried). I have applied hard-filters to my HC-SNP data set as recommended by GATK as a starting point. However, I also want to filter by depth (I think this is no longer a popular option but variants with good coverage are surely still more reliable than variants with very low coverage?).
Ideally, what I want is a filtered vcf file that contains variants which are genotyped in the majority of my samples at a minimum depth. I think this is what the CoveredByNSampleSites Walker used to do, but it seems this walker has been removed?
So instead I worked around this using VariantFiltration at the sample level: --genotypeFilterExpression 'DP < 10' --genotypeFilterName DP-10 --setFilteredGtToNocall, so all samples with a depth of less than 10 for the any given variant were set to ./.. This seemed to work fine.
Now I want to exclude variants for which a large proportion of my samples have no genotype (./.). For this, I used:
SelectVariants --maxFilteredGenotypes 10.
However, when I check how many samples have no genotype using VariantsToTable -F NO-CALL, I still have many variants where 20-40 of my samples have no genotype! The --maxFilteredGenotypes option seems to work fine, as I definitely have far fewer variants with many missing genotypes. I believe this is due to the fact that these missing genotypes weren't 'filtered', but simply have not been genotyped in the HaplotypeCaller. I guess to exclude these, I would need an option along the lines of "--maxNOCALL" instead of the --maxFilteredGenotypes, but as far as I can tell, this does not exist?
I am sure there must be a way to do this and I simply haven't found it?
Many thanks for any pointers