If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
SelectVariants - Removing variants present in a sample reality check
I have a very small set non-model organism genomes, and have called SNPs following the Best Practices with
HaplotypeCaller. One of the genomes that I included is a re-sequenced version of the reference ancestral genome.
Surprisingly, all of the SNPs that were identified appeared to be present in the newly sequenced ancestor track as well; in fact looking at the joint-called VCF contig by contig I couldn't find a single SNP that wasn't present in the ancestor. A few of these make sense - for example looking at the bam alignment shows that some of the SNPs are almost exactly 50% split between two alternative bases, indicating that the reference may be heterozygous at that position. But before I accept that there are literally 0 new SNPs found I want to verify this with filtering; I wanted to select only variants that were not also called in the ancestor.
To do this, I created a new VCF containing only the ancestor, and then selected variants discordant from that ancestral VCF:
# First get only the ancestor java -jar $GATK -T SelectVariants \ -R $reference \ -V jointcalls.vcf \ -sn Ancestor \ -o jointcalls_AncestorOnly.vcf # Then get all variants that don't agree with ancestor (are in discordance) java -jar $GATK -T SelectVariants \ -R $reference \ -V jointcalls.vcf \ --discordance jointcalls_AncestorOnly.vcf \ -o jointcalls_NoAncestor.vcf
When I visualize this output w/ IGV and summarize it with VariantEval, both indicate that the file is empty - there are 0 calls. I think the interpretation here would be that literally every call made in any of the input genomes was also made in the ancestor. However, without even a single positive example to cross check, it's hard to trust that I didn't do something wrong o misunderstand the filtering.
Is this interpretation correct? Any sanity check would be appreciated. Thanks,