If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.
Filtering of heterozygotes only
I need help, please!
I'm working on lovebirds and trying to identify SNPs that can be included in a parentage verification panel. The reference genome is the offspring and then I have mapped its parents' reads to the reference to identify SNPs. I want to identify only those SNPs where the mother and father are both heterozygotes, which will imply that all four the grandparents also had a polymorphism at that site.
I did hard filtering using the following parameters:
Firstly as the best practises guidelines suggests:
QD<2 || FS>60 || MQ<40 || MQRankSum<-12.5 || ReadPosRandSum < -8.0 And then to filter in the heterozygotes: QD>2 || FS <10 || MQ >50 || MQRankSum >-5.1 || ReadPosRandSum <-8.0
The mother is more heterozygous than the father and I get around (raw) 1.9mil SNPs for her vs 1.2mil for the father. After filtering, there is of course much less.
I then combined the genotypes of the two parents and repeated the process.
The results I get for both the filtering parameters and the combined and separate genotypes are not bad, but I wish to only have those SNPs where both the mother and father are heterozygous for the SNP. I've checked the results on igv and it seems that about 1 in every 10-20 SNPs that was filtered in complying to this. However, I cannot see any difference in parameters or quality or anything to filter these further. I went through them manually and selected those I wanted, but there were no significant similarities in this subset to be able to filter them from the rest.
So my questions are:
1. Is there any way to filter out only those SNPs that are heterozygous for all individuals, other than going through them manually?
2. Some of the SNPs with the highest quality are heterozygous but less than 20% of the reads have the alternative allele. Can I select these or should I go for lower quality but a higher % of alternative allele (e.g. 50%).
Thanks a lot!