If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Calculation of HaplotypeScore
I conducted a target resequencing study focusing on 1.5Mbp region for 48 individuals. After the data processing procedure recommended by the GATK team, I obtained SNP calls by the multiple-sample version of the UnifiedGenotyper. The VariantRecalibrator did not work due to the limited number of SNPs. Therefore, I consider the hard filtering of the raw SNPs. I tried the hard filtering procedure recommended by the GATK team in the Best Practices (QD<2.0, MQ<40.0, FS>60, HaplotypeScore>13.0, MQRankSum<-12.5, ReadPosRankSum<-8.0). Surprisingly, one third of the raw SNPs were filtered out by the HaplotypeScore threshold.
The average coverage depth of our data was very high (about 200 for each individual). Therefore, I am wondering the value of the HaplotypeScore depends on coverage depth.
Could you tell me the equations for HaplotypeScore, MQRankSum, and ReadPosRankSum and tell me whether these statistics depend on the depth of coverage or other measurements?