The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
Interpreting ExcessHet INFO field
First thanks for the wonderful help and clarity of explanations on this website.
I am considering applying a filter on ExcessHet on my vcf files to have only markers that follow HWE. I understood that the ExcessHet value was the probability of getting the same or more heterozygotes as was observed unde HW conditions. But looking at my data, I get sites like this:
Scaffold_100 316384 . A C 1085470 PASS AC=55;AF=0.724;AN=76;BaseQRankSum=0.764;ClippingRankSum=0.00;DP=36981;ExcessHet=0.0000;FS=0.000;InbreedingCoeff=0.9342;MLEAC=55;MLEAF=0.724;MQ=57.15;MQRankSum=0.771;QD=29.50;ReadPosRankSum=0.118;SOR=1.517 GT:AD:DP:GQ:PL 1/1:1,120:121:99:4815,322,0 1/1:1,594:595:99:24888,1748,0 1/1:0,789:789:99:33158,2371,0 1/1:4,461:465:99:19229,1157,0 1/1:2,106:108:99:4322,245,0 1/1:9,279:288:99:11278,484,0 1/1:1,265:266:99:10808,754,0 1/1:8,246:254:99:10149,462,0 1/1:5,293:298:99:12072,726,0 1/1:1,734:735:99:30363,2167,0 1/1:9,302:311:99:12455,568,0
Where ExcessHet is 0 but the site is monomorphic for the alternate allele, so following the null hypothesis of HWE there should be no heterozygote, as observed, and I should get a high p-value?
Did I misinterpret something there?
As a result I am not so sure how to apply the filter...