This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Interpreting ExcessHet INFO field
First thanks for the wonderful help and clarity of explanations on this website.
I am considering applying a filter on ExcessHet on my vcf files to have only markers that follow HWE. I understood that the ExcessHet value was the probability of getting the same or more heterozygotes as was observed unde HW conditions. But looking at my data, I get sites like this:
Scaffold_100 316384 . A C 1085470 PASS AC=55;AF=0.724;AN=76;BaseQRankSum=0.764;ClippingRankSum=0.00;DP=36981;ExcessHet=0.0000;FS=0.000;InbreedingCoeff=0.9342;MLEAC=55;MLEAF=0.724;MQ=57.15;MQRankSum=0.771;QD=29.50;ReadPosRankSum=0.118;SOR=1.517 GT:AD:DP:GQ:PL 1/1:1,120:121:99:4815,322,0 1/1:1,594:595:99:24888,1748,0 1/1:0,789:789:99:33158,2371,0 1/1:4,461:465:99:19229,1157,0 1/1:2,106:108:99:4322,245,0 1/1:9,279:288:99:11278,484,0 1/1:1,265:266:99:10808,754,0 1/1:8,246:254:99:10149,462,0 1/1:5,293:298:99:12072,726,0 1/1:1,734:735:99:30363,2167,0 1/1:9,302:311:99:12455,568,0
Where ExcessHet is 0 but the site is monomorphic for the alternate allele, so following the null hypothesis of HWE there should be no heterozygote, as observed, and I should get a high p-value?
Did I misinterpret something there?
As a result I am not so sure how to apply the filter...