This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Allele balance filtering
I am getting a number of incorrect/questionable genotype calls after proceeding through the "Best Practices" guidelines. These incorrect calls are resulting in a lot of false positives in my subsequent association analysis. After discussing these issues with a couple of people, I have been advised to try setting an AB filter of 0.2. There are a few references to an AB filter on the GATK site but none (that I can find) that come out and state the exact commands needed to create this filter. I tried using VariantFiltration:
java -Xmx10g -jar /shared/bin/gatk/GenomeAnalysisTK.jar -T VariantFiltration -R /mnt/DATA/Cores/hiseq2000/annie/reference/Homo_sapiens/GRCh37/b37/human_g1k_v37.fasta --variant july2017.biallelic.noindel.recode.vcf --filterExpression "AB < 0.2" --filterName ABfilter -o july2017.biallelic.noindel.filtered.vcf
This just resulted in numerous warnings of "Interpreter - ![0,2]: 'AB < 0.2;' undefined variable AB". From reading other forums, I believe that I somehow have to get the "AB" variable into my vcf before I can filter on the value. I have no idea how to do that and at what step in the best practices it should have been done.
The main issue seems to be true ref/ref genotypes being called heterozygous. We have sanger sequenced a dozen or so and they are all ref/ref.
6 32557529 rs1059547 C G 35212 PASS . GT:AD:DP:FT:GQ:PGT:PID:PL:PP 0/1:132,30:162:PASS:99:.:.:943,0,5348:944,0,5353 0/1:129,38:167:PASS:99:.:.:1149,0,5328:1150,0,5333 0/0:120,9:129:lowGQ:1:0|1:32557529_C_G:0,2,5038:0,1,5040 0/1:120,25:145:PASS:99:.:.:520,0,5041:521,0,5046 0/1:148,33:181:PASS:99:.:.:836,0,5970:837,0,5975 0/0:148,12:160:PASS:50:.:.:0,51,6315:0,50,6319 0/1:163,22:185:PASS:99:0|1:32557529_C_G:367,0,6746:368,0,6751 0/1:64,10:74:PASS:99:.:.:221,0,2741:222,0,2746 0/1:115,25:140:PASS:99:.:.:549,0,4823:550,0,4828 0/0:163,0:163:PASS:99:.:.:0,120,1800:0,119,1804 0/1:88,8:96:lowGQ:10:.:.:9,0,3596:10,0,3601 6 57398118 rs147643173 G T 32652.9 PASS . GT:AD:DP:FT:GQ:PGT:PID:PL:PP 0/1:54,21:75:PASS:99:0|1:57398117_T_TA:498,0,2153:500,0,2156 0/1:62,27:89:PASS:99:0|1:57398117_T_TA:935,0,2406:937,0,2409 0/1:34,8:42:PASS:99:0|1:57398117_T_TA:213,0,1398:215,0,1401 0/1:58,0:58:lowGQ:2:.:.:0,0,1045:2,0,1048 0/1:28,16:44:PASS:99:0|1:57398117_T_TA:588,0,1103:590,0,1106 0/1:33,17:50:PASS:99:0|1:57398117_T_TA:534,0,1336:536,0,1339 0/1:26,5:31:PASS:99:0|1:57398117_T_TA:132,0,1050:134,0,1053 0/1:35,14:49:PASS:99:0|1:57398117_T_TA:482,0,1421:484,0,1424 0/1:50,10:60:PASS:99:0|1:57398117_T_TA:185,0,2171:187,0,2174 0/1:40,22:62:PASS:99:0|1:57398117_T_TA:768,0,1600:770,0,1603 0/1:61,19:80:PASS:99:0|1:57398117_T_TA:595,0,2540:597,0,2543
Ideally I would like to apply something on a per site/per sample basis that would set any heterozygous call that does not meet AB threshold of 0.2 to be called as ./. I don't know if this is even possible. I am open to any other suggestion. I know I can set the ones with the lowGQ flag to ./. if I wanted but not all of the problem sites indicate lowGQ. For reference I am using GATK Version=3.4-46-gbc02625.