PID FORMAT field in GenotypeGVCFs output
I'd be grateful for your help in understanding how GATK add phase information (PID field) in GenotypeGVCFs.
Here's an example of two lines from a VCF I'm getting for running GenotypeGVCFs on a set of samples:
chr1 8864083 . T C 462.27 . AC=4;AF=0.500;AN=8;BaseQRankSum=-4.950e-01;ClippingRankSum=0.406;DP=74;FS=0.000;MLEAC=4;MLEAF=0.500;MQ=60.00;MQRankSum=0.306;QD=6.42;ReadPosRankSum=1.54;SOR=0.674 GT:AD:DP:GQ:PL 0/1:22,12:34:99:205,0,518 0/1:10,4:14:60:60,0,220 0/1:3,3:6:71:71,0,72 0/1:10,8:18:99:158,0,238 ./.:2,0:2
chr1 8864426 . C T 5302.72 . AC=5;AF=0.500;AN=10;BaseQRankSum=0.815;ClippingRankSum=-6.270e-01;DP=316;FS=7.664;MLEAC=5;MLEAF=0.500;MQ=60.00;MQRankSum=-6.130e-01;QD=17.05;ReadPosRankSum=-1.240e-01;SOR=0.777 GT:AD:DP:GQ:PGT:PID:PL 0/1:24,66:90:99:0|1:8864426_C_T:1596,0,649 0/1:20,62:82:99:.:.:1502,0,536 0/1:24,37:61:99:0|1:8864426_C_T:921,0,647 0/1:22,41:63:99:0|1:8864426_C_T:1014,0,717 0/1:2,13:15:74:0|1:8864426_C_T:304,0,74
What's the interpretation for the phase information of the last sample in chr1:8864426? in position chr1:8864083 its genotype is not reported but in the proceeding position chr1:8864426 it is phased 0|1 WRT position chr1:8864083.
If I subsequently run SelectVariants for that sample, with --excludeNonVariants this phase information is retained but clearly looses context. So another question is if there is a way to change the PID field when running SelectVariants or VariantFiltration in case the position it is phased with is filtered?
Thanks a lot