The front line support team will be unavailable to answer questions until May 27th 2019 as we are celebrating Memorial Day. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!
Why is Inbreeding Coefficient not always displayed in a vcf file?
Using GATK v3.5-0, I've generated vcf files for a group of 223 whole genomes, one per interval of the reference genome. I created GVCF files using Haplotype Caller in 'GVCF' mode, before using CombineGVCFs to generate cohorts, which were entered into GenotypeGVCFs runs. Curiously, not every SNP or indel called in the vcf file has an Inbreeding Coefficient value attributed to it. This happens with both SNPs and Indels. Below are two lines from one of the output vcf files - you can see that the top entry contains no Inbreeding Coefficient value, whilst the lower entry does.
NW_014444451.1 5724 . A C 29058.86 . AC=166;AF=0.382;AN=434;BaseQRankSum=0.322;ClippingRankSum=0.244;DP=1982;ExcessHet=0.0000;FS=0.000;MLEAC=172;MLEAF=0.396;MQ=59.39;MQRankSum=0.278;QD=31.55;ReadPosRankSum=0.00;SOR=0.711 GT:AD:DP:GQ:PGT:PID:PL 0/0:4,0:4:0:.:.:0,0,56 0/1:3,4:7:99:0|1:5714_G_C:159,0,829
NW_014444451.1 5731 . TG T 28376.05 . AC=167;AF=0.383;AN=436;BaseQRankSum=0.387;ClippingRankSum=0.00;DP=1963;ExcessHet=0.0000;FS=0.000;InbreedingCoeff=0.3346;MLEAC=174;MLEAF=0.399;MQ=59.60;MQRankSum=-5.300e-02;QD=31.49;ReadPosRankSum=0.00;SOR=0.743 GT:AD:DP:GQ:PGT:PID:PL 0/0:5,0:5:0:.:.:0,0,790/1:2,4:6:99:0|1:5714_G_C:200,0,694
I plan to use hard filters to generate a list of HQ variants that I can then feed into BQSR and VQSR in a second run-through of the GATK best practices, and I was going to include Inbreeding Coefficient in this. Would someone be able to explain to me why not all variant sites have Inbreeding Coefficient values and whether this will impact upon my filtering, please?