Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Why is Inbreeding Coefficient not always displayed in a vcf file?
Using GATK v3.5-0, I've generated vcf files for a group of 223 whole genomes, one per interval of the reference genome. I created GVCF files using Haplotype Caller in 'GVCF' mode, before using CombineGVCFs to generate cohorts, which were entered into GenotypeGVCFs runs. Curiously, not every SNP or indel called in the vcf file has an Inbreeding Coefficient value attributed to it. This happens with both SNPs and Indels. Below are two lines from one of the output vcf files - you can see that the top entry contains no Inbreeding Coefficient value, whilst the lower entry does.
NW_014444451.1 5724 . A C 29058.86 . AC=166;AF=0.382;AN=434;BaseQRankSum=0.322;ClippingRankSum=0.244;DP=1982;ExcessHet=0.0000;FS=0.000;MLEAC=172;MLEAF=0.396;MQ=59.39;MQRankSum=0.278;QD=31.55;ReadPosRankSum=0.00;SOR=0.711 GT:AD:DP:GQ:PGT:PID:PL 0/0:4,0:4:0:.:.:0,0,56 0/1:3,4:7:99:0|1:5714_G_C:159,0,829
NW_014444451.1 5731 . TG T 28376.05 . AC=167;AF=0.383;AN=436;BaseQRankSum=0.387;ClippingRankSum=0.00;DP=1963;ExcessHet=0.0000;FS=0.000;InbreedingCoeff=0.3346;MLEAC=174;MLEAF=0.399;MQ=59.60;MQRankSum=-5.300e-02;QD=31.49;ReadPosRankSum=0.00;SOR=0.743 GT:AD:DP:GQ:PGT:PID:PL 0/0:5,0:5:0:.:.:0,0,790/1:2,4:6:99:0|1:5714_G_C:200,0,694
I plan to use hard filters to generate a list of HQ variants that I can then feed into BQSR and VQSR in a second run-through of the GATK best practices, and I was going to include Inbreeding Coefficient in this. Would someone be able to explain to me why not all variant sites have Inbreeding Coefficient values and whether this will impact upon my filtering, please?