This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Force GATK HaplotypeCaller/GenotypeGVCFs to report genotypes at a whitelist of sites, even if WT?
We have a project generating WGS and WES data. We have nearly 1000 samples and currently perform one round of HaplotypeCaller/GenotypeGVCFs on the WGS data and one with the WES to produce two VCFs. We run CombineVariants on these to make our final VCF.
An inconvenient problem happens with this merge. If VCF 1 has coverage at a site, but all subjects are WT, that site is omitted from the VCF. Therefore when you CombineVariants, there is no difference between actual 'No Data', and 'all wild-type'.
For our purposes, we have a whitelist of sites where we would like to force genotypes to get reported (including if they are all WT). Is there a mechanism in the GATK tools to do this? I assume it would need to occur in HaplotypeCaller/GenotypeGVCFs, since after this point that information is lost.