Read more about it here!
VariantRecalibrator difference between calling variants per individual and jointly
We are working on a small targeted capture. We first called variants on a sample by sample basis and then ran VariantRecalibrator. This did not give us great results but we did see decent separation in the 2D plots on some features.
Now we have redone the Variant calling but this time on all samples jointly. When we run the VariantRecalibrator, we get very poor separation between different features/INFO variables.
This made us wonder about the differences between single and multi sample variant calling. The values that get written to the INFO field in a sample specific variant calling are obviously specific to one sample whereas they are the sum/average over all samples in a multi-sample vcf file. Wouldn't this affect the VariantRecalibrator as the specifics of the INFO field variables for the variant sample(s) are "lost" in the averaging over all samples (most of which are not variant)?
In particular, is there not a loss of information in the INFO field values (which is critical to the VariantRecalibrator) when one does multi-sample variant calling?
Can this explain the change in separation we see between single and multisample variant calling?