VQSR background GVCFs for new exome library

Hello GATK team,

Hope you are all doing good.

We have been performing VQSR on our exome samples fine until now, but now we are in a bit of a bind. All of our exome samples up to this point belong to Agilent's Medical Exome kit v1 - so the GVCF files created for VQSR will have data for the genomic regions covered in this kit. Now we are looking into upgrading our exome kit from v1 to v2, and v2 kit has 1.6MB of new genomic region/targets (non contiguous) that is/are not covered in v1. From my past experience - I know that VQSR step fails if there is not enough data for a variant site in the GVCFs used for VQSR - so I obviously cannot use v1's GVCF files to perform VQSR on data generated using v2 kit's GVCF files (due to the extra 1.6MB of new genomic regions in v2).

So, what are my options to generate/gather GVCFs to perform VQSR on samples sequenced with v2 kit? Do we have to sequence at least 30 samples on v2 kit before performing VQSR or can I just re-create GVCFs on v1 samples with v2 bed file (that would result in empty data for that 1.6MB region)?


Best Answer


