This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Combining variants from different WES capture types
I've googled on GATK forum with no success for the following topic. I have a set of wes (around 110 samples in total) all of them from an specific population. The aim of the project is to study population genetic variation. All samples have been processed with GATK 4.1.2. The issue is that I have two subsets of samples, each generated with a different capture technology.
Not sure how to proceed to study variants for the whole set since it is desired to reduce the batch effect as much as possible. I've run the following: gVCF files were generated for each sample and then a joint analysis has been applied using all gVCF files (GenomicsDBImport and genotypeGVCFs). Not sure if this approach is the best one (it is the same as assuming a single capture technology). For GenomicsDBImport, the intervals used were all chromosomes although another try would be to build de database using a specific set of regions given just by the intersection of the two capture BEDs.
Another approach would be to perform joint variant calling separately for each subset and then combine results somehow (not sure how) using again the intersection of capture BEDs, but may be this might introduce a worse batch effect.