VQSR working on large dataset
We have a large cohorts of 30x WGS with more than 3000 samples follwing the Best Practices using GATK 4.0.
To joint calling variants for the cohorts, we combined gVCFs and performed genotyping on each chromosome ( by CombineGVCF and GenotypeGVCF module). After that, variants from each chromosome were merged into a single VCF.
We now need to filter variants by VQSR . However, it required too much memory and time. I read the previous post about "Speeding up VQSR for 2000+ WGS samples". @Geraldine_VdAuwera mentioned that GATK team is working on the problem. I wonder whether is a solution for GATK4 now ?