This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Minimising Batch Effects in VariantRecalibration
I've checked the best practices and documentation to look for correct way to run VQSR to prevent the dreaded "No data found" error. What keeps popping up is not to run single-sample vcf's but rather to run all samples together as one multisample vcf.
What then is the correct way to correct for batch effects using this input for the VartiantRecalibration tool? Is this something I should be worried about?
Say I use a multisample vcf as input where N=100 and then run another instance where the input sample has N=80, are there inherent dangers in batch effects between the 2 runs?
Would you recommend running each sample in a batch with the 1000G as the multisample vcf i.e. N=1001?
Any other thoughts or ideas?