This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Does the number of sites used for training increase with the number of resources?
I have three resources that I could use for training VariantRecalibrator. They correspond to variants discovered by WGS in other studies, so I want to use them as Non-true sites training resources
The True sites training resource corresponds to sites found through genotyping array.
My intuition is that using the three (Non-true sites) training resources would provide more data points to build the recalibration model.
Also, it was previously explained to me that only a subset (2.5 million) of the variants in the training resources is used for training.
Are those 2.5 Million sites sampled from each training resource, so that if I use 3 (Non-true sites) training resource, there will be 7.5 Million sites to train the recalibration model (plus the sites in the true training resource)?