We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

using SNP database of known variants from RAD for BQSR on whole genome

Is it a bad idea to use a dataset of known variants determined from RAD seq data to input for BQSR for whole-genome resequencing data? After reading the description of the tool, my understanding is that novel variation present in the whole genome sequence, which is not a previously known variant would be treated as a sequencing error, for the purposes of finding associations between sequencing errors and genomic context, machine cycle, etc. and then the BQSR will adjust quality scores based on these models. Thus, if these variants are not in fact sequencing errors, but are also not associated with any of the putative error covariates, will their quality scores remain fine? Am I correct in this, or is it the case that these novel variants will have their quality scores downgraded simply by virtue of being assumed to be an error, even if they are not found to be associated with putative error covariates?

Thanks in advance for your advice.


Sign In or Register to comment.