This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Base recalibration and variant calling(with joint genotyping) for a non-model organism
In case we do not have a database of known sites for a non model organism, what is the best strategy to do the base recalibration and variant calling with joint genotyping when we have several samples ?
According to the GATK best practices, when we do not have a data base of known sites, it is recommended to run HaplotypeCaller and hard filter the SNPs then use the resultings SNPs for the base recalibration until convergence. As I have many samples (31), I have to do this on all the samples. However, as, after the base recalibration, I want to do the final snp calling with joint genotyping, I think this approach is going to be really time consuming...
So, I'm wondering whether I can execute an initial snp calling (with HaplotypeCaller) on each sample, then joint genotyping them to obtain a global snp data that I will manually filter. This joint and filtered snp database could then be used for the base recalibration and a second joint genotyping. I think that it will be faster this way. However, I'm not sure whether there are caveats in this approach and I would like to have your advices.
Thank you very much in advance.