This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Targeted Sequencing: appropriate to use BaseRecalibrator (BQSR) on 150M bases over small intervals?
I know the best practices document is on its way, but I recently came across this quite new page on how to use the -L argument as was surprised to see that:
Small Targeted Experiments
The same guidelines as for whole exome analysis apply except you do not run BQSR on small datasets.
I knew that VQSR could not be used on small targeted experiments, but didn't know that BQSR should not be used. The Base Quality Score Recalibration page includes the note:
A critical determinant of the quality of the recalibation is the number of observed bases and mismatches in each bin. The system will not work well on a small number of aligned reads. We usually expect well in excess of 100M bases from a next-generation DNA sequencer per read group. 1B bases yields significantly better results.
If I have 150Mbases of data over a set of small target intervals, does that count as a small dataset for which I should avoid using BQSR? What about 1B bases, again over a small set of intervals? What are the best practices in this case?
Thanks for your help!