Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Targeted Sequencing: appropriate to use BaseRecalibrator (BQSR) on 150M bases over small intervals?
I know the best practices document is on its way, but I recently came across this quite new page on how to use the -L argument as was surprised to see that:
Small Targeted Experiments
The same guidelines as for whole exome analysis apply except you do not run BQSR on small datasets.
I knew that VQSR could not be used on small targeted experiments, but didn't know that BQSR should not be used. The Base Quality Score Recalibration page includes the note:
A critical determinant of the quality of the recalibation is the number of observed bases and mismatches in each bin. The system will not work well on a small number of aligned reads. We usually expect well in excess of 100M bases from a next-generation DNA sequencer per read group. 1B bases yields significantly better results.
If I have 150Mbases of data over a set of small target intervals, does that count as a small dataset for which I should avoid using BQSR? What about 1B bases, again over a small set of intervals? What are the best practices in this case?
Thanks for your help!