If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Default to reference base
I'm working with some previously sequenced and analyzed data, and I am curious if there is a bias to the reference base. Specifically, about 3 years ago, our group sequenced ~40 genomes of C. albicans patient isolates. These isolates were sequenced at relatively low levels of coverage (~25x), and validation of variant sites with sequenom revealed some problems in SNP calling.
My question is this - does GATK (specifically older versions of GATK, from ab. 2011) default to the reference base, or prefer the reference base, at areas of low coverage? More generally, are there any circumstances under which GATK would be biased towards reporting the reference base as opposed to either a (1) variant base, or (2) an indeterminant base?
Thanks in advance for your help!