To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Default to reference base


I'm working with some previously sequenced and analyzed data, and I am curious if there is a bias to the reference base. Specifically, about 3 years ago, our group sequenced ~40 genomes of C. albicans patient isolates. These isolates were sequenced at relatively low levels of coverage (~25x), and validation of variant sites with sequenom revealed some problems in SNP calling.

My question is this - does GATK (specifically older versions of GATK, from ab. 2011) default to the reference base, or prefer the reference base, at areas of low coverage? More generally, are there any circumstances under which GATK would be biased towards reporting the reference base as opposed to either a (1) variant base, or (2) an indeterminant base?

Thanks in advance for your help!

Chris Ford

Best Answer


Sign In or Register to comment.