Service notice: Several of our team members are on vacation so service will be slow through at least July 13th, possibly longer depending on how much backlog accumulates during that time. This means that for a while it may take us more time than usual to answer your questions. Thank you for your patience.

Default to reference base


I'm working with some previously sequenced and analyzed data, and I am curious if there is a bias to the reference base. Specifically, about 3 years ago, our group sequenced ~40 genomes of C. albicans patient isolates. These isolates were sequenced at relatively low levels of coverage (~25x), and validation of variant sites with sequenom revealed some problems in SNP calling.

My question is this - does GATK (specifically older versions of GATK, from ab. 2011) default to the reference base, or prefer the reference base, at areas of low coverage? More generally, are there any circumstances under which GATK would be biased towards reporting the reference base as opposed to either a (1) variant base, or (2) an indeterminant base?

Thanks in advance for your help!

Chris Ford

Best Answer


Sign In or Register to comment.