Using HaplotypeCaller to detect low-level mosaics

We are interested in calling variants on a sample that may contain a small amount of mosaicism. For instance, if there are 300 reads over a particular base, and six of those reads contain an SNV or InDel, then we would like a call to be made. Currently, the six reads are filtered out by HaplotypeCaller, and do not even appear in the GVCF output.

The decision by HaplotypeCaller appears to be based on the reasonable assumption that a variant in a haploid organism must be homozygous reference, heterozygous with a 50:50 ratio, or homozygous alternate. Therefore, the probability of a location being in each of these three states can be calculated based on the sequencing error rate as estimated by the base quality score (for the homozygous options) and the binomial distribution (for the heterozygous option). With these assumptions, an allele depth of 294,6 would have a very low probability for all three options, as it does not seem reasonable to have six errors in the same place, and it also does not seem reasonable to obtain an unbalance of 294,6 with a random selection with probability 0.5

We would like it very much if there was an option to relax the probability calculation for heterogyzous variants, so that there is no longer the assumption that the allele balance is 50:50. This would increase the chance that a variant is called when there is a very strong mis-balance, as it would no longer be a low probability event to report a 294,6 allele depth.

For extra bonus points, HaplotypeCaller could mark in the vcf file when a variant is more likely to be due to mosaicism than 50:50 heterozygosity.


    Can you confirm the 6 reads with the SNV or Indel have good mapping quality scores and good base quality scores at the site?

    I think this thread will answer your question:


    Another option may be to use a somatic caller like MuTect.

    The six reads were hypothetical, so no I can't confirm that. However, I can confirm that the kind of (human) variants that we will be looking for will have these characteristics, and will have real reads with good mapping quality and base quality.

    I understand the idea of raising the ploidy as described in the other article. While this may kind of solve the problem, it is not an ideal solution, as I'm looking for mosaics with a content of 2% and upwards, which would correspond to a ploidy of 50, but in actual fact I cannot predict the actual mosaic content. I would be looking for an option to alter the way that HaplotypeCaller determines the probabilities that it uses to make the call.

    Geraldine, thanks for suggesting MuTect - I shall have a look.

