Incorrect Heterzygous Calls

Hello!
I'm pretty new to this so pardon me if this is formatted incorrectly or I'm missing something obvious.
It looks like I have the same problem as: http://gatkforums.broadinstitute.org/gatk/discussion/2319/haplotypecaller-incorrectly-making-heterozygous-calls-again but their problems seemed to be fixed simply by updating, and I'm using the current version of GATK (3.5.0). Any idea what I can do to fix this?

When processing this, I mostly used the GATK Best Practices. However, I did use joint calling instead of the mixed method, since I figured with only 93 samples, for a handful of genes, it wouldn't benefit too heavily from more efficient computation, and plugging the whole BAM in at once seemed easier.

Here's a shot from IGV of a portion of one of my samples. The circled SNPs are the ones in question. For all 3, all of the reads in the BAM are for the alternate, but the VCF produced by HaplotypeCaller has them as heterozygous (and passing filters). It seems to be pretty consistent in which SNPs it messes up. On the other hand, some samples are fine, and some are not, but I don't see a pattern as to why some work and some don't. I'd be happy to provide any other information that would help resolve this.

Thank you for any help you can provide,
Jessica Patnode

image

Answers

  • Hmm. I notice that for each amplicon you're sequencing, IGV displays a dashed line on the right side, and the problematic calls are on the left side. I wonder if the reads aren't soft-clipped on the end, and HC is (erroneously, in this case) using those soft-clipped ends in the genotype call? Running with bamout over this region would, I think, pretty quickly tell you if I'm right or wrong in that assessment.

    One other comment - the Best Practices are written for whole genome or hybridization capture experiments. I don't know one way or the other what you did, but I would warn you that you probably can't just use those recommendations directly for a Haloplex/TSCA/whatever brand name this is experiment.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Have a look at this doc, it addresses missing variants but most of the recommendations (and debugging techniques) are applicable to genotypes that seem wrong as well. Make sure also to turn on display of soft clips in IGV when toubleshooting this sort of problem.

  • kap269kap269 NAUMember

    Thank you both! Looking at the bamout in IGV it does look like it's something to do with the soft clipping. Some reads were realigned to have the soft clips overlapping with the sites with the alternate, leading to a heterozygous call. However, the CIGAR for those reads has them flagged as soft clipped, so shouldn't HaplotypeCaller know to ignore them? Do I need to cut them out?

    I should note we have good Sanger sequence data for this region for a few samples (not all unfortunately). In the Sanger set, these het calls are homo alt.

    image

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    HaplotypeCaller uses the soft-clipped sequence on purpose because those are often signs of a large insertion event that the mapper wasn't able to deal with. You could disable this behavior by using -dontUseSoftClips (but check the HC doc in case I got the spelling or capitalization wrong). Be aware though that those soft-clips could be meaningful, and that you may find your power to discover indels diminished.

Sign In or Register to comment.