It looks like you're new here. If you want to get involved, click one of these buttons!
The title kind of explains the situation, but basically I've got a SNP that shows up in IGV that I would call homozygous that the Unified Genotyper has labeled as heterozygous. The total read depth is 35, 32 of which were called as a SNP (A-->T), 2 were called the reference base (A), and one read contained a G. I went through your article describing why a SNP visible in IGV might not get called, and none of those five questions explained this situation. I didn't alter the --hets option at all either. Any help you might be able to offer would be greatly appreciated.
Geraldine_VdAuwera
Posts: 2,238 admin
OK, that makes sense. You need to update to upgrade to a more recent version.
What's happening is that you probably have some low-grade contamination in your sequence data (those non-ref reads). The UG can't just dismiss them, so it calls the site as heterozygous, but if you look at the genotype quality (GQ) I bet you'll find that it's really low -- which means that the UG knows it's a crappy call.
In version 2.2 we added a filter that downsamples per allele, which has the net effect of filtering out low-grade contamination (see the release highlights for 2.2 for more details). This feature is turned on by default, so if you run again with the latest version, those contaminating reads should get downsampled and the UG will be able to make the hom var call that you want to see.
In future, keep in mind that you should look at the GQ before getting too focused on the actual genotype.
Good luck!
Geraldine Van der Auwera, PhD
Answers
Hi there, can you tell me what version you are using and what is your command line?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I'm using v2.1-11-g13c0244, and here's my command line usage (I'm using it via Galaxy wrapper, not sure if that makes a difference):
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks for the help, I've made the upgrade and am waiting to see if this resolves the issue. Regardless, it seems like I'd want to check the GQ for possible het var calls, is there a score below which you would say, "Hmm, that seems questionable"?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •It's all relative to the quality of your dataset. One way to empirically derive a threshold is to plot the distribution of GQs for your callset and look in closer detail at a few subsets of calls along the distribution.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •That makes sense. Thanks for the suggestion, I'll give it a shot.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •