Service note: Geraldine is on vacation this week; other members of GSA will be responding to questions, but they have a lot of work besides this, so be aware that responses may be a little slower than usual. Thank you for your patience.

Why didn't the Unified Genotyper call my SNP? I can see it right there in IGV!

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 2,239Administrator, GSA Official Member admin

Just because something looks like a SNP in IGV doesn't mean that it is of high quality. We are extremely confident in the genotype likelihoods calculations in the Unified Genotyper (especially for SNPs), so before you post this issue in our support forum you will first need to do a little investigation on your own.

To diagnose what is happening, you should take a look at the pileup of bases at the position in question. It is very important for you to look at the underlying data here.

Here is a checklist of questions you should ask yourself:

  • How many overlapping deletions are there at the position?

The genotyper ignores sites if there are too many overlapping deletions. This value can be set using the --max_deletion_fraction argument (see the UG's documentation page to find out what is the default value for this argument), but be aware that increasing it could affect the reliability of your results.

  • What do the base qualities look like for the non-reference bases?

Remember that there is a minimum base quality threshold and that low base qualities mean that the sequencer assigned a low confidence to that base. If your would-be SNP is only supported by low-confidence bases, it is probably a false positive.

Keep in mind that the depth reported in the VCF is the unfiltered depth. You may think you have good coverage at that site, but the Unified Genotyper ignores bases if they don't look good, so actual coverage seen by the UG may be lower than you think.

  • What do the mapping qualities look like for the reads with the non-reference bases?

A base's quality is capped by the mapping quality of its read. The reason for this is that low mapping qualities mean that the aligner had little confidence that the read is mapped to the correct location in the genome. You may be seeing mismatches because the read doesn't belong there -- you may be looking at the sequence of some other locus in the genome!

Keep in mind also that reads with mapping quality 255 ("unknown") are ignored.

  • Are there a lot of alternate alleles?

By default the UG will only consider a certain number of alternate alleles. This value can be set using the --max_alternate_alleles argument (see the UG's documentation page to find out what is the default value for this argument). Note however that genotyping sites with many alternate alleles is both CPU and memory intensive and it scales exponentially based on the number of alternate alleles. Unless there is a good reason to change the default value, we highly recommend that you not play around with this parameter.

  • Are you working with SOLiD data?

SOLiD alignments tend to have reference bias and it can be severe in some cases. Do the SOLiD reads have a lot of mismatches (no-calls count as mismatches) around the the site? If so, you are probably seeing false positives.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Comments

  • nikmalnikmal Posts: 8Member

    Thank you for this useful information!

    I have a question however: if I view a BAM file with recalibrated base qualities in IGV - does IGV then report the recalibrated score or the score before recalibration? That is, the "Base phred quality" score that shows in the info box that pops up when you hover over a locus/read.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 2,239Administrator, GSA Official Member admin

    IGV will report the recalibrated qualities, because they are written in place of the originals. It is possible to still see the original qualities in the read information; if you set the appropriate flag to keep the original quals when you did the second step of recalibration, they will be written to the OQ tag.

    Geraldine Van der Auwera, PhD

    nikmal
Sign In or Register to comment.