Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.5 is out. See the GATK4 beta page for download and details.

Unified Genotyper reports high QUAL at DP=1 invariant sites

I've noticed that when ploidy is set to one, the unified genotyper will report fairly high QUAL scores at sites where the depth of coverage is one read, and that read reports the reference allele. There's a vcf that exemplifies this at:

/gsap/assembly_repository/interim/snpcall/B318/G32577/G32577_gatk_236x_4281_v1/filtered.vcf

which includes:

Pf3D7_01_v3 7045 . T . 32.95 PASS AC=0;AF=0.00;AN=2;DP=1;MQ=17.00;MQ0=0 GT:DP 0/0:1
Pf3D7_01_v3 7046 . G . 8.91 LowConfidence;LowQual AC=0;AF=0.00;AN=2;DP=1;MQ=17.00;MQ0=0 GT:DP 0/0:1

This behavior is somewhat surprising. I realize that the QUAL threshold can simply be raised to filter these sites, but I would expect the QUAL score to be similar to similarly covered variant sites.

Another example is /home/unix/emoss/vcf/individual/deprecated/haploid/SenT001.08.vcf.gz, viewable with
zcat /home/unix/emoss/vcf/individual/deprecated/haploid/SenT001.08.vcf.gz | grep DP=1\; | grep PASS | head

Here again the variable sites with one read gets a failing score, and the invariant sites pass.

These were generated with the latest GATK.

Thanks for reading! This isn't a pressing issue to me now that I've identified it, but it has in the past led to some erroneous metrics of genome coverage and sequencing quality.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hmm, that is odd. When you say latest version, you mean the latest release (2.4-9) or the nightly build? We're about to release 2.5 and we've made quite a lot of changes so it would be interesting if these calls are affected in any way.

  • emossemoss Member

    I meant the latest release. At the moment my approach is to call diploid, which doesn't have this problem, and then remove heterozygotes.

Sign In or Register to comment.