GATK assigns a GT in contrast to AD

dcittarodcittaro Posts: 31Member

Dear Team,
I was looking at a VCF file produced with UnifiedGenotyper (2.4.9). It is a multisample call and, for a limited number of calls, I have genotypes that are telling the exact opposite of AD field, as in this case

GT:AD:DP:GQ:PL  1/1:10,1:11:3:24,3,0


GT:AD:DP:GQ:PL  1/1:18,1:19:3:22,3,0

I have ten reads supporting the reference allele, 1 read supporting the alternate and the genotype is 1/1. This is happening in ~200 sites per sample in my dataset. I've checked the other way around and I found <100 sites in which the genotype is called 0/0 and the AD suggests 1/1 or (more frequently) 0/1. This seems to happen in sites in which the number of variant samples is low (no more than 3 samples in a set of ~50 samples) and it is puzzling me a lot.
Can you give me a comment on why this is happening?



  • pdexheimerpdexheimer Posts: 501Member, GATK Dev, DSDE Dev mod

    Do the site-level annotations support these being real variants? My expectation is that most of these would be removed by VQSR…

    I suspect that most of these reads are of fairly mediocre quality. Notice the PL annotations - in both of these sites, the heterozygous case is nearly as likely as the hom alt (and the hom var case isn't all that far behind). I think this is just a case of really ambiguous data, and the caller just picks as well as it can.

  • dcittarodcittaro Posts: 31Member

    There are "PASS" sites after VQRS that suffer the same issue...

  • pdexheimerpdexheimer Posts: 501Member, GATK Dev, DSDE Dev mod

    I still think it's a read/base quality issue - a lot of low-quality sites with one allele versus a single high-quality site with another

  • dcittarodcittaro Posts: 31Member

    I guess you are right, the problem is GATK is using all the reads supporting the site to make the call (after BQRS), the sum of AD is equal to DP. I may check each site manually to see what's happening, though

Sign In or Register to comment.