The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at

GATK assigns a GT in contrast to AD

dcittarodcittaro Member Posts: 31

Dear Team,
I was looking at a VCF file produced with UnifiedGenotyper (2.4.9). It is a multisample call and, for a limited number of calls, I have genotypes that are telling the exact opposite of AD field, as in this case

GT:AD:DP:GQ:PL  1/1:10,1:11:3:24,3,0


GT:AD:DP:GQ:PL  1/1:18,1:19:3:22,3,0

I have ten reads supporting the reference allele, 1 read supporting the alternate and the genotype is 1/1. This is happening in ~200 sites per sample in my dataset. I've checked the other way around and I found <100 sites in which the genotype is called 0/0 and the AD suggests 1/1 or (more frequently) 0/1. This seems to happen in sites in which the number of variant samples is low (no more than 3 samples in a set of ~50 samples) and it is puzzling me a lot.
Can you give me a comment on why this is happening?



  • pdexheimerpdexheimer Member, Dev Posts: 543 ✭✭✭✭

    Do the site-level annotations support these being real variants? My expectation is that most of these would be removed by VQSR…

    I suspect that most of these reads are of fairly mediocre quality. Notice the PL annotations - in both of these sites, the heterozygous case is nearly as likely as the hom alt (and the hom var case isn't all that far behind). I think this is just a case of really ambiguous data, and the caller just picks as well as it can.

  • dcittarodcittaro Member Posts: 31

    There are "PASS" sites after VQRS that suffer the same issue...

  • pdexheimerpdexheimer Member, Dev Posts: 543 ✭✭✭✭

    I still think it's a read/base quality issue - a lot of low-quality sites with one allele versus a single high-quality site with another

  • dcittarodcittaro Member Posts: 31

    I guess you are right, the problem is GATK is using all the reads supporting the site to make the call (after BQRS), the sum of AD is equal to DP. I may check each site manually to see what's happening, though

Sign In or Register to comment.