The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

GATK assigns a GT in contrast to AD

dcittarodcittaro Member Posts: 31

Dear Team,
I was looking at a VCF file produced with UnifiedGenotyper (2.4.9). It is a multisample call and, for a limited number of calls, I have genotypes that are telling the exact opposite of AD field, as in this case

GT:AD:DP:GQ:PL  1/1:10,1:11:3:24,3,0

or

GT:AD:DP:GQ:PL  1/1:18,1:19:3:22,3,0

I have ten reads supporting the reference allele, 1 read supporting the alternate and the genotype is 1/1. This is happening in ~200 sites per sample in my dataset. I've checked the other way around and I found <100 sites in which the genotype is called 0/0 and the AD suggests 1/1 or (more frequently) 0/1. This seems to happen in sites in which the number of variant samples is low (no more than 3 samples in a set of ~50 samples) and it is puzzling me a lot.
Can you give me a comment on why this is happening?
Thanks

d

Answers

  • pdexheimerpdexheimer Member, Dev Posts: 544 ✭✭✭✭

    Do the site-level annotations support these being real variants? My expectation is that most of these would be removed by VQSR…

    I suspect that most of these reads are of fairly mediocre quality. Notice the PL annotations - in both of these sites, the heterozygous case is nearly as likely as the hom alt (and the hom var case isn't all that far behind). I think this is just a case of really ambiguous data, and the caller just picks as well as it can.

  • dcittarodcittaro Member Posts: 31

    There are "PASS" sites after VQRS that suffer the same issue...

  • pdexheimerpdexheimer Member, Dev Posts: 544 ✭✭✭✭

    I still think it's a read/base quality issue - a lot of low-quality sites with one allele versus a single high-quality site with another

  • dcittarodcittaro Member Posts: 31

    I guess you are right, the problem is GATK is using all the reads supporting the site to make the call (after BQRS), the sum of AD is equal to DP. I may check each site manually to see what's happening, though

Sign In or Register to comment.