Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
GATK sample genotype AD for alternative alleles
I have a question regarding the interpretation of AD for alternative alleles. I called variants and subsequently performed joined genotyping, leaving me with a vcf file. Before further utilizing the SNPs, I want to perform hard filtering. The samples I genotyped are haploid, but I observed a couple of SNPs that have been called either as ref or alt, yet having AD for multiple alternative alleles. Looking into them, it seems that these are regions that either have been collapesed in the the reference genome assembly, or have been recently duplicated, thus leading to a 'heterozygous' read mapping. Thus, I aim to also filter on the AD that support the ref or alt calls to only have genotypes per sample that either support ref or alt. However, I also have cases where I observed multiple AD, for example:
Chr1 402667 . C T 10334.83 PASS AC=10;AF=0.270;AN=37;DP=1408;FS=0.000;GQ_MEAN=906.38;GQ_STDDEV=623.77;MLEAC=10;MLEAF=0.270;MQ=60.00;MQ0=0;NCC=0;QD=27.42;SOR=1.521 GT:AD:DP:GQ:PL 0:3,0:3:99:0,119 0:30,0:30:99:0,1080 0:98,0:98:99:0,1800 0:5,0:5:99:0,135 1:0,2,33:35:99:1144,0 ........... (see last genotype call).
If I checked the read mappings in IGV, the C->T SNP is supported by 33 reads and an alternitive C->G with two reads. How come that the AD field shows the depth for C->T as the third option, as I assumed based on the vcf header the alt alleles (and their depth) are given in order, i.e. ref (C), alt1(T), .... and so on....
Thanks a lot