Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK sample genotype AD for alternative alleles

MichaelSeidlMichaelSeidl WageningenMember
edited March 2018 in Ask the GATK team

Hi all,

I have a question regarding the interpretation of AD for alternative alleles. I called variants and subsequently performed joined genotyping, leaving me with a vcf file. Before further utilizing the SNPs, I want to perform hard filtering. The samples I genotyped are haploid, but I observed a couple of SNPs that have been called either as ref or alt, yet having AD for multiple alternative alleles. Looking into them, it seems that these are regions that either have been collapesed in the the reference genome assembly, or have been recently duplicated, thus leading to a 'heterozygous' read mapping. Thus, I aim to also filter on the AD that support the ref or alt calls to only have genotypes per sample that either support ref or alt. However, I also have cases where I observed multiple AD, for example:

Chr1    402667  .   C   T   10334.83    PASS    AC=10;AF=0.270;AN=37;DP=1408;FS=0.000;GQ_MEAN=906.38;GQ_STDDEV=623.77;MLEAC=10;MLEAF=0.270;MQ=60.00;MQ0=0;NCC=0;QD=27.42;SOR=1.521  GT:AD:DP:GQ:PL  0:3,0:3:99:0,119    0:30,0:30:99:0,1080 0:98,0:98:99:0,1800 0:5,0:5:99:0,135    1:0,2,33:35:99:1144,0 ........... (see last genotype call). 

If I checked the read mappings in IGV, the C->T SNP is supported by 33 reads and an alternitive C->G with two reads. How come that the AD field shows the depth for C->T as the third option, as I assumed based on the vcf header the alt alleles (and their depth) are given in order, i.e. ref (C), alt1(T), .... and so on....

Thanks a lot
Michael

Post edited by shlee on

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @MichaelSeidl,

    You are analyzing haploid samples and you have a callset representing multiple haploid samples in a cohort. What you are surprised to find is a site with one ALT allele but three allele-specific values for sample level:

    Chr1    402667  .   C   T ...  GT:AD:DP:GQ:PL  ...  1:0,2,33:35:99:1144,0
    

    I suspect this is some sort of bug.

    Can you tell us more about how you processed the samples to get this callset? Specifically, can you share with us the HaplotypeCaller command you ran (+/-GenotypeGVCFs command if in GVCF mode)? What version of GATK? Also, we are likely to ask for a bug report (link to instructions on left), so you should anticipate having to prep some small data for us to recapitulate what you see.

  • MichaelSeidlMichaelSeidl WageningenMember

    Hi @Shlee,
    indeed, there are two things surprising for me: first, there are three allele-specific values. I can understand that this might happen if e.g. one of the samples has a C->T and the other has a C->A. second, and even more surprising for me was that the 2nd alt allele has 3 reads while the 1st alt allele, which in my eyes should be T, has 2. If I look at IGV, the SNP is a C->T SNP supported by 33 reads.

    The procedure I did is briefly:
    1) map the reads with BWA (add read groups, mark duplicates)
    2) HaplotypeCaller (v3.3.0) -R $reference -ploidy 1 --input_file $bam_in --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -o $vcf_out
    3) GenotypeGVCFs (v3.3.0) -R $reference --variant vcf.list -maxAltAlleles 2

    Subsequently, I filtered for SNPs, and applied hard filters (QD, MQ, ReadPosRankSum, MQRandSum, BaseRankSum, window)...

    I hope that helps...

    Cheers
    Michael

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @MichaelSeidl,

    So you are using GATK v3.3. Can you try your analysis with GATKv3.7 and see if you get the same odd result? Thanks.

  • MichaelSeidlMichaelSeidl WageningenMember

    Dear @shlee
    I will (will take some time), and I will come back to you - thanks
    Michael

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Let us know how it goes @MichaelSeidl.

  • MichaelSeidlMichaelSeidl WageningenMember

    Dear @shlee,

    I know used v 3.8-1, and the problem seems to be solved (at least in the cases I checked). Thus, indeed there seems to be a big with GATK v3.3.

    Thanks
    M

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Great to hear updating to v3.8 solved the issue @MichaelSeidl.

Sign In or Register to comment.