Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

Phased Heterozygous SNP

vincitygialamvincitygialam Member
edited October 2018 in Ask the GATK team

Dear all,

I have difficulties in understanding the genotypes of the phased SNPs. Here i have a SNP where only one read has a reference allele and 11 reads have an alternate allele and is called as heterozygous SNP.

 chr15  8485088 .   G   T   4936.33 PASS     
 BaseQRankSum=1.82;ClippingRankSum=0;ExcessHet=0;FS=2.399;InbreedingCoeff=0.721;
 MQ=60;MQRankSum=0;QD=32.86;ReadPosRankSum=0.267;SOR=1.167;
 DP=10789;AF=0.013;MLEAC=13;MLEAF=0.012;AN=1300;AC=28    
GT:AD:DP:GQ:PGT:PID:PL  0/1:1,12:13:3:0|1:8485088_G_T:485,0,3

The genotype for a single sample from a multi-sample VCF is shown here. Could someone throw light on how to interpret the genotype as heterozygous as only one read has reference allele. It should have been called as homozygous SNP. Is this a bug or am i missing something also IGV does not show the reference read.(GATK Version=3.7-0-gcfedb67).

Post edited by shlee on

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited October 2018

    Hi @vincitygialam,

    Let's consider the scores for your genotype where you observe 1 read that supports the ref allele and 11 reads that support the alt allele.

    GT:AD:DP:GQ:PGT:PID:PL 0/1:1,12:13:3:0|1:8485088_G_T:485,0,3

    Here we see that genotype quality GQ is 3, a very low score. The PL score indicates ref-ref 0/0 is the least likely with a PL of 485, ref-alt 0/1 is the most likely with a PL of 0, and alt-alt 1/1 is the next most likely genotype with a PL of 3. These scores basically tell us that the genotype is a toss-up between ref-alt 0/1 and alt-alt 1/1. The tool is systematic and gives a final genotype that corresponds to the lowest PL. It is up to you to review and refine low GQ genotypes. Here's a tutorial that may be of interest: https://software.broadinstitute.org/gatk/documentation/article?id=12350

    In this type of scenario, it would be helpful for your sample to have had more read depth, where the odds of a 1:11 skew in read distribution for a het call would be even more unlikely. This 1:11 skew is not unlikely for a heterozygous site with depth of 12x. Think about coin toss instances. The skew becomes even more likely if your sample preparation involves PCR, e.g. as would be the case in targeted exomes.

    This type of sensitivity in a caller--to a single read that is a different--is, in fact, preferable, and you should reconsider the expectation for a homozygous-alt call despite the read evidence to the contrary (presence of read(s) supporting the ref allele).

    I recommend additional genotype refinement, e.g. with CalculateGenotypePosteriors, if you have population or family priors you can use.

  • TomliuTomliu Member
    edited July 9
    Hi @shlee ,
    I have a small problem about the unphased site, like
    GT:AD:DP:GQ:PGT:PID:PL:PS ./.:9,0:9:.:.:.:0,0,0
    Why this site with no GT ? May caused by genome repeat sequence ? The GATK version is 4.1.0.0
    Thanks in advance!
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Tomliu

    ./. or no-call genotypes occur when the tool does not have enough information to make a proper genotype call. This could be due to low coverage or bad quality data or messy regions in general.

Sign In or Register to comment.