Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Phased Heterozygous SNP

meharmehar Member ✭✭

Dear all,

I have difficulties in understanding the genotypes of the phased SNPs. Here i have a SNP where only one read has a reference allele and 11 reads have an alternate allele and is called as heterozygous SNP.

 chr15  8485088 .   G   T   4936.33 PASS     
 BaseQRankSum=1.82;ClippingRankSum=0;ExcessHet=0;FS=2.399;InbreedingCoeff=0.721;
 MQ=60;MQRankSum=0;QD=32.86;ReadPosRankSum=0.267;SOR=1.167;
 DP=10789;AF=0.013;MLEAC=13;MLEAF=0.012;AN=1300;AC=28    
GT:AD:DP:GQ:PGT:PID:PL  0/1:1,12:13:3:0|1:8485088_G_T:485,0,3

The genotype for a single sample from a multi-sample VCF is shown here. Could someone throw light on how to interpret the genotype as heterozygous as only one read has reference allele. It should have been called as homozygous SNP. Is this a bug or am i missing something also IGV does not show the reference read.(GATK Version=3.7-0-gcfedb67).

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mehar
    Hi,

    The GQ is pretty low, indicating the tool was not sure if the genotype is het or hom var. Can you post IGV screenshots of the BAM file and bamout file at that position? Please include ~300 bases before and after the site.

    Thanks,
    Sheila

  • meharmehar Member ✭✭
    edited June 2018

    Hi,

    Here is the IGV screenshot. The top track is from the bam file and the lower track is from the bamout file.

    zoomed bam file:

    zoomed bamout file:

    Does this help? Let me know if you need more info.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mehar

    Hi,

    It looks like two of the Ts are low quality, but can you check the other base qualities? Can you tell me what they are?

    Thanks,
    Sheila

  • meharmehar Member ✭✭
    edited June 2018

    Hi,

    The base quality scores are in the range of 28-33 for the ALT allele. There are 3 T's with Q28, 4 T's with Q30, 5 T's with Q31 and 1T with Q33. While for the 2 reference alleles one base has Q13 and the other has Q33. It still don't seem like base quality scores are too low for ALT allele to be called as het variant. Let me know if you can see something that could cause the observed behaviour.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mehar
    Hi,

    In this case, the ref base with Q33 must have contributed to the het call. In general, het is slightly favored over hom var or hom ref if there is evidence for both alleles. You may try running the Genotype Refinement Workflow to see if that changes the call. Or, adding more samples may help.

    -Sheila

Sign In or Register to comment.