Overall non-diploid ratio (OND)

lauraplaurap Los AngelesMember

The formula and the description given for the OND annotation seem to be contradictory (see: https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_AlleleBalance.php). The formula implies that a true diploid variant would have a non-allele value of zero and therefore have an OND=1. However, the description "reads that support something other than the genotyped alleles (called "non-alleles") will be counted in the OND tag, which represents the overall fraction of data that diverges from the diploid hypothesis." suggests that a higher fraction is more divergent from diploid. Can you please clarify (e.g., confirm the true formula should be 1-alleles/(alleles+non-alleles) and that an ideal diploid variant would have an OND of zero)?

Additionally, we have noticed a lot of missing OND values (not multiallelic or indels). Can you explain when/why these may be missing?

Thanks so much!

Issue · Github
by Sheila

Issue Number
818
State
open
Last Updated

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @laurap
    Hi,

    Can you tell me the exact command you ran to get the OND annotation? Can you also post a record that shows the OND annotated and a record where it is not annotated?

    As for the OND description, let me check with the team about how to make it clearer. You are correct, the OND value as the equation shows, should be 1 to be a "true diploid". Because the document and the VCF header show the same equation, I think the document wording is incorrect.

    -Sheila

  • leekaiintheskyleekaiinthesky Member
    edited April 2016

    Hi Sheila,

    I think an OND of 0 is a true diploid, and the text wording in the documentation is correct. But the formula in the documentation [alleles/(alleles+non-alleles)] is incorrect, and it should be [1 - alleles/(alleles+non-alleles)] as Laura mentioned. Please check with your team to confirm that this is correct.

    As for missing OND values, see the record below. OND is missing from the annotation, and we expect an OND of 0 based on manual calculation from the samples in our VCF. Please confirm that whenever OND is missing, we can assume an OND of 0. I don't think there are any ONDs of 0 in the file, which would be consistent with this idea.

    ABHom is also missing from this record, which is confusing since there are HomRef genotypes at this locus, and we have many other examples where ABHom is present when only HomRef genotypes are present. Can you please clarify why sometimes ABHom is missing? I don't think we can safely assume the missing values are 0 in this case.

    Thanks,
    Lee-kai (colleague of @laurap)

    22 16050115 . G A 135.06 PASS ABHet=0.602;AC=3;AF=0.0005353;AN=4600;BaseQRankSum=2.4;DP=18518;ExcessHet=3.0126;FS=0;InbreedingCoeff=-0.0175;MLEAC=3;MLEAF=0.0006399;MQ=37.01;MQ0=0;MQRankSum=1.65;NDA=1;QD=5;ReadPosRankSum=0.729;SOR=0.317;VQSLOD=-5.353;VariantType=SNP;culprit=MQ;cytoBand=22q11.1;Func=intergenic;Gene=NONE,LA16c-4G1.3;GeneDetail=NONE,12042;genomicSuperDups=0.99355,chr14:19600000;FATHMM_c=0.03524;FATHMM_nc=0.00021;1000g2015aug_all=0.00638978;CSQ=A|intergenic_variant|MODIFIER|||||||||||||||rs587755077|||SNV||||||||||||||||A:0.0064|||||||||||||||||||||||||||| GT:AB:AD:DP:GQ:PL 0/0:.:7,0:7:21:0,21,230 0/0:.:21,0:21:61:0,61,659 0/0:.:7,0:7:21:0,21,226 0/0:.:8,0:8:24:0,24,229 0/0:.:7,0:7:21:0,21,185 0/0:.:13,0:13:39:0,39,351 0/0:.:15,0:15:45:0,45,474 0/0:.:8,0:8:24:0,24,250 0/0:.:2,0:2:6:0,6,64 0/0:.:11,0:11:22:0,22,292 0/0:.:7,0:7:21:0,21,211 0/0:.:7,0:7:21:0,21,220 0/0:.:7,0:7:21:0,21,225 0/0:.:7,0:7:21:0,21,227 0/0:.:7,0:7:21:0,21,226 0/0:.:7,0:7:21:0,21,229 0/0:.:7,0:7:21:0,21,223

    [truncated for post length]

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    edited April 2016

    Hi @laurap and @leekaiinthesky, we've confirmed that the equations representing the allele balance calculations were confusing and in the case of of OND, inconsistent with both the description text and the code (the equation is wrong, at least if I'm reading the code correctly). We're going to get this fixed in the near future.

    Regarding what conditions might cause these annotations not to be emitted:

    1. ADHom and ABHet are both weighted by the genotype quality of the samples they are derived from, and if there is insufficient confidence in the underlying genotypes, these annotations are not emitted.

    2. OND is only emitted if it is > 0, where OND = 0 corresponds to the case of ideal diploid variants.

Sign In or Register to comment.