Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

difficulty to interpret output of evaluation variants

I used "varianteval" module to evaluate 33 samples of mine with a ref-data across 23 chromosomes. I expected the sum of nHomVar + nHets = nSNPs but is not!!!!! I don't know why the number of "nHets" is too much large? any thoughts or clue? thanks

allSample nVariantLoci nSNPs nInsertions nDeletions nHets nHomRef nHomVar nSingletons hetHomRatio
all_chr1 51144 47364 1719 2061 152157 672 33997 22615 4.48
all_chr2 37491 34738 1246 1507 102477 257 19140 18172 5.35
all_chr3 35139 32591 1163 1385 98830 386 18342 16502 5.39
all_chr4 26399 24419 864 1116 71321 198 14864 12911 4.80
all_chr5 28058 25933 958 1167 76321 290 14099 13391 5.41
all_chr6 31222 28940 1009 1273 82483 243 16410 15084 5.03
all_chr7 28351 26331 907 1113 80464 221 14682 13448 5.48
all_chr8 22551 20968 701 882 63569 224 13134 10650 4.84
all_chr9 22478 20987 679 812 60268 198 11341 11055 5.31
all_chr10 27740 25691 924 1125 79158 182 15035 12875 5.26
all_chr11 24012 22309 762 941 66762 197 14378 11521 4.64
all_chr12 25239 23310 832 1097 71793 218 15187 11800 4.73
all_chr13 12494 11547 388 559 34506 113 7156 5945 4.82
all_chr14 15825 14640 534 651 42887 123 9137 7689 4.69
all_chr15 17554 16249 585 720 49245 133 10203 8329 4.83
all_chr16 22966 21529 656 781 63713 165 13000 10825 4.90
all_chr17 23983 22285 768 930 71715 236 15978 10776 4.49
all_chr18 9741 8935 374 432 25755 93 4743 4793 5.43
all_chr19 26262 24448 792 1022 81650 359 17700 11176 4.61
all_chr20 13068 12133 431 504 37924 155 7633 6197 4.97
all_chr21 7722 7220 212 290 20928 124 4497 3807 4.65
all_chr22 14151 13188 406 557 42289 156 8814 6180 4.80
all_chr23 5330 4869 221 240 10552 92 5177 2772 2.04

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @mahyarhey
    Hi,

    Can you tell me the exact command you ran? When I tried on test data, the nVariantLoci = nHets + nHomVar. nSNPs does not have to equal nHets+ nHomVar because there may be sites that are not SNPs that are variant.

    -Sheila

  • mahyarheymahyarhey BostonMember

    Hi Sheila, for example have a look to all_chr1: 51,144 variants is called, 33,997 are HOM_VAR and I expect the rest would be HET. but the number of HET variants is very large (152,157) which is exceed of the total variants. Why?

  • SheilaSheila Broad InstituteMember, Broadie admin

    @mahyarhey
    Hi,

    When I test this on some of my own test files, I see that nVariantLoci gives the number of sites that are variant. The nHets gives the number of samples that are heterozygous at all of those nVariantLoci. The nHomVar gives the number of samples that are homozygous variant at all of those nVariantLoci. What I don't understand in your case is why the nHets + nHomRef + nHomVar does not equal nVariantLoci * 33. I suspect the remaining genotypes may be no-calls.

    Can you test this on a small section of your VCF? For example, I tested on only 2 variant sites. You can then count manually whether the totals are correct in the output.

    -Sheila

  • mahyarheymahyarhey BostonMember

    Hi Sheila, let me to explain just in a simple way. Please see attached file. This is just for Chr1 across 33 samples. It seems everything is OK for each sample, but when you have a look to "all", only nHet is out of range. I can't find any explanation why nHet is so large. For each sample the total of nHet+nHomref+nHomVar = nVariant but this is not correct or "all", Why?

  • mahyarheymahyarhey BostonMember

    Thank you Shiela for the info.

Sign In or Register to comment.