Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

sensitivity

mahyarheymahyarhey BostonMember

After I run --evaluationvariant, I got some values more than 1 for sensitivity. (e.g. 1.02, 1.59, 1.76, 1.42, etc.)
My expectation for the sensitivity range is (0-to-1). How to interpret values of sensitivity greater than 1? Is that make sense?

Issue · Github
by Geraldine_VdAuwera

Issue Number
37
State
open
Last Updated

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I'm not sure how to answer that without knowing anything about the command you ran, what data you're running on, etc. You realize we can't read your mind, right?

  • mahyarheymahyarhey BostonMember

    Hi Geraldine,
    I used the following script for evaluation. I run this script 22 times, each time for different chromosome as a reference data.

    java -Xms4096m -Xmx4096m -jar /GenomeAnalysisTK-3.2-2/GenomeAnalysisTK.jar
    -R ucsc.hg19.fasta
    -T VariantEval
    --eval:set1 Final_All33_samples.vcf
    --comp chr1_imputed_edit.vcf
    -ST Sample -noEV -EV CountVariants -EV TiTvVariantEvaluator -EV ValidationReport -EV CompOverlap
    -o eval-all33samples-Chr1.gatkreport

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    OK, that's a good start. Can you also describe what Final_All33_samples.vcf and chr1_imputed_edit.vcf contain?

  • mahyarheymahyarhey BostonMember

    I have these elements in my "Final_All33_samples.vcf" which is merged from 33 output files of HaplotypeCaller:

    CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2 sample3 ...................... sample33

    and in my "chr1_imputed_edit.vcf" I have imputed genotype of those 33 samples using "HumanOMNI2.5" reference data as follows:

    CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2 sample3 ...................... sample33

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Can you tell us more about how you did the imputation? What program did you use? The reason I'm asking is because we need to understand what might have been added or modified in your callset.

    It could also be helpful if you post the table of results.

  • mahyarheymahyarhey BostonMember

    We used Impute2 software for imputation. Below, please see one of my output for compare with Chr1. Two values of sensitivity is greater than 1. Why?

    Sample nEvalVariants nVariantsAtComp nConcordant concordantRate nVariantLoci nSNPs nInsertions nDeletions hetHomRatio tiTvRatio sensitivity specificity
    Sample1 39286 3650 3587 98.27 39286 36479 1344 1463 5.71 2.46 0.44 100.00
    Sample2 52114 4794 4695 97.93 52114 48652 1642 1820 4.57 2.46 0.57 100.00
    Sample3 52749 4827 4724 97.87 52749 49264 1630 1855 4.14 2.47 0.57 100.00
    Sample4 52706 4897 4816 98.35 52706 49300 1627 1779 4.14 2.44 0.59 100.00
    Sample5 32447 3122 3070 98.33 32447 30431 928 1088 3.41 2.60 0.37 100.00
    Sample6 37469 3152 3093 98.13 37469 35111 1049 1309 4.64 2.43 0.38 100.00
    Sample7 102958 8000 7809 97.61 102958 95596 3483 3879 5.10 2.39 0.95 99.99
    Sample8 55281 5035 4945 98.21 55281 51457 1765 2059 4.49 2.47 0.60 100.00
    Sample9 58136 5104 4963 97.24 58136 54295 1735 2106 4.20 2.45 0.60 99.99
    Sample10 62771 5558 5456 98.16 62771 58514 2004 2253 4.10 2.42 0.66 100.00
    Sample11 66956 5783 5664 97.94 66956 62598 2032 2326 4.17 2.40 0.69 99.99
    Sample12 58138 4730 4621 97.70 58138 54178 1807 2153 4.18 2.40 0.56 99.99
    Sample13 44054 4354 4278 98.25 44054 41441 1207 1406 4.03 2.54 0.52 100.00
    Sample14 52467 4756 4627 97.29 52467 49423 1385 1659 4.39 2.49 0.56 99.99
    Sample15 56291 5246 5153 98.23 56291 52779 1614 1898 4.25 2.49 0.63 100.00
    Sample16 56257 4988 4870 97.63 56257 52757 1513 1987 4.68 2.45 0.59 99.99
    Sample17 59806 5367 5210 97.07 59806 55823 1798 2185 8.86 2.49 0.63 99.99
    Sample18 77128 6910 6774 98.03 77128 71982 2270 2876 16.64 2.46 1.02 99.99
    Sample19 53035 4815 4717 97.96 53035 49725 1526 1784 4.64 2.54 0.57 100.00
    Sample20 51547 4615 4523 98.01 51547 48461 1409 1677 4.75 2.45 0.55 100.00
    Sample21 43811 3988 3901 97.82 43811 41189 1175 1447 4.88 2.54 0.47 100.00
    Sample22 46768 4563 4473 98.03 46768 43957 1274 1537 4.31 2.52 0.54 100.00
    Sample23 49765 4481 4393 98.04 49765 46358 1591 1816 4.87 2.47 0.53 100.00
    Sample24 58376 5368 5239 97.60 58376 54588 1697 2091 5.01 2.50 0.64 99.99
    Sample25 51632 4842 4711 97.29 51632 48438 1449 1745 4.52 2.52 0.57 99.99
    Sample26 52708 4691 4589 97.83 52708 49320 1532 1856 4.29 2.50 0.56 100.00
    Sample27 96171 7409 7235 97.65 96171 90227 2685 3259 5.78 2.42 1.28 99.99
    Sample28 55894 4963 4839 97.50 55894 52592 1516 1786 4.38 2.50 0.59 99.99
    Sample29 57601 5058 4953 97.92 57601 54180 1462 1959 5.09 2.39 0.60 99.99
    Sample30 69534 5800 5687 98.05 69534 65244 1933 2357 4.76 2.39 0.69 99.99
    Sample31 57333 5070 4947 97.57 57333 54014 1462 1857 4.60 2.52 0.60 99.99
    Sample32 61675 5416 5289 97.66 61675 58023 1723 1929 4.37 2.48 0.64 99.99
    Sample33 58980 5280 5178 98.07 58980 55283 1570 2127 4.40 2.45 0.63 100.00
    all 517739 40264 37868 94.05 517739 479896 16864 20979 4.79 2.38 4.60 99.88

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks, this is helpful -- I should have asked for the table earlier :)

    What's happening here is that you're comparing the calls for chromosome 1 to all the calls for all the chromosomes, so the sensitivity result is extremely low -- below 1% for most cases, except those two sites where you have slightly more than 1%. The sensitivity is expressed as a percentage ( x out of 100), not a fraction (between 0 and 1). I had forgotten that myself, but seeing the table jogged my memory. We'll try to document units more clearly in the output.

    Try adding -L Chr1 to your command to restrict your analysis to just that one chromosome and you will get the actual sensitivity values per chromosome.

  • mahyarheymahyarhey BostonMember

    Hi Geraldine,
    Thanks for your comment!
    But, the results did not changed after I add "-L chr1" to compare Call-set with imputed_chr1. I think when we compare these 2 files, only the common SNPs on Chr1 could be consider each other, because the match criteria is "chr". Am I right?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hmm, actually, looking at the numbers again, I think you're right. My point stands that the sensitivity is given as percent, not fraction. But I wonder why your sensitivity is so low then. I assume it has something to do with what is the output of Impute2, which I'm not familiar with.

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @mahyarhey Can you post the first record from each of your two VCF files being compared? Can you also post a record count for one chromosome (e.g. chr20) for each of your VCF files? What software did you use to convert from .gen.gz to .vcf.gz? Did you apply a genotype probability and SNP call rate threshold in the process? Is your imputed data also build hg19?

  • mahyarheymahyarhey BostonMember

    Hi Tommy,
    Attached, please find all the information which you requested regarding my issue. Thanks

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @mahyarhey Thanks. I'm still not sure what is going on. As a fellow user I want to.

    @mahyarhey said on May18:
    I have these elements in my "Final_All33_samples.vcf" which is merged from 33 output files of HaplotypeCaller:

    I noticed lots of missing calls in your HC output. How did you merge your HC output files? Preferably you should be doing joint calling for your 33 samples in the default DISCOVERY mode. If you only have 33 samples, then you also don't need to run in GVCF mode. It might be better to use UnifiedGenotyper (or samtools or FreeBayes or Platypus), if you are dealing with low coverage data.

    I noticed your first record in the imputed VCF being monomorphic. I'm not sure how GATK handles that. Probably it's not an issue. If you want to be sure, then you can remove your monomorhic sites with SelectVariants (or bcftools).

    I'm sorry that it's not obvious to me, what is the problem. Perhaps @Geraldine_VdAuwera or @Sheila can easily spot the problem after your latest post. Their midichlorian count is greater than mine.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @mahyarhey, it sounds like you're not following our best practices. Are you not filtering your callset at all? If you post your complete workflow from A to Z (in words, not a script!) we can take a look at it and comment, but otherwise I'm afraid you're going to be on your own.

Sign In or Register to comment.