If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.


Could anyone help me with two questions in comparing my vcf file to gold standard?
1. Regarding sites present in my vcf but absent in gold standard, are they ignored?
2. Regarding sites absent in my vcf but present in gold standard, are they assumed 0/0?


Best Answers


  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin



    1) They are ignored.

    2) They are ignored.

    We do not compare records that are missing in either set.


  • blueskypyblueskypy Member ✭✭

    Thanks Sheila! But I'm confused. My vcf file does not contain non-variant sites. then if those non-variant sites are ignored, will the value of HOM_REF_HOM_REF be 0?

  • blueskypyblueskypy Member ✭✭
    edited October 2014

    if I use -comp v1.vcf -eval v2.vcf -L v3.bed, will ONLY sites present in ALL three files be used in the comparison?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @‌blueskypy

    That's right, the first two because you can only compare sites for which all information is available, and the third because that sets hard limits on the scope of the analysis.

    Craig/Appistry will happily take any further questions you may have about this and other topics (please see my private message from earlier). Thanks!

  • blueskypyblueskypy Member ✭✭

    hi, Geraldine,
    Sorry, I didn't notice the message! I'll direct future questions to Craig. However, may I bring the following thought because I think it benefits other users too?

    Since GenotypeGVCFs only outputs variant sites by default, and actually it may not work properly with -allSites. 1) The sensitivity computed from such vcf file will be falsely high since HOM_REF_HET and HOM_REF_HOM_VAR are 0; and 2) Even if using the same gold standard, the sensitivities from different input files are not comparable because the denominators are different.

    is my understanding correct?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks :)

    Current issues notwithstanding, GenotypeGVCFs should/will work properly with -allSites. Remind me what is the problem you've encountered with allSites?

  • blueskypyblueskypy Member ✭✭

    Sorry to come back to this question! But I wonder if the set 2 in my original question is NOT ignored and they are actually counted as UNAVAILABLE and is part of the denominator in computing Sensitivity?

Sign In or Register to comment.