Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

VQSR with two data sets

shinkenshinken IrapuatoMember


I would like to Build the SNP recalibration model with two data sets.

1- SNPs derived from WGS and filtered
2- SNps derived from SNP array chip data

The problem that I have is that the SNPs from the chip data do not have any information such as coverage or any other information. They just have the genotyope and the position. Of course I could convert this data to a vcf but without any more info. It is still possible to use this data to run the VQRS?

Best Wishes,




  • bshifawbshifaw Member, Broadie, Moderator admin

    Not sure, I'm guessing it wouldn't work since the tool uses the annotations from the sequence to create a VQSLOD

    "The approach taken by variant quality score recalibration is to develop a continuous, covarying estimate of the relationship between SNP call annotations (such as QD, MQ, and ReadPosRankSum) and the probability that a SNP is a true genetic variant versus a sequencing or data processing artifact."

    mentioned here

    But I'll confirm with my team.

  • bshifawbshifaw Member, Broadie, Moderator admin


    So the SNP chip data would not be sufficient to generate a recalibration model. VQSR works by looking at the annotations/quality score for existing variants that have been called by haplotypecaller and clustering it to determine where to "draw the line" for filtering so to speak. The SNP chip data itself is not useful because it doesn't contain any of those annotations.

Sign In or Register to comment.