Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VQSR with two data sets

shinkenshinken IrapuatoMember


I would like to Build the SNP recalibration model with two data sets.

1- SNPs derived from WGS and filtered
2- SNps derived from SNP array chip data

The problem that I have is that the SNPs from the chip data do not have any information such as coverage or any other information. They just have the genotyope and the position. Of course I could convert this data to a vcf but without any more info. It is still possible to use this data to run the VQRS?

Best Wishes,




  • bshifawbshifaw moonMember, Broadie, Moderator admin

    Not sure, I'm guessing it wouldn't work since the tool uses the annotations from the sequence to create a VQSLOD

    "The approach taken by variant quality score recalibration is to develop a continuous, covarying estimate of the relationship between SNP call annotations (such as QD, MQ, and ReadPosRankSum) and the probability that a SNP is a true genetic variant versus a sequencing or data processing artifact."

    mentioned here

    But I'll confirm with my team.

  • bshifawbshifaw moonMember, Broadie, Moderator admin


    So the SNP chip data would not be sufficient to generate a recalibration model. VQSR works by looking at the annotations/quality score for existing variants that have been called by haplotypecaller and clustering it to determine where to "draw the line" for filtering so to speak. The SNP chip data itself is not useful because it doesn't contain any of those annotations.

Sign In or Register to comment.