dbsnp file in unified genotyper

HomaHoma Posts: 25Member


I am using unified genotyper utility of GATK.
I don't exactly know how I should make the dbsnp file necessary. I thank you very much for any help.




  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 9,971Administrator, Dev admin

    I'm sorry, I don't understand your question. The dbsnp file is not required by the UnifiedGenotyper but it can be used if you want. Are you trying to use it and it's not working? Or is your problem something else?

    Geraldine Van der Auwera, PhD

  • namsyvonamsyvo University of MemphisPosts: 5Member
    edited April 2014

    I want to make sure about one problem related to this topic. As I understand from the GATK-UGT document, dbSNP files are not used to verify called SNPs in any way while performing SNP calling with GATK-UGT, is that correct?

    --dbsnp / -D
    dbSNP file
    rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate. dbSNP is not used in any way for the calculations themselves.
    --dbsnp binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3


    Post edited by namsyvo on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 9,971Administrator, Dev admin

    Hi Nam, they are used only for annotating the rsID if a variant has been reported previously. They are not used in any way to decide whether variants should be emitted.

    Geraldine Van der Auwera, PhD

  • namsyvonamsyvo University of MemphisPosts: 5Member

    Thank you for your previous answer. I have one more question, if I provide GATK a VCF file with the option --alleles (together with setting genotyping_mode to be GENOTYPE_GIVEN_ALLELES), is this information used for the calculation of SNP calls, or the determination of qualities of SNP calls, or whatever related to the process of calling SNPs?

  • SheilaSheila Broad InstitutePosts: 3,226Member, Broadie, Moderator, Dev admin



    Normally our variant caller tools find candidate alleles in the data. When you provide a -alleles file, the variant caller will use the alleles specified for that position in the file (including the reference allele). If a sample does not have any of the alternate alleles specified at the position, it will simply be genotyped as homozygous reference. The quality scores of a variant site are calculated in the same way as not specifying a -alleles file.

    The usual rules for whether to emit a call or not (if no sample is variant at the site, don't emit a call unless -allSites is specified) still apply.

    I hope this helps.


Sign In or Register to comment.