HaplotypeCaller --dbsnp

blueskypyblueskypy Posts: 261Member ✭✭
edited June 2013 in Ask the GATK team

The doc says "dbSNP is not used in any way for the calculations themselves. --dbsnp binds reference ordered data". Does it mean that the determination of whether a locus is a variant is not influenced by whether that variant is present at dbSNP? what does "--dbsnp binds reference ordered data" mean?

Also why isn't there a --indel option?

Post edited by blueskypy on

Best Answers


  • blueskypyblueskypy Posts: 261Member ✭✭

    Thanks Geraldine for the explanation! But intuitively, wouldn't the verification of a variant calling by dbSNP increase the confidence level of that calling? If so, why wouldn't we use dbSNP to help to make the decision on that call?

    Also you mean I could also add the following to HaplotypeCaller?

    --dbsnp Mills_and_1000G_gold_standard.indels.b37.vcf --dbsnp 1000G_phase1.indels.b37.vcf

  • blueskypyblueskypy Posts: 261Member ✭✭

    Thanks so much, Geraldine! Have a great weekend!

  • everestial007everestial007 GreensboroPosts: 13Member

    Is it not possible to given --dbsnp argument twice:
    java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R lyrata_genome.fa -I realigned_readsMA605.bam --dbsnp filtered_indelsMA605.vcf --dbsnp filtered_snpsMA605.vcf --genotyping_mode DISCOVERY -stand_emit_conf 30 -stand_call_conf 30 -o raw02_variantsMA605.vcf

    I am receiving error output, a part of the output is:

    ERROR MESSAGE: Argument 'dbsnp' has too many values: [org.broadinstitute.gatk.utils.commandline.ArgumentMatchStringValue@412ff43d, org.broadinstitute.gatk.utils.commandline.ArgumentMatchStringValue@334bf23a].

    But, when I provide --dbsnp argument only once (either --dbsnp filtered_indelsMA605.vcf --dbsnp or filtered_snpsMA605.vcf) it runs.
    It important to use -L flag while using BaseRecalibrator. But, how much important is it provide the -L flag when using HaplotypeCaller (while processing -BQSR bootstrapping).

    Thanks in advance !

  • SheilaSheila Broad InstitutePosts: 2,402Member, GATK Dev, Broadie, Moderator, DSDE Dev admin


    You cannot use the -dbsnp argument more than once, as you have discovered. However, you can combine your two DBSNP files using CombineVariants. https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_CombineVariants.php

    Have a look at this article for more information on using -L: http://gatkforums.broadinstitute.org/discussion/4133/when-should-i-use-l-to-pass-in-a-list-of-intervals


  • everestial007everestial007 GreensboroPosts: 13Member

    Thank you Sheila !

Sign In or Register to comment.