The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Last chance to register for the GATK workshop next week in Basel, Switzerland! http://www.sib.swiss/training/upcoming-training-events/training/gatk-workshop-lecture

# HaplotypeCaller --dbsnp

Posts: 261Member ✭✭
edited June 2013

The doc says "dbSNP is not used in any way for the calculations themselves. --dbsnp binds reference ordered data". Does it mean that the determination of whether a locus is a variant is not influenced by whether that variant is present at dbSNP? what does "--dbsnp binds reference ordered data" mean?

Also why isn't there a --indel option?

Tagged:

• Posts: 261Member ✭✭

Thanks Geraldine for the explanation! But intuitively, wouldn't the verification of a variant calling by dbSNP increase the confidence level of that calling? If so, why wouldn't we use dbSNP to help to make the decision on that call?

Also you mean I could also add the following to HaplotypeCaller?

--dbsnp Mills_and_1000G_gold_standard.indels.b37.vcf --dbsnp 1000G_phase1.indels.b37.vcf

• Posts: 261Member ✭✭

Thanks so much, Geraldine! Have a great weekend!

• GreensboroPosts: 62Member

Is it not possible to given --dbsnp argument twice:
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R lyrata_genome.fa -I realigned_readsMA605.bam --dbsnp filtered_indelsMA605.vcf --dbsnp filtered_snpsMA605.vcf --genotyping_mode DISCOVERY -stand_emit_conf 30 -stand_call_conf 30 -o raw02_variantsMA605.vcf

I am receiving error output, a part of the output is:

But, when I provide --dbsnp argument only once (either --dbsnp filtered_indelsMA605.vcf --dbsnp or filtered_snpsMA605.vcf) it runs.
It important to use -L flag while using BaseRecalibrator. But, how much important is it provide the -L flag when using HaplotypeCaller (while processing -BQSR bootstrapping).

You cannot use the -dbsnp argument more than once, as you have discovered. However, you can combine your two DBSNP files using CombineVariants. https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_CombineVariants.php

-Sheila

• GreensboroPosts: 62Member

Thank you Sheila !

• University of MemphisPosts: 5Member

Hi, I have a question about how to interpret information in the dbSNP file that is passed to HaplotypeCaller. Let say I have this line in VCF file:

# CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003

20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.

So how is the value 0.5 in AF=0.5 (allele frequency) calculated and what does it mean? Can you give me a specific example so that I can understand it fully and clearly? Thank you.

@namsyvo
Hi,

The AF field gives you the allele frequency of the alternate alleles. In your example above, there is one alternate allele (A). The AF = 0.5 means that the A alternate allele appears at a frequency of 50% in the genotypes. Notice your 3 samples have these genotypes: G/G, G/A, A/A. The A allele has a frequency of 50%.

I hope this helps!

-Sheila

• University of MemphisPosts: 5Member

Thank you @Sheila for your quick answer. I saw this statement in the VCF v4.2 documentation:
"AF : allele frequency for each ALT allele in the same order as listed: use this when estimated from primary
data, not called genotypes".

One more question, is there any difference between genotype 1|0 and 0|1? Sometimes I saw 1|0, sometimes I saw 0|1. For example, in second sample in my previous example, can I represent the genotype as 0|1 instead of 1|0?

Thank you.

#### Issue · Github December 2015 by Sheila

Issue Number
443
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Regarding the definition of AF, ours may not entirely match the definition provided by the VCF spec. I think the spec recommends using AF to express allele fraction in the read data, whereas we use it to express the frequency in called genotypes. This may be a violation of the intent of the spec, if you take a strict reading of it.

In your second question, it comes down to the different ways of writing heterozygous genotypes that have been phased. The order of the 0 and the 1 (which represent specific alleles) signifies how those alleles are phased, either within a pedigree or relative to co-location on physical haplotypes. You can't switch the notation without affecting the meaning that this carries.

Geraldine Van der Auwera, PhD