The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

Human b37 Indels/dbsnp annotation versions for IndelRealigner & BaseRecalibrator

tonytony Member Posts: 4
edited July 2012 in Ask the GATK team

Dear GATK Team,

I have recently downloaded the GATK Bundle to get the human reference genome and its associated annotations.

After the mapping step on my lane BAM files, I am planning on using IndelRealigner and BaseRecalibrator as it is explained in the "Best Practices v4".

I am always confused about which annotation file I should use for my analysis.

For the Indel realignment, in the command line arguments of RealignerTargetCreator, one have to set the '--known' switch to indicate known indel sites.

--known:indels,vcf Mills_and_1000G_gold_standard.indels.b37.sites.vcf
--known:dbsnp,vcf dbsnp_135.b37.vcf

But in the annotations folder, you can also find 'dbsnp_135.b37.excluding_sites_after_129.vcf' for dbsnp (version before 1000K genomes). Depending on which one I use the target intervals files are pretty different. So I am really wondering which one should be used in my case ? Or is there any other factor that could drive me to the better choice ?

I have a similar dilemna with base recalibration, "dbsnp_135.b37.vcf" or "dbsnp_135.b37.excluding_sites_after_129.vcf" in the '-knownSites' switch ?

Thanks a lot,
Best,

Anthony

Best Answer

Answers

Sign In or Register to comment.