Service Notice: Due to the blizzard currently hammering the US Northeast, the Broad is shut down and the GATK forum will be mostly unattended while we hunker down and sip hot cocoa with marshmallows. Assuming the power stays on and we're able to dig ourselves out of the snow when it's all over, normal service should resume Wednesday or Thursday.

Human b37 Indels/dbsnp annotation versions for IndelRealigner & BaseRecalibrator

tonytony Posts: 3Member
edited July 2012 in Ask the GATK team

Dear GATK Team,

I have recently downloaded the GATK Bundle to get the human reference genome and its associated annotations.

After the mapping step on my lane BAM files, I am planning on using IndelRealigner and BaseRecalibrator as it is explained in the "Best Practices v4".

I am always confused about which annotation file I should use for my analysis.

For the Indel realignment, in the command line arguments of RealignerTargetCreator, one have to set the '--known' switch to indicate known indel sites.

--known:indels,vcf Mills_and_1000G_gold_standard.indels.b37.sites.vcf --known:dbsnp,vcf dbsnp_135.b37.vcf

But in the annotations folder, you can also find 'dbsnp_135.b37.excluding_sites_after_129.vcf' for dbsnp (version before 1000K genomes). Depending on which one I use the target intervals files are pretty different. So I am really wondering which one should be used in my case ? Or is there any other factor that could drive me to the better choice ?

I have a similar dilemna with base recalibration, "dbsnp_135.b37.vcf" or "dbsnp_135.b37.excluding_sites_after_129.vcf" in the '-knownSites' switch ?

Thanks a lot, Best,

Anthony

Best Answer

Answers

Sign In or Register to comment.