It looks like you're new here. If you want to get involved, click one of these buttons!
Dear GATK Team,
I have recently downloaded the GATK Bundle to get the human reference genome and its associated annotations.
After the mapping step on my lane BAM files, I am planning on using IndelRealigner and BaseRecalibrator as it is explained in the "Best Practices v4".
I am always confused about which annotation file I should use for my analysis.
For the Indel realignment, in the command line arguments of RealignerTargetCreator, one have to set the '--known' switch to indicate known indel sites.
--known:indels,vcf Mills_and_1000G_gold_standard.indels.b37.sites.vcf --known:dbsnp,vcf dbsnp_135.b37.vcf
But in the annotations folder, you can also find 'dbsnp_135.b37.excluding_sites_after_129.vcf' for dbsnp (version before 1000K genomes). Depending on which one I use the target intervals files are pretty different. So I am really wondering which one should be used in my case ? Or is there any other factor that could drive me to the better choice ?
I have a similar dilemna with base recalibration, "dbsnp_135.b37.vcf" or "dbsnp_135.b37.excluding_sites_after_129.vcf" in the '-knownSites' switch ?
Thanks a lot, Best,
Anthony
Geraldine_VdAuwera
Posts: 2,239 admin
Answers
Actually, I found my answers in the FAQ.
Here : http://gatk.vanillaforums.com/discussion/1247/what-should-i-use-as-known-variantssites-for-running-tool-x
Is it possible to simply delete this thread ?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •