The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# Known indel/SNP databases for Indel-based realignment

aminziaMember

Dear GATK team,

Would you please clarify that, based on your experience or the logic used in the realignment algorithm, which option between using dbSNP, 1K gold standard (mills...), or "no known dbase" might result in a more accurate set of indels in the Indel-based realignment stage (speed and efficiency is not my concern).

Based on the documentation I found on your site, the "known" variants are used to identify "intervals" of interest to then perform re-alignment around indels. So, it makes sense to me to use as many number of indels as possible (even if they are unreliable and garbage such as many of those found in dbSNP) in addition to those more accurate calls found in 1K gold-standard datasets for choosing the intervals. After all, that increases he number of indel regions to be investigated and therefore potentially increase the accuracy. Depending on your algorithm logic, also, it seems that providing no known dbase would increase the chance of investigating more candidates of mis-alignment and therefore improving the accuracy.

But if your logic uses the "known" indel sets to just "not" perform the realignment and ignore those candidates around known sites, it makes sense to use the more accurate set such as 1K gold standard.

Please let me know what you suggest.

Thank you
Regards
Amin Zia

Tagged: