It looks like you're new here. If you want to get involved, click one of these buttons!
Hi, For both IndelRealigner/RealignerTargetCreator, there is an option for known indel sites as below:
-known /path/to/indels.vcf
However, from the bundle files collection such as from hg19, there are several vcf files:
1000G_indels_for_realignment.hg19.vcf
1000G_omni2.5.hg19.sites.vcf
1000G_omni2.5.hg19.vcf
dbsnp_132.hg19.excluding_sites_after_129.vcf
dbsnp_132.hg19.vcf
hapmap_3.3.hg19.sites.vcf
hapmap_3.3.hg19.vcf
indels_mills_devine.hg19.sites.vcf
indels_mills_devine.hg19.vcf
NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.hg19.sites.vcf
NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.hg19.vcf
amongst them, just based on the names, 1000G_indels_for_realignment.hg19.vcf and indels_mills_devine.hg19.sites.vcf look like the files supposed to use for IndelRealigner/RealignerTargetCreator, Could you clarify the exact files for this purpose?
Since for old version, I have used 1000G_phase1.indels.hg19.vcf and Mills_and_1000G_gold_standard.indels.hg19.sites.vcf. and I compared the new and old files, quite different now.
Thanks
Mike
Geraldine_VdAuwera
Posts: 2,239 admin
Hi Mike,
You're correct that we have not yet updated the file names, but they are fairly minor differences. All you need to know really is the difference between the .vcf and the .sites.vcf files: the .vcf files contain the full callset info including genotypes, while the .sites.vcf files don't contain the genotypes, only the variant sites info. The point of having sites-only files is that they're smaller files. For most purposes such as realignment you only need the sites infomation. But you can choose to use either file.
Geraldine Van der Auwera, PhD
Answers
See this FAQ article:
http://www.broadinstitute.org/gatk/guide/article?id=1247
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi, Geraldine:
Thanks for the input! However, the article seems not updated for the new version GATK v2.0 or newer. For example, the article mentioned for realignment, we shall use:
which is exactly what I used for the old version I described in my original comments above. But if we look at the bundle of the new version, those files are gone or at least the names somewhat changed more or less, I copied and pasted again the files in the bundle for the new version as below:
I think for realignment, I shall use 1000G_indels_for_realignment.hg19.vcf, but what about indels_mills_devine.hg19.sites.vcf or indels_mills_devine.hg19.vcf, which one to use for realignment?
Thanks again
Mike
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks a lot for the great detailed info, Geraldine! Appreciated! Mike
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi, Geraldine:
Sorry,. I just realized that your web page is actually the new version. Our own installation has some confusion about the new and old versions, which was caused by our installation staffs. Sorry about confusion. your web page is fine on that.
Thanks any way for the info!
Best
Mike
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •