Training data for VQSR step


I want to run VQSR for my Exome data. I have finished with data pre-processing and joint genotyping. Now, I want to move to the next step which is VQSR (still in the first step, VariantRecalibrator). I noticed that I don't have the training set for the tools parameter's input. Where can I download the vcf file that I need to run this command? This is what I've tried:

Hapmap (Link)
I tried to open HapMap website and I found allocated SNPs download link. Is this the file I need? The file are in XML format and splitted per chromosome and it based in Hg35. Do I need to join these XML and then convert it to VCF using GATK? Will it give some problem with the different HG build (I use HG38)

1000genome (Link)
I found the VCF file also per chromosome. I think I just need to join it, don't I? probably you can give some suggestion how to join it properly?

I don't know where I can get this file.

I think I already have this file. I have use it during the GATK pre-processing step. It is the same file, right?

Thank you for your help.


Best Answer


Sign In or Register to comment.