Known sites for indel realignment and BQSR in hg38 bundle

SvyatoslavSidorovSvyatoslavSidorov St. Petersburg, RussiaMember

Dear GATK team,

I'd like to learn what files I should use for indel realignment and BQSR from hg38 bundle? (I read the manual on this topic -- -- but just would like to be sure):

1) Am I right that for indel realignment I should use Mills_and_1000G_gold_standard.indels.hg38.vcf and 1000G_phase1.snps.high_confidence.hg38.vcf.gz ?

2) Am I right that for BQSR I should use Mills_and_1000G_gold_standard.indels.hg38.vcf , 1000G_phase1.snps.high_confidence.hg38.vcf.gz , and dbsnp_144.hg38.vcf ?

3) Are there any other files with known sites I should use for indel realignment and BQSR?

Best Answer


  • SvyatoslavSidorovSvyatoslavSidorov St. Petersburg, RussiaMember

    Dear Sheila, thank you for your quick reply!

  • NandaNanda CanadaMember

    Dear Sheila and Svyatoslav,

    I have a doubt here, Svyatoslav is looking for indel realignment. As per his question, he mentioned that to use following vcf files
    1) Mills_and_1000G_gold_standard.indels.hg38.vcf
    2) 1000G_phase1.snps.high_confidence.hg38.vcf

    But the 2nd file is not for indels. It looks like it has 1000G_phase1 high confidence SNPS. Am I missing some thing here?

    Thanks in advance.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    I think there is some confusion with the naming, and we also need to update the article to refer to hg38 resources. The 1000G_phase1.snps.high_confidence.hg38.vcf will also contain some indels that are useful for Indel Realignment.

    Please also note, it is no longer necessary to run Indel Realignment step. Have a look at this blog post.


  • oss10oss10 Member

    I am trying to perform base recalibration on Star 2pass aligned bam files which I want to use for variant calling. I am using ucsc hg19 reference. For known sites for hg19, I got the dbsnp, 1000G and mills vcfs from GATK bundle.

    When I try to use these vcfs as input for known sites in the Base recalibration step, I get contig order mismatch with reference genome error. I have tried to re-download the vcf, perform liftover using CrossMap and ucsc liftover tool. I also tried to sort the vcf and remove the index after sorting (as recommended in various threads related to contig order mismatch) but nothing is working.

    If I carry on without the base recalibration step, I still do get the vcfs from haplotype caller so everything in the pipeline seems to be fine except the base recalibration step.

    I also tried to use GRCh38 reference but I got the same issue with contig mismatch for dbsnp vcf and reference during base recalibration step.

    Could you please help me? Am I doing something wrong?
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @oss10

    Please post the exact command you are using, the version of gatk and the entire error log.

Sign In or Register to comment.