Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Known sites for indel realignment and BQSR in hg38 bundle

SvyatoslavSidorovSvyatoslavSidorov St. Petersburg, RussiaMember

Dear GATK team,

I'd like to learn what files I should use for indel realignment and BQSR from hg38 bundle? (I read the manual on this topic -- https://broadinstitute.org/gatk/guide/article?id=1247 -- but just would like to be sure):

1) Am I right that for indel realignment I should use Mills_and_1000G_gold_standard.indels.hg38.vcf and 1000G_phase1.snps.high_confidence.hg38.vcf.gz ?

2) Am I right that for BQSR I should use Mills_and_1000G_gold_standard.indels.hg38.vcf , 1000G_phase1.snps.high_confidence.hg38.vcf.gz , and dbsnp_144.hg38.vcf ?

3) Are there any other files with known sites I should use for indel realignment and BQSR?

Best Answer


  • SvyatoslavSidorovSvyatoslavSidorov St. Petersburg, RussiaMember

    Dear Sheila, thank you for your quick reply!

  • NandaNanda CanadaMember

    Dear Sheila and Svyatoslav,

    I have a doubt here, Svyatoslav is looking for indel realignment. As per his question, he mentioned that to use following vcf files
    1) Mills_and_1000G_gold_standard.indels.hg38.vcf
    2) 1000G_phase1.snps.high_confidence.hg38.vcf

    But the 2nd file is not for indels. It looks like it has 1000G_phase1 high confidence SNPS. Am I missing some thing here?

    Thanks in advance.

  • SheilaSheila Broad InstituteMember, Broadie admin


    I think there is some confusion with the naming, and we also need to update the article to refer to hg38 resources. The 1000G_phase1.snps.high_confidence.hg38.vcf will also contain some indels that are useful for Indel Realignment.

    Please also note, it is no longer necessary to run Indel Realignment step. Have a look at this blog post.


  • oss10oss10 Member

    I am trying to perform base recalibration on Star 2pass aligned bam files which I want to use for variant calling. I am using ucsc hg19 reference. For known sites for hg19, I got the dbsnp, 1000G and mills vcfs from GATK bundle.

    When I try to use these vcfs as input for known sites in the Base recalibration step, I get contig order mismatch with reference genome error. I have tried to re-download the vcf, perform liftover using CrossMap and ucsc liftover tool. I also tried to sort the vcf and remove the index after sorting (as recommended in various threads related to contig order mismatch) but nothing is working.

    If I carry on without the base recalibration step, I still do get the vcfs from haplotype caller so everything in the pipeline seems to be fine except the base recalibration step.

    I also tried to use GRCh38 reference but I got the same issue with contig mismatch for dbsnp vcf and reference during base recalibration step.

    Could you please help me? Am I doing something wrong?
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @oss10

    Please post the exact command you are using, the version of gatk and the entire error log.

  • @Sheila said:
    The 1000G_phase1.snps.high_confidence.hg38.vcf will also contain some indels that are useful for Indel Realignment.

    Hi @Sheila, maybe meanwhile something changed somewhere in the documentation, but at the moment in the 1000G_phase1.snps.high_confidence.hg38.vcf file there aren't indels (checked with vcftools keeping only indels).

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi @andresguarahino, Sheila's post is a bit old.
    Try looking for what you need in here. Can you clarify what you are trying to do so that I can be more helpful?

  • @Tiffany_at_Broad, I wrote the post for the other users, I don't have any question about it, thank you!

Sign In or Register to comment.