Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

What's the difference between b37 and hg19 resources?

Hi, all. I have questions on resource bundles.
Are the 'hg19' bundle files just liftover from 'b37' bundles in UCSC-style? If so, why are there some variants in only one version and not the other? For example, the variant 'rs34872315 (on chr1)' is in b37 version of dbsnp137.excluding_sites_after_129.vcf, but not in hg19 version. At first, I thought it's because of the differences in reference genome (vcf files in the bundle are fit for the accompanying reference sequences). But the reference chromosome 1 was the same in both bundles. Can you help me to understand the difference between b37 and hg19 resource bundles?

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    The two versions of the reference genomes are not exactly the same. There are a few differences, for example some bases that are flipped between strands. That is why we have liftover chain files to convert between the two versions. So there may be a few variants that are filtered out in one version relative to the other. But this should affect only a tiny proportion of variants.

  • ihleeihlee Member

    Yes, I expected a few differences in variant sets between one version and the other. But then, why were the accompanying reference sequence files the same? I diffed chromosome by chromosome between two fasta files, and only differences were chromosome M and Y.

  • Hi, I couldn't find the hg19Tob37 chain file in the bundle resources. Could you please let me know its updated location? Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Everything we provide should be in the resource bundle; if it's not in there, we don't have it. Did you look in the bundle on our FTP, or just in the small resources directory in the downloaded package?

  • mr_davemr_dave MarylandMember

    For those searching in 2017, the liftover chain files are not in the bundle, you can find them at
    ftp://ftp.broadinstitute.org/Liftover_Chain_Files

  • nandannandan Member

    Hi mr_dave,

    I was trying to get the liftover chain files for converting my vcf files generated by

         mapping to hg19 reference to b37 so that I can use the GATK tools for annotation etc.
    

    However the ftp link provided above does not seem to work. (ftp://ftp.broadinstitute.org/Liftover_Chain_Files)
    I will appreciate any assistance,

    regards,

    Nandan

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @nandan,

    You can find lifeover chain files at the UCSC website.

  • kianaleekianalee Tucson, Arizona, Unitesd StatesMember
    Hi,

    I am trying to acquire a hg19 interval list for WGS.

    I downloaded the available WGS interval list for hg38 resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list and converted it into BED format:

    sed 's/:\|-/\t/gi' resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list > resources_broad_hg38_v0_wgs_calling_regions.hg38.bed

    I then uploaded this onto the UCSC liftover tool, but it said this format was unsupported even though it requested a BED format.

    Are there any other ways to acquire an hg19 interval list for WGS?
  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    Have you checked this: https://console.cloud.google.com/storage/browser/broad-references/hg19/v0/ location? It might contain what you need but please be sure to perform your own checks that it is indeed the content that you need.

  • 29043594952904359495 Member
    edited August 31

    It would be more helpful if you can supply the hg19 germlline resource and contamination vcf directly ,
    because a lot of people have asked about the conversion about hg19, because most of people in clinical, they use hg19 but not hg38, but you provide all the things of hg38 very well
    if you directly provide this, a lot of questions will not appear, just my own suggestion. thanks a lot
    @SChaluvadi

  • Elo_777Elo_777 LeuvenMember
    Hello GATK community,

    I have the extact same problem as @kianalee
    I downloaded the wgs_calling_regions.v1.interval_list from https://console.cloud.google.com/storage/browser/broad-references/hg19/v0/ as mentionned by @SChaluvadi , but am not sure it is the correct one

    I have interval lists for exome sequencing that are waaaaaay longer than this file (like 200.000 lines vs 714)

    so I am not sure of the interval_list to use for WGS at deep coverage, because, according to what I read, you need to provide an interval list (which basically encompass the whole genome) to run indelrealigner (I am using unified genotyper, so I need to run IndelRealigner before)

    if you have any suggestions ...

    Elodie
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi,

    The GATK support team is currently primarily focusing on resolving questions about GATK tool specific errors or abnormal results from the GATK tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and the tools.

    We cannot guarantee a reply, however we ask other community members to help out if you know the answer.

    For more information:

    https://software.broadinstitute.org/gatk/blog?id=24419

    https://gatkforums.broadinstitute.org/gatk/discussion/24417/what-types-of-questions-will-the-gatk-frontline-team-answer/p1?new=1

Sign In or Register to comment.