Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

What's the difference between b37 and hg19 resources?

Hi, all. I have questions on resource bundles.
Are the 'hg19' bundle files just liftover from 'b37' bundles in UCSC-style? If so, why are there some variants in only one version and not the other? For example, the variant 'rs34872315 (on chr1)' is in b37 version of dbsnp137.excluding_sites_after_129.vcf, but not in hg19 version. At first, I thought it's because of the differences in reference genome (vcf files in the bundle are fit for the accompanying reference sequences). But the reference chromosome 1 was the same in both bundles. Can you help me to understand the difference between b37 and hg19 resource bundles?

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    The two versions of the reference genomes are not exactly the same. There are a few differences, for example some bases that are flipped between strands. That is why we have liftover chain files to convert between the two versions. So there may be a few variants that are filtered out in one version relative to the other. But this should affect only a tiny proportion of variants.

  • ihleeihlee Member

    Yes, I expected a few differences in variant sets between one version and the other. But then, why were the accompanying reference sequence files the same? I diffed chromosome by chromosome between two fasta files, and only differences were chromosome M and Y.

  • Hi, I couldn't find the hg19Tob37 chain file in the bundle resources. Could you please let me know its updated location? Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Everything we provide should be in the resource bundle; if it's not in there, we don't have it. Did you look in the bundle on our FTP, or just in the small resources directory in the downloaded package?

  • mr_davemr_dave MarylandMember

    For those searching in 2017, the liftover chain files are not in the bundle, you can find them at
    ftp://ftp.broadinstitute.org/Liftover_Chain_Files

  • nandannandan Member

    Hi mr_dave,

    I was trying to get the liftover chain files for converting my vcf files generated by

         mapping to hg19 reference to b37 so that I can use the GATK tools for annotation etc.
    

    However the ftp link provided above does not seem to work. (ftp://ftp.broadinstitute.org/Liftover_Chain_Files)
    I will appreciate any assistance,

    regards,

    Nandan

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @nandan,

    You can find lifeover chain files at the UCSC website.

  • kianaleekianalee Tucson, Arizona, Unitesd StatesMember
    Hi,

    I am trying to acquire a hg19 interval list for WGS.

    I downloaded the available WGS interval list for hg38 resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list and converted it into BED format:

    sed 's/:\|-/\t/gi' resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list > resources_broad_hg38_v0_wgs_calling_regions.hg38.bed

    I then uploaded this onto the UCSC liftover tool, but it said this format was unsupported even though it requested a BED format.

    Are there any other ways to acquire an hg19 interval list for WGS?
Sign In or Register to comment.