What's the difference between b37 and hg19 resources?

ihlee

Hi, all. I have questions on resource bundles.
Are the 'hg19' bundle files just liftover from 'b37' bundles in UCSC-style? If so, why are there some variants in only one version and not the other? For example, the variant 'rs34872315 (on chr1)' is in b37 version of dbsnp137.excluding_sites_after_129.vcf, but not in hg19 version. At first, I thought it's because of the differences in reference genome (vcf files in the bundle are fit for the accompanying reference sequences). But the reference chromosome 1 was the same in both bundles. Can you help me to understand the difference between b37 and hg19 resource bundles?



  Geraldine_VdAuwera

    Hi there,

    The two versions of the reference genomes are not exactly the same. There are a few differences, for example some bases that are flipped between strands. That is why we have liftover chain files to convert between the two versions. So there may be a few variants that are filtered out in one version relative to the other. But this should affect only a tiny proportion of variants.

    

  ihlee

    Yes, I expected a few differences in variant sets between one version and the other. But then, why were the accompanying reference sequence files the same? I diffed chromosome by chromosome between two fasta files, and only differences were chromosome M and Y.

  trptyrphe

    Hi, I couldn't find the hg19Tob37 chain file in the bundle resources. Could you please let me know its updated location? Thanks.

  Geraldine_VdAuwera

    Everything we provide should be in the resource bundle; if it's not in there, we don't have it. Did you look in the bundle on our FTP, or just in the small resources directory in the downloaded package?

    

