hg38 version

Hello!

I have found in your cloud bundle (https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/) the reference fasta of the hg 38 (Homo_sapiens_assembly38.fasta). and I would like to know the version number.

Thank you in advance

Best

Tagged:

Best Answer

Answers

  • kokyriakidiskokyriakidis Member
    edited October 2018

    Hi @slee ,

    1) Can someone use the latest patch GRCh38.p12 instead of the reference file in the bundle?
    2) dbSNP151 vcf file is based on GRCh38.p7 reference. Should I use this particular patch or can I use GRCh38.p12?
    3) If I use a newer reference file (GRCh38.p12) can I use the same files from the bundle for the VQSR step for training, truth etc?

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @kokyriakidis,

    It is completely up to you which reference release you use in your analyses. If the sequences represented by the patches are important to your research, then you ought to use them. I've discussed the pros and cons of using patched releases in another thread that I cannot find at the moment. Please be sure to search the forum with your question and hopefully you will have better luck finding the discussion. In general, if these patches are not of import to your research, then you may find it much easier to use just the major release. The reason behind this is that alignments to references with a different make-up of sequences are not comparable or not easy to make comparable. So use of resources out there, e.g. dbSNP, that may not cover such regions or cover them as they are represented on the primary assembly or a major release, will become difficult for certain pipelines that rely on such resources. Unless you are willing to think out every step and the impact of including patches, again, you may find it easier to use the major release version.

  • Hi @shlee

    Patches do not change the chromosomal coordinates right? They only add information. GRCh38.p12 has the latest and most accurate information right? This is the case for dbSNP151 too. So, wouldn't be the best to do all my research with the latest patch?

    I didn't understand what you meant in this part. Can you explain it a bit more?:
    "The reason behind this is that alignments to references with a different make-up of sequences are not comparable or not easy to make comparable. So use of resources out there, e.g. dbSNP, that may not cover such regions or cover them as they are represented on the primary assembly or a major release, will become difficult for certain pipelines that rely on such resources."

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @kokyriakidis,

    No the patches do not change the primary assembly chromosomal coordinates as they are added as separate contigs in the minor releases. Remember there are two different types of patches--fix and novel. Depending on how you are aligning reads, reads that map to the patch region may become secondary alignments because they now align to multiple locations in the reference. GATK tools exclude secondary alignments from analysis. There are new ways of aligning that allow alternate-contig awareness that you would want to consider. See here and here.

Sign In or Register to comment.