The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.

What's in the resource bundle and how can I get it?

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,395Administrator, Dev admin
edited August 17 in FAQs

NOTE: we are currently working on overhauling the bundle to 1) add support for Hg38 and 2) document the provenance of the resource files more fully.


1. Accessing the bundle

See the Resource Bundle page. In a nutshell, there's a Google Cloud bucket and an FTP server. We only provide Hg38 resources in the cloud bucket; the rest is only available through the FTP server.


2. Grch38/Hg38 Resources: the soon-to-be Standard Set

  • In the cloud
    All the resource files needed for Best Practices short variant discovery in whole-genome sequencing data (WGS). Exome files coming soon. Detailed documentation in progress (ToW Aug 2016), stay tuned for announcements.

  • On the FTP
    A beta version of the bundle that has some extra files (compared to the Cloud bucket) but is not officially supported. Will mirror the cloud bundle once that project is completed.


All resources below this are available only on the FTP server, not on the cloud.


3. b37 Resources: the Standard Data Set pending completion of the Hg38 bundle

  • Reference sequence (standard 1000 Genomes fasta) along with fai and dict files

  • dbSNP in VCF. This includes two files:

    • A recent dbSNP release (build 138)
    • This file subsetted to only sites discovered in or before dbSNPBuildID 129, which excludes the impact of the 1000 Genomes project and is useful for evaluation of dbSNP rate and Ti/Tv values at novel sites.
  • HapMap genotypes and sites VCFs

  • OMNI 2.5 genotypes for 1000 Genomes samples, as well as sites, VCF
  • The current best set of known indels to be used for local realignment (note that we don't use dbSNP for this anymore); use both files:

    • 1000G_phase1.indels.b37.vcf (currently from the 1000 Genomes Phase I indel calls)
    • Mills_and_1000G_gold_standard.indels.b37.sites.vcf
  • The latest set from 1000G phase 3 (v4) for genotype refinement: 1000G_phase3_v4_20130502.sites.vcf

  • A large-scale standard single sample BAM file for testing:

    • NA12878.HiSeq.WGS.bwa.cleaned.recal.b37.20.bam containing ~64x reads of NA12878 on chromosome 20
    • A callset produced by running UnifiedGenotyper on the dataset above. Note that this resource is out of date and does not represent the results of our Best Practices. This will be updated in the near future.
  • The Broad's custom exome targets list: Broad.human.exome.b37.interval_list (note that you should always use the exome targets list that is appropriate for your data, which typically depends on the prep kit that was used, and should be available from the kit manufacturer's website)

Additionally, these files all have supplementary indices, statistics, and other QC data available.


4. hg19 Resources: lifted over from b37

Includes the UCSC-style hg19 reference along with all lifted over VCF files.


5. hg18 Resources: lifted over from b37

Includes the UCSC-style hg18 reference along with all lifted over VCF files. The refGene track and BAM files are not available. We only provide data files for this genome-build that can be lifted over "easily" from our master b37 repository. Sorry for whatever inconvenience that this might cause.

Also includes a chain file to lift over to b37.


6. b36 Resources: lifted over from b37

Includes the 1000 Genomes pilot b36 formatted reference sequence (human_b36_both.fasta) along with all lifted over VCF files. The refGene track and BAM files are not available. We only provide data files for this genome-build that can be lifted over "easily" from our master b37 repository. Sorry for whatever inconvenience that this might cause.

Also includes a chain file to lift over to b37.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Tagged:

Issue · Github
by Geraldine_VdAuwera

Issue Number
1070
State
open
Last Updated
Assignee
Array

Comments

Sign In or Register to comment.