It looks like you're new here. If you want to get involved, click one of these buttons!
I have read this on GATK's documents:
Human sequence
If you are using human data, your reads must be aligned to one of the official b3x (e.g. b36, b37) or hg1x (e.g. hg18, hg19) references. The contig ordering in the reference you used must exactly match that of one of the official references canonical orderings. These are defined by historical karotyping of largest to smallest chromosomes, followed by the X, Y, and MT for the b3x references; the order is thus 1, 2, 3, ..., 10, 11, 12, ... 20, 21, 22, X, Y, MT. The hg1x references differ in that the chromosome names are prefixed with "chr" and chrM appears first instead of last. The GATK will detect misordered contigs (for example, lexicographically sorted) and throw an error.
that said the reference order must be: chr1, chr2, chr3, ... chr22, chrX, chrY, chrM. but after I download all the bundle in GATK's ftp, I check's reference, it's with a order of :
>chrM
>chr1
>chr2
>chr3
>chr4
>chr5
>chr6
>chr7
>chr8
>chr9
>chr10
>chr11
>chr12
>chr13
>chr14
>chr15
>chr16
>chr17
>chr18
>chr19
>chr20
>chr21
>chr22
>chrX
>chrY
...
so, is it contradictory?
Geraldine_VdAuwera
Posts: 2,239 admin
Hi there,
Our bundle includes both the b37 and hg19 references. The article mainly describes the b37 version, which we work with most. You seem to be talking about the hg19 version. What you're seeing is described in this line of the article:
The hg1x references differ in that the chromosome names are prefixed with "chr" and chrM appears first instead of last.