Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Unaligned sequence for the human genome

jwhitejwhite MEEIMember
edited April 2013 in Ask the GATK team

Hello,
The UCSC downloads site for human genome sequence includes unaligned sequence data (both "random" and "chrUn".). For completeness, we include these sequences in our indexed, reference genome. In order to use GATK, we have to resort the chromosome order and reindex the genome. However, we are unsure where to place the unaligned sequence in the chromosome order to make GATK happy.

How have these additional sequences been handled at the Broad?

Joe White
MEEI

Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Joe,

    As far as the GATK is concerned, we just require that chr1 < chr2 < chr10 for human references, and that the reference dictionary be "compatible" with the dictionaries for the reads (compatible meaning: that there are common contigs, and these common contigs have the same lengths, occur in the same relative order, and (if there are intervals), occur at the same indices in the dictionaries).

    We have some sequences with additional contigs and typically those are just appended after the "standard" chromosomes.

  • jwhitejwhite MEEIMember

    So the order should be something like:

    chr1.fa chr2.fa chr2_random.fa chr3.fa ... chrM.fa chrX.fa chrY.fa chrUn_12345.fa ...

    Correct?
    (chrM: mitochondrial)

    How do we know dictionary for the reads? If the reads have been aligned against a genome, then the dictionary should be the same--right?

    Joe

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Yes, that order looks good. For the reads IIRC you can look it up in bam header. Reads aligned to a reference should have the same dictionary, yes.

Sign In or Register to comment.