Errors about BAM files not being ordered properly

delangeldelangel Posts: 71GATK Dev mod
edited May 15 in Common Problems

This error occurs when for example, a collaborator gives you a BAM that's derived from what was originally the same reference as you are using, but for whatever reason the contigs are not sorted in the same order .The GATK can be particular about the ordering of a BAM file so it will fail with an error in this case.

So what do you do? You use a Picard tool called ReorderSam to, well, reorder your BAM file.

Here's an example usage where we reorder a BAM file that was sorted lexicographically so that the output will be another BAM, but this time sorted karyotypically :

java -jar picard.jar ReorderSam \
    I= lexicographic.bam \
    O= kayrotypic.bam \
    REFERENCE= Homo_sapiens_assembly18.kayrotypic.fasta

This tool requires you have a correctly sorted version of the reference sequence you used to align your reads. Be aware that this tool will drop reads that don't have equivalent contigs in the new reference (potentially bad, but maybe not). If contigs have the same name in the bam and the new reference, this tool assumes that the alignment of the read in the new BAM is the same. This is not a liftover tool!

This tool is part of the Picard package.

Post edited by Geraldine_VdAuwera on

Comments

  • mehaffeymgmehaffeymg Posts: 2Member

    Hello, I am running picard-1.79 ReorderSam. The bam file was created using a non-GATK acceptable order. I have another .fa file with matching .dict file I use all the time for the GATK pipeline. However, when trying to reorder these bam files - that contain additional contigs - the ReorderSam tool reorders chrM-chrY fine and then throws this exception:
    Exception in thread "main" net.sf.picard.PicardException: New reference sequence does not contain a matching contig for chr6_ssto_hap7
    I thought these reads were supposed to be discarded by the ReorderSam tool. Suggestions?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,019Administrator, GATK Dev admin

    I would suggest asking the Picard team, who are responsible for supporting this tool. We can't help you with this, sorry.

    Geraldine Van der Auwera, PhD

  • ZakiZaki Posts: 4Member

    @Geraldine_VdAuwera Dear Geraldine, Could I ask.. Once a .bam has been realigned and recalibrated with GATK tool (Base Recalibration), is it safe to re-apply SortSam & ReorderSam on the .bam file?

    Would the re-application of SortSam & ReorderSam on a recalibrated .bam undo any changes?

    many thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,019Administrator, GATK Dev admin

    Hi @Zaki,

    I'm not sure why you would want to do that, but it would not undo the recalibration, no.

    Geraldine Van der Auwera, PhD

  • YubinTseYubinTse BostonPosts: 4Member

    now we are using hg19, how can I get data similar to REFERENCE= Homo_sapiens_assembly18.kayrotypic.fasta ?

  • YubinTseYubinTse BostonPosts: 4Member

    @YubinTse said:
    now we are using hg19, how can I get data similar to REFERENCE= Homo_sapiens_assembly18.kayrotypic.fasta ?

    or is hg19.fa is kayrotypic or lexicographic?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,019Administrator, GATK Dev admin

    @‌YubinTse

    Do you mean you're looking for the hg19 reference file? If so, we provide it in our resource bundle (see FAQs for details on how to get it).

    Geraldine Van der Auwera, PhD

  • YubinTseYubinTse BostonPosts: 4Member

    @Geraldine_VdAuwera said:
    @‌YubinTse

    Do you mean you're looking for the hg19 reference file? If so, we provide it in our resource bundle (see FAQs for details on how to get it).

    well, I got the hg19 reference from UCSC. However, AFTER I use ReorderSam, I found it is also lexicographic instead of kayrotypic.like this
    INFO 2014-07-29 19:08:38 ReorderSam Reordering read contig chr1 [index=0] to => ref contig chr1 [index=0]
    INFO 2014-07-29 19:08:38 ReorderSam Reordering read contig chr10 [index=1] to => ref contig chr10 [index=1]
    INFO 2014-07-29 19:08:38 ReorderSam Reordering read contig chr11 [index=2] to => ref contig chr11 [index=2]

    I don't know why.

  • HasaniHasani GermanyPosts: 22Member

    Hi,

    I'm facing the problem with GATK, it does not accept my sorted bam. I did the following:

    (1) Add groupID :

    java -jar $PICARDPATH/AddOrReplaceReadGroups.jar \
    I=$infile \
    O=$o_picard/$outfile.grID.bam \
    LB=anything \
    PL=anything \
    PU=anything \
    SM=anything

    (2) Reorder and Index bam file:

    java -jar $PICARDPATH/ReorderSam.jar \
    I= $o_picard/$outfile.grID.bam \
    O= $o_picard/$outfile.sorted.grID.bam
    samtools index $o_picard/$outfile.sorted.grID.bam

    (3) Call GATK:

    java -Xmx4g -jar $GATKPATH/GenomeAnalysisTK.jar \
    -T RealignerTargetCreator \
    -R $Ref \
    -I $o_picard/$outfile.sorted.grID.bam \
    -o $o_realign/list

    I'm not sure why I still get the error MESSAGE: Lexicographically sorted human genome sequence detected in reference.

    Neither sorting with samtools or reordering is working! dict is already generated to the reference.

    Could you please help?

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,019Administrator, GATK Dev admin

    @Hasani, your ReorderSam command is not reordering anything. You're not providing it with the reference that you want to reorder against.

    Geraldine Van der Auwera, PhD

  • YubinTseYubinTse BostonPosts: 4Member

    @Geraldine_VdAuwera
    HI,
    I download hg.19 again, this time it works! yes, it seems the hg19.fa in ucsc's net is not right. Thank you.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,019Administrator, GATK Dev admin

    @Hasani, look at the manual page for ReorderSam here: http://picard.sourceforge.net/command-line-overview.shtml#ReorderSam

    When you write the ReorderSam command, you need to include the REFERENCE= argument. Give it the reference you want to use with GATK (the one you downloaded from our bundle).

    Geraldine Van der Auwera, PhD

  • HasaniHasani GermanyPosts: 22Member

    Thank you! The -REFERENCE parameter was deleted accidentally while I was writing this thread but it is there in my script!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,019Administrator, GATK Dev admin

    OK, so now please write the full command you ran, with the actual file names (not $ref), and the output of the command.

    Geraldine Van der Auwera, PhD

  • HasaniHasani GermanyPosts: 22Member

    Hello again,

    first, many thanks for helping, I finally figured out, what caused the sorting error.

    I first sorted the reads using samtools and then filtered out those who do not overlap certain genes. The sorting method samtools uses is not recognizable by GATK [1] and picard reordering did not help.

    Therefore, I sorted the reads first using picard then ran the above pipeline which went finally problem-free.

    I'm using the ucsc.hg19 prvided in bundle/2.8/hg19/ as my -REFERENCE; next step will be trying my own downloaded ucsc.h19 that I created dict and fai for it.

    Thanks again!


    [1] https://www.biostars.org/p/8392/

  • lcollopylcollopy Posts: 2Member

    I am having an issue because my hg19.fa was not karyotypic so I downloaded a new version from the GATK FTP and re-indexed it. It looks in order to me but I am getting the same error telling me that:

    Lexicographically sorted human genome sequence detected in reads

    How can I sort my reference file? I have managed to sort my .bam file but can't do the same for the reference.

  • SheilaSheila Broad InstitutePosts: 1,397Member, GATK Dev, Broadie, Moderator, DSDE Dev admin
Sign In or Register to comment.