The current GATK version is 3.4-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# Errors about BAM files not being ordered properly

Posts: 71GATK Dev mod
edited May 15

This error occurs when for example, a collaborator gives you a BAM that's derived from what was originally the same reference as you are using, but for whatever reason the contigs are not sorted in the same order .The GATK can be particular about the ordering of a BAM file so it will fail with an error in this case.

So what do you do? You use a Picard tool called ReorderSam to, well, reorder your BAM file.

Here's an example usage where we reorder a BAM file that was sorted lexicographically so that the output will be another BAM, but this time sorted karyotypically :

java -jar picard.jar ReorderSam \
I= lexicographic.bam \
O= kayrotypic.bam \
REFERENCE= Homo_sapiens_assembly18.kayrotypic.fasta


This tool requires you have a correctly sorted version of the reference sequence you used to align your reads. Be aware that this tool will drop reads that don't have equivalent contigs in the new reference (potentially bad, but maybe not). If contigs have the same name in the bam and the new reference, this tool assumes that the alignment of the read in the new BAM is the same. This is not a liftover tool!

This tool is part of the Picard package.

Post edited by Geraldine_VdAuwera on
Tagged:

• Posts: 2Member

Hello, I am running picard-1.79 ReorderSam. The bam file was created using a non-GATK acceptable order. I have another .fa file with matching .dict file I use all the time for the GATK pipeline. However, when trying to reorder these bam files - that contain additional contigs - the ReorderSam tool reorders chrM-chrY fine and then throws this exception:
Exception in thread "main" net.sf.picard.PicardException: New reference sequence does not contain a matching contig for chr6_ssto_hap7
I thought these reads were supposed to be discarded by the ReorderSam tool. Suggestions?

I would suggest asking the Picard team, who are responsible for supporting this tool. We can't help you with this, sorry.

Geraldine Van der Auwera, PhD

• Posts: 4Member

@Geraldine_VdAuwera Dear Geraldine, Could I ask.. Once a .bam has been realigned and recalibrated with GATK tool (Base Recalibration), is it safe to re-apply SortSam & ReorderSam on the .bam file?

Would the re-application of SortSam & ReorderSam on a recalibrated .bam undo any changes?

many thanks

Hi @Zaki,

I'm not sure why you would want to do that, but it would not undo the recalibration, no.

Geraldine Van der Auwera, PhD

• BostonPosts: 4Member

now we are using hg19, how can I get data similar to REFERENCE= Homo_sapiens_assembly18.kayrotypic.fasta ?

• BostonPosts: 4Member

@YubinTse said:
now we are using hg19, how can I get data similar to REFERENCE= Homo_sapiens_assembly18.kayrotypic.fasta ?

or is hg19.fa is kayrotypic or lexicographic?

@‌YubinTse

Do you mean you're looking for the hg19 reference file? If so, we provide it in our resource bundle (see FAQs for details on how to get it).

Geraldine Van der Auwera, PhD

• BostonPosts: 4Member

@Geraldine_VdAuwera said:
@‌YubinTse

Do you mean you're looking for the hg19 reference file? If so, we provide it in our resource bundle (see FAQs for details on how to get it).

well, I got the hg19 reference from UCSC. However, AFTER I use ReorderSam, I found it is also lexicographic instead of kayrotypic.like this
INFO 2014-07-29 19:08:38 ReorderSam Reordering read contig chr1 [index=0] to => ref contig chr1 [index=0]
INFO 2014-07-29 19:08:38 ReorderSam Reordering read contig chr10 [index=1] to => ref contig chr10 [index=1]
INFO 2014-07-29 19:08:38 ReorderSam Reordering read contig chr11 [index=2] to => ref contig chr11 [index=2]

I don't know why.

• GermanyPosts: 22Member

Hi,

I'm facing the problem with GATK, it does not accept my sorted bam. I did the following:

java -jar $PICARDPATH/AddOrReplaceReadGroups.jar \ I=$infile \
O=$o_picard/$outfile.grID.bam \
LB=anything \
PL=anything \
PU=anything \
SM=anything

(2) Reorder and Index bam file:

java -jar $PICARDPATH/ReorderSam.jar \ I=$o_picard/$outfile.grID.bam \ O=$o_picard/$outfile.sorted.grID.bam samtools index$o_picard/$outfile.sorted.grID.bam (3) Call GATK: java -Xmx4g -jar$GATKPATH/GenomeAnalysisTK.jar \
-T RealignerTargetCreator \
-R $Ref \ -I$o_picard/$outfile.sorted.grID.bam \ -o$o_realign/list

I'm not sure why I still get the error MESSAGE: Lexicographically sorted human genome sequence detected in reference.

Neither sorting with samtools or reordering is working! dict is already generated to the reference.

Thanks!

@Hasani, your ReorderSam command is not reordering anything. You're not providing it with the reference that you want to reorder against.

Geraldine Van der Auwera, PhD

• BostonPosts: 4Member

@Geraldine_VdAuwera
HI,
I download hg.19 again, this time it works! yes, it seems the hg19.fa in ucsc's net is not right. Thank you.

@Hasani, look at the manual page for ReorderSam here: http://picard.sourceforge.net/command-line-overview.shtml#ReorderSam

When you write the ReorderSam command, you need to include the REFERENCE= argument. Give it the reference you want to use with GATK (the one you downloaded from our bundle).

Geraldine Van der Auwera, PhD

• GermanyPosts: 22Member

Thank you! The -REFERENCE parameter was deleted accidentally while I was writing this thread but it is there in my script!

OK, so now please write the full command you ran, with the actual file names (not \$ref), and the output of the command.

Geraldine Van der Auwera, PhD

• GermanyPosts: 22Member

Hello again,

first, many thanks for helping, I finally figured out, what caused the sorting error.

I first sorted the reads using samtools and then filtered out those who do not overlap certain genes. The sorting method samtools uses is not recognizable by GATK [1] and picard reordering did not help.

Therefore, I sorted the reads first using picard then ran the above pipeline which went finally problem-free.

I'm using the ucsc.hg19 prvided in bundle/2.8/hg19/ as my -REFERENCE; next step will be trying my own downloaded ucsc.h19 that I created dict and fai for it.

Thanks again!

• Posts: 2Member

I am having an issue because my hg19.fa was not karyotypic so I downloaded a new version from the GATK FTP and re-indexed it. It looks in order to me but I am getting the same error telling me that:

Lexicographically sorted human genome sequence detected in reads

How can I sort my reference file? I have managed to sort my .bam file but can't do the same for the reference.