Companion Utilities: ReorderSam

delangeldelangel Posts: 71GATK Developer mod
edited March 2013 in Methods and Workflows

ReorderSam

The GATK can be particular about the ordering of a BAM file. If you find yourself in the not uncommon situation of having created or received BAM files sorted in a bad order, you can use the tool ReorderSam to generate a new BAM file where the reads have been reordered to match a well-ordered reference file.

java -jar picard/ReorderSam.jar I= lexicographc.bam O= kayrotypic.bam REFERENCE= Homo_sapiens_assembly18.kayrotypic.fasta

This tool requires you have a correctly sorted version of the reference sequence you used to align your reads. This tool will drop reads that don't have equivalent contigs in the new reference (potentially bad, but maybe not). If contigs have the same name in the bam and the new reference, this tool assumes that the alignment of the read in the new BAM is the same. This is not a lift over tool!

The tool, though once in the GATK, is now part of the Picard package.

Post edited by Geraldine_VdAuwera on

Comments

  • mehaffeymgmehaffeymg Posts: 2Member

    Hello, I am running picard-1.79 ReorderSam. The bam file was created using a non-GATK acceptable order. I have another .fa file with matching .dict file I use all the time for the GATK pipeline. However, when trying to reorder these bam files - that contain additional contigs - the ReorderSam tool reorders chrM-chrY fine and then throws this exception:
    Exception in thread "main" net.sf.picard.PicardException: New reference sequence does not contain a matching contig for chr6_ssto_hap7
    I thought these reads were supposed to be discarded by the ReorderSam tool. Suggestions?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,122Administrator, GATK Developer admin

    I would suggest asking the Picard team, who are responsible for supporting this tool. We can't help you with this, sorry.

    Geraldine Van der Auwera, PhD

  • ZakiZaki Posts: 4Member

    @Geraldine_VdAuwera Dear Geraldine, Could I ask.. Once a .bam has been realigned and recalibrated with GATK tool (Base Recalibration), is it safe to re-apply SortSam & ReorderSam on the .bam file?

    Would the re-application of SortSam & ReorderSam on a recalibrated .bam undo any changes?

    many thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,122Administrator, GATK Developer admin

    Hi @Zaki,

    I'm not sure why you would want to do that, but it would not undo the recalibration, no.

    Geraldine Van der Auwera, PhD

  • YubinTseYubinTse BostonPosts: 4Member

    now we are using hg19, how can I get data similar to REFERENCE= Homo_sapiens_assembly18.kayrotypic.fasta ?

  • YubinTseYubinTse BostonPosts: 4Member

    @YubinTse said:
    now we are using hg19, how can I get data similar to REFERENCE= Homo_sapiens_assembly18.kayrotypic.fasta ?

    or is hg19.fa is kayrotypic or lexicographic?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,122Administrator, GATK Developer admin

    @‌YubinTse

    Do you mean you're looking for the hg19 reference file? If so, we provide it in our resource bundle (see FAQs for details on how to get it).

    Geraldine Van der Auwera, PhD

  • YubinTseYubinTse BostonPosts: 4Member

    @Geraldine_VdAuwera said:
    @‌YubinTse

    Do you mean you're looking for the hg19 reference file? If so, we provide it in our resource bundle (see FAQs for details on how to get it).

    well, I got the hg19 reference from UCSC. However, AFTER I use ReorderSam, I found it is also lexicographic instead of kayrotypic.like this
    INFO 2014-07-29 19:08:38 ReorderSam Reordering read contig chr1 [index=0] to => ref contig chr1 [index=0]
    INFO 2014-07-29 19:08:38 ReorderSam Reordering read contig chr10 [index=1] to => ref contig chr10 [index=1]
    INFO 2014-07-29 19:08:38 ReorderSam Reordering read contig chr11 [index=2] to => ref contig chr11 [index=2]

    I don't know why.

  • HasaniHasani GermanyPosts: 20Member

    Hi,

    I'm facing the problem with GATK, it does not accept my sorted bam. I did the following:

    (1) Add groupID :

    java -jar $PICARDPATH/AddOrReplaceReadGroups.jar \
    I=$infile \
    O=$o_picard/$outfile.grID.bam \
    LB=anything \
    PL=anything \
    PU=anything \
    SM=anything

    (2) Reorder and Index bam file:

    java -jar $PICARDPATH/ReorderSam.jar \
    I= $o_picard/$outfile.grID.bam \
    O= $o_picard/$outfile.sorted.grID.bam
    samtools index $o_picard/$outfile.sorted.grID.bam

    (3) Call GATK:

    java -Xmx4g -jar $GATKPATH/GenomeAnalysisTK.jar \
    -T RealignerTargetCreator \
    -R $Ref \
    -I $o_picard/$outfile.sorted.grID.bam \
    -o $o_realign/list

    I'm not sure why I still get the error MESSAGE: Lexicographically sorted human genome sequence detected in reference.

    Neither sorting with samtools or reordering is working! dict is already generated to the reference.

    Could you please help?

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,122Administrator, GATK Developer admin

    @Hasani, your ReorderSam command is not reordering anything. You're not providing it with the reference that you want to reorder against.

    Geraldine Van der Auwera, PhD

  • YubinTseYubinTse BostonPosts: 4Member

    @Geraldine_VdAuwera
    HI,
    I download hg.19 again, this time it works! yes, it seems the hg19.fa in ucsc's net is not right. Thank you.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,122Administrator, GATK Developer admin

    @Hasani, look at the manual page for ReorderSam here: http://picard.sourceforge.net/command-line-overview.shtml#ReorderSam

    When you write the ReorderSam command, you need to include the REFERENCE= argument. Give it the reference you want to use with GATK (the one you downloaded from our bundle).

    Geraldine Van der Auwera, PhD

  • HasaniHasani GermanyPosts: 20Member

    Thank you! The -REFERENCE parameter was deleted accidentally while I was writing this thread but it is there in my script!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,122Administrator, GATK Developer admin

    OK, so now please write the full command you ran, with the actual file names (not $ref), and the output of the command.

    Geraldine Van der Auwera, PhD

  • HasaniHasani GermanyPosts: 20Member

    Hello again,

    first, many thanks for helping, I finally figured out, what caused the sorting error.

    I first sorted the reads using samtools and then filtered out those who do not overlap certain genes. The sorting method samtools uses is not recognizable by GATK [1] and picard reordering did not help.

    Therefore, I sorted the reads first using picard then ran the above pipeline which went finally problem-free.

    I'm using the ucsc.hg19 prvided in bundle/2.8/hg19/ as my -REFERENCE; next step will be trying my own downloaded ucsc.h19 that I created dict and fai for it.

    Thanks again!


    [1] https://www.biostars.org/p/8392/

  • lcollopylcollopy Posts: 2Member

    I am having an issue because my hg19.fa was not karyotypic so I downloaded a new version from the GATK FTP and re-indexed it. It looks in order to me but I am getting the same error telling me that:

    Lexicographically sorted human genome sequence detected in reads

    How can I sort my reference file? I have managed to sort my .bam file but can't do the same for the reference.

  • SheilaSheila Broad InstitutePosts: 1,001Member, GATK Developer, Broadie, Moderator admin
Sign In or Register to comment.