Our documentation websites are currently offline due to a data center fire. We do not yet have an ETA for restoring service; we’ll update this message when we know more.

Why does SamReader/SamLocusIterator generate inconsistent results on Ubuntu 14 but not Ubuntu 12?

Hi,

I’ve been trying to track down an issue that cropped up when we were validating our pipeline on a newer system. We have a test that produces different output each time it’s run (it seems to cycle randomly between five different outputs), but only on ubuntu 14. The same test produces the same result every time on ubuntu 12, and when run on a Mac OS X desktop.

I was able to create an isolated test using only SamReader and SamLocusIterator that simply iterates over the BAM and writes out every locus to a text file. This file exhibits the same behavior, e.g. it cycles between five different outputs.

The test code is here:

http://paste.openstack.org/show/412884/

Note that this will generate a lot of output, so it’s best to run with a small BAM file. My test file of 170M produces an 800M output file.

I tested using JDK 8u51 on all machines, and am using the latest picard tools (1.138) although it originally showed up using a much earlier version of picard tools (1.93) and JDK 7.

I’m not a bioinformatician so I don’t really know what the expected behavior is, although I couldn’t find the source of this variation by looking at the SAM tools source. If this is expected behavior that is OK, but I’d like confirmation of it before considering that the validation is complete. Better yet would be a way to disable or control this behavior to allow for consistent test results.

Thanks,

Phil Shapiro

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Phil Shapiro
    Hi Phil Shapiro,

    Can you please describe what exactly the 5 different outputs are? Maybe you can post snippets of the the 5 different outputs where they differ?

    Thanks,
    Sheila

  • Phil ShapiroPhil Shapiro CambridgeMember

    I'm happy to, although without the original BAM and interval list files I'm not sure how helpful it will be. We were able to reproduce it using a few different BAM files so it should be reproducible as long as you have an Ubuntu 14 install to try it on.

    /tmp/test $ cmp raw_itertor_data.out.1 raw_itertor_data.out.2 raw_itertor_data.out.1 raw_itertor_data.out.2 differ: byte 147945346, line 10185157 /tmp/test $ awk 'NR>10185155{print};NR==10185160{exit};' raw_itertor_data.out.1 chr3:382591 chr3:382592 chr3:382593 chr3:382594 chr3:382595 /tmp/test $ awk 'NR>10185155{print};NR==10185160{exit};' raw_itertor_data.out.2 chr3:382591 chr3:382720 chr3:382721 chr3:382722 chr3:382723 /tmp/test $ cmp raw_itertor_data.out.1 raw_itertor_data.out.3 raw_itertor_data.out.1 raw_itertor_data.out.3 differ: byte 147944578, line 10185093 /tmp/test $ cmp raw_itertor_data.out.1 raw_itertor_data.out.4 raw_itertor_data.out.1 raw_itertor_data.out.4 differ: byte 147981406, line 10188162 /tmp/test $ cmp raw_itertor_data.out.1 raw_itertor_data.out.5 raw_itertor_data.out.1 raw_itertor_data.out.5 differ: byte 147939848, line 10184699 /tmp/test $

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Phil Shapiro
    Hi,

    I am not sure what is going on here, but I think I will have the Picard team look at this. Can you submit a bug report? Instructions are here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report

    Thanks,
    Sheila

  • Phil ShapiroPhil Shapiro CambridgeMember

    I quickly ran into a lack of knowledge on my part when trying to create a small BAM file. I was trying to use GATK PrintReads and got an error that a reference (fasta) file was required, but I don't have one for my data file. The error I got was "Walker requires a reference but none was provided."

    Is this file something I can generate from the BAM? Sorry that I don't know much about these tools, as I said before I'm not a bioinformatician and I don't really know much about these tools or file formats.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Phil, please see the FAQs about input requirements (in the Guide section). There are some basic things you will need to learn in order to use these tools, otherwise things won't make sense and you'll run into a lot of these issues.

Sign In or Register to comment.