Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Why does SamReader/SamLocusIterator generate inconsistent results on Ubuntu 14 but not Ubuntu 12?

Phil ShapiroPhil Shapiro CambridgeMember


I’ve been trying to track down an issue that cropped up when we were validating our pipeline on a newer system. We have a test that produces different output each time it’s run (it seems to cycle randomly between five different outputs), but only on ubuntu 14. The same test produces the same result every time on ubuntu 12, and when run on a Mac OS X desktop.

I was able to create an isolated test using only SamReader and SamLocusIterator that simply iterates over the BAM and writes out every locus to a text file. This file exhibits the same behavior, e.g. it cycles between five different outputs.

The test code is here:


Note that this will generate a lot of output, so it’s best to run with a small BAM file. My test file of 170M produces an 800M output file.

I tested using JDK 8u51 on all machines, and am using the latest picard tools (1.138) although it originally showed up using a much earlier version of picard tools (1.93) and JDK 7.

I’m not a bioinformatician so I don’t really know what the expected behavior is, although I couldn’t find the source of this variation by looking at the SAM tools source. If this is expected behavior that is OK, but I’d like confirmation of it before considering that the validation is complete. Better yet would be a way to disable or control this behavior to allow for consistent test results.


Phil Shapiro



  • SheilaSheila Broad InstituteMember, Broadie admin

    @Phil Shapiro
    Hi Phil Shapiro,

    Can you please describe what exactly the 5 different outputs are? Maybe you can post snippets of the the 5 different outputs where they differ?


  • Phil ShapiroPhil Shapiro CambridgeMember

    I'm happy to, although without the original BAM and interval list files I'm not sure how helpful it will be. We were able to reproduce it using a few different BAM files so it should be reproducible as long as you have an Ubuntu 14 install to try it on.

    /tmp/test $ cmp raw_itertor_data.out.1 raw_itertor_data.out.2 raw_itertor_data.out.1 raw_itertor_data.out.2 differ: byte 147945346, line 10185157 /tmp/test $ awk 'NR>10185155{print};NR==10185160{exit};' raw_itertor_data.out.1 chr3:382591 chr3:382592 chr3:382593 chr3:382594 chr3:382595 /tmp/test $ awk 'NR>10185155{print};NR==10185160{exit};' raw_itertor_data.out.2 chr3:382591 chr3:382720 chr3:382721 chr3:382722 chr3:382723 /tmp/test $ cmp raw_itertor_data.out.1 raw_itertor_data.out.3 raw_itertor_data.out.1 raw_itertor_data.out.3 differ: byte 147944578, line 10185093 /tmp/test $ cmp raw_itertor_data.out.1 raw_itertor_data.out.4 raw_itertor_data.out.1 raw_itertor_data.out.4 differ: byte 147981406, line 10188162 /tmp/test $ cmp raw_itertor_data.out.1 raw_itertor_data.out.5 raw_itertor_data.out.1 raw_itertor_data.out.5 differ: byte 147939848, line 10184699 /tmp/test $

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Phil Shapiro

    I am not sure what is going on here, but I think I will have the Picard team look at this. Can you submit a bug report? Instructions are here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report


  • Phil ShapiroPhil Shapiro CambridgeMember

    I quickly ran into a lack of knowledge on my part when trying to create a small BAM file. I was trying to use GATK PrintReads and got an error that a reference (fasta) file was required, but I don't have one for my data file. The error I got was "Walker requires a reference but none was provided."

    Is this file something I can generate from the BAM? Sorry that I don't know much about these tools, as I said before I'm not a bioinformatician and I don't really know much about these tools or file formats.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Phil, please see the FAQs about input requirements (in the Guide section). There are some basic things you will need to learn in order to use these tools, otherwise things won't make sense and you'll run into a lot of these issues.

Sign In or Register to comment.