Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Why does SamReader/SamLocusIterator generate inconsistent results on Ubuntu 14 but not Ubuntu 12?
I’ve been trying to track down an issue that cropped up when we were validating our pipeline on a newer system. We have a test that produces different output each time it’s run (it seems to cycle randomly between five different outputs), but only on ubuntu 14. The same test produces the same result every time on ubuntu 12, and when run on a Mac OS X desktop.
I was able to create an isolated test using only SamReader and SamLocusIterator that simply iterates over the BAM and writes out every locus to a text file. This file exhibits the same behavior, e.g. it cycles between five different outputs.
The test code is here:
Note that this will generate a lot of output, so it’s best to run with a small BAM file. My test file of 170M produces an 800M output file.
I tested using JDK 8u51 on all machines, and am using the latest picard tools (1.138) although it originally showed up using a much earlier version of picard tools (1.93) and JDK 7.
I’m not a bioinformatician so I don’t really know what the expected behavior is, although I couldn’t find the source of this variation by looking at the SAM tools source. If this is expected behavior that is OK, but I’d like confirmation of it before considering that the validation is complete. Better yet would be a way to disable or control this behavior to allow for consistent test results.