Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

Read the number of expected bins( 65600) but still had more elements in file with IlluminaBclToFastq

rdubinrdubin Albert Einstein College of MedicineMember

Has this issue (previously reported as https://gatkforums.broadinstitute.org/gatk/discussion/10168/picard-illuminabasecallstosam-clocs-file-issue-more-elements-than-expected ) been fixed? I have just tried using IlluminaBclToFastq in picard tools 2.15.0/java.1.8.0_20 and had this same error: Read the number of expected bins( 65600) but still had more elements in file. However, when I use picard tools version 1.119/java.1.7.0_67 there is no problem and the IlluminaBclToFastq completes without issue. In both cases I use NUM_PROCESSORS=4 . Any ideas? Should I simply upgrade to the newest version of picard tools?
Thanks.
Rob Dubin
[email protected]

Answers

  • shleeshlee ✭✭✭✭✭ CambridgeMember, Broadie ✭✭✭✭✭

    Hi @rdubin,

    This appears to be a bug. Can you test this with Picard v2.17.1, which is the latest release? If you still get the same error, then would you mind helping us out by submitting a bug report following instructions at left? And let us know if such a bug report would be unwieldy. Thanks.

  • shleeshlee ✭✭✭✭✭ CambridgeMember, Broadie ✭✭✭✭✭

    Hi @rdubin,

    Just to let you know we have a developer at hand ready to test your data on our side to see what may be the issue. So please follow instructions on https://software.broadinstitute.org/gatk/guide/article?id=1894 to upload some test data for us. Thank you.

  • rdubinrdubin Albert Einstein College of MedicineMember

    I'll first try installing picard v2.17.1 and try that out to see if the issue is resolved. Thanks. -Robert

  • rdubinrdubin Albert Einstein College of MedicineMember

    I tested picard tools v2.17.1 using the same sequencing run as I did before, and I saw the same error described above. However, I then tested v2.17.1 on two other, unrelated, sequencing runs, and v2.17.1 worked just fine - no errors at all. It appears that the error issue described above is restricted to a single sequencing run. I'm not sure why that is, since this sequencing run that generated an error with v2.17.1 and v2 15.0 ran fine (no errors) with v1.119. -Rob

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    Hi Robert, FYI the latest version of GATK (4.0) includes a copy of the Picard tools.

  • shleeshlee ✭✭✭✭✭ CambridgeMember, Broadie ✭✭✭✭✭

    Hi @rdubin,

    So there is something specific to your data for the tool to error with v2. but not v1. It's really hard to say what is going on without some of your data for us to examine. Again, would it be possible to get a small piece of your data that recapitulates the tool behavior? We have instructions to submit to our FTP site at https://software.broadinstitute.org/gatk/guide/article?id=1894.

  • rdubinrdubin Albert Einstein College of MedicineMember

    I'm still sometimes observing a problem with IlluminaBasecallsToFastq. So, on some, but NOT on all sequencing runs, I observe this error: Exception in thread "pool-2-thread-4" picard.PicardException: Read the number of expected bins( 65600) but still had more elements in file( /home/svc_wasp/wasp-home/data/180403_SN7001401_0474_ACACV7ACXX/Data/Intensities/L003/s_3_1104.clocs)
    Interestingly, for most of my sequencing runs, I do NOT observe this problem!! However, for some sequencing runs however, this IlluminaBasecallsToFastq error occurs. And if I re-run IlluminaBasecallsToFastq on a problematic sequencing run, I again see the same exact error, so this problem, if present, is reproducible on a problematic sequencing run. If the run is problematic, I observe this error if I use picard 2.17.1 and I also observe this error if I use the picard tools located within GATK 4.0 (both compiled with Java 1.8). However, I can properly process the problematic sequencing run if I use picard 1.119 (compiled with java 1.7) and I can also properly process the problematic sequencing run using Illumina's bcl2Fastq2 tool, indicating to me that the files in the so-called "problematic" sequencing run are really OK. Also, when we examine the supposed problem file (in this case, that file was L003/s_3_1104.clocs) we see no reason to suspect that the specific file has any issues. The only thing I can think of is that picard 2.17.1 and GATK 4 were compiled with Java1.8. But to reiterate, this issue, when using picard v2.17.1 IlluminaBasecallsToFastq does NOT appear with every sequencing run we process; many runs process just fine using picard v2.17.1; to obtain data for problem runs we just revert to v1.119. Any suggestions would be greatly appreciated. (I don't really know how I could provide you with files so that you could reproduce this error, as it is a sequencing run, rather than a single file issue.) Thanks again, Robert Dubin

  • rdubinrdubin Albert Einstein College of MedicineMember

    Hi @shlee,

    We've tried to run it using the latest version 2.18.2 and still no luck, same error when we tried version 2.3.0, 2.1.1, and 1.141 (all these version are compiled with Java 1.8); the only thing working here is version 1.119 with Java 1.7. So as you instructed, I upload the piece of data to your FTP server which should be enough to show you the exception, as well as the command and the full output log in txt files. The file names all start with "rdubin_bug_report". Please take a look and let us know if you need any more information.

    Thanks!

    Issue · Github
    by Sheila

    Issue Number
    3052
    State
    closed
    Last Updated
    Assignee
    Array
    Closed By
    chandrans
  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @rdubin
    Hi Robert,

    Thanks. I will take a look soon.

    -Sheila

  • rdubinrdubin Albert Einstein College of MedicineMember

    Hi Sheila,
    Were you and your team able to take a look at this issue? If so, were you able to re-create the problem using the files we sent?
    Thanks,
    robert dubin

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @rdubin
    Hi Robert,

    Sorry, I have been a bit behind on bugs. We are trying a new system out, and it is working now. I will allot some time to bugs tomorrow. I should get back to you this week.

    -Sheila

  • rdubinrdubin Albert Einstein College of MedicineMember

    Hi Sheila,
    I realize things are hectic. Any chance you may be able to determine whether you can re-create the problem we observed using the files we sent to you?
    Thank you for your help.
    Rob

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @rdubin
    Hi Rob,

    I am testing your files right now :smile: I will get back to you asap. Thanks for posting again.

    -Sheila

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @rdubin
    Hi again Rob,

    Great news :smile: This has been fixed in version 2.18.4.

    If you look at the release notes, it looks like in 2.18.3 the ability to process a single tile was added. This must have been causing your original issue.

    -Sheila

  • rdubinrdubin Albert Einstein College of MedicineMember

    Hi Shelia,
    We just downloaded version 2.18.4 and we still observe the issue that we reported, when using the files we provided to you and the commands that we provided. Is it possible that you misunderstood our issue? The basic issue is NOT an ability to process a single tile. We merely used that ability to work with a single tile in order to simplify the fileset that we needed to send to you, in order to minimize the files needed to re-create the problem. Recall that with the files we provided in the rdubin_bug_report on your FTP server, we see no error using v1.119 and we see the error in versions 2.18.2 and version 2.17.1 (and now also in version 2.18.4). Were you able to demonstrate NO ERROR with v1.119 when we process the entire sequence run; were you able to demonstrate the error that we described with versions 2.18.2, 2.18.3, and 2.17.1; were you able to demonstrate NO ERROR in 2.18.4?
    Thanks for your continued help.
    Rob

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @rdubin
    Hi Rob,

    I did not get an error with 2.18.4, but after testing 2.18.2, I got no error either.

    I copied and pasted your command from the command.txt and that gives no errors. I also copied and pasted the command from your report_log.txt and that give no errors either for both versions.

    I am not sure what I could be missing?

    -Sheila

  • rdubinrdubin Albert Einstein College of MedicineMember

    Hi Sheila,
    This is very much unexpected. I do not understand why you are not observing the error that we observe. Do you know what version of java you use? Can you think of a reason for this difference?
    -Thanks, Rob

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @rdubin
    Hi Rob,

    I used JAVA 8. Are you using the same? I am not sure why you are having issues. Can you try running your exact command you sent me again on the test file? Is it possible the test file does not error, but your larger file does?

    Thanks,
    Sheila

  • rdubinrdubin Albert Einstein College of MedicineMember

    Hi Sheila,

    Yes, we see the error in the test file that we sent to you AND when we run the entire flow cell/sequence run (I assume that when you say the larger file you mean the entire sequence run). We see the same exact error with the test file or when processing the entire sequence run.

    In fact, oddly enough, we just saw a similar issue today with a clocs file, on a new sequence run; again, this problem is very rare for us, but it sometimes appears. Most runs do NOT produce this error. But once and a while it appears. (We see it only using new versions of picard but not when using 1.119. Indeed, we saw today's new error when we used picard 2.17.1 but when we re-ran that new sequencing run with picard 1.119, there was no problem at all.)

    As far as versions of java, when we test picard tools on a laptop, it's java 1.8.0_25 and on our cluster we use java 1.8.0_20. (However, on the cluster, when we run picard 1.119, it's java 7 version 1.7.0_67.)

    -Rob

  • Hi Sheila,

    I'm a co-worker of Rob and trying to figure out this weird picard problem. I tried to run picard on various environments and see if we can reproduce the error. And it seems that after upgrading java to the latest 1.8.0_171, all picard versions (2.17.1 and 2.18.6) works with no error. Therefore I think it's due to some bug in the earlier version of java and getting fixed by some update between java 1.8.0_20 and 1.8.0_171. Hope this information might be helpful to you.

    Thanks for you help!

  • rdubinrdubin Albert Einstein College of MedicineMember

    Hi Sheila,
    I can confirm my co-worker, AJ's, observation dated June 5. When we compile and run picard tools 2.17.1 with java 1.8.0_20 we observe the error that is described above when run against the test dataset. However, when we compile the same picard tools, 2.17.1, with java1.8.0_171, and run it on the exact same data set, we set no error. We are not sure why this is.
    Thanks for your help.
    -Rob Dubin

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @rdubin @aj_jing
    Hi,

    Thank you both for digging into this. Very odd, as I checked my Java version and it just says 1.8.0. Honestly, I don't think the team is going to do anything about this, as you found a way to get it to work. I hope you can proceed easily from now on. Perhaps this post will help too.

    -Sheila

Sign In or Register to comment.