To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Random java.io.FileNotFoundException version 2.7-2-g6bda569

Hi All
We are running into some random weirdness when running jobs using SGE, GATK version 2.7-2-g6bda569, pretty much all GATK tools - but mostly IndelRealigner abd UnifiedGenotyper, we often get the following error:-

ERROR MESSAGE: Couldn't read file /scratch/project/pipelines/novorecal.bam because java.io.FileNotFoundException: /scratch/project/pipelines/novorecal.bam (No such file or directory)

This also happens for supplied reference genomes and vcf files. The GATK tool cant find them.

These "missing" files do exist, and have often even been created by the previous tool/step in the pipeline.

When we re-run the pipeline on a failed sample, it works. So we end up having to re-run our pipeline on the same set of samples multiple times and are beginning to find this very frustrating. These errors seem to be random, I cant find any pattern, and as I mentioned, when we re-run the pipeline on a failed run, it work without a hitch.

Has anyone experienced this? And if so, any recommendations?

Please help

Steve

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Steve,

    This sounds like a quirk of your platform. There are a few things you can check to troubleshoot this. For example, confirm that directories named "/scratch" are actually available across NFS-- assuming you're using network mounts. If they are NFS mounts, when do they appear? You can also try adding ls $file path before executing your GATK jobs at each step. Queue has an option at some point called "wait for parts before gather" because newly created files sometimes take a few seconds to become available on the filesystem. Considering you said re-running always works off the bat, that may well be what's happening here.

Sign In or Register to comment.