It looks like you're new here. If you want to get involved, click one of these buttons!
drchriscole
Posts: 9Member ✭
Hi,
I had a seemingly random RUNTIME ERROR previously (mentioned on this thread http://gatkforums.broadinstitute.org/discussion/1860/runtime-error-in-baserecalibrator-version-2-2-8-g99996f2), but I'm seeing this more and more for UnifiedGenotyper runs.
It seems that every 6-10 runs I get one or two that fail as below. Re-runinng seems to work correctly, but this is annoying to find that a 12hr job has died randomly. Please can this be investigated.
java.lang.NullPointerException at java.util.concurrent.locks.AbstractQueuedSynchronizer.hasQueuedPredecessors(AbstractQueuedSynchronizer.java:1453) at java.util.concurrent.locks.ReentrantLock$FairSync.tryAcquire(ReentrantLock.java:240) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1136) at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:229) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at java.util.concurrent.PriorityBlockingQueue.peek(PriorityBlockingQueue.java:286) at org.broadinstitute.sting.utils.nanoScheduler.Reducer.reduceNextValueInQueue(Reducer.java:89) at org.broadinstitute.sting.utils.nanoScheduler.Reducer.reduceAsMuchAsPossible(Reducer.java:120) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$MapReduceJob.run(NanoScheduler.java:510) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636)
My commandline is this: java -Xmx8g -jar GenomeAnalysisTKLite.jar -nct 4 -T UnifiedGenotyper --genotype_likelihoods_model BOTH --genotyping_mode DISCOVERY -R fastafile -I infile -o outfile --dbsnp dbsnp_137.b37.vcf -stand_call_conf 30 -stand_emit_conf 30
This usually coincides with errors is calling home code, although I don't know if it's cause or effect.
"INFO 16:41:33,533 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty INFO 16:41:33,534 HttpMethodDirector - Retrying request INFO 16:41:33,716 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty INFO 16:41:33,716 HttpMethodDirector - Retrying request INFO 16:41:33,897 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty INFO 16:41:33,897 HttpMethodDirector - Retrying request INFO 16:41:34,077 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty INFO 16:41:34,085 HttpMethodDirector - Retrying request INFO 16:41:34,267 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty INFO 16:41:34,267 HttpMethodDirector - Retrying request "
Answers
Can you tell me what OS and version of java you're using?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Linux CentOS5.5 and Java 1.6.0_32
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hmm, nothing strange there. Unfortunately we can only take a closer look at this if you isolate a snippet of bam file that reliably reproduces the error, could you do that? Otherwise it may be that your platform simply does not handle nct properly. If so you'll need to take a look at the other parallelism options we offer (see new docs posted yesterday).
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •That's the problem. It's not consistent. If I rerun problematic jobs they usually complete fine. However, from running this on many samples I get about 1/6 jobs fail the first time.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I see; unfortunately we can't investigate it if we can't reproduce the error locally. Have a look at the other parallelism options here:
http://www.broadinstitute.org/gatk/guide/article?id=1975
You may want to look into using Queue. You can use scatter-gather to parallelize your jobs, and Queue will automatically retry failed jobs.
Also, note that we've changed the implementation of nct that will be in release 2.4, so when it's out you should try nct again and see whether the issue persists or not.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •THanks.
Any idea when 2.4 will be releaseD?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Our usual release cycle is about 6 weeks, and we just released 2.3 this week, so I wouldn't expect it before early February. I realize that's quite a while away when you're trying to get things done, but unfortunately the change is too big for a mere patch. There's also no guarantee that it will solve your problem. I hope the other parallelism options will help!
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •