Clock drift error

I'm trying to use genotypeGCFs on 350 small gVCF files (from bacterial genomes) and I'm getting this "clock drift" error:

INFO 13:02:46,006 ProgressMeter - chr1:44001 0.0 3.0 h 15250.3 w 2.0% 6.3 d 6.2 d
INFO 13:06:30,421 ProgressMeter - chr1:44001 0.0 3.0 h 15250.3 w 2.0% 6.4 d 6.3 d
INFO 14:38:28,521 ProgressMeter - chr1:44001 0.0 3.1 h 15250.3 w 2.0% 6.6 d 6.5 d
WARN 16:24:14,924 SimpleTimer - Clock drift of -1,425,286,806,691,525,163 - -1,425,286,778,358,860,637 = 28,332,664,526 nanoseconds detected, vs. max allowable drift of 5,000,000,000. Assuming checkpoint/restart event.
WARN 18:23:08,307 SimpleTimer - Clock drift of -1,425,286,778,358,827,793 - -1,425,286,806,691,525,163 = 28,332,697,370 nanoseconds detected, vs. max allowable drift of 5,000,000,000. Assuming checkpoint/restart event.
INFO 22:55:24,502 ProgressMeter - chr1:44001 0.0 0.0 s 0.0 s 2.0% 0.0 s 0.0 s
INFO 02:46:17,060 ProgressMeter - chr1:44001 0.0 14.1 h 15250.3 w 2.0% 4.3 w 4.2 w

Any ideas what that is?

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hm. Are you using multithreading? (either -nt or -nct in your command line)

  • I'm using -nt 1 (so basically one thread). I found with nt higher it was using too much memory (mind you that was in a different case, with 12 Whole Genome gVCFs). These current gVCFs are up to about 9MB

  • Also.. every time it mentions clock drift, the time to completion leaps up:

    INFO 13:06:30,421 ProgressMeter - chr1:44001 0.0 3.0 h 15250.3 w 2.0% 6.4 d 6.3 d
    INFO 14:38:28,521 ProgressMeter - chr1:44001 0.0 3.1 h 15250.3 w 2.0% 6.6 d 6.5 d
    WARN 16:24:14,924 SimpleTimer - Clock drift of -1,425,286,806,691,525,163 - -1,425,286,778,358,860,637 = 28,332,664,526 nanoseconds detected, vs. max allowable drift of 5,000,000,000. Assuming checkpoint/restart event.
    WARN 18:23:08,307 SimpleTimer - Clock drift of -1,425,286,778,358,827,793 - -1,425,286,806,691,525,163 = 28,332,697,370 nanoseconds detected, vs. max allowable drift of 5,000,000,000. Assuming checkpoint/restart event.
    INFO 22:55:24,502 ProgressMeter - chr1:44001 0.0 0.0 s 0.0 s 2.0% 0.0 s 0.0 s
    INFO 02:46:17,060 ProgressMeter - chr1:44001 0.0 14.1 h 15250.3 w 2.0% 4.3 w 4.2 w
    INFO 05:25:41,021 ProgressMeter - chr1:44001 0.0 17.7 h 15250.3 w 2.0% 5.4 w 5.3 w
    INFO 09:09:02,166 ProgressMeter - chr1:44001 0.0 20.1 h 15250.3 w 2.0% 6.1 w 6.0 w
    INFO 13:12:56,610 ProgressMeter - chr1:44001 0.0 24.1 h 15250.3 w 2.0% 7.4 w 7.2 w
    INFO 16:48:53,884 ProgressMeter - chr1:44001 0.0 28.3 h 15250.3 w 2.0% 8.6 w 8.5 w
    WARN 19:04:18,511 SimpleTimer - Clock drift of -1,425,286,804,341,420,017 - -1,425,286,778,358,336,828 = 25,983,083,189 nanoseconds detected, vs. max allowable drift of 5,000,000,000. Assuming checkpoint/restart event.
    WARN 21:17:24,787 SimpleTimer - Clock drift of -1,425,286,778,358,492,210 - -1,425,286,804,341,420,017 = 25,982,927,807 nanoseconds detected, vs. max allowable drift of 5,000,000,000. Assuming checkpoint/restart event.
    INFO 01:12:20,861 ProgressMeter - chr1:44001 0.0 0.0 s 0.0 s 2.0% 0.0 s 0.0 s
    INFO 05:19:36,876 ProgressMeter - chr1:44001 0.0 40.1 h 15250.3 w 2.0% 12.2 w 12.0 w
    INFO 10:14:20,374 ProgressMeter - chr1:44001 0.0 44.0 h 15250.3 w 2.0% 13.4 w 13.1 w
    WARN 11:32:21,432 SimpleTimer - Clock drift of -1,425,286,805,643,879,470 - -1,425,286,778,358,732,118 = 27,285,147,352 nanoseconds detected, vs. max allowable drift of 5,000,000,000. Assuming checkpoint/restart event.
    WARN 13:39:21,772 SimpleTimer - Clock drift of -1,425,286,778,358,152,527 - -1,425,286,805,643,879,470 = 27,285,726,943 nanoseconds detected, vs. max allowable drift of 5,000,000,000. Assuming checkpoint/restart event.
    INFO 19:11:50,065 ProgressMeter - chr1:44001 0.0 0.0 s 0.0 s 2.0% 0.0 s 0.0 s
    INFO 00:38:05,516 ProgressMeter - chr1:44001 0.0 58.6 h 15250.3 w 2.0% 17.9 w 17.5 w
    WARN 02:07:55,808 SimpleTimer - Clock drift of -1,425,286,804,670,199,571 - -1,425,286,778,358,789,097 = 26,311,410,474 nanoseconds detected, vs. max allowable drift of 5,000,000,000. Assuming checkpoint/restart event.
    WARN 04:45:09,216 SimpleTimer - Clock drift of -1,425,286,778,358,841,693 - -1,425,286,804,670,199,571 = 26,311,357,878 nanoseconds detected, vs. max allowable drift of 5,000,000,000. Assuming checkpoint/restart event.
    INFO 09:18:18,974 ProgressMeter - chr1:44001 0.0 0.0 s 0.0 s 2.0% 0.0 s 0.0 s
    INFO 13:49:21,288 ProgressMeter - chr1:44001 0.0 72.4 h 15250.3 w 2.0% 22.1 w 21.6 w
    INFO 16:35:06,931 ProgressMeter - chr1:44001 0.0 76.2 h 15250.3 w 2.0% 23.2 w 22.8 w

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    This sounds like an issue with your computing platform. I would recommend asking your IT support to have a look.

  • I've done that, but they can't find any problem at our end.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    edited March 2015

    Well, on the bright side I don't think this is something you need to worry about as long as you use only one thread. From the SimpleTimer code documentation comments:

    A useful simple system for timing code with nano second resolution
     *
     * Note that this code is not thread-safe.  If you have a single timer
     * being started and stopped by multiple threads you will need to protect the
     * calls to avoid meaningless results of having multiple starts and stops
     * called sequentially.
     *
     * This timer has been modified to provide better semantics for dealing with
     * system-level checkpoint and restarting. Such events can cause the internal JVM
     * clock to be reset, breaking timings based upon it. Whilst this is difficult to
     * counter without getting explicit notice of checkpoint events, we try to moderate
     * the symptoms through tracking the offset between the system clock and the JVM clock.
     * If this offset grows drastically (greater than CLOCK_DRIFT), we infer a JVM restart
     * and reset the timer.
    

    In the genotyper, the timer is only used to log the time, it has no effect on the calculations. So I believe it's safe for you to ignore these warnings.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭
    edited July 2017

    How about PrintReads?
    I am getting a similar warning during PrintReads stage after BQSR. I am using -nct8. My drifts do not seem to be as drastic as the ones above and I only get once however do you think this would affect the results?

    Thanks.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @SkyWarrior
    Hi,

    Are you using the latest version? What happens if you don't use multi-threading? I think you should be fine if you don't get an error message, just the WARN messages.

    -Sheila

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    This was in version 3.70. Seems that even those warnings are gone with version 3.80 may be switching to native inflater and deflater libraries reduced the overhead coming from java threads. I am still using 4 threads -nct 4 but seems that those messages are long gone after 3.80.

    I was using my homebrew solution for collection level parallelism (I made up that sorry. I just run 4 workflows at once instead of running one workflow at a time. This is orchestrated by nested bash scripts of my own mostly) for faster exome analysis and these messages showed up at that time with version 3.7.

    Those messages were only warning messages and I noticed that if the drift is too high then timer is reset but results don't seem to be affected by that.

    The system that I use is a i9 7900x 128GB memory. I use 4 threads per workflow.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @SkyWarrior
    Hi,

    Okay. Glad to hear 3.8 works better.

    -Sheila

Sign In or Register to comment.