Attention:
The frontline support team will be slow on the forum because we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and more available to answer questions on the forum on March 25th 2019.

GenotypeGVCFs remaining runtime estimate incorrect

tommycarstensentommycarstensen United KingdomMember ✭✭✭

I have noticed when using the GenotypeGVCFs walker (version 3.2), that the remaining runtime estimate is very poor. The estimate from CombineGVCFs and other walkers on the other hand is very accurate. This is not a critical bug. It is rather a feature enhancement. I just noticed that the "completed" percentage is also incorrect. It does not start at 0%. In fact it has stayed constant after walking over 2 million base pairs of 2000 samples in 2 hours. I am not using multi threading. Not that important to me, but I thought I would let you know.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I see we're in for a flurry of posts before you head out on your vacation ;)

    Would you say this incorrectness is new in 3.2 or is it just something you happened to notice now?

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited July 2014

    Just trying to set up a pipeline in less than a week to keep CPUs busy, while I'm gone ;) The incorrectness could have been present in 3.0 and 3.1. I haven't checked. Will do so.

    Here head and tail:

    INFO  22:09:41,775 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  22:09:41,775 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
    INFO  22:09:41,776 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
    INFO  22:10:11,800 ProgressMeter -        20:65601         0.0    30.0 s      49.6 w       86.7%    34.0 s       4.0 s 
    INFO  22:10:41,863 ProgressMeter -        20:73601         0.0    60.0 s      99.4 w       86.7%    69.0 s       9.0 s 
    INFO  08:07:11,798 ProgressMeter -      20:9792601   9000000.0    10.0 h      66.4 m       87.0%    11.5 h      89.6 m 
    INFO  08:07:41,808 ProgressMeter -      20:9801401   9000000.0    10.0 h      66.4 m       87.0%    11.5 h      89.7 m 
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, fair enough :)

    OK, so it seems this is indeed a thing (not that I doubted you) -- the reason this is happening is that the process starts out very slowly, so the estimate is overly long to start with, then as it proceeds the estimate gets more accurate and number decreases. Over a short amount of intervals you don't see it because the averaging effect doesn't have a chance to kick in, but over a longer amount it should. We'll see if we can get the estimate to be better from the start, but to be honest this is going to be pretty low priority.

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    Geraldine, this is extremely low priority as I said at the beginning of the thread. Thanks for bothering looking into it.

    While most GATK walkers take a while to accurately calculate the remaining runtime, this is not quite the problem with GenotypeGCVFs. My process has been running for nearly 48 hours and the remaining runtime is still wrong. I am running on human chromosome 20 (63Mbp). If it took ~48 hours to do ~48MBp, then ~15 hours should remain. The current estimate is 6 hours. After ~10 hours and ~10Mbp it was estimated that the remaining runtime was 90 minutes. Here the latest estimate after 45.5h:

    INFO  22:09:41,775 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
    INFO  22:09:41,776 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
    INFO  19:42:24,353 ProgressMeter -     20:48567501       4.8E7    45.5 h      56.9 m       88.2%    51.6 h       6.1 h
    
Sign In or Register to comment.