Attention:
The frontline support team will be slow on the forum because we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and more available to answer questions on the forum on March 25th 2019.

GenotypeGVCFs run time increases as time passes

Hi,

I'm genotyping only one individual by GenotypeGVCFs and I'm writing the output for all sites, variant or non-variant. The gvcf.gz file is 7.2 Gb and the problem below occurs when the file has reached something around 3.5 Gb.

This is the command I'm using:

java -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -nt 2 --includeNonVariantSites -R $ref --variant SRR9457_26_27_28_raw_variants_allsites.g.vcf.gz -o SRR_26_27_28.genotped.vcf_v2.gz

It works fine, that is an estimated run time is associated to it in the beginning and as time passes, the remaining run time decreases but as it reaches to about 45%, the time starts to increase:

INFO  14:32:10,599 ProgressMeter - NW_009244866.1:408001   5.20130924E8    20.5 h       2.4 m       45.2%    45.3 h      24.8 h 
INFO  14:33:32,751 ProgressMeter - NW_009244866.1:475301   5.20130924E8    20.5 h       2.4 m       45.2%    45.3 h      24.8 h 
INFO  14:34:53,693 ProgressMeter - NW_009244870.1:60501   5.21135395E8    20.5 h       2.4 m       45.2%    45.3 h      24.8 h 
INFO  14:36:13,805 ProgressMeter - NW_009244870.1:127201   5.21135395E8    20.5 h       2.4 m       45.3%    45.4 h      24.8 h 
INFO  14:37:36,894 ProgressMeter - NW_009244870.1:193601   5.21135395E8    20.6 h       2.4 m       45.3%    45.4 h      24.9 h 
INFO  14:38:59,544 ProgressMeter - NW_009244870.1:260501   5.21135395E8    20.6 h       2.4 m       45.3%    45.5 h      24.9 h 
INFO  14:40:22,072 ProgressMeter - NW_009244870.1:326901   5.21135395E8    20.6 h       2.4 m       45.3%    45.5 h      24.9 h 
INFO  14:41:45,723 ProgressMeter - NW_009244870.1:393401   5.21135395E8    20.6 h       2.4 m       45.3%    45.6 h      24.9 h 
INFO  14:43:10,946 ProgressMeter - NW_009244871.1:1001   5.22021688E8    20.7 h       2.4 m       45.3%    45.6 h      25.0 h 
INFO  14:45:06,608 ProgressMeter - NW_009244871.1:1001   5.22021688E8    20.7 h       2.4 m       45.3%    45.7 h      25.0 h 
INFO  14:46:34,759 ProgressMeter - NW_009244871.1:1001   5.22021688E8    20.7 h       2.4 m       45.3%    45.7 h      25.0 h 
INFO  14:48:27,197 ProgressMeter - NW_009244871.1:1001   5.22021688E8    20.7 h       2.4 m       45.3%    45.8 h      25.1 h 
INFO  14:49:51,726 ProgressMeter - NW_009244871.1:1001   5.22021688E8    20.8 h       2.4 m       45.3%    45.9 h      25.1 h 
INFO  14:51:17,616 ProgressMeter - NW_009244871.1:1001   5.22021688E8    20.8 h       2.4 m       45.3%    45.9 h      25.1 h 
INFO  14:52:43,099 ProgressMeter - NW_009244871.1:1001   5.22021688E8    20.8 h       2.4 m       45.3%    46.0 h      25.1 h 
INFO  14:54:09,153 ProgressMeter - NW_009244871.1:1001   5.22021688E8    20.8 h       2.4 m       45.3%    46.0 h      25.2 h 
INFO  14:55:31,464 ProgressMeter - NW_009244871.1:1001   5.22021688E8    20.9 h       2.4 m       45.3%    46.1 h      25.2 h 
INFO  14:57:04,755 ProgressMeter - NW_009244871.1:1001   5.22021688E8    20.9 h       2.4 m       45.3%    46.1 h      25.2 h 
INFO  14:58:32,594 ProgressMeter - NW_009244871.1:1001   5.22021688E8    20.9 h       2.4 m       45.3%    46.2 h      25.3 h 

I wonder what could cause this issue?

Best,
Homa

Tagged:

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭
    edited April 2018

    What is your OS?

  • HomapHomap Member
    edited April 2018

    I'm running it on a Linux cluster and have given it 2 nodes.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Homap
    Hi,

    Can you try adding --useNewAFCalculator to your command? People have reported that it helps.

    -Sheila

  • HomapHomap Member

    @Sheila

    Thanks! I just added and started a new run! Will inform you about the outcome!

  • HomapHomap Member

    Hi @Sheila

    The problem is still there. This is the command I'm using giving it 1 node:

    java -Xmx22g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs --useNewAFCalculator --includeNonVariantSites -R $ref --variant SRR9457_26_27_28_raw_variants_allsites.g.vcf.gz -o SRR_26_27_28.genotped.vcf_v6.gz
    

    Once it reaches around 30%, the time starts to increase. You can see below. This started with the total run time of 29 hours but you see how it has become now:

    INFO  09:52:28,343 ProgressMeter - NW_009244913.1:1679201   5.64211942E8    41.2 h       4.4 m       49.0%    84.2 h      43.0 h 
    INFO  09:53:56,601 ProgressMeter - NW_009244913.1:1719001   5.64211942E8    41.2 h       4.4 m       49.0%    84.2 h      43.0 h 
    INFO  09:54:59,143 ProgressMeter - NW_009244913.1:1758301   5.64211942E8    41.2 h       4.4 m       49.0%    84.2 h      43.0 h 
    INFO  09:57:02,540 ProgressMeter - NW_009244913.1:1837301   5.64211942E8    41.3 h       4.4 m       49.0%    84.3 h      43.0 h 
    INFO  09:58:30,721 ProgressMeter - NW_009244913.1:1876501   5.64211942E8    41.3 h       4.4 m       49.0%    84.3 h      43.0 h 
    INFO  09:59:35,176 ProgressMeter - NW_009244913.1:1915801   5.64211942E8    41.3 h       4.4 m       49.0%    84.4 h      43.0 h 
    INFO  10:00:38,398 ProgressMeter - NW_009244913.1:1955001   5.64211942E8    41.3 h       4.4 m       49.0%    84.4 h      43.1 h 
    INFO  10:01:48,825 ProgressMeter - NW_009244913.1:1994001   5.64211942E8    41.4 h       4.4 m       49.0%    84.4 h      43.1 h 
    INFO  10:02:51,225 ProgressMeter - NW_009244913.1:2033101   5.65211942E8    41.4 h       4.4 m       49.0%    84.5 h      43.1 h 
    INFO  10:04:28,172 ProgressMeter - NW_009244913.1:2073201   5.65211942E8    41.4 h       4.4 m       49.0%    84.5 h      43.1 h 
    INFO  10:06:24,176 ProgressMeter - NW_009244913.1:2151401   5.65211942E8    41.4 h       4.4 m       49.0%    84.6 h      43.1 h 
    INFO  10:07:25,670 ProgressMeter - NW_009244913.1:2190501   5.65211942E8    41.4 h       4.4 m       49.0%    84.6 h      43.1 h 
    INFO  10:09:17,045 ProgressMeter - NW_009244913.1:2268201   5.65211942E8    41.5 h       4.4 m       49.0%    84.6 h      43.2 h 
    INFO  10:10:42,703 ProgressMeter - NW_009244913.1:2306501   5.65211942E8    41.5 h       4.4 m       49.0%    84.7 h      43.2 h 
    INFO  10:11:52,081 ProgressMeter - NW_009244913.1:2344801   5.65211942E8    41.5 h       4.4 m       49.0%    84.7 h      43.2 h 
    INFO  10:12:57,097 ProgressMeter - NW_009244913.1:2384101   5.65211942E8    41.5 h       4.4 m       49.0%    84.8 h      43.2 h 
    

    I don't know anymore. Maybe it is because I'm outputting all sites, could this be a reason?

    Thanks,
    Homa

  • HomapHomap Member

    @Sheila

    I removed --includeNonVariantSites and it is working as expected. The reason I included the nonvariant sites as well is because I want to filter them based on coverage as well and genotypeVCF gives me a genotype quality for those as well. But I guess I can only filter based on read depth as it seems the genotypeVCF can never finish when used for all sites.

    Homa

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Homap
    Hi Homa,

    Yes, including non-variant sites is pretty compute heavy. Perhaps you can try running on the cloud and seeing if that helps? We are giving away free credits to try out FireCloud :smiley: https://software.broadinstitute.org/firecloud/documentation/freecredits.php

    -Sheila

Sign In or Register to comment.