To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

GenotypeGVCF on pooled data running out of memory despite providing 512GB to Java

Dear GATK staff,

I am doing SNP calling with GATK 3.8 on whole genome sequences of 12 pools (50 diploid individuals in each pool, genome size ~900 Mbp) of a non-model organism with scaffolded reference genome (114K fragments now stitched into 94 super-scaffolds).
First, I ran HaplotypeCaller in –ERC GVCF mode for each pool sample and super_scaffold separately (i.e. scatter by scaffold). An example of the commands used for "sample1" in "super_scaffold8" is shown below (I chose ploidy 10 as this seems to be the maximum number working for other Poolseq users, and I also modified the number of PLs to 10000 as in the first runs I got warning messages saying that the default maximum number of PLs of 100 was too low):

java -Djava.io.tmpdir=/path/tmp/ -XX:ParallelGCThreads=1 -Dsamjdk.use_async_io=true -Dsamjdk.buffer_size=4194304 -Xmx8g -jar /path/GATK/3.8.0/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R /path/Stitched_Ref_genome/ref.fasta \
-I /path/BAMs-SG/sample1.PoolSeq.sorted.MarkDup.RG.bam \
-L Super_Scaffold8 \
-ERC GVCF \
-ploidy 10 \
-mbq 20 \
-minPruning 5 \
-maxNumPLValues 10000 \
--read_filter OverclippedRead \
-o /path/GATK_results/sample1.Super_Scaffold8.raw.g.vcf

Second, I obtained a single gVCF file per pool sample by merging the gVCF files of each of the 94 super_scaffolds of a given pool sample.
Third, I ran GenotypeGVCF on the cohort of gVCFS (12) for each super_scaffold separately (i.e. scatter by scaffold), setting the maximum number of alternative alleles to 3, using the newQUAL and setting the maximum number of PL values to 700000 (as this was the maximum number of PLs observed in some test runs where I obtained warning messages that -maxNumPLValues 10000 was too low). An example of the commands used for the 12 gVCFS on "super_scaffold8" is shown below (-Xmx 18g):

java -Djava.io.tmpdir=/path/tmp/ -XX:ParallelGCThreads=1 -Dsamjdk.use_async_io=true -Dsamjdk.buffer_size=4194304 -Xmx18g -jar /path/GATK/3.8.0/GenomeAnalysisTK.jar \
-T GenotypeGVCFs \
-R /path/Stitched_Ref_genome/ref.fasta \
-V /path/GVCFs/sample1.94.Super_Scaffolds.raw.g.vcf \
-V /path/GVCFs/sample2.94.Super_Scaffolds.raw.g.vcf \
-V /path/GVCFs/sample3.94.Super_Scaffolds.raw.g.vcf \
-V /path/GVCFs/sample4.94.Super_Scaffolds.raw.g.vcf \
-V /path/GVCFs/sample5.94.Super_Scaffolds.raw.g.vcf \
-V /path/GVCFs/sample6.94.Super_Scaffolds.raw.g.vcf \
-V /path/GVCFs/sample7.94.Super_Scaffolds.raw.g.vcf \
-V /path/GVCFs/sample8.94.Super_Scaffolds.raw.g.vcf \
-V /path/GVCFs/sample9.94.Super_Scaffolds.raw.g.vcf \
-V /path/GVCFs/sample10.94.Super_Scaffolds.raw.g.vcf \
-V /path/GVCFs/sample11.94.Super_Scaffolds.raw.g.vcf \
-V /path/GVCFs/sample12.94.Super_Scaffolds.raw.g.vcf \
-L Super_Scaffold8 \
-maxAltAlleles 3 \
-newQual \
-maxNumPLValues 700000 \
-o /path/GATK_results/12pops/12.pops.Super_Scaffold0.raw.SNPs-indels.vcf

I got this ERROR message:

INFO  17:15:24,838 HelpFormatter - ---------------------------------------------------------------------------------- 
INFO  17:15:24,842 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50 
INFO  17:15:24,843 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
INFO  17:15:24,843 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
INFO  17:15:24,843 HelpFormatter - [Thu Aug 24 17:15:24 EDT 2017] Executing on Linux 2.6.32-642.6.2.el6.x86_64 amd64 
INFO  17:15:24,843 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_74-b02 
INFO  17:15:24,849 HelpFormatter - Program Args: -T GenotypeGVCFs -R /path/ref.fasta -V /path/GVCFs/sample1.94.Super_Scaffolds.raw.g.vcf -V /path/GVCFs/sample2.Super_Scaffolds.raw.g.vcf -V /path/GVCFs/sample3.94.Super_Scaffolds.raw.g.vcf -V /path/GVCFs/sample4.Super_Scaffolds.raw.g.vcf -V /path/GVCFs/sample5.Super_Scaffolds.raw.g.vcf -V /path/GVCFs/sample6.94.Super_Scaffolds.raw.g.vcf -V /path/GVCFs/sample7.94.Super_Scaffolds.raw.g.vcf -V /path/GVCFs/sample8.94.Super_Scaffolds.raw.g.vcf -V /path/GVCFs/sample9.94.Super_Scaffolds.raw.g.vcf -V /path/GVCFs/sample10.94.Super_Scaffolds.raw.g.vcf -V /path/GVCFs/sample11.Super_Scaffolds.raw.g.vcf -V /path/GVCFs/sample12.Super_Scaffolds.raw.g.vcf -L Super_Scaffold8 -maxAltAlleles 3 -newQual -maxNumPLValues 10000 -o /path/GATK_results/12pops/12.pops.Super_Scaffold8.raw.SNPs-indels.vcf 
INFO  17:15:24,859 HelpFormatter - Executing as xxxxxxx on Linux 2.6.32-642.6.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_74-b02. 
INFO  17:15:24,860 HelpFormatter - Date/Time: 2017/08/24 17:15:24 
INFO  17:15:24,860 HelpFormatter - ---------------------------------------------------------------------------------- 
INFO  17:15:24,860 HelpFormatter - ---------------------------------------------------------------------------------- 
ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/cvmfs/path/GATK/3.8.0/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
INFO  17:15:25,100 GenomeAnalysisEngine - Deflater: JdkDeflater 
INFO  17:15:25,100 GenomeAnalysisEngine - Inflater: JdkInflater 
INFO  17:15:25,101 GenomeAnalysisEngine - Strictness is SILENT 
INFO  17:15:25,272 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
INFO  17:16:18,502 IntervalUtils - Processing 12508256 bp from intervals 
INFO  17:16:18,615 GenomeAnalysisEngine - Preparing for traversal 
INFO  17:16:18,616 GenomeAnalysisEngine - Done preparing for traversal 
INFO  17:16:18,617 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  17:16:18,617 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
INFO  17:16:18,618 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
WARN  17:16:20,705 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
WARN  17:16:20,707 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
INFO  17:16:20,707 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files 
WARN  17:16:21,841 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not GenotypeGVCFs 
INFO  17:16:48,638 ProgressMeter - Super_Scaffold8:197901         0.0    30.0 s      49.6 w        1.6%    31.6 m      31.1 m 
INFO  17:17:18,640 ProgressMeter - Super_Scaffold8:198001         0.0    60.0 s      99.2 w        1.6%    63.2 m      62.2 m 
INFO  17:17:48,642 ProgressMeter - Super_Scaffold8:198001         0.0    90.0 s     148.9 w        1.6%    94.8 m      93.3 m 
INFO  17:18:18,643 ProgressMeter - Super_Scaffold8:198001         0.0   120.0 s     198.5 w        1.6%     2.1 h       2.1 h 
INFO  17:18:48,644 ProgressMeter - Super_Scaffold8:198001         0.0     2.5 m     248.1 w        1.6%     2.6 h       2.6 h 
INFO  17:19:18,646 ProgressMeter - Super_Scaffold8:198001         0.0     3.0 m     297.7 w        1.6%     3.2 h       3.1 h 
INFO  17:19:48,647 ProgressMeter - Super_Scaffold8:198001         0.0     3.5 m     347.3 w        1.6%     3.7 h       3.6 h 
INFO  17:20:18,649 ProgressMeter - Super_Scaffold8:198001         0.0     4.0 m     396.9 w        1.6%     4.2 h       4.1 h 
INFO  17:21:02,832 ProgressMeter - Super_Scaffold8:198001         0.0     4.7 m     469.9 w        1.6%     5.0 h       4.9 h 
INFO  17:21:34,999 ProgressMeter - Super_Scaffold8:198001         0.0     5.3 m     523.1 w        1.6%     5.5 h       5.5 h 
INFO  17:22:07,210 ProgressMeter - Super_Scaffold8:198001         0.0     5.8 m     576.4 w        1.6%     6.1 h       6.0 h 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: An error occurred because you did not provide enough memory to run this program. You can use the -Xmx argument (before the -jar argument) to adjust the maximum heap size provided to Java. Note that this is a JVM argument, not a GATK argument.
##### ERROR ------------------------------------------------------------------------------------------

Following the instructions of the ERROR message, I set the argument -Xmx to 32g and run the program again, then tried 256g and finally 512g, always obtaining the same error message. I understand this problem is not of GATK but of Java, however, I don't have access to larger memory resources than 512GB RAM.

Thus, I was wondering if you could please indicate me how to reduce the demand of memory by GATK when running GenotypeGVCF, hopefully not compromising sensitivity of SNP calling on pool data (e.g. I would prefer keeping ploidy equals to 10).

Which parameters and in which step (HaplotypeCaller or GenotyGVCFs) would you recommend to make the changes? In pool data I am mostly interested on obtaining read counts per allele, not genotypes (as these are not individuals). I was considering reducing -maxNumPLValues to 10000 but then I would have PLs not being calculated when running GenotypeGVCF...but I am not sure how this may affect SNP calling.

Thanks very much for any help!

Best Answers

Answers

  • Hi @Sheila
    Thanks very much for your prompt response. I will keep an eye on that thread to get to know when the issue is fixed.

    I was wondering whether I may need to re-run all the analysis (HaplotypeCaller and GenotypeGVCF) with the nightly build of GATK 3.8 that would include the fix, or I would just need to repeat the GenotypeGVCF part (using as input the gVCF files generated with the stable version of GATK 3.8).

    Also, would you recommend me to conduct this analysis with GATK 3.8, or should I switch to GATK4 now? My concern on doing it is that there are two bugs related to GentypeGVCF in GATK4 that are under review. I need to obtain results ASAP but would prefer to use the most stable GATK version to date.

    Thanks very much for your help!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @apfuentes
    Hi,

    You would just have to re-run the GenotypeGVCFs step. There should be no major changes that make 3.8 stable and 3.8 nightly incompatible :smile:

    GATK4 is still in beta, although we are recommending people test it out. If you need the results ASAP, you should stick with GATK3 as the stable non-beta version.

    -Sheila

  • @Sheila
    Thanks for your response!

    I was wondering if I should repeat the HaplotypeCaller step because I obtained the same error message I got with GenotypeGVCF:

    ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/cvmfs/bioinformatics.usherbrooke.ca/GATK/3.8.0/GenomeAnalysisTK.jar!/META-INF/log4j-p
    ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
    ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
    

    However, after each run of the HaplotypeCaller I did obtain a gVCF file and the message Done. --------------------------- in STDOUT. Therefore, could I assume the gVCF files obtained despite the StatusLogger error message are fine?

    Thanks!

  • Thanks so much for working on the fix. Much appreciated!

  • Dear @Geraldine_VdAuwera,
    Just wondering, has the fix for the logger error been implement in the latest nightly build?

    Thanks

  • Dear @Sheila and @Geraldine_VdAuwera

    I am still having the memory problem when running GenotypeGVCF in Poolseq data (error message below), despite using GATK v.3.8 nightly build from September 12, 2017 (2017-09-12-g315c945) that includes the fix for the hanging and logger issues (as mentioned here):

    INFO  11:06:16,543 HelpFormatter - Executing as afuentes@cp0405.m on Linux 2.6.32-642.6.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_74-b02. 
    INFO  11:06:16,544 HelpFormatter - Date/Time: 2017/09/18 11:06:16 
    INFO  11:06:16,544 HelpFormatter - --------------------------------------------------------------------------------------------- 
    INFO  11:06:16,544 HelpFormatter - --------------------------------------------------------------------------------------------- 
    INFO  11:06:16,768 NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/path/GATK/2017_09_12/GenomeAnalysisTK.jar!/com/intel/gkl/native/libgkl_compression.so 
    INFO  11:06:16,801 GenomeAnalysisEngine - Deflater: IntelDeflater 
    INFO  11:06:16,802 GenomeAnalysisEngine - Inflater: IntelInflater 
    INFO  11:06:16,802 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  11:06:17,043 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
    INFO  11:07:08,975 IntervalUtils - Processing 12289464 bp from intervals 
    INFO  11:07:14,788 GenomeAnalysisEngine - Preparing for traversal 
    INFO  11:07:14,789 GenomeAnalysisEngine - Done preparing for traversal 
    INFO  11:07:14,790 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  11:07:14,791 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
    INFO  11:07:14,791 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
    WARN  11:07:14,994 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
    WARN  11:07:14,995 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
    INFO  11:07:14,995 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files 
    WARN  11:07:17,491 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not GenotypeGVCFs 
    INFO  11:07:44,796 ProgressMeter - Super_Scaffold61:154401         0.0    30.0 s      49.6 w        1.3%    39.8 m      39.3 m 
    INFO  11:08:14,801 ProgressMeter - Super_Scaffold61:298601         0.0    60.0 s      99.2 w        2.4%    41.2 m      40.2 m 
    INFO  11:08:44,802 ProgressMeter - Super_Scaffold61:434301         0.0    90.0 s     148.8 w        3.5%    42.4 m      40.9 m 
    INFO  11:09:14,803 ProgressMeter - Super_Scaffold61:496101         0.0   120.0 s     198.4 w        4.0%    49.5 m      47.5 m 
    INFO  11:09:44,805 ProgressMeter - Super_Scaffold61:654801         0.0     2.5 m     248.0 w        5.3%    46.9 m      44.4 m 
    INFO  11:10:14,806 ProgressMeter - Super_Scaffold61:675901         0.0     3.0 m     297.6 w        5.5%    54.5 m      51.5 m 
    INFO  11:10:44,807 ProgressMeter - Super_Scaffold61:745601         0.0     3.5 m     347.3 w        6.1%    57.7 m      54.2 m 
    INFO  11:11:14,861 ProgressMeter - Super_Scaffold61:862401         0.0     4.0 m     396.9 w        7.0%    57.0 m      53.0 m 
    INFO  11:11:44,866 ProgressMeter - Super_Scaffold61:1007501   1000000.0     4.5 m       4.5 m        8.2%    54.9 m      50.4 m 
    INFO  11:12:14,867 ProgressMeter - Super_Scaffold61:1237401   1000000.0     5.0 m       5.0 m       10.1%    49.7 m      44.7 m 
    INFO  11:12:44,868 ProgressMeter - Super_Scaffold61:1263801   1000000.0     5.5 m       5.5 m       10.3%    53.5 m      48.0 m 
    INFO  11:13:14,869 ProgressMeter - Super_Scaffold61:1331101   1000000.0     6.0 m       6.0 m       10.8%    55.4 m      49.4 m 
    INFO  11:13:44,871 ProgressMeter - Super_Scaffold61:1491801   1000000.0     6.5 m       6.5 m       12.1%    53.5 m      47.0 m 
    INFO  11:14:14,872 ProgressMeter - Super_Scaffold61:1695101   1000000.0     7.0 m       7.0 m       13.8%    50.7 m      43.7 m 
    INFO  11:14:44,873 ProgressMeter - Super_Scaffold61:1935201   1000000.0     7.5 m       7.5 m       15.7%    47.6 m      40.1 m 
    INFO  11:15:14,874 ProgressMeter - Super_Scaffold61:1935201   1000000.0     8.0 m       8.0 m       15.7%    50.8 m      42.8 m 
    INFO  11:15:44,875 ProgressMeter - Super_Scaffold61:1935201   1000000.0     8.5 m       8.5 m       15.7%    54.0 m      45.5 m 
    INFO  11:16:18,018 ProgressMeter - Super_Scaffold61:1935201   1000000.0     9.1 m       9.1 m       15.7%    57.5 m      48.4 m 
    INFO  11:16:48,019 ProgressMeter - Super_Scaffold61:1935201   1000000.0     9.6 m       9.6 m       15.7%    60.6 m      51.1 m 
    INFO  11:17:18,020 ProgressMeter - Super_Scaffold61:1935201   1000000.0    10.1 m      10.1 m       15.7%    63.8 m      53.8 m 
    INFO  11:17:48,022 ProgressMeter - Super_Scaffold61:1935201   1000000.0    10.6 m      10.6 m       15.7%    67.0 m      56.4 m 
    INFO  11:18:18,023 ProgressMeter - Super_Scaffold61:1935201   1000000.0    11.1 m      11.1 m       15.7%    70.2 m      59.1 m 
    INFO  11:18:48,024 ProgressMeter - Super_Scaffold61:1935201   1000000.0    11.6 m      11.6 m       15.7%    73.3 m      61.8 m 
    INFO  11:19:18,025 ProgressMeter - Super_Scaffold61:1935201   1000000.0    12.1 m      12.1 m       15.7%    76.5 m      64.5 m 
    INFO  11:19:48,027 ProgressMeter - Super_Scaffold61:1935201   1000000.0    12.6 m      12.6 m       15.7%    79.7 m      67.1 m 
    INFO  11:20:18,028 ProgressMeter - Super_Scaffold61:1935201   1000000.0    13.1 m      13.1 m       15.7%    82.9 m      69.8 m 
    INFO  11:20:48,029 ProgressMeter - Super_Scaffold61:1935201   1000000.0    13.6 m      13.6 m       15.7%    86.0 m      72.5 m 
    INFO  11:21:18,030 ProgressMeter - Super_Scaffold61:1935201   1000000.0    14.1 m      14.1 m       15.7%    89.2 m      75.2 m 
    INFO  11:21:48,031 ProgressMeter - Super_Scaffold61:1935201   1000000.0    14.6 m      14.6 m       15.7%    92.4 m      77.8 m 
    INFO  11:22:18,033 ProgressMeter - Super_Scaffold61:1935201   1000000.0    15.1 m      15.1 m       15.7%    95.6 m      80.5 m 
    INFO  11:22:48,034 ProgressMeter - Super_Scaffold61:1935201   1000000.0    15.6 m      15.6 m       15.7%    98.8 m      83.2 m 
    INFO  11:23:18,035 ProgressMeter - Super_Scaffold61:1935201   1000000.0    16.1 m      16.1 m       15.7%   101.9 m      85.9 m 
    INFO  11:23:48,036 ProgressMeter - Super_Scaffold61:1939301   1000000.0    16.6 m      16.6 m       15.8%   104.9 m      88.3 m 
    INFO  11:24:21,910 ProgressMeter - Super_Scaffold61:2034101   2000000.0    17.1 m       8.6 m       16.6%   103.4 m      86.3 m 
    INFO  11:24:51,911 ProgressMeter - Super_Scaffold61:2034101   2000000.0    17.6 m       8.8 m       16.6%   106.4 m      88.8 m 
    INFO  11:25:21,912 ProgressMeter - Super_Scaffold61:2139201   2000000.0    18.1 m       9.1 m       17.4%   104.1 m      86.0 m 
    INFO  11:25:51,913 ProgressMeter - Super_Scaffold61:2206201   2000000.0    18.6 m       9.3 m       18.0%   103.7 m      85.1 m 
    INFO  11:26:21,914 ProgressMeter - Super_Scaffold61:2206201   2000000.0    19.1 m       9.6 m       18.0%   106.5 m      87.4 m 
    INFO  11:26:51,916 ProgressMeter - Super_Scaffold61:2225101   2000000.0    19.6 m       9.8 m       18.1%   108.3 m      88.7 m 
    INFO  11:27:21,918 ProgressMeter - Super_Scaffold61:2266301   2000000.0    20.1 m      10.1 m       18.4%   109.1 m      89.0 m 
    INFO  11:27:51,919 ProgressMeter - Super_Scaffold61:2357901   2000000.0    20.6 m      10.3 m       19.2%   107.5 m      86.8 m 
    INFO  11:28:21,920 ProgressMeter - Super_Scaffold61:2376201   2000000.0    21.1 m      10.6 m       19.3%   109.2 m      88.1 m 
    INFO  11:28:51,922 ProgressMeter - Super_Scaffold61:2429501   2000000.0    21.6 m      10.8 m       19.8%   109.3 m      87.7 m 
    INFO  11:29:21,923 ProgressMeter - Super_Scaffold61:2429501   2000000.0    22.1 m      11.1 m       19.8%   111.9 m      89.8 m 
    INFO  11:30:09,615 ProgressMeter - Super_Scaffold61:2429501   2000000.0    22.9 m      11.5 m       19.8%   115.8 m      92.9 m 
    INFO  11:30:47,058 ProgressMeter - Super_Scaffold61:2429501   2000000.0    23.5 m      11.8 m       19.8%   119.0 m      95.5 m 
    INFO  11:31:17,059 ProgressMeter - Super_Scaffold61:2429501   2000000.0    24.0 m      12.0 m       19.8%     2.0 h      97.5 m 
    INFO  11:31:47,061 ProgressMeter - Super_Scaffold61:2429501   2000000.0    24.5 m      12.3 m       19.8%     2.1 h      99.6 m 
    INFO  11:32:17,062 ProgressMeter - Super_Scaffold61:2429501   2000000.0    25.0 m      12.5 m       19.8%     2.1 h     101.6 m 
    INFO  11:32:47,063 ProgressMeter - Super_Scaffold61:2429501   2000000.0    25.5 m      12.8 m       19.8%     2.2 h     103.6 m 
    INFO  11:33:17,064 ProgressMeter - Super_Scaffold61:2429501   2000000.0    26.0 m      13.0 m       19.8%     2.2 h     105.7 m 
    INFO  11:33:47,065 ProgressMeter - Super_Scaffold61:2429501   2000000.0    26.5 m      13.3 m       19.8%     2.2 h     107.7 m 
    INFO  11:34:17,067 ProgressMeter - Super_Scaffold61:2429501   2000000.0    27.0 m      13.5 m       19.8%     2.3 h     109.7 m 
    INFO  11:34:47,068 ProgressMeter - Super_Scaffold61:2429501   2000000.0    27.5 m      13.8 m       19.8%     2.3 h     111.7 m 
    INFO  11:35:17,069 ProgressMeter - Super_Scaffold61:2429501   2000000.0    28.0 m      14.0 m       19.8%     2.4 h     113.8 m 
    INFO  11:35:47,070 ProgressMeter - Super_Scaffold61:2429501   2000000.0    28.5 m      14.3 m       19.8%     2.4 h     115.8 m 
    INFO  11:36:17,071 ProgressMeter - Super_Scaffold61:2429501   2000000.0    29.0 m      14.5 m       19.8%     2.4 h     117.8 m 
    INFO  11:36:47,072 ProgressMeter - Super_Scaffold61:2429501   2000000.0    29.5 m      14.8 m       19.8%     2.5 h     119.9 m 
    INFO  11:37:17,073 ProgressMeter - Super_Scaffold61:2429501   2000000.0    30.0 m      15.0 m       19.8%     2.5 h       2.0 h 
    INFO  11:37:47,075 ProgressMeter - Super_Scaffold61:2429501   2000000.0    30.5 m      15.3 m       19.8%     2.6 h       2.1 h 
    INFO  11:38:17,076 ProgressMeter - Super_Scaffold61:2429501   2000000.0    31.0 m      15.5 m       19.8%     2.6 h       2.1 h 
    INFO  11:38:47,077 ProgressMeter - Super_Scaffold61:2429501   2000000.0    31.5 m      15.8 m       19.8%     2.7 h       2.1 h 
    INFO  11:39:17,078 ProgressMeter - Super_Scaffold61:2429501   2000000.0    32.0 m      16.0 m       19.8%     2.7 h       2.2 h 
    INFO  11:39:47,080 ProgressMeter - Super_Scaffold61:2429501   2000000.0    32.5 m      16.3 m       19.8%     2.7 h       2.2 h 
    INFO  11:40:17,081 ProgressMeter - Super_Scaffold61:2429501   2000000.0    33.0 m      16.5 m       19.8%     2.8 h       2.2 h 
    INFO  11:40:56,417 ProgressMeter - Super_Scaffold61:2429501   2000000.0    33.7 m      16.8 m       19.8%     2.8 h       2.3 h 
    INFO  11:41:26,418 ProgressMeter - Super_Scaffold61:2429501   2000000.0    34.2 m      17.1 m       19.8%     2.9 h       2.3 h 
    INFO  11:41:56,419 ProgressMeter - Super_Scaffold61:2429501   2000000.0    34.7 m      17.3 m       19.8%     2.9 h       2.3 h 
    INFO  11:42:26,420 ProgressMeter - Super_Scaffold61:2429501   2000000.0    35.2 m      17.6 m       19.8%     3.0 h       2.4 h 
    INFO  11:42:56,421 ProgressMeter - Super_Scaffold61:2429501   2000000.0    35.7 m      17.8 m       19.8%     3.0 h       2.4 h 
    INFO  11:43:26,422 ProgressMeter - Super_Scaffold61:2429501   2000000.0    36.2 m      18.1 m       19.8%     3.1 h       2.4 h 
    INFO  11:43:56,423 ProgressMeter - Super_Scaffold61:2429501   2000000.0    36.7 m      18.3 m       19.8%     3.1 h       2.5 h 
    INFO  11:44:26,425 ProgressMeter - Super_Scaffold61:2429501   2000000.0    37.2 m      18.6 m       19.8%     3.1 h       2.5 h 
    INFO  11:44:56,602 ProgressMeter - Super_Scaffold61:2429501   2000000.0    37.7 m      18.8 m       19.8%     3.2 h       2.5 h 
    INFO  11:45:26,603 ProgressMeter - Super_Scaffold61:2429501   2000000.0    38.2 m      19.1 m       19.8%     3.2 h       2.6 h 
    INFO  11:45:56,605 ProgressMeter - Super_Scaffold61:2429501   2000000.0    38.7 m      19.3 m       19.8%     3.3 h       2.6 h 
    INFO  11:46:26,607 ProgressMeter - Super_Scaffold61:2429501   2000000.0    39.2 m      19.6 m       19.8%     3.3 h       2.7 h 
    INFO  11:46:56,608 ProgressMeter - Super_Scaffold61:2429501   2000000.0    39.7 m      19.8 m       19.8%     3.3 h       2.7 h 
    INFO  11:47:26,610 ProgressMeter - Super_Scaffold61:2429501   2000000.0    40.2 m      20.1 m       19.8%     3.4 h       2.7 h 
    INFO  11:47:56,611 ProgressMeter - Super_Scaffold61:2429501   2000000.0    40.7 m      20.3 m       19.8%     3.4 h       2.8 h 
    INFO  11:48:26,613 ProgressMeter - Super_Scaffold61:2429501   2000000.0    41.2 m      20.6 m       19.8%     3.5 h       2.8 h 
    INFO  11:48:56,614 ProgressMeter - Super_Scaffold61:2429501   2000000.0    41.7 m      20.8 m       19.8%     3.5 h       2.8 h 
    INFO  11:49:26,616 ProgressMeter - Super_Scaffold61:2429501   2000000.0    42.2 m      21.1 m       19.8%     3.6 h       2.9 h 
    INFO  11:49:56,617 ProgressMeter - Super_Scaffold61:2429501   2000000.0    42.7 m      21.3 m       19.8%     3.6 h       2.9 h 
    INFO  11:50:26,619 ProgressMeter - Super_Scaffold61:2429501   2000000.0    43.2 m      21.6 m       19.8%     3.6 h       2.9 h 
    INFO  11:50:56,668 ProgressMeter - Super_Scaffold61:2447501   2000000.0    43.7 m      21.8 m       19.9%     3.7 h       2.9 h 
    INFO  11:51:26,670 ProgressMeter - Super_Scaffold61:2623601   2000000.0    44.2 m      22.1 m       21.3%     3.4 h       2.7 h 
    INFO  11:51:56,672 ProgressMeter - Super_Scaffold61:2733801   2000000.0    44.7 m      22.3 m       22.2%     3.3 h       2.6 h 
    INFO  11:52:26,673 ProgressMeter - Super_Scaffold61:2902001   2000000.0    45.2 m      22.6 m       23.6%     3.2 h       2.4 h 
    INFO  11:52:56,675 ProgressMeter - Super_Scaffold61:2944101   2000000.0    45.7 m      22.8 m       24.0%     3.2 h       2.4 h 
    INFO  11:53:26,676 ProgressMeter - Super_Scaffold61:3147901   3000000.0    46.2 m      15.4 m       25.6%     3.0 h       2.2 h 
    INFO  11:53:56,848 ProgressMeter - Super_Scaffold61:3298801   3000000.0    46.7 m      15.6 m       26.8%     2.9 h       2.1 h 
    INFO  11:54:26,850 ProgressMeter - Super_Scaffold61:3412701   3000000.0    47.2 m      15.7 m       27.8%     2.8 h       2.0 h 
    INFO  11:54:56,852 ProgressMeter - Super_Scaffold61:3473601   3000000.0    47.7 m      15.9 m       28.3%     2.8 h       2.0 h 
    INFO  11:55:26,853 ProgressMeter - Super_Scaffold61:3523501   3000000.0    48.2 m      16.1 m       28.7%     2.8 h     119.9 m 
    INFO  11:55:56,855 ProgressMeter - Super_Scaffold61:3637101   3000000.0    48.7 m      16.2 m       29.6%     2.7 h     115.9 m 
    INFO  11:56:26,856 ProgressMeter - Super_Scaffold61:3684101   3000000.0    49.2 m      16.4 m       30.0%     2.7 h     114.9 m 
    INFO  11:56:56,858 ProgressMeter - Super_Scaffold61:3736001   3000000.0    49.7 m      16.6 m       30.4%     2.7 h     113.8 m 
    INFO  11:57:26,859 ProgressMeter - Super_Scaffold61:3861401   3000000.0    50.2 m      16.7 m       31.4%     2.7 h     109.6 m 
    INFO  11:57:56,861 ProgressMeter - Super_Scaffold61:3999901   3000000.0    50.7 m      16.9 m       32.5%     2.6 h     105.1 m 
    INFO  11:58:26,863 ProgressMeter - Super_Scaffold61:4101201   4000000.0    51.2 m      12.8 m       33.4%     2.6 h     102.2 m 
    INFO  11:58:56,864 ProgressMeter - Super_Scaffold61:4136501   4000000.0    51.7 m      12.9 m       33.7%     2.6 h     101.9 m 
    INFO  11:59:26,866 ProgressMeter - Super_Scaffold61:4170001   4000000.0    52.2 m      13.1 m       33.9%     2.6 h     101.6 m 
    INFO  11:59:56,868 ProgressMeter - Super_Scaffold61:4274201   4000000.0    52.7 m      13.2 m       34.8%     2.5 h      98.8 m 
    INFO  12:00:26,870 ProgressMeter - Super_Scaffold61:4336701   4000000.0    53.2 m      13.3 m       35.3%     2.5 h      97.6 m 
    INFO  12:00:56,871 ProgressMeter - Super_Scaffold61:4384601   4000000.0    53.7 m      13.4 m       35.7%     2.5 h      96.8 m 
    INFO  12:01:26,872 ProgressMeter - Super_Scaffold61:4384601   4000000.0    54.2 m      13.6 m       35.7%     2.5 h      97.7 m 
    INFO  12:01:56,874 ProgressMeter - Super_Scaffold61:4445401   4000000.0    54.7 m      13.7 m       36.2%     2.5 h      96.5 m 
    INFO  12:02:26,875 ProgressMeter - Super_Scaffold61:4511201   4000000.0    55.2 m      13.8 m       36.7%     2.5 h      95.2 m 
    INFO  12:02:56,876 ProgressMeter - Super_Scaffold61:4673001   4000000.0    55.7 m      13.9 m       38.0%     2.4 h      90.8 m 
    INFO  12:03:26,878 ProgressMeter - Super_Scaffold61:4794601   4000000.0    56.2 m      14.1 m       39.0%     2.4 h      87.9 m 
    INFO  12:03:56,879 ProgressMeter - Super_Scaffold61:4888901   4000000.0    56.7 m      14.2 m       39.8%     2.4 h      85.8 m 
    INFO  12:04:26,880 ProgressMeter - Super_Scaffold61:4899201   4000000.0    57.2 m      14.3 m       39.9%     2.4 h      86.3 m 
    INFO  12:04:56,882 ProgressMeter - Super_Scaffold61:5044401   5000000.0    57.7 m      11.5 m       41.0%     2.3 h      82.9 m 
    INFO  12:05:26,884 ProgressMeter - Super_Scaffold61:5194301   5000000.0    58.2 m      11.6 m       42.3%     2.3 h      79.5 m 
    INFO  12:05:56,885 ProgressMeter - Super_Scaffold61:5211001   5000000.0    58.7 m      11.7 m       42.4%     2.3 h      79.7 m 
    INFO  12:06:26,887 ProgressMeter - Super_Scaffold61:5305901   5000000.0    59.2 m      11.8 m       43.2%     2.3 h      77.9 m 
    INFO  12:06:56,888 ProgressMeter - Super_Scaffold61:5418001   5000000.0    59.7 m      11.9 m       44.1%     2.3 h      75.7 m 
    INFO  12:07:26,890 ProgressMeter - Super_Scaffold61:5500001   5000000.0    60.2 m      12.0 m       44.8%     2.2 h      74.3 m 
    INFO  12:07:56,891 ProgressMeter - Super_Scaffold61:5611701   5000000.0    60.7 m      12.1 m       45.7%     2.2 h      72.2 m 
    INFO  12:08:26,893 ProgressMeter - Super_Scaffold61:5611701   5000000.0    61.2 m      12.2 m       45.7%     2.2 h      72.8 m 
    INFO  12:08:56,894 ProgressMeter - Super_Scaffold61:5611701   5000000.0    61.7 m      12.3 m       45.7%     2.3 h      73.4 m 
    INFO  12:09:38,121 ProgressMeter - Super_Scaffold61:5611701   5000000.0    62.4 m      12.5 m       45.7%     2.3 h      74.2 m 
    INFO  12:10:12,997 ProgressMeter - Super_Scaffold61:5611701   5000000.0    63.0 m      12.6 m       45.7%     2.3 h      74.9 m 
    INFO  12:10:42,998 ProgressMeter - Super_Scaffold61:5611701   5000000.0    63.5 m      12.7 m       45.7%     2.3 h      75.5 m 
    INFO  12:11:12,999 ProgressMeter - Super_Scaffold61:5611701   5000000.0    64.0 m      12.8 m       45.7%     2.3 h      76.1 m 
    INFO  12:11:43,460 ProgressMeter - Super_Scaffold61:5611701   5000000.0    64.5 m      12.9 m       45.7%     2.4 h      76.7 m 
    INFO  12:12:13,461 ProgressMeter - Super_Scaffold61:5611701   5000000.0    65.0 m      13.0 m       45.7%     2.4 h      77.3 m 
    INFO  12:12:43,462 ProgressMeter - Super_Scaffold61:5611701   5000000.0    65.5 m      13.1 m       45.7%     2.4 h      77.9 m 
    INFO  12:13:13,463 ProgressMeter - Super_Scaffold61:5611701   5000000.0    66.0 m      13.2 m       45.7%     2.4 h      78.5 m 
    INFO  12:13:43,464 ProgressMeter - Super_Scaffold61:5611701   5000000.0    66.5 m      13.3 m       45.7%     2.4 h      79.1 m 
    INFO  12:14:13,465 ProgressMeter - Super_Scaffold61:5611701   5000000.0    67.0 m      13.4 m       45.7%     2.4 h      79.7 m 
    INFO  12:14:43,466 ProgressMeter - Super_Scaffold61:5611701   5000000.0    67.5 m      13.5 m       45.7%     2.5 h      80.3 m 
    INFO  12:15:13,467 ProgressMeter - Super_Scaffold61:5611701   5000000.0    68.0 m      13.6 m       45.7%     2.5 h      80.9 m 
    INFO  12:15:43,468 ProgressMeter - Super_Scaffold61:5611701   5000000.0    68.5 m      13.7 m       45.7%     2.5 h      81.5 m 
    INFO  12:16:13,469 ProgressMeter - Super_Scaffold61:5611701   5000000.0    69.0 m      13.8 m       45.7%     2.5 h      82.1 m 
    INFO  12:16:43,470 ProgressMeter - Super_Scaffold61:5611701   5000000.0    69.5 m      13.9 m       45.7%     2.5 h      82.7 m 
    INFO  12:17:13,471 ProgressMeter - Super_Scaffold61:5611701   5000000.0    70.0 m      14.0 m       45.7%     2.6 h      83.3 m 
    INFO  12:17:43,472 ProgressMeter - Super_Scaffold61:5611701   5000000.0    70.5 m      14.1 m       45.7%     2.6 h      83.9 m 
    INFO  12:18:13,473 ProgressMeter - Super_Scaffold61:5611701   5000000.0    71.0 m      14.2 m       45.7%     2.6 h      84.4 m 
    INFO  12:18:43,688 ProgressMeter - Super_Scaffold61:5611701   5000000.0    71.5 m      14.3 m       45.7%     2.6 h      85.0 m 
    INFO  12:19:13,689 ProgressMeter - Super_Scaffold61:5611701   5000000.0    72.0 m      14.4 m       45.7%     2.6 h      85.6 m 
    INFO  12:19:43,690 ProgressMeter - Super_Scaffold61:5611701   5000000.0    72.5 m      14.5 m       45.7%     2.6 h      86.2 m 
    INFO  12:20:13,691 ProgressMeter - Super_Scaffold61:5611701   5000000.0    73.0 m      14.6 m       45.7%     2.7 h      86.8 m 
    INFO  12:20:43,692 ProgressMeter - Super_Scaffold61:5611701   5000000.0    73.5 m      14.7 m       45.7%     2.7 h      87.4 m 
    INFO  12:21:13,693 ProgressMeter - Super_Scaffold61:5611701   5000000.0    74.0 m      14.8 m       45.7%     2.7 h      88.0 m 
    INFO  12:21:43,694 ProgressMeter - Super_Scaffold61:5611701   5000000.0    74.5 m      14.9 m       45.7%     2.7 h      88.6 m 
    INFO  12:22:13,695 ProgressMeter - Super_Scaffold61:5611701   5000000.0    75.0 m      15.0 m       45.7%     2.7 h      89.2 m 
    INFO  12:22:43,696 ProgressMeter - Super_Scaffold61:5611701   5000000.0    75.5 m      15.1 m       45.7%     2.8 h      89.8 m 
    INFO  12:23:13,697 ProgressMeter - Super_Scaffold61:5611701   5000000.0    76.0 m      15.2 m       45.7%     2.8 h      90.4 m 
    INFO  12:23:43,698 ProgressMeter - Super_Scaffold61:5611701   5000000.0    76.5 m      15.3 m       45.7%     2.8 h      91.0 m 
    INFO  12:24:13,699 ProgressMeter - Super_Scaffold61:5611701   5000000.0    77.0 m      15.4 m       45.7%     2.8 h      91.6 m 
    INFO  12:24:43,700 ProgressMeter - Super_Scaffold61:5611701   5000000.0    77.5 m      15.5 m       45.7%     2.8 h      92.2 m 
    INFO  12:25:13,701 ProgressMeter - Super_Scaffold61:5611701   5000000.0    78.0 m      15.6 m       45.7%     2.8 h      92.8 m 
    INFO  12:25:46,489 ProgressMeter - Super_Scaffold61:5611701   5000000.0    78.5 m      15.7 m       45.7%     2.9 h      93.4 m 
    INFO  12:26:16,490 ProgressMeter - Super_Scaffold61:5611701   5000000.0    79.0 m      15.8 m       45.7%     2.9 h      94.0 m 
    INFO  12:26:46,491 ProgressMeter - Super_Scaffold61:5611701   5000000.0    79.5 m      15.9 m       45.7%     2.9 h      94.6 m 
    INFO  12:27:16,492 ProgressMeter - Super_Scaffold61:5611701   5000000.0    80.0 m      16.0 m       45.7%     2.9 h      95.2 m 
    INFO  12:27:46,493 ProgressMeter - Super_Scaffold61:5611701   5000000.0    80.5 m      16.1 m       45.7%     2.9 h      95.8 m 
    INFO  12:28:16,494 ProgressMeter - Super_Scaffold61:5611701   5000000.0    81.0 m      16.2 m       45.7%     3.0 h      96.4 m 
    INFO  12:28:46,496 ProgressMeter - Super_Scaffold61:5611701   5000000.0    81.5 m      16.3 m       45.7%     3.0 h      97.0 m 
    INFO  12:29:16,497 ProgressMeter - Super_Scaffold61:5611701   5000000.0    82.0 m      16.4 m       45.7%     3.0 h      97.6 m 
    INFO  12:29:46,498 ProgressMeter - Super_Scaffold61:5611701   5000000.0    82.5 m      16.5 m       45.7%     3.0 h      98.2 m 
    INFO  12:30:16,500 ProgressMeter - Super_Scaffold61:5611701   5000000.0    83.0 m      16.6 m       45.7%     3.0 h      98.8 m 
    INFO  12:30:46,512 ProgressMeter - Super_Scaffold61:5611701   5000000.0    83.5 m      16.7 m       45.7%     3.0 h      99.4 m 
    INFO  12:31:16,514 ProgressMeter - Super_Scaffold61:5611701   5000000.0    84.0 m      16.8 m       45.7%     3.1 h     100.0 m 
    INFO  12:31:46,515 ProgressMeter - Super_Scaffold61:5611701   5000000.0    84.5 m      16.9 m       45.7%     3.1 h     100.6 m 
    INFO  12:32:16,516 ProgressMeter - Super_Scaffold61:5611701   5000000.0    85.0 m      17.0 m       45.7%     3.1 h     101.2 m 
    INFO  12:32:46,924 ProgressMeter - Super_Scaffold61:5611701   5000000.0    85.5 m      17.1 m       45.7%     3.1 h     101.8 m 
    INFO  12:33:16,925 ProgressMeter - Super_Scaffold61:5611701   5000000.0    86.0 m      17.2 m       45.7%     3.1 h     102.4 m 
    INFO  12:33:46,926 ProgressMeter - Super_Scaffold61:5611701   5000000.0    86.5 m      17.3 m       45.7%     3.2 h     103.0 m 
    INFO  12:34:16,927 ProgressMeter - Super_Scaffold61:5611701   5000000.0    87.0 m      17.4 m       45.7%     3.2 h     103.6 m 
    INFO  12:34:46,928 ProgressMeter - Super_Scaffold61:5611701   5000000.0    87.5 m      17.5 m       45.7%     3.2 h     104.2 m 
    INFO  12:35:16,929 ProgressMeter - Super_Scaffold61:5611701   5000000.0    88.0 m      17.6 m       45.7%     3.2 h     104.8 m 
    INFO  12:35:46,931 ProgressMeter - Super_Scaffold61:5611701   5000000.0    88.5 m      17.7 m       45.7%     3.2 h     105.4 m 
    INFO  12:36:16,932 ProgressMeter - Super_Scaffold61:5611701   5000000.0    89.0 m      17.8 m       45.7%     3.2 h     105.9 m 
    INFO  12:36:46,933 ProgressMeter - Super_Scaffold61:5611701   5000000.0    89.5 m      17.9 m       45.7%     3.3 h     106.5 m 
    INFO  12:37:16,934 ProgressMeter - Super_Scaffold61:5611701   5000000.0    90.0 m      18.0 m       45.7%     3.3 h     107.1 m 
    INFO  12:37:46,935 ProgressMeter - Super_Scaffold61:5611701   5000000.0    90.5 m      18.1 m       45.7%     3.3 h     107.7 m 
    INFO  12:38:16,936 ProgressMeter - Super_Scaffold61:5611701   5000000.0    91.0 m      18.2 m       45.7%     3.3 h     108.3 m 
    INFO  12:38:46,937 ProgressMeter - Super_Scaffold61:5611701   5000000.0    91.5 m      18.3 m       45.7%     3.3 h     108.9 m 
    INFO  12:39:16,938 ProgressMeter - Super_Scaffold61:5611701   5000000.0    92.0 m      18.4 m       45.7%     3.4 h     109.5 m 
    INFO  12:39:46,939 ProgressMeter - Super_Scaffold61:5611701   5000000.0    92.5 m      18.5 m       45.7%     3.4 h     110.1 m 
    INFO  12:40:16,940 ProgressMeter - Super_Scaffold61:5611701   5000000.0    93.0 m      18.6 m       45.7%     3.4 h     110.7 m 
    INFO  12:40:46,941 ProgressMeter - Super_Scaffold61:5611701   5000000.0    93.5 m      18.7 m       45.7%     3.4 h     111.3 m 
    INFO  12:41:16,942 ProgressMeter - Super_Scaffold61:5611701   5000000.0    94.0 m      18.8 m       45.7%     3.4 h     111.9 m 
    INFO  12:41:46,943 ProgressMeter - Super_Scaffold61:5611701   5000000.0    94.5 m      18.9 m       45.7%     3.5 h     112.5 m 
    INFO  12:42:16,944 ProgressMeter - Super_Scaffold61:5611701   5000000.0    95.0 m      19.0 m       45.7%     3.5 h     113.1 m 
    INFO  12:42:46,945 ProgressMeter - Super_Scaffold61:5611701   5000000.0    95.5 m      19.1 m       45.7%     3.5 h     113.7 m 
    INFO  12:43:16,946 ProgressMeter - Super_Scaffold61:5611701   5000000.0    96.0 m      19.2 m       45.7%     3.5 h     114.3 m 
    INFO  12:43:46,948 ProgressMeter - Super_Scaffold61:5611701   5000000.0    96.5 m      19.3 m       45.7%     3.5 h     114.9 m 
    INFO  12:44:16,949 ProgressMeter - Super_Scaffold61:5611701   5000000.0    97.0 m      19.4 m       45.7%     3.5 h     115.5 m 
    INFO  12:44:46,950 ProgressMeter - Super_Scaffold61:5611701   5000000.0    97.5 m      19.5 m       45.7%     3.6 h     116.1 m 
    INFO  12:45:16,951 ProgressMeter - Super_Scaffold61:5611701   5000000.0    98.0 m      19.6 m       45.7%     3.6 h     116.7 m 
    INFO  12:45:46,952 ProgressMeter - Super_Scaffold61:5611701   5000000.0    98.5 m      19.7 m       45.7%     3.6 h     117.3 m 
    INFO  12:46:16,953 ProgressMeter - Super_Scaffold61:5611701   5000000.0    99.0 m      19.8 m       45.7%     3.6 h     117.8 m 
    INFO  12:46:46,954 ProgressMeter - Super_Scaffold61:5611701   5000000.0    99.5 m      19.9 m       45.7%     3.6 h     118.4 m 
    INFO  12:47:16,955 ProgressMeter - Super_Scaffold61:5611701   5000000.0   100.0 m      20.0 m       45.7%     3.7 h     119.0 m 
    INFO  12:47:46,956 ProgressMeter - Super_Scaffold61:5611701   5000000.0   100.5 m      20.1 m       45.7%     3.7 h     119.6 m 
    INFO  12:48:16,957 ProgressMeter - Super_Scaffold61:5611701   5000000.0   101.0 m      20.2 m       45.7%     3.7 h       2.0 h 
    INFO  12:48:46,958 ProgressMeter - Super_Scaffold61:5611701   5000000.0   101.5 m      20.3 m       45.7%     3.7 h       2.0 h 
    INFO  12:49:16,959 ProgressMeter - Super_Scaffold61:5611701   5000000.0   102.0 m      20.4 m       45.7%     3.7 h       2.0 h 
    INFO  12:49:46,960 ProgressMeter - Super_Scaffold61:5611701   5000000.0   102.5 m      20.5 m       45.7%     3.7 h       2.0 h 
    INFO  12:50:16,961 ProgressMeter - Super_Scaffold61:5611701   5000000.0   103.0 m      20.6 m       45.7%     3.8 h       2.0 h 
    INFO  12:50:46,962 ProgressMeter - Super_Scaffold61:5611701   5000000.0   103.5 m      20.7 m       45.7%     3.8 h       2.1 h 
    INFO  12:51:16,963 ProgressMeter - Super_Scaffold61:5611701   5000000.0   104.0 m      20.8 m       45.7%     3.8 h       2.1 h 
    INFO  12:51:46,964 ProgressMeter - Super_Scaffold61:5611701   5000000.0   104.5 m      20.9 m       45.7%     3.8 h       2.1 h 
    INFO  12:52:16,965 ProgressMeter - Super_Scaffold61:5611701   5000000.0   105.0 m      21.0 m       45.7%     3.8 h       2.1 h 
    INFO  12:52:46,966 ProgressMeter - Super_Scaffold61:5611701   5000000.0   105.5 m      21.1 m       45.7%     3.9 h       2.1 h 
    INFO  12:53:20,185 ProgressMeter - Super_Scaffold61:5611701   5000000.0   106.1 m      21.2 m       45.7%     3.9 h       2.1 h 
    WARN  12:53:39,333 AFCalculator - Maximum allowed number of PLs (4000000) exceeded for sample 7iA at Super_Scaffold61:5611797-5611797 with 20030010 possible genotypes. No PLs will be output for these genotypes (which may cause incorrect results in subsequent analyses) unless the --max_num_PL_values argument is increased accordingly. Unless the DEBUG logging level is used, this warning message is output just once per run and further warnings are suppressed. 
    INFO  12:53:50,186 ProgressMeter - Super_Scaffold61:5611801   5000000.0   106.6 m      21.3 m       45.7%     3.9 h       2.1 h 
    INFO  12:54:20,187 ProgressMeter - Super_Scaffold61:5611801   5000000.0   107.1 m      21.4 m       45.7%     3.9 h       2.1 h 
    INFO  12:54:50,188 ProgressMeter - Super_Scaffold61:5628601   5000000.0   107.6 m      21.5 m       45.8%     3.9 h       2.1 h 
    INFO  12:55:20,189 ProgressMeter - Super_Scaffold61:5772501   5000000.0   108.1 m      21.6 m       47.0%     3.8 h       2.0 h 
    INFO  12:55:50,190 ProgressMeter - Super_Scaffold61:5851801   5000000.0   108.6 m      21.7 m       47.6%     3.8 h     119.5 m 
    INFO  12:56:20,191 ProgressMeter - Super_Scaffold61:5929501   5000000.0   109.1 m      21.8 m       48.2%     3.8 h     117.0 m 
    INFO  12:56:50,192 ProgressMeter - Super_Scaffold61:6090201   6000000.0   109.6 m      18.3 m       49.6%     3.7 h     111.5 m 
    INFO  12:57:20,193 ProgressMeter - Super_Scaffold61:6137001   6000000.0   110.1 m      18.3 m       49.9%     3.7 h     110.4 m 
    INFO  12:57:50,194 ProgressMeter - Super_Scaffold61:6155101   6000000.0   110.6 m      18.4 m       50.1%     3.7 h     110.2 m 
    INFO  12:58:20,196 ProgressMeter - Super_Scaffold61:6245801   6000000.0   111.1 m      18.5 m       50.8%     3.6 h     107.5 m 
    INFO  12:58:50,197 ProgressMeter - Super_Scaffold61:6357101   6000000.0   111.6 m      18.6 m       51.7%     3.6 h     104.1 m 
    INFO  12:59:20,198 ProgressMeter - Super_Scaffold61:6440001   6000000.0   112.1 m      18.7 m       52.4%     3.6 h     101.8 m 
    INFO  12:59:50,199 ProgressMeter - Super_Scaffold61:6577201   6000000.0   112.6 m      18.8 m       53.5%     3.5 h      97.8 m 
    INFO  13:00:20,200 ProgressMeter - Super_Scaffold61:6597501   6000000.0   113.1 m      18.8 m       53.7%     3.5 h      97.6 m 
    INFO  13:00:50,201 ProgressMeter - Super_Scaffold61:6597501   6000000.0   113.6 m      18.9 m       53.7%     3.5 h      98.0 m 
    INFO  13:01:20,202 ProgressMeter - Super_Scaffold61:6597501   6000000.0   114.1 m      19.0 m       53.7%     3.5 h      98.4 m 
    INFO  13:01:50,203 ProgressMeter - Super_Scaffold61:6597501   6000000.0   114.6 m      19.1 m       53.7%     3.6 h      98.9 m 
    INFO  13:02:20,204 ProgressMeter - Super_Scaffold61:6597501   6000000.0   115.1 m      19.2 m       53.7%     3.6 h      99.3 m 
    INFO  13:02:50,205 ProgressMeter - Super_Scaffold61:6597501   6000000.0   115.6 m      19.3 m       53.7%     3.6 h      99.7 m 
    INFO  13:03:20,207 ProgressMeter - Super_Scaffold61:6597501   6000000.0   116.1 m      19.3 m       53.7%     3.6 h     100.2 m 
    INFO  13:03:50,208 ProgressMeter - Super_Scaffold61:6597501   6000000.0   116.6 m      19.4 m       53.7%     3.6 h     100.6 m 
    INFO  13:04:20,209 ProgressMeter - Super_Scaffold61:6735601   6000000.0   117.1 m      19.5 m       54.8%     3.6 h      96.5 m 
    INFO  13:04:50,211 ProgressMeter - Super_Scaffold61:6909701   6000000.0   117.6 m      19.6 m       56.2%     3.5 h      91.5 m 
    INFO  13:05:20,212 ProgressMeter - Super_Scaffold61:6909701   6000000.0   118.1 m      19.7 m       56.2%     3.5 h      91.9 m 
    INFO  13:05:50,213 ProgressMeter - Super_Scaffold61:6909701   6000000.0   118.6 m      19.8 m       56.2%     3.5 h      92.3 m 
    INFO  13:06:20,214 ProgressMeter - Super_Scaffold61:6909701   6000000.0   119.1 m      19.8 m       56.2%     3.5 h      92.7 m 
    INFO  13:06:50,214 ProgressMeter - Super_Scaffold61:6909701   6000000.0   119.6 m      19.9 m       56.2%     3.5 h      93.1 m 
    INFO  13:07:20,215 ProgressMeter - Super_Scaffold61:6909701   6000000.0     2.0 h      20.0 m       56.2%     3.6 h      93.5 m 
    INFO  13:07:50,216 ProgressMeter - Super_Scaffold61:6909701   6000000.0     2.0 h      20.1 m       56.2%     3.6 h      93.9 m 
    INFO  13:08:21,457 ProgressMeter - Super_Scaffold61:6909701   6000000.0     2.0 h      20.2 m       56.2%     3.6 h      94.3 m 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version nightly-2017-09-12-g315c945): 
    ##### ERROR
    ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
    ##### ERROR The error message below tells you what is the problem.
    ##### ERROR
    ##### ERROR If the problem is an invalid argument, please check the online documentation guide
    ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ##### ERROR
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ##### ERROR
    ##### ERROR MESSAGE: An error occurred because you did not provide enough memory to run this program. You can use the -Xmx argument (before the -jar argument) to adjust the maximum heap size provided to Java. Note that this is a JVM argument, not a GATK argument.
    ##### ERROR ------------------------------------------------------------------------------------------
    ----------------------------------------
    

    Before, using the standard 3.8 version, 15 out of 94 super-scaffolds did not work; with the nightly build, now 9 of the initial 15 problematic super-scaffolds still don't work. That is an improvement but cannot figure out what is causing such large memory leak. Any ideas?

    From STDOUT I noticed that the --max_num_PL_values was high = 20030010. This is the command used for one of the super-scaffolds:

    java -Djava.io.tmpdir=/path/tmp/ -XX:ParallelGCThreads=1 -Dsamjdk.use_async_io=true -Dsamjdk.buffer_size=4194304 -Xmx256g -jar /path/GATK/2017_09_12/GenomeAnalysisTK.jar \
    -T GenotypeGVCFs \
    -R /path/Stitched_Ref_genome/genome.fasta \
    -V /path/GVCFs/7iA.94.Super_Scaffolds.raw.g.vcf \
    -V /path/GVCFs/BDO.94.Super_Scaffolds.raw.g.vcf \
    -V /path/GVCFs/Bla.94.Super_Scaffolds.raw.g.vcf \
    -V /path/GVCFs/Lab.94.Super_Scaffolds.raw.g.vcf \
    -V /path/GVCFs/M514.94.Super_Scaffolds.raw.g.vcf \
    -V /path/GVCFs/Mir.94.Super_Scaffolds.raw.g.vcf \
    -V /path/GVCFs/Mus.94.Super_Scaffolds.raw.g.vcf \
    -V /path/GVCFs/NDF.94.Super_Scaffolds.raw.g.vcf \
    -V /path/GVCFs/NDS.94.Super_Scaffolds.raw.g.vcf \
    -V /path/GVCFs/ScB.94.Super_Scaffolds.raw.g.vcf \
    -V /path/GVCFs/Ste.94.Super_Scaffolds.raw.g.vcf \
    -L Super_Scaffold0 \
    -maxAltAlleles 3 \
    -newQual \
    -maxNumPLValues 4000000 \
    -o /path/GATK_results/11.pops.Super_Scaffold0.raw.SNPs-indels.vcf
    

    Thanks for your help.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @apfuentes
    Hi,

    Can you confirm if this happens with a subset of GVCFs or only with all of them? Have you tried using CombineGVCFs before running GenotypeGVCFs?

    Thanks,
    Sheila

  • @Sheila
    Thanks a lot for your recommendations! I performed the tests suggested, here the results:

    1) I run GenotypeGVCFs on each scaffold, excluding 1 of the 11 GVCFs files at a time for a total of 10 GVCFs in one run; I used the same Java settings and arguments as the ones shown in the post on Sept 18 (-maxAltAlleles 3 -newQual), except -maxNumPLValues 30000000. In the first scaffold I used -Xmx256g but the largest memory usage was ~26Gb (as shown below), thus for the other runs this argument was changed to -Xmx32g.

    Resources:  cput=01:08:03,mem=14530600kb,vmem=280478208kb,walltime=01:07:55
    Resources:  cput=01:03:13,mem=14696068kb,vmem=280478208kb,walltime=01:03:03
    Resources:  cput=01:03:25,mem=10215756kb,vmem=280478208kb,walltime=01:02:57
    Resources:  cput=01:00:55,mem=12851356kb,vmem=280478208kb,walltime=01:00:35
    Resources:  cput=01:08:44,mem=13565140kb,vmem=280478208kb,walltime=01:08:20
    Resources:  cput=00:54:50,mem=10251544kb,vmem=280478208kb,walltime=00:54:48
    Resources:  cput=01:36:24,mem=18604812kb,vmem=280478208kb,walltime=01:36:15
    Resources:  cput=01:33:38,mem=18162824kb,vmem=280478208kb,walltime=01:33:12
    Resources:  cput=00:55:51,mem=16145228kb,vmem=280478208kb,walltime=00:55:33
    Resources:  cput=01:55:08,mem=25885124kb,vmem=280478208kb,walltime=01:54:52
    Resources:  cput=01:01:08,mem=15229552kb,vmem=280478208kb,walltime=01:00:52
    

    This table shows for which scaffolds the run was successful when removing one GVCF file at a time:

    Scaffold f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11
    0 x x x
    18 x x x x
    20 x x
    44 x x x x x
    50 x x x x
    53
    56 x
    61 x x x
    68 x x x

    The file f8 seems to be one of the more problematic ones, as for most scaffolds the run finished when this file was excluded. However, different files were problematic for different scaffolds, from which I conclude the problem is more scaffold specific than file specific. In other words, by removing one or a few files probably won't solve the memory issue.

    2) I did not consider CombineGVCFs initially as it is usually recommended when more than 200 GVCFs need to be analyzed, and I only have 11 files.

    Anyways, I first tried combining all the 11 files but run out of memory even when using -Xmx512g. Then, I combined two cohorts, one of 5 GVCFs and other of 6 (-Xmx256g). CombineGVCF was successful in both cases. Later I run GenotypeGVCFs on those 2 combined files (-V cohort1.g.vcf -V cohort2.g.vcf) with -Xmx216g and -Xmx512g but neither worked.

    What else can I do?

    Plan A: Exclude problematic regions from GenotypeGVCF: For this, I first would look at the BAM files and identify the problematic region range (scaffold name, starting and end base). Then run GenotypeGVCFs with the argument -XL to exclude the problematic interval (format: <chr>:<start>-<stop>).
    - Would it be necesary to add the "padding" argument in this case (-ip)?

    Plan B: Repeat everything (HaplotypeCaller and GenotypeGVCFs) using a lower ploidy: I have pools of 50 individuals, so the actual ploidy is 100 (diploid species). Using ploidy 10 is giving these memory problems, so maybe a ploidy of 5 may work. However, lowering ploidy will affect sensitivity for low frequency alleles...

    Any other idea?

    Thanks very much for any help!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @apfuentes
    Hi,

    It would be best if you try Plan A. You can look for sites with a crazy amount of PLs and exclude those sites. Those are likely the most messy and compute-intensive sites. However, you should also look in IGV to check whether you are throwing out interesting areas that may be important in your research. As for including padding, yes you should so you "quarantine" the difficult regions.

    Another thing to try is reducing the -maxNumPLValues and --max_genotype_count.

    I hope this helps.

    -Sheila

    P.S. Thanks for the nice chart and methodical testing :smile:

  • @Sheila
    Hi,
    Thanks a ton for your prompt response and valuable recommendations.

    A few additional questions:

    a) I am now running HaplotypeCaller on one file (chose f8 because of the previous test) for the 9 challenging scaffolds to obtain the correspondent bamout. My plan is to take a look at the bamout as well as to the bam file in IGV.

    Considering that the HaplotypeCaller may skip regions of high complexity (as the ones I am dealing with) for efficiency purposes, I first tried to obtain the bamout forcing the program to finish processing those regions by using the arguments --forceActive --disableOptimizations --dontTrimActiveRegions. For reference, this is the command I used:

    java -Djava.io.tmpdir=/path/tmp/ -XX:ParallelGCThreads=1 -Dsamjdk.use_async_io=true -Dsamjdk.buffer_size=4194304 -Xmx32g -jar /path/GATK/3.8.0/GenomeAnalysisTK.jar \
    -T HaplotypeCaller \
    -R /path/Stitched_Ref_genome/ref.fasta \
    -I /path/aln.bam \
    -L Super_Scaffold0 \
    -L Super_Scaffold18 \
    -L Super_Scaffold20 \
    -L Super_Scaffold44 \
    -L Super_Scaffold50 \
    -L Super_Scaffold53 \
    -L Super_Scaffold56 \
    -L Super_Scaffold61 \
    -L Super_Scaffold68 \
    -ERC GVCF \
    -ploidy 10 \
    -mbq 20 \
    -minPruning 5 \
    -maxNumPLValues 10000 \
    --read_filter OverclippedRead \
    --forceActive --disableOptimizations --dontTrimActiveRegions \
    -o /path/f8.9.Super_Scaffolds.forced.raw.g.vcf \
    -bamout /path/bamout.forced.f8.9.Super_Scaffolds.raw.bam
    

    However the run crashes. Any idea why? Am I missing out something? I tried changing arguments without luck, so for now I am just obtaining the standard bamout using the very same arguments of the original HaplotypeCaller run but adding -bamout flag...I hope those are OK...

    b. To identify sites with a large amount of PLs, should I just look at the annotations in the gVCF file?

    c. As per padding, what would be the best strategy to choose a reasonable number? I have seen people using -ip 100, but I wish could do a more informed decision. Would it be OK to base this decision on the bamouts? For example, pick a number of bases away from the complex region where the assembly looks good (e.g., reads with high mappability and quality scores), would that be fine?

    Thanks.

    PS: Hey, no problem. Other people have saved me time by posting their own tests. I hope can do the same for others. Thanks to you guys for providing this level of support through the forums, I really appreciate it! :)

  • @Sheila
    Hi,
    I am inspecting the bamout of file f8 for the 9 challenging scaffolds. Pics below show a zoomed image of the top part of the bamout around the base where GenotypeGVCFs stops (last base reported by the Progress meter in STDOUT). Reads are colored by Tag -> HC.

    I was expecting a messy alignment, lots of indels, extremely high coverage, or something like that...but I don't really see highly complex regions. Some bases fall within a repetitive sequence but the reads seem to align well, then? Am I missing out something?









    Any idea on how to troubleshoot this better and hopefully get GenotypeVCF work on those scaffolds?

    • In my last post I forgot to mention that I used such large number for -maxNumPLValues (30,000,000) because previous runs in GenotypeGVCF showed a WARN message indicating that the number of PLs was larger than the initial value I set (10,000), then I explored what was the maximum value in all the files and this was 29,000,000. **Do you think by setting --maxNumPLValues to 100,000 and --max_genotype_count to 100 would reduce the memory resources required by GenotypeGVCF?

    Thanks for any help.

  • Hi again,
    I run GenotypeVCFs with these arguments including 11 GVCFs and for each scaffold separately:

    -L Super_Scaffold0 \
    -maxAltAlleles 3 \
    -newQual \
    -maxNumPLValues 100 \
    -maxGT 1000 \
    

    ...it did not work either, it stops at the very same region as before.

  • @Geraldine_VdAuwera
    Hi,
    OK, thanks a lot for the heads-up. Would it be possible to mention that in the documentation? This would be helpful for others in a similar situation :).

    I am going to repeat the HaplotypeCaller run to limit the number of PLs and genotypes (and hopefully reduce the memory requirements of GenotypeGVCFs). A few questions before starting the massive run:

    1. Is it OK if I ignore the WARN messages indicating there are regions with larger number of PLs than 100 (default)? Would that affect the SNP calling of subsequent regions?

    Most likely regions with a large number of PLs are messy and won't be informative anyways...

    2. I am mostly interested on point mutations which have 4 possible bases, thus I was wondering if using -maxAltAlleles 3 to limit the program to consider the REF allele (e.g. A) and 3 alternative alleles (T, C, G) would affect SNP calling or realignment around indels?

    Thanks!!

    Issue · Github
    by Sheila

    Issue Number
    2533
    State
    open
    Last Updated
    Assignee
    Array
  • Hi,
    Sorry for the multiple posts. I would like to rephrase question 2 of my last post:

    How to choose an appropriate --max_genotype_count value for a given dataset? Should this selection be informed by the formula described by Shee here?

    For example, for my dataset the real ploidy is 100 but I am using 10 because this number seems to be working for other users of GATK 3.8 and the combination of high ploidy and alt alleles would require too much memory. Then the max. number of genotypes for ploidy 10 and max. alt alleles 3 is:

    NUG(p,a) = choose(p+a-1, a-1) = choose(10+4-1, 4-1) = choose(13, 3) = 286
    

    Would this value be OK for my case, considering I am mostly interested on SNPs?

    May the setting of such stringent number of genotypes bias variant calling or the consideration of indels as alternative alleles?

    How many PLs should be printed in output when I am only interested on SNPs? The same a the max. number of genotypes (286 in the example)?

    Thanks for your help.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited September 2017

    @apfuentes
    Hi,

    Alright, it looks like you already got started, but I will answer your original questions here:

    However the run crashes. Any idea why? Am I missing out something? I tried changing arguments without luck, so for now I am just obtaining the standard bamout using the very same arguments of the original HaplotypeCaller run but adding -bamout flag...I hope those are OK...

    It is probably crashing because the compute is just too much. I would start by checking which sites are messy/have too many alternate alleles in the VCF. Then, you can check in IGV and make sure you are not missing any valuable information in those sites you wish to exclude.

    To identify sites with a large amount of PLs, should I just look at the annotations in the gVCF file?

    Yes, you can look at either the PL field or the alternate alleles column. The more alternate alleles there are, the messier the region/site. You can simplify this by selecting only for multiallelic sites with SelectVariants.

    As per padding, what would be the best strategy to choose a reasonable number?

    I think if a site is very messy, usually the whole region around it is messy (due to bad mapping or artifacts). I would start with 100, but if you find in your analysis that that is too much or too little, you can tweak it.

    I hope this helps. I will get to your other questions soon :smiley:

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited September 2017

    @apfuentes
    Hi again,

    For those tricky repeat regions, you can try setting some extra arguments mentioned here under "7. Try fiddling with graph arguments (ADVANCED)".

    Also, before your start the massive run, can you try just running on a snippet of the region (~300 bases before and after the sites of interest)?

    Thanks,
    Sheila

  • @Sheila,
    Hi,
    Thanks a lot for responding my original questions and for your recommendations.

    I decided to abort the plan of excluding problematic regions as I did not observe a clear pattern that applies to all of them. Instead, I run HaplotypeCaller for each of the 9 challenging scaffolds using two different sets of values ("relax" and "stringent") for the arguments -max_genotype_count and -maxAltAlleles, assuming it is OK no variants are called at the regions that have large number of PLs as they are generally messy and most likely not informative. Later I run GenotypeGVCFs per scaffold for the cohort of 11 GVCFS. I am glad to share that most runs finished successfully except for one scaffold of the "relaxed" set despite I assigned 256gb of RAM to this job.

    Now I am in the dilemma to decide which set of values for these arguments is more appropriate for my dataset. Would you please take a look at questions posted on Sep 22 and 25?

    Thanks!

  • @Sheila
    Hi,
    Thanks for your prompt response!

    Yes and yes. The parameters used in the HaplotypeCaller runs are shown below (-Xmx 8g, one scaffold at a time for all 11 BAM files):

    "Relaxed" run

    -ploidy 10
    --max_genotype_count 1024
    --maxAltAlleles 6
    -maxNumPLValues 100
    

    "Stringent" run

    -ploidy 10
    --max_genotype_count 286
    --maxAltAlleles 3
    -maxNumPLValues 100
    

    Then I run GenotypeGVCFs one scaffold at a time for the 11 GVCFs using -Xmx 32g.

    Considering that the "stringent" settings worked fine for all scaffolds, I am mostly interested on detecting SNPs, the mutation rate in this species is low, and most likely I will find lots of diallelic SNPs and a few triallelic or tetrallelic ones (expect few genotypes), I plan to use the settings below for my actual runs:

    -ploidy 10
    --max_genotype_count 286
    --maxAltAlleles 6
    -maxNumPLValues 100
    

    In the very first run of the HaplotypeCaller I set -maxNumPLValues 10000 because some regions had 8000-9000 PLs. However, this number of PLs produced the memory issue I encountered when running GenotypeGVCFs. If regions with too many PLs are messy and not informative, I don't mind the program skip over them.

    As far as I understand the -maxNumPLValues only limits the PLs printed in the gVCF file but not the actual PLs calculated internally by the algorithm for all the genotypes found, is that right? Thus, would it be OK to keep it simple and leave this argument in 100?

    Thanks for your advice :)

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @apfuentes
    Hi,

    Yes, the argument only limits the number of PLs output to the GVCF. This is more for saving compute in the GenotypeGVCFs step rather than in the HaplotypeCaller step. You can try limiting it to 100, but you can also try a higher number if you wish. If you are interested in experimenting (which it seems like you are :smile: ) I would try playing with the thresholds until you find ones that give you the best sensitivity in both HaplotypeCaller and GenotypeGVCFs. Try the settings you mentioned for GenotypeGVCFs, and if those work, try being a little less stringent.

    Good luck and let us know how things go.

    -Sheila

  • Thanks a lot to you both for all your recommendations :) I really appreciated it! Sure, I will keep you posted.

  • Hi,
    I just wanted to share that by excluding a repetitive region that starts at the base where HaplotypeCaller used to crash was suffice to fix the problem. I identified the start and end bases in the BAMOUT files and excluded this region using the command -XL in the run of the HaplotypeCaller. This job just required 32 GB of RAM.

    Thanks a ton for all your support troubleshooting the run <3. I have learned tons from your advice :)

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @apfuentes
    Hi,

    Wonderful news!! Thank you for sharing your solution :smile:

    -Sheila

Sign In or Register to comment.