Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Problem running GenotypeGVCFs for large all male cohort in chromosome X

cedlundcedlund Member
edited July 2015 in Ask the GATK team

Hi,

I'm attempting to run GenotypeGVCFs on a cohort of ~4200 human male samples with targeted sequencing. I'm following the current DNA-Seq guidelines for cohort genotyping, with GATK v3.4-0. For each sample, I ran HaplotypeCaller separately for diploid and haploid (i.e., chrX non-PAR) regions, specifying --ploidy 1 for the haploid regions, then combined the resulting two GVCFs with CombineGVCFs. I then combined the per-sample GVCFs into groups of 64 samples using CombineGVCFs. Finally, I ran GenotypeGVCFs with all samples separately for groups of ~100 small target intervals (baits). Every group of target intervals ran fine without error in about 4 hours with ~5GB of RAM, except for the non-PAR chrX regions, which were haploid for all samples.

For the haploid regions, GATK hangs on the very first base, slowly increasing memory usage, then eventually runs out of memory and exits. The estimated runtime keeps increasing without making any progress. The last run exited after 12 hours without making any progress. This happens no matter how much memory I specify (up to 128 GB).

Interestingly, a PAR region of chromosome X run with --ploidy 2 in HaplotypeCaller worked with no problem.

The inputted GVCF files to GenotypeGVCFs are uncompressed and were indexed by CombineGVCFs.

I'm using default settings for GenotypeGVCFs, except for the following:


--standard_min_confidence_threshold_for_calling 20
--standard_min_confidence_threshold_for_emitting 10
--dbsnp dbSNP142.20150416.GRCh37.for-GATK.chr1-MT.vcf.gz

I tried running GenotypeGVCFs with the latest v3.4-46 release, but the same problem occurred.

Below is example output:

INFO 10:57:51,803 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:57:51,810 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12
INFO 10:57:51,811 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 10:57:51,811 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 10:57:51,818 HelpFormatter - Program Args: -T GenotypeGVCFs -R /tmp/12715944.hpc-pbs.hpcc.usc.edu/hs37m.fa --dbsnp dbSNP142.20150416.GRCh37.for-GATK.chr1-MT.vcf.gz --standard_min_confidence_threshold_for_calling 20 --standard_min_confidence_threshold_for_emitting 10
[LONG LIST OF VARIANT FILES OMITTED]
--out gatk.hc.combined.genotyped.chunk117.vcf.gz -L split_117.intervals --log_to_file gatk.hc.combined.genotyped.chunk117.log
INFO 10:57:51,824 HelpFormatter - Executing as [email protected] on Linux 2.6.32-504.12.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13.
INFO 10:57:51,825 HelpFormatter - Date/Time: 2015/07/17 10:57:51
INFO 10:57:51,826 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:57:51,826 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:57:56,331 GenomeAnalysisEngine - Strictness is SILENT
INFO 10:57:56,671 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 10:59:03,550 IntervalUtils - Processing 154370 bp from intervals
WARN 10:59:03,615 IndexDictionaryUtils - Track dbsnp doesn't have a sequence dictionary built in, skipping dictionary validation
INFO 10:59:03,766 GenomeAnalysisEngine - Preparing for traversal
INFO 10:59:03,768 GenomeAnalysisEngine - Done preparing for traversal
INFO 10:59:03,768 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 10:59:03,769 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 10:59:03,770 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 10:59:04,283 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files
INFO 10:59:57,328 ProgressMeter - X:51033766 216.0 53.0 s 68.9 h 0.2% 7.2 h 7.2 h
INFO 11:00:27,330 ProgressMeter - X:51035366 216.0 83.0 s 4.5 d 1.2% 111.5 m 110.1 m
INFO 11:00:57,353 ProgressMeter - X:51035366 216.0 113.0 s 6.1 d 1.2% 2.5 h 2.5 h
INFO 11:01:27,354 ProgressMeter - X:51035366 216.0 2.4 m 7.7 d 1.2% 3.2 h 3.2 h
INFO 11:01:57,356 ProgressMeter - X:51035366 216.0 2.9 m 9.3 d 1.2% 3.9 h 3.8 h
INFO 11:02:27,358 ProgressMeter - X:51035366 216.0 3.4 m 10.9 d 1.2% 4.5 h 4.5 h
INFO 11:02:57,882 ProgressMeter - X:51035366 216.0 3.9 m 12.5 d 1.2% 5.2 h 5.2 h
INFO 11:03:27,884 ProgressMeter - X:51035366 216.0 4.4 m 14.2 d 1.2% 5.9 h 5.8 h
INFO 11:03:57,885 ProgressMeter - X:51035366 216.0 4.9 m 15.8 d 1.2% 6.6 h 6.5 h
INFO 11:04:27,887 ProgressMeter - X:51035366 216.0 5.4 m 17.4 d 1.2% 7.3 h 7.2 h
INFO 11:04:58,976 ProgressMeter - X:51035366 216.0 5.9 m 19.0 d 1.2% 7.9 h 7.8 h
INFO 11:05:28,977 ProgressMeter - X:51035366 216.0 6.4 m 2.9 w 1.2% 8.6 h 8.5 h
INFO 11:05:58,979 ProgressMeter - X:51035366 216.0 6.9 m 3.2 w 1.2% 9.3 h 9.2 h
INFO 11:06:28,981 ProgressMeter - X:51035366 216.0 7.4 m 3.4 w 1.2% 10.0 h 9.8 h
INFO 11:06:58,982 ProgressMeter - X:51035366 216.0 7.9 m 3.6 w 1.2% 10.6 h 10.5 h
INFO 11:07:28,984 ProgressMeter - X:51035366 216.0 8.4 m 3.9 w 1.2% 11.3 h 11.2 h
INFO 11:07:58,986 ProgressMeter - X:51035366 216.0 8.9 m 4.1 w 1.2% 12.0 h 11.8 h
INFO 11:08:30,497 ProgressMeter - X:51035366 216.0 9.4 m 4.3 w 1.2% 12.7 h 12.5 h
INFO 11:09:30,568 ProgressMeter - X:51035366 216.0 10.4 m 4.8 w 1.2% 14.0 h 13.8 h
INFO 11:10:32,779 ProgressMeter - X:51035366 216.0 11.5 m 5.3 w 1.2% 15.4 h 15.2 h
INFO 11:11:33,479 ProgressMeter - X:51035366 216.0 12.5 m 5.7 w 1.2% 16.8 h 16.6 h
INFO 11:12:35,360 ProgressMeter - X:51035366 216.0 13.5 m 6.2 w 1.2% 18.2 h 17.9 h
INFO 11:13:35,445 ProgressMeter - X:51035366 216.0 14.5 m 6.7 w 1.2% 19.5 h 19.3 h
INFO 11:14:39,689 ProgressMeter - X:51035366 216.0 15.6 m 7.2 w 1.2% 20.9 h 20.7 h
INFO 11:15:40,505 ProgressMeter - X:51035366 216.0 16.6 m 7.6 w 1.2% 22.3 h 22.0 h
INFO 11:16:41,140 ProgressMeter - X:51035366 216.0 17.6 m 8.1 w 1.2% 23.7 h 23.4 h
INFO 11:17:41,956 ProgressMeter - X:51035366 216.0 18.6 m 8.6 w 1.2% 25.0 h 24.7 h
INFO 11:18:41,958 ProgressMeter - X:51035366 216.0 19.6 m 9.0 w 1.2% 26.4 h 26.0 h
INFO 11:19:44,493 ProgressMeter - X:51035366 216.0 20.7 m 9.5 w 1.2% 27.8 h 27.4 h
INFO 11:20:49,749 ProgressMeter - X:51035366 216.0 21.8 m 10.0 w 1.2% 29.2 h 28.8 h
INFO 11:21:53,414 ProgressMeter - X:51035366 216.0 22.8 m 10.5 w 1.2% 30.6 h 30.3 h
INFO 11:22:58,174 ProgressMeter - X:51035366 216.0 23.9 m 11.0 w 1.2% 32.1 h 31.7 h
INFO 11:24:01,211 ProgressMeter - X:51035366 216.0 25.0 m 11.5 w 1.2% 33.5 h 33.1 h
INFO 11:25:05,051 ProgressMeter - X:51035366 216.0 26.0 m 12.0 w 1.2% 34.9 h 34.5 h
INFO 11:26:07,782 ProgressMeter - X:51035366 216.0 27.1 m 12.4 w 1.2% 36.3 h 35.9 h
INFO 11:27:10,933 ProgressMeter - X:51035366 216.0 28.1 m 12.9 w 1.2% 37.8 h 37.3 h
INFO 11:28:20,854 ProgressMeter - X:51035366 216.0 29.3 m 13.5 w 1.2% 39.3 h 38.8 h
INFO 11:29:28,165 ProgressMeter - X:51035366 216.0 30.4 m 14.0 w 1.2% 40.8 h 40.3 h
INFO 11:30:28,575 ProgressMeter - X:51035366 216.0 31.4 m 14.4 w 1.2% 42.2 h 41.6 h
INFO 11:31:36,673 ProgressMeter - X:51035366 216.0 32.5 m 14.9 w 1.2% 43.7 h 43.1 h
INFO 11:32:45,497 ProgressMeter - X:51035366 216.0 33.7 m 15.5 w 1.2% 45.2 h 44.7 h
INFO 11:33:49,205 ProgressMeter - X:51035366 216.0 34.8 m 16.0 w 1.2% 46.7 h 46.1 h
INFO 11:34:49,226 ProgressMeter - X:51035366 216.0 35.8 m 16.4 w 1.2% 48.0 h 47.4 h
INFO 11:35:54,571 ProgressMeter - X:51035366 216.0 36.8 m 16.9 w 1.2% 49.5 h 48.8 h
INFO 11:36:59,402 ProgressMeter - X:51035366 216.0 37.9 m 17.4 w 1.2% 50.9 h 50.3 h
INFO 11:38:03,427 ProgressMeter - X:51035366 216.0 39.0 m 17.9 w 1.2% 52.3 h 51.7 h
INFO 11:39:12,036 ProgressMeter - X:51035366 216.0 40.1 m 18.4 w 1.2% 53.9 h 53.2 h
INFO 11:40:15,472 ProgressMeter - X:51035366 216.0 41.2 m 18.9 w 1.2% 55.3 h 54.6 h
INFO 11:41:22,184 ProgressMeter - X:51035366 216.0 42.3 m 19.4 w 1.2% 56.8 h 56.1 h
INFO 11:42:24,992 ProgressMeter - X:51035366 216.0 43.4 m 19.9 w 1.2% 58.2 h 57.5 h
INFO 11:43:30,745 ProgressMeter - X:51035366 216.0 44.4 m 20.4 w 1.2% 59.7 h 58.9 h
INFO 11:44:41,392 ProgressMeter - X:51035366 216.0 45.6 m 21.0 w 1.2% 61.3 h 60.5 h
INFO 11:45:51,136 ProgressMeter - X:51035366 216.0 46.8 m 21.5 w 1.2% 62.8 h 62.0 h
INFO 11:46:59,056 ProgressMeter - X:51035366 216.0 47.9 m 22.0 w 1.2% 64.3 h 63.5 h
INFO 11:48:09,266 ProgressMeter - X:51035366 216.0 49.1 m 22.5 w 1.2% 65.9 h 65.1 h
INFO 11:49:16,701 ProgressMeter - X:51035366 216.0 50.2 m 23.1 w 1.2% 67.4 h 66.6 h
INFO 11:50:24,150 ProgressMeter - X:51035366 216.0 51.3 m 23.6 w 1.2% 68.9 h 68.1 h
INFO 11:51:31,883 ProgressMeter - X:51035366 216.0 52.5 m 24.1 w 1.2% 70.5 h 69.6 h
INFO 11:52:40,234 ProgressMeter - X:51035366 216.0 53.6 m 24.6 w 1.2% 72.0 h 71.1 h
INFO 11:53:46,785 ProgressMeter - X:51035366 216.0 54.7 m 25.1 w 1.2% 73.5 h 72.6 h
INFO 11:54:53,194 ProgressMeter - X:51035366 216.0 55.8 m 25.6 w 1.2% 75.0 h 74.0 h

Here is an example of the GVCF for 3 samples in one of the problem haploid regions:

X 51035345 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:78:2:0,78 .:9:99:3:0,112
X 51035346 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112
X 51035347 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112
X 51035348 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112
X 51035349 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112
X 51035350 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112
X 51035351 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112
X 51035352 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112
X 51035353 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112
X 51035354 . T <NON_REF> . . END=51035355 GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112
X 51035356 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:41:1:0,41 .:9:99:3:0,112
X 51035357 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:40:1:0,40 .:9:99:3:0,112
X 51035358 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:39:1:0,39 .:9:99:3:0,112
X 51035359 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:38:1:0,38 .:9:99:3:0,112
X 51035360 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:41:1:0,41 .:9:99:3:0,112
X 51035361 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:39:1:0,39 .:9:99:3:0,112
X 51035362 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:38:1:0,38 .:9:99:3:0,112
X 51035363 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:38:1:0,38 .:9:99:3:0,112
X 51035364 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:1:38:1:0,38 .:9:99:3:0,112
X 51035365 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112
X 51035366 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112
X 51035367 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112
X 51035368 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112
X 51035369 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112
X 51035370 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112
X 51035371 . A C,<NON_REF> . . DP=246;MQ=60.00 GT:AD:DP:GQ:MIN_DP:PL:SB .:0,4,0:4:99:.:126,0,126:0,0,1,3 .:.:2:72:2:0,72,72 .:.:9:99:3:0,112,112
X 51035372 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112
X 51035373 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112
X 51035374 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:2:45:2:0,45 .:2:72:2:0,72 .:9:99:3:0,112
X 51035375 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:2:45:2:0,45 .:2:72:2:0,72 .:9:99:3:0,112
X 51035376 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:2:45:2:0,45 .:2:54:2:0,54 .:9:99:3:0,112
X 51035377 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:2:45:2:0,45 .:2:71:2:0,71 .:9:99:3:0,112
X 51035378 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:18:99:3:0,105 .:2:71:2:0,71 .:9:99:3:0,112
X 51035379 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:18:99:3:0,105 .:2:81:2:0,81 .:9:99:3:0,112
X 51035380 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:18:99:3:0,105 .:2:78:2:0,78 .:9:99:3:0,112
X 51035381 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:18:99:3:0,105 .:2:45:2:0,45 .:9:99:3:0,112
X 51035382 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:18:99:3:0,105 .:2:45:2:0,45 .:9:99:3:0,112

Any help is greatly appreciated. Please let me know if you need any other information.

Kindest regards,
Chris

Best Answer

Answers

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    +1 I have experienced the same problem with version 3.2 or 3.3 of UG and GenotypeGVCFs.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @cedlund
    Hi Chris,

    Sorry for the late response. Can you tell me if Genotype GVCFs hangs on the same position all the time? There are some areas of chromosome X that are messy, and therefore cause problems in genotyping. We have a list that hopefully will be made public soon that contains these regions.

    -Sheila

  • cedlundcedlund Member

    Hi Sheila,

    Thank you for your reply. I'm running some additional tests to help answer your question and will get back to you soon.

    -Chris

  • cedlundcedlund Member

    Hi Sheila,

    Again, thank you for your reply. My data has 512 short intervals in chrX (non-PAR). The average interval length is 1.9 kb. I previously split these up into 5 batches of ~100 intervals each to process in parallel, and the problem was exhibited for all 5 batches. To investigate the answer to your question, I re-processed the chrX data in 103 batches of ~5 intervals each.

    Of the 103 chrX batches, 12 finished successfully, taking anywhere from 1 minute to 7 hours. Some of the run times for these batches that worked are surprising long, since I was able to successfully run all diploid batches of 100 intervals (20 times as large) in about 4 hours each.

    Another 82 chrX batches hang on a single position and run out of memory after anywhere from 35 minutes to 19 hours (run with -Xmx10g). The remaining 9 chrX batches are still running after 20 hours, and are stuck on a single position. I'm sure they will eventually run out of memory and exit.

    So it seems that not all positions in chrX exhibit the issue, but the majority do. But, I suspect this has less to do with chromosome X, and more to do with using '--ploidy 1' with HaplotypeCaller. The PAR region of chrX run with '--ploidy 2' worked with no issue.

    My best,
    Chris

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @cedlund
    Hi Chris,

    Thank you for going into great detail about this issue. It helps us a lot. It seems like the majority of the batches are hanging on some messy spot. But, you are probably correct the issue is being exacerbated by the -ploidy 1 issue. I have a bug report from another user on the -ploidy 1 issue. Once I process that, I will try to move it up in priority. Unfortunately, the developers are quite busy with other work and bugs, but I will let you know once a fix is in.

    -Sheila

  • cedlundcedlund Member

    Hi Sheila,

    Glad to hear my testing will help you work on a fix. Please let me know if there's anything else I can do to help.

    Do you happen to have a link to the existing --ploidy 1 issue in a bug tracker that I can follow?

    Thank you,
    Chris

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @cedlund
    Hi Chris,

    Sure, I still have to put in the bug report , but you can follow this thread for more information: http://gatkforums.broadinstitute.org/discussion/comment/22487#Comment_22487
    The Github link is there as well.

    -Sheila

  • cedlundcedlund Member

    Thank you, Sheila.

  • vkutscheravkutschera SwedenMember

    Hi,

    I have a similar problem: I'm also following the current DNA-Seq guidelines for joint genotyping in ERC GVCF mode using GATK v3.4-0. I'm analyzing a whole genome resequencing data set of ca. 120 individuals (birds). For each sample, I successfully ran HaplotypeCaller separately for diploid (=autosomes, Z chromosome in males) and haploid (=Z chromosome in females, mtDNA) regions, specifying --ploidy 1 for the haploid regions, resulting in ca. 360 g.vcf files.

    Then I used GenotypeGVCFs to genotype the autosomal scaffolds, the Z chromosomal scaffolds (males and females together) and the mtDNA separately from each other by splitting up the data into the different regions using -L and without specifying --ploidy (I stopped a first run where I tried to genotype everything together as the runtime estimate was about 25 days). I genotyped the autosomal scaffold data successfully in 5 batches of ca. 150-250 Mb using -L, which finished each after ca. 9h with -Xmx112g. Analyses of Z chromosomal scaffolds and mtDNA, however, got stuck at a particular position. For the Z chromosomal scaffolds I used -Xmx112g, for the mtDNA -Xmx56g. In both cases, memory usage slowly increased. At some point both eventually ran out of memory and exited with the message "There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument to adjust the maximum heap size provided to Java".

    Next I tried to run GenotypeGVCFs without -nt, then with lower value for -maxAltAlleles, and finally using CombineGVCFs to generate one batch of g.vcf files per data set before running GenotypeGVCFs (always mtDNA and Z chromosomal scaffolds separately from each other). Also, I tried to genotype the mtDNA with -Xmx112g. Nothing helped, GenotypeGVCFs got stuck roughly around the same position no matter what I tried, and ran out of memory.

    Interestingly, neither of the regions where the analyses got stuck are particularly diverse, i.e. many sites are non-variant and variant sites are mostly diallelic (max. 3 alleles, some indels).

    Since the autosomal scaffolds, for which I used the same parameters (except --ploidy 2 in HaplotypeCaller), did not cause any problems in GenotypeGVCFs, and I found this forum post here, I thought that the ploidy might have caused the problem. So I ran GenotypeGVCFs on the Z chromosomal scaffolds of diploid males (ca. 90 g.vcf files) and haploid females (ca. 30 g.vcf files) separately from each other, using the exact same parameters and memory allocations (-Xmx112g, without -nt). Indeed, the males finished successfully within ca. 10h, but the females got stuck again at a similar position as in previous runs, again with increasing memory usage and eventually running out of memory.

    So, to me it sounds as if setting --ploidy to 1 in HaplotypeCaller has indeed caused the problem. What do you think? Are there any news regarding the bug report you mentioned above, or do you have any additional suggestions?

    Thanks a lot in advance!

    Best regards,
    Verena

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @vkutschera
    Hi Verena,

    Yes, it turns out there is a different piece of code used for genotyping diploids and other ploidies. One of our team members is working on it now. I hope to have an update for you all soon.

    -Sheila

  • vkutscheravkutschera SwedenMember

    Hi Sheila, thanks for this information!

  • jmandelljmandell San Francisco, CAMember

    I would just like to chime in and say that I'm experiencing the same error. On my smaller cohort, I was able to get output from GenotypeGVCFs using --max-alternate-alleles = 3, but on my larger cohorts (400+individuals) even this failed. I'm glad to hear that a fix will be coming soon.

  • rfraserrfraser Guelph, OntarioMember

    +1. Using GenotypeGVCFs across 21 pooled samples of varying ploidies results in the same error. Seems to work ok when run on one sample at a time. Will keep an eye on this thread for the updates - looking forward to the updates.

Sign In or Register to comment.