Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenotypeGVCFs gets stuck

blueskypyblueskypy Member ✭✭
edited September 2015 in Ask the GATK team

hi,
I have a human WES project of ~700 patients that was processed using GATK v3.1. Now I'd like to reprocess the sex chromosomes using v3.4 with the correct ploidy.
Here are my steps:
for each male
HaplotypeCaller -L X -L Y -L MT -ploidy 1 -I boy.i.bam -o boy.i.XYMT.g.vcf.gz
for each female
HaplotypeCaller -L X -I girl.i.bam -o girl.i.X.g.vcf.gz HaplotypeCaller -L MT -ploidy 1 -I girl.i.bam -o girl.i.MT.g.vcf.gz

then,
CombineGVCFs -v boys.XYMT.gvcf.list -o boys.XYMT.g.vcf CombineGVCFs -v girls.X.gvcf.list -o girls.X.g.vcf CombineGVCFs -v girls.MT.gvcf.list -o girls.MT.g.vcf GenotypeGVCFs -v boys.XYMT.g.vcf -v girls.X.g.vcf -v girls.MT.g.vcf -o all.XYMT.vcf.gz

Now the problem is the GenotypeGVCFs has been running for 12 hours and is using 95G of vmem and the all.XYMT.vcf.gz is only 382K so far. Here is part of the log. Seems it gets stuck at X:180201. But I checked X:180201 at boys.XYMT.g.vcf and girls.X.g.vcf, they look fine.

INFO 00:20:11,428 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 00:20:11,430 HelpFormatter - The Genome Analysis Toolkit (GATK) v2015.1-3.4.0-1-ga5ca3fc, Compiled 2015/06/07 19:28:31
INFO 00:20:11,430 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 00:20:11,430 HelpFormatter - For support and documentation go to http://gatkdocs.appistry.com/
INFO 00:20:11,433 HelpFormatter - Program Args: -T GenotypeGVCFs -R broad_bundle_v2.8/human_g1k_v37.fasta --variant all.XYMT.list -o all.XYMT.raw.var.vcf.gz
INFO 00:20:11,439 HelpFormatter - Executing on Linux 2.6.32-504.16.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_75-b13.
INFO 00:20:11,439 HelpFormatter - Date/Time: 2015/09/15 00:20:11
INFO 00:20:11,439 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 00:20:11,439 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 00:20:12,219 GenomeAnalysisEngine - Strictness is SILENT
INFO 00:20:12,493 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
WARN 00:20:12,778 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 00:20:12,778 IndexDictionaryUtils - Track variant2 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 00:20:12,778 IndexDictionaryUtils - Track variant3 doesn't have a sequence dictionary built in, skipping dictionary validation
INFO 00:20:13,008 GenomeAnalysisEngine - Preparing for traversal
INFO 00:20:13,021 GenomeAnalysisEngine - Done preparing for traversal
INFO 00:20:13,022 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 00:20:13,022 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 00:20:13,022 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 00:20:13,223 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files
INFO 00:20:43,025 ProgressMeter - X:80301 0.0 30.0 s 49.6 w 92.9% 32.0 s 2.0 s
INFO 00:21:13,026 ProgressMeter - X:158401 0.0 60.0 s 99.2 w 92.9% 64.0 s 4.0 s
INFO 00:21:43,028 ProgressMeter - X:174401 0.0 90.0 s 148.8 w 92.9% 96.0 s 6.0 s
INFO 00:22:13,029 ProgressMeter - X:180201 0.0 120.0 s 198.4 w 92.9% 2.2 m 9.0 s
INFO 00:22:43,031 ProgressMeter - X:180201 0.0 2.5 m 248.0 w 92.9% 2.7 m 11.0 s

..............skip lines here

INFO 12:32:37,786 ProgressMeter - X:180201 0.0 12.2 h 15250.3 w 92.9% 13.1 h 56.1 m
INFO 12:33:41,510 ProgressMeter - X:180201 0.0 12.2 h 15250.3 w 92.9% 13.2 h 56.2 m
INFO 12:35:19,787 ProgressMeter - X:180201 0.0 12.3 h 15250.3 w 92.9% 13.2 h 56.3 m
INFO 12:36:19,788 ProgressMeter - X:180201 0.0 12.3 h 15250.3 w 92.9% 13.2 h 56.4 m

Answers

  • blueskypyblueskypy Member ✭✭

    The program just crashed after running for 17 hours using vmem at 95G. Here is part of the error msg:

    INFO 17:24:49,527 ProgressMeter - X:180201 0.0 17.1 h 15250.3 w 92.9% 18.4 h 78.4 m
    INFO 17:26:06,172 ProgressMeter - X:180201 0.0 17.1 h 15250.3 w 92.9% 18.4 h 78.5 m

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 2015.1-3.4.0-1-ga5ca3fc):
    ERROR
    ERROR This means that one or more arguments or inputs in your command are incorrect.
    ERROR The error message below tells you what is the problem.
    ERROR
    ERROR If the problem is an invalid argument, please check the online documentation guide
    ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ERROR
    ERROR Visit our website for extensive documentation and answers to
    ERROR commonly asked questions http://gatkdocs.appistry.com/
    ERROR
    ERROR MESSAGE: There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument

    to adjust the maximum heap size provided to Java

    ERROR ------------------------------------------------------------------------------------------

    could anyone please help? I got a tight timeline on this task.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @blueskypy
    Hi,

    You can always try providing more memory, but unfortunately, this is an issue we are aware of. The developers are working on a fix. http://gatkforums.broadinstitute.org/discussion/5585/genotypegvcfs-running-out-of-memory-with-50-samples#latest

    -Sheila

Sign In or Register to comment.