Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenotypeGVCFs hanging?

I'm trying to call genotypes on ~160 S. cerevisiae genomes by going calling. When I tried to do it on the whole genome with a single command, it would run out of memory (even with 48G provided). Now I'm doing it one chromosome at a time:

gatk-4.0.11.0/gatk --java-options "-Xmx48G" GenotypeGVCFs -R Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fasta -V combined_variants.vcf.gz -O called_genotypes_II.vcf.gz -ploidy 1 -L II

It initially appears to make progress, for the first 25 minutes, but I've had no console activity for the last 90 minutes. Looking at the CPU monitor, it just shows periodic spikes, but not sustained activity. I even tried it over night, with exactly the same behavior - i.e. 25 minutes of console updates then nothing. Chromosome I worked fine, but II never completes. Are there other similar reports?

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Gavin_Sherlock

    I am looking into this issue, I will have an update for you by next week. Given the holiday week we are backed up on our end, but i will definitely get to this by next week.

    Regards
    Bhanu

  • Gavin_SherlockGavin_Sherlock StanfordMember

    Thanks - I changed --max-alternate-alleles to be 2, rather than the default 6, and that made it run fairly rapidly. I understand I may lose some sites, but as these are evolved clones, where I don't expect the exact same nucleotide to be mutated very frequently to different alternate alleles, that's probably ok.

Sign In or Register to comment.