Attention:
The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

GenotypeGVCFs 8 weeks runtime

esticcaesticca New YorkMember

I'm working on the last step of our lab's well established variant calling pipeline, running GATK GenotypeGVCFs on 4392 whole exome sequenced individuals. In the past I haven't had any problems with this sort of thing, but on this last run the job would be killed on the supercomputer cluster for using too much memory. Now it appears that even with allocating 16 threads and 64 GB of memory the log file predicts nearly 8 weeks of runtime remaining! I am using GATK 3.3 with the following arguments:

-T GenotypeGVCFs -R /projects/resources/Homo_sapiens_assembly19.fasta --variant /projects/combinedgvcfs/combined_gvcfs.list --dbsnp /projects/resources/gatk_bundle/dbsnp_138.b37.vcf -o /04_15_2016/genotype_gvcfs/04_15_2016_raw.vcf -log /04_15_2016/genotype_gvcfs/04_15_2016_raw.log -L /projects/resources/bed.and.interval.files/b37_refseqplus50_clean.bed -nt 16 --max_alternate_alleles 6

If anyone has any ideas please let me know because in the past this would take no more than 72 hours to run to completion. Let me know if I can provide any additional information to help.

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @esticca
    Hi,

    Did you try combining your GVCFs with CombineGVCFs first? You can try combining 10-20 GVCFs at a time then run GenotypeGVCFs on those combined GVCFs.

    -Sheila

  • esticcaesticca New YorkMember

    Hi Sheila,

    I should have specified but yes, the 4392 samples were combined into 29 gvcfs with no more than 200 individuals per gvcf as per CombineGVCFs standards. I combined them corresponding to each sequencing run that the samples were sequenced on, thus resulting in 29 gvcfs being passed into GenotypeGVCFs.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Can you try running on subsets of the GVCFs to see if you can pinpoint the problem to a particular set of files?

Sign In or Register to comment.