Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
GenotypeGVCFs: running out of memory with >50 samples
I used the HaplotypeCaller with "-dcov 500 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000" to produce 60 gvcf files, that worked fine.
However, GenotypeGVCFs gets stuck on a position and runs out of memory after about 24hours, even when I allocate 240Gb. Testing a short region of 60kb does not help. Here was my command line:
software/jre1.7.0_25/bin/java -Xmx240g -jar GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T GenotypeGVCFs -R Reference.fasta -L chrom14:2240000-2300000 --variant 60samples_gvcf.list -o output.vcf
If I split my list of 60 gvcf files into two lists of 30 samples each, GenotypeGVCFs works fine for both batches within 15 minutes (~10Gb of memory).
I tested with 47 samples, it took 8 hours (31gb of memory) for a 60kb region. Once I use more than ~55 samples, it takes forever and crashes.
Any help will be much appreciated!