Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Unusually long runtime of Unified Genotyper?
I recently started my PhD and I'm working with large Illumina datasets (250-300mio Hiseq 150bp paired end reads) of pooled samples (10-12 genomes/pool). After the alignment against a reference genome, InDel realignment and marking of duplicates, I started the variant calling with the Unified Genotyper (command down below). The proposed runtime is 6+ weeks per pool and after talking to the senior bioinformatics scientist of my working group, she said that this is an unusually long runtime and she never had such a runtime even with similar projects of size.
Now to my question, is this runtime due to my setting of the UG, the size of my pools, expacted due to UG or did I do a crucial mistake?
Some technical properties:
ARS1 is the newest and "best" goat reference genome, 29 chromosomes and 29000 unplaced scaffolds, 2.9Gb lenght
I am working on a HPC cluster with SGE as a batchsystem. Depending on the node, ~252Gb of RAM
java -Djava.io.tmpdir=tmp -jar GenomeAnalysisTK/3.7/bin/GenomeAnalysisTK.jar -T UnifiedGenotyper -nt 8 -nct 4 -glm SNP -stand_call_conf 20 -ploidy 24 -out_mode EMIT_VARIANTS_ONLY -R ASR1.fa -I INPUTFILE -o OUTPUTFILE >>LOGFILE 2>&1
Hopefully someone can help me.