We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Unusually long runtime of Unified Genotyper?

Dear GATK-team

I recently started my PhD and I'm working with large Illumina datasets (250-300mio Hiseq 150bp paired end reads) of pooled samples (10-12 genomes/pool). After the alignment against a reference genome, InDel realignment and marking of duplicates, I started the variant calling with the Unified Genotyper (command down below). The proposed runtime is 6+ weeks per pool and after talking to the senior bioinformatics scientist of my working group, she said that this is an unusually long runtime and she never had such a runtime even with similar projects of size.

Now to my question, is this runtime due to my setting of the UG, the size of my pools, expacted due to UG or did I do a crucial mistake?

Some technical properties:

reference genome

ARS1 is the newest and "best" goat reference genome, 29 chromosomes and 29000 unplaced scaffolds, 2.9Gb lenght

working environment

I am working on a HPC cluster with SGE as a batchsystem. Depending on the node, ~252Gb of RAM


java -Djava.io.tmpdir=tmp -jar GenomeAnalysisTK/3.7/bin/GenomeAnalysisTK.jar -T UnifiedGenotyper -nt 8 -nct 4 -glm SNP -stand_call_conf 20 -ploidy 24 -out_mode EMIT_VARIANTS_ONLY -R ASR1.fa -I INPUTFILE -o OUTPUTFILE >>LOGFILE 2>&1

Hopefully someone can help me.

Best Answer


Sign In or Register to comment.