The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
HaplotypeCaller is taking a long time to start

Hello GATK team!
I'm running HalpotypeCaller on two different bam files at the same time (193 and 804 Mb big) that are coordinate sorted and with Read Groups information. The reference (for which I created an index and a dictionary) is the same for the two files and it's 1.8 Gb.
I start my HaplotypeCaller runs 3 days ago (with default values) and they are not progressing. Specifically they are both still at this stage:
INFO 15:26:00,474 HelpFormatter - ----------------------------------------------------------------------------------
INFO 15:26:00,475 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.6-0-g89b7209, Compiled 2016/06/01 22:27:29
INFO 15:26:00,475 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 15:26:00,476 HelpFormatter - For support and documentation go to https://www.broadinstitute.org/gatk
INFO 15:26:00,476 HelpFormatter - [Wed Mar 29 15:26:00 CEST 2017] Executing on Linux 2.6.32-642.6.1.el6.x86_64 amd64
INFO 15:26:00,476 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_111-b15 JdkDeflater
INFO 15:26:00,478 HelpFormatter - Program Args: -I mergedbamfiles_Wdup.bam -R ../reference/GCA_000715055.1_mussel1.0_genomic.fasta -T HaplotypeCaller -o mytilus_Greference_Wdup.vcf -stand_emit_conf 30 -stand_call_conf 30
INFO 15:26:00,481 HelpFormatter - Executing as [email protected] on Linux 2.6.32-642.6.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-b15.
INFO 15:26:00,481 HelpFormatter - Date/Time: 2017/03/29 15:26:00
INFO 15:26:00,481 HelpFormatter - ----------------------------------------------------------------------------------
INFO 15:26:00,481 HelpFormatter - ----------------------------------------------------------------------------------
INFO 15:26:00,494 GenomeAnalysisEngine - Strictness is SILENT
Any idea of why it is not proceding? (I am not using the latest GATK version because I'm currently trying to reproduce some analysis made using an older version).
Thanks a lot,
David
Best Answer
-
Geraldine_VdAuwera Cambridge, MA admin
Is your reference fully assembled or is it a draft with lots of contigs? That would explain what you're seeing.
Answers
Hi Geraldine,
Yes, it's a draft with lots of contigs. I guess I will simply have to be patient.
Thank you for your answer.
David
Note that there are strategies you can use to workaround this problem; one is to combine contigs into pseudocontigs, with stretches of Ns between them (at least as long as the read length) to reduce the number of of separate contigs in the reference.