The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

# UnifiedGenotyper gets hung waiting for file lock

Member Posts: 6
edited August 2012

We have a SGE environment where we run our GATK jobs. Lately we have had a few cases of UnifiedGenotyper getting hung up while trying to read our previously created .fai and .dict reference files. On one occasion we had 6 jobs on different nodes all hung for about 5 days before finally giving the FSLockWithShared - WARNING messages and then continuing on just fine. We haven't been able to find out why this is happening. Do you know of anything that would cause UnifiedGenotyper to try and read those files before ultimately throwing the warning messages 5 days later?

We looked into the possibility of other users/jobs having an exclusive lock on those reference files, but we had other UnifiedGenotyper jobs run fine during that same time period without these problems.

We also haven't been able to find any hardware problems yet. The jobs with this problem were on multiple nodes, so I thought I would see if you have any ideas.

Using strace we were able to find that all 6 of the UnifiedGenotyper processes were hung in the same spot:

$strace -p 22924 Process 22924 attached - interrupt to quit futex(0x41c319d0, FUTEX_WAIT, 22925, NULL <unfinished ...>$ strace -p 22925
Process 22925 attached - interrupt to quit
write(1, "WARN  12:39:09,312 FSLockWithSha"..., 158) = 158
write(1, "INFO  12:39:09,364 ReferenceData"..., 116) = 116
write(1, "INFO  12:39:09,364 ReferenceData"..., 89) = 89
open("/reference/sequence/human/ncbi/37.1/allchr.fa.fai", O_RDONLY) = 16
fstat(16, {st_mode=S_IFREG|0770, st_size=2997, ...}) = 0
fstat(16, {st_mode=S_IFREG|0770, st_size=2997, ...}) = 0
fcntl(16, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=0, len=9223372036854775807} <unfinished ...>


Here is the log file for one of the jobs:

Thu Jul 19 09:42:53 CDT 2012
INFO  09:42:58,047 HelpFormatter - --------------------------------------------------------------------------------
INFO  09:42:58,048 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.6-7-g2be5704, Compiled 2012/05/25 16:27:30
INFO  09:42:58,048 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
INFO  09:42:58,049 HelpFormatter - Program Args: -R /reference/sequence/human/ncbi/37.1/allchr.fa -et NO_ET -K /p
rojects/bsi/bictools/apps/alignment/GenomeAnalysisTK/1.6-7-g2be5704//key -T UnifiedGenotyper --output
_mode EMIT_ALL_SITES --min_base_quality_score 20 -nt 4 --max_alternate_alleles 5 -glm BOTH -L /data2/bsi/target.bed -I /data2/bsi/H-001.chr22-sorted.bam --out /data2/bsi/variants.chr22.raw.all.vcf
INFO  09:42:58,050 HelpFormatter - Date/Time: 2012/07/19 09:42:58
INFO  09:42:58,050 HelpFormatter - --------------------------------------------------------------------------------
INFO  09:42:58,050 HelpFormatter - --------------------------------------------------------------------------------
INFO  09:42:58,166 GenomeAnalysisEngine - Strictness is SILENT
WARN  12:39:09,312 FSLockWithShared - WARNING: Unable to lock file /reference/sequence/human/ncbi/37.1/allchr.dict: Protocol family not supported.
INFO  12:39:09,364 ReferenceDataSource - Unable to create a lock on dictionary file: Protocol family not supported
INFO  12:39:09,364 ReferenceDataSource - Treating existing dictionary file as complete.
WARN  12:39:12,527 FSLockWithShared - WARNING: Unable to lock file /reference/sequence/human/ncbi/37.1/allchr.fa.fai: Protocol family not supported.
INFO  12:39:12,528 ReferenceDataSource - Unable to create a lock on index file: Protocol family not supported
INFO  12:39:12,528 ReferenceDataSource - Treating existing index file as complete.
INFO  12:39:12,663 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 12:39:12,954 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.29
INFO  12:39:13,108 MicroScheduler - Running the GATK in parallel mode with 4 concurrent threads
WARN  12:39:13,864 UnifiedGenotyper - WARNING: note that the EMIT_ALL_SITES option is intended only for point mutations (SNPs) in DISCOVERY mode or generally when running in GENOTYPE_GIVEN_ALLELES mode; it will by no means produce a comprehensive set of indels in DISCOVERY mode
INFO  12:39:14,361 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 12:39:14,568 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.21
INFO  12:39:14,569 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 12:39:14,627 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.06
INFO  12:39:14,628 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 12:39:14,686 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.06
INFO  12:39:15,123 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
...
Tue Jul 24 12:47:32 CDT 2012

Tagged:

• Posts: 1

I meet the same problem,and hope someone can answer this problem as soon as possible.Thank you!

• Member Posts: 2

Same problem here.

@yluo, what version of GATK are you using?

Geraldine Van der Auwera, PhD

• Member Posts: 33

I get a similar error and not sure what this is about
ERROR MESSAGE: Timeout of 30000 milliseconds was reached while trying to acquire a lock on file /scratch/vyellapantula/MMRF_vcf/MMRF_xxxx_1_PB_Whole_C2_KAS5U_L00653_merged.vcf.idx. Since the GATK uses non-blocking lock acquisition calls that are not supposed to wait, this implies a problem with the file locking support in your operating system.

@vyellapa, that refers to how the GATK interacts with your operating system when it wants to access a file. In some rare cases the protocols used by the GATK and by the operating system are not compatible. If you are using an older version of GATK, try upgrading to a newer version. If that doesn't work, you may need to work with a different OS or different computer.

Geraldine Van der Auwera, PhD

• Member Posts: 33
edited August 2013

Geraldine, This problem started after I upgraded to the latest release downloaded yesterday. I system is currently running on CentOS 6.2. Is there a certain flavor of linux that GATK recommends?
Thank you - Teja

• Member Posts: 33

The error had something to do with vcf header being improper and fixing it fixed the problem. I also have to mention that the error occurred using the variantAnnotator. In the example code below "MMRF_xxxx_1_PB_Whole_C2_KAS5U_L00653_merged.vcf" was the improper vcf.

java -Xmx12g -jar ~/local/bin/GenomeAnalysisTK.jar -R /scratch/tgenref/pipeline_v0.3/genome_fasta/hs37d5.fa -T VariantAnnotator -nt 12 -o MMRF_xxxx_1_PB_Whole_C2_KAS5U_L00653_merged_allDB.vcf --variant MMRF_xxxx_1_PB_Whole_C2_KAS5U_L00653_merged.vcf --dbsnp dbsnp_137.b37.vcf --comp:NHLBI /scratch/ref/pipeline_v0.3/nhlbi/ESP6500SI-V2_snps_indels.vcf --comp:1000G /scratch/ref/pipeline_v0.3/gatk_bundle_2.5/b37/1000G_phase1.snps.high_confidence.b37.vcf --comp:COSMIC /scratch/tgenref/pipeline_v0.3/cosmic/CosmicCodingMuts_v66_20130725_sorted.vcf

Thanks for reporting your solution, Teja. I'm glad this turned out to be a trivial issue after all.

Geraldine Van der Auwera, PhD

• Member Posts: 10

We've also used GATK in a SGE environment for more than 2 years now. We recently upgraded to 3.1-1-g07a4bf8 and since then the following new error message occurs regularly when running IndelRealigner or HaplotypeCaller with the use of vcf's from the GATK resource bundle:

##### ERROR MESSAGE: Timeout of 30000 milliseconds was reached while trying to acquire a lock on file [CLIPPED]/1000G_omni2.5.b37.vcf.idx. Since the GATK uses non-blocking lock acquisition calls that are not supposed to wait, this implies a problem with the file locking support in your operating system.

In an attempt to bypass the problem, we already provided each process with a private copy of the vcf.idx file. So at least this file should never be locked before the process tries to access it? The vcf files are not copied (too large), but linked, so they are still shared between processes. However the problem seems to be the vcf.idx file?

Is this problem related to the issue highlighted in the Bug bulletin in the header of the GATK support forum: "we have identified a bug that affects indexing when producing gzipped VCFs" ? And will this likewise be solved in the upcoming 3.2 release?

This is the IndelRealigner command that triggered the file lock error:
[CLIPPED]/jdk/1.7.0/bin/java -Xmx8G -XX:-UsePerfData -XX:-UseParallelGC -Djava.io.tmpdir=[CLIPPED]/tmp -jar [CLIPPED]/apps/gatk/3.1.1/GenomeAnalysisTK.jar -et NO_ET -K [CLIPPED].key -nt 1 -nct 1 -L 8:1-146364022 -I [CLIPPED].bam -R [CLIPPED]/gatk/broad_bundle_b37_v2.2/human_g1k_v37.fasta -T IndelRealigner -targetIntervals [CLIPPED].intervals -o [CLIPPED].bam -known [CLIPPED]/tmp/8/Mills_and_1000G_gold_standard.indels.b37.vcf -known [CLIPPED]/tmp/8/1000G_omni2.5.b37.vcf --filter_mismatching_base_and_quals

Hi @lucdh, I don't think this would be related to the indexing bug. That bug just affects the format used for writing out new index files. Whereas in the job you're running with that command, GATK only reads the idx file, no writing involved. I'll ask the software engineers if they have any idea why this is suddenly happening. Could you maybe try a run on the same files with an older version? This just to make sure that it is linked to GATK 3.1, as opposed to your servers coincidentally having developed an issue recently.

Geraldine Van der Auwera, PhD

• Member Posts: 10

Hi Geraldine, thanks for forwarding our question to the GATK engineers. From my perspecitive it does look like the issue is GATK version related. We are testing GATK 3.1.1 in the context of updating our analysis pipeline. In parallel we are still running the bulk of our analyses with the well established GATK 2.4.9 based pipeline on the same servers, accessing the same idx files. So far we have only seen the file lock problem pop up occasionally with GATK 3.1.1. We switched from UnifiedGenotyper to HaplotypeCaller but otherwise the steps (e.g. IndelRealigner) and options from GATK 2.4.9 are kept in GATK 3.1.1
--luc

• USAMember Posts: 4

Hi,

I'm having a similar problem when using the GenotypeGVCFs tool (GenomeAnalysisTK-3.1-1) in the computer facilities of my institution, which runs CENTOS 6.5. For all the files I received this warning,

WARN 10:11:45,161 FSLockWithShared\$LockAcquisitionTask - WARNING: Unable to lock file /staging/dh3/perezaa/gVCF4recal/51x35_001_m.gVCF.idx because an IOException occurred with message: Function not implemented.
INFO 10:11:45,163 RMDTrackBuilder - Could not acquire a shared lock on index file /staging/dh3/perezaa/gVCF4recal/51x35_001_m.gVCF.idx, falling back to using an in-memory index for this GATK run.

and the job didn't finished after 24 hours nor produced any output.

Interestingly, when I ran some of the gVCF files on my own laptop (same GATK version, Ubuntu 12.04) it worked smoothly.

Maybe something related to permissions?

Sorry for the delayed response. It turns out that this is a problem with OS-level file locking support in some environments. We ran into this at the Broad, which is why the devs added the check. There is a hidden argument called --disableAutoIndexCreationAndLockingWhenReadingRods that disables index auto-creation and related file locking when reading vcfs. If all index files are pre-existing, and no concurrent processes will ever update any of the indices, it should be safe to use this argument.

Geraldine Van der Auwera, PhD

• USAMember Posts: 4

Thank you for the response Geraldine.

Unfortunately, it is not working for me. I tried with the GenotypeGVCFs tool (both GenomeAnalysisTK-3.1-1 and GenomeAnalysisTK-nightly-2014-05-19-g090253a) and a error message telling that the argument isn't defined is generated.

• USAMember Posts: 4

Thanks Geraldine. That completely solved the problem for me.

• Member Posts: 10

Hi Geraldine, it looks like you solved the problem for me as well.
thanks!

• Member Posts: 7

This solved a problem I was having when running multiple mutect 1.1.7 in parallel using one cosmic.vcf(.idx) file.

• Member Posts: 15
edited July 2015

I initially was having this problem but now that I have added the argument and resolved file locking errors. I just noticed my grid has only 1024 file open handles available. I guess this is my problem? Although it has worked but I am guessing it depends on the status of our grid?

• Member Posts: 15

@Sheila
apparently getting me more file handles was going to be difficult, but I did discover using less threads with higher memory gave me acceptable performance. I think more memory is better than more threads. Thank you.

@nchuang
Thank you for posting your finding.

• St. LouisMember Posts: 28
edited October 2015

Any advice on how to add the --disable_auto_index_creation_and_locking_when_reading_rods option when using Queue?

I've tried adding it as a command line option when Java is run, but it comes back as an unrecognized option.
org.broadinstitute.gatk.utils.commandline.InvalidArgumentException:
Argument with name 'disable_auto_index_creation_and_locking_when_reading_rods' isn't defined.

I tried adding it to my .scala file (using the variable name disableAutoIndexCreationAndLockingWhenReadingRods), both in the CommonArguments and within the script as an argument to GenotypeGVCFs, but that produces a "not a member" error:
ERROR 10:45:42,287 QScriptManager - GenotypeGVCFsScatter.scala:23: value disableAutoIndexCreationAndLockingWhenReadingRods is not a member of GenotypeGVCFs.this.CommonArguments
ERROR 10:45:42,289 QScriptManager - this.disableAutoIndexCreationAndLockingWhenReadingRods = true

I have a feeling I just need to import something else at the top of the .scala file, but I have no idea what it would be.

EDIT: Nevermind, realized I have to use the shortName, not the variable name when working with CommonArguments. So it's
this.disable_auto_index_creation_and_locking_when_reading_rods = true