Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

UnifiedGenotyper gets hung waiting for file lock

jaredmejaredme Posts: 2Member
edited August 2012 in Ask the GATK team

We have a SGE environment where we run our GATK jobs. Lately we have had a few cases of UnifiedGenotyper getting hung up while trying to read our previously created .fai and .dict reference files. On one occasion we had 6 jobs on different nodes all hung for about 5 days before finally giving the FSLockWithShared - WARNING messages and then continuing on just fine. We haven't been able to find out why this is happening. Do you know of anything that would cause UnifiedGenotyper to try and read those files before ultimately throwing the warning messages 5 days later?

We looked into the possibility of other users/jobs having an exclusive lock on those reference files, but we had other UnifiedGenotyper jobs run fine during that same time period without these problems.

We also haven't been able to find any hardware problems yet. The jobs with this problem were on multiple nodes, so I thought I would see if you have any ideas.

Thanks for your help!

Using strace we were able to find that all 6 of the UnifiedGenotyper processes were hung in the same spot:

$ strace -p 22924
Process 22924 attached - interrupt to quit
futex(0x41c319d0, FUTEX_WAIT, 22925, NULL <unfinished ...>

$ strace -p 22925
Process 22925 attached - interrupt to quit
write(1, "WARN  12:39:09,312 FSLockWithSha"..., 158) = 158
write(1, "INFO  12:39:09,364 ReferenceData"..., 116) = 116
write(1, "INFO  12:39:09,364 ReferenceData"..., 89) = 89
open("/reference/sequence/human/ncbi/37.1/allchr.fa.fai", O_RDONLY) = 16
fstat(16, {st_mode=S_IFREG|0770, st_size=2997, ...}) = 0
fstat(16, {st_mode=S_IFREG|0770, st_size=2997, ...}) = 0
fcntl(16, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=0, len=9223372036854775807} <unfinished ...>

Here is the log file for one of the jobs:

Thu Jul 19 09:42:53 CDT 2012
INFO  09:42:58,047 HelpFormatter - --------------------------------------------------------------------------------
INFO  09:42:58,048 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.6-7-g2be5704, Compiled 2012/05/25 16:27:30
INFO  09:42:58,048 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO  09:42:58,048 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
INFO  09:42:58,048 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
INFO  09:42:58,049 HelpFormatter - Program Args: -R /reference/sequence/human/ncbi/37.1/allchr.fa -et NO_ET -K /p
rojects/bsi/bictools/apps/alignment/GenomeAnalysisTK/1.6-7-g2be5704//key -T UnifiedGenotyper --output
_mode EMIT_ALL_SITES --min_base_quality_score 20 -nt 4 --max_alternate_alleles 5 -glm BOTH -L /data2/bsi/target.bed -I /data2/bsi/H-001.chr22-sorted.bam --out /data2/bsi/variants.chr22.raw.all.vcf
INFO  09:42:58,050 HelpFormatter - Date/Time: 2012/07/19 09:42:58
INFO  09:42:58,050 HelpFormatter - --------------------------------------------------------------------------------
INFO  09:42:58,050 HelpFormatter - --------------------------------------------------------------------------------
INFO  09:42:58,166 GenomeAnalysisEngine - Strictness is SILENT
WARN  12:39:09,312 FSLockWithShared - WARNING: Unable to lock file /reference/sequence/human/ncbi/37.1/allchr.dict: Protocol family not supported.
INFO  12:39:09,364 ReferenceDataSource - Unable to create a lock on dictionary file: Protocol family not supported
INFO  12:39:09,364 ReferenceDataSource - Treating existing dictionary file as complete.
WARN  12:39:12,527 FSLockWithShared - WARNING: Unable to lock file /reference/sequence/human/ncbi/37.1/allchr.fa.fai: Protocol family not supported.
INFO  12:39:12,528 ReferenceDataSource - Unable to create a lock on index file: Protocol family not supported
INFO  12:39:12,528 ReferenceDataSource - Treating existing index file as complete.
INFO  12:39:12,663 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  12:39:12,954 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.29
INFO  12:39:13,108 MicroScheduler - Running the GATK in parallel mode with 4 concurrent threads
WARN  12:39:13,864 UnifiedGenotyper - WARNING: note that the EMIT_ALL_SITES option is intended only for point mutations (SNPs) in DISCOVERY mode or generally when running in GENOTYPE_GIVEN_ALLELES mode; it will by no means produce a comprehensive set of indels in DISCOVERY mode
INFO  12:39:14,361 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  12:39:14,568 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.21
INFO  12:39:14,569 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  12:39:14,627 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.06
INFO  12:39:14,628 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  12:39:14,686 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.06
INFO  12:39:15,123 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
...
Tue Jul 24 12:47:32 CDT 2012
Post edited by Geraldine_VdAuwera on

Answers

  • yuanziyuanzi Posts: 1

    I meet the same problem,and hope someone can answer this problem as soon as possible.Thank you!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    @yluo, what version of GATK are you using?

    Geraldine Van der Auwera, PhD

  • vyellapavyellapa Posts: 29Member

    I get a similar error and not sure what this is about ERROR MESSAGE: Timeout of 30000 milliseconds was reached while trying to acquire a lock on file /scratch/vyellapantula/MMRF_vcf/MMRF_xxxx_1_PB_Whole_C2_KAS5U_L00653_merged.vcf.idx. Since the GATK uses non-blocking lock acquisition calls that are not supposed to wait, this implies a problem with the file locking support in your operating system.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    @vyellapa, that refers to how the GATK interacts with your operating system when it wants to access a file. In some rare cases the protocols used by the GATK and by the operating system are not compatible. If you are using an older version of GATK, try upgrading to a newer version. If that doesn't work, you may need to work with a different OS or different computer.

    Geraldine Van der Auwera, PhD

  • vyellapavyellapa Posts: 29Member
    edited August 2013

    Geraldine, This problem started after I upgraded to the latest release downloaded yesterday. I system is currently running on CentOS 6.2. Is there a certain flavor of linux that GATK recommends? Thank you - Teja

    Post edited by vyellapa on
  • vyellapavyellapa Posts: 29Member

    The error had something to do with vcf header being improper and fixing it fixed the problem. I also have to mention that the error occurred using the variantAnnotator. In the example code below "MMRF_xxxx_1_PB_Whole_C2_KAS5U_L00653_merged.vcf" was the improper vcf.

    java -Xmx12g -jar ~/local/bin/GenomeAnalysisTK.jar -R /scratch/tgenref/pipeline_v0.3/genome_fasta/hs37d5.fa -T VariantAnnotator -nt 12 -o MMRF_xxxx_1_PB_Whole_C2_KAS5U_L00653_merged_allDB.vcf --variant MMRF_xxxx_1_PB_Whole_C2_KAS5U_L00653_merged.vcf --dbsnp dbsnp_137.b37.vcf --comp:NHLBI /scratch/ref/pipeline_v0.3/nhlbi/ESP6500SI-V2_snps_indels.vcf --comp:1000G /scratch/ref/pipeline_v0.3/gatk_bundle_2.5/b37/1000G_phase1.snps.high_confidence.b37.vcf --comp:COSMIC /scratch/tgenref/pipeline_v0.3/cosmic/CosmicCodingMuts_v66_20130725_sorted.vcf

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    Thanks for reporting your solution, Teja. I'm glad this turned out to be a trivial issue after all.

    Geraldine Van der Auwera, PhD

  • lucdhlucdh Posts: 10Member

    We've also used GATK in a SGE environment for more than 2 years now. We recently upgraded to 3.1-1-g07a4bf8 and since then the following new error message occurs regularly when running IndelRealigner or HaplotypeCaller with the use of vcf's from the GATK resource bundle:

    ERROR MESSAGE: Timeout of 30000 milliseconds was reached while trying to acquire a lock on file [CLIPPED]/1000G_omni2.5.b37.vcf.idx. Since the GATK uses non-blocking lock acquisition calls that are not supposed to wait, this implies a problem with the file locking support in your operating system.

    In an attempt to bypass the problem, we already provided each process with a private copy of the vcf.idx file. So at least this file should never be locked before the process tries to access it? The vcf files are not copied (too large), but linked, so they are still shared between processes. However the problem seems to be the vcf.idx file?

    Is this problem related to the issue highlighted in the Bug bulletin in the header of the GATK support forum: "we have identified a bug that affects indexing when producing gzipped VCFs" ? And will this likewise be solved in the upcoming 3.2 release?

    This is the IndelRealigner command that triggered the file lock error: [CLIPPED]/jdk/1.7.0/bin/java -Xmx8G -XX:-UsePerfData -XX:-UseParallelGC -Djava.io.tmpdir=[CLIPPED]/tmp -jar [CLIPPED]/apps/gatk/3.1.1/GenomeAnalysisTK.jar -et NO_ET -K [CLIPPED].key -nt 1 -nct 1 -L 8:1-146364022 -I [CLIPPED].bam -R [CLIPPED]/gatk/broad_bundle_b37_v2.2/human_g1k_v37.fasta -T IndelRealigner -targetIntervals [CLIPPED].intervals -o [CLIPPED].bam -known [CLIPPED]/tmp/8/Mills_and_1000G_gold_standard.indels.b37.vcf -known [CLIPPED]/tmp/8/1000G_omni2.5.b37.vcf --filter_mismatching_base_and_quals

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    Hi @lucdh, I don't think this would be related to the indexing bug. That bug just affects the format used for writing out new index files. Whereas in the job you're running with that command, GATK only reads the idx file, no writing involved. I'll ask the software engineers if they have any idea why this is suddenly happening. Could you maybe try a run on the same files with an older version? This just to make sure that it is linked to GATK 3.1, as opposed to your servers coincidentally having developed an issue recently.

    Geraldine Van der Auwera, PhD

  • lucdhlucdh Posts: 10Member

    Hi Geraldine, thanks for forwarding our question to the GATK engineers. From my perspecitive it does look like the issue is GATK version related. We are testing GATK 3.1.1 in the context of updating our analysis pipeline. In parallel we are still running the bulk of our analyses with the well established GATK 2.4.9 based pipeline on the same servers, accessing the same idx files. So far we have only seen the file lock problem pop up occasionally with GATK 3.1.1. We switched from UnifiedGenotyper to HaplotypeCaller but otherwise the steps (e.g. IndelRealigner) and options from GATK 2.4.9 are kept in GATK 3.1.1 --luc

  • albertoapalbertoap USAPosts: 4Member

    Hi,

    I'm having a similar problem when using the GenotypeGVCFs tool (GenomeAnalysisTK-3.1-1) in the computer facilities of my institution, which runs CENTOS 6.5. For all the files I received this warning,

    WARN 10:11:45,161 FSLockWithShared$LockAcquisitionTask - WARNING: Unable to lock file /staging/dh3/perezaa/gVCF4recal/51x35_001_m.gVCF.idx because an IOException occurred with message: Function not implemented. INFO 10:11:45,163 RMDTrackBuilder - Could not acquire a shared lock on index file /staging/dh3/perezaa/gVCF4recal/51x35_001_m.gVCF.idx, falling back to using an in-memory index for this GATK run.

    and the job didn't finished after 24 hours nor produced any output.

    Interestingly, when I ran some of the gVCF files on my own laptop (same GATK version, Ubuntu 12.04) it worked smoothly.

    Maybe something related to permissions?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    @lucdh and @albertoap

    Sorry for the delayed response. It turns out that this is a problem with OS-level file locking support in some environments. We ran into this at the Broad, which is why the devs added the check. There is a hidden argument called --disableAutoIndexCreationAndLockingWhenReadingRods that disables index auto-creation and related file locking when reading vcfs. If all index files are pre-existing, and no concurrent processes will ever update any of the indices, it should be safe to use this argument.

    Geraldine Van der Auwera, PhD

  • albertoapalbertoap USAPosts: 4Member

    Thank you for the response Geraldine.

    Unfortunately, it is not working for me. I tried with the GenotypeGVCFs tool (both GenomeAnalysisTK-3.1-1 and GenomeAnalysisTK-nightly-2014-05-19-g090253a) and a error message telling that the argument isn't defined is generated.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    Ack, sorry @albertoap‌, I gave you the variable name instead of the argument name. Please try again using --disable_auto_index_creation_and_locking_when_reading_rods, that should do the trick.

    Geraldine Van der Auwera, PhD

  • albertoapalbertoap USAPosts: 4Member

    Thanks Geraldine. That completely solved the problem for me.

  • lucdhlucdh Posts: 10Member

    Hi Geraldine, it looks like you solved the problem for me as well. thanks!

Sign In or Register to comment.