Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

CNVDiscovery Stage 7 errors

Dear all

We have run into a problem while running CNVDiscovery. The errors are occurring during Stage 7. An example of the type of error can be seen below:

ERROR 00:13:56,742 FunctionEdge - Error: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCT\ imeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/home/lcottino/attempt8/ALL/.queue/tmp' '-cp' '/opt/exp_soft/bio\ inf/svtoolkit/lib/SVToolkit.jar:/opt/exp_soft/bioinf/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/opt/exp_soft/bioinf/svtoolkit\ /lib/gatk/Queue.jar' '-cp' '/opt/exp_soft/bioinf/svtoolkit/lib/SVToolkit.jar:/opt/exp_soft/bioinf/svtoolkit/lib/gatk/Genom\ eAnalysisTK.jar:/opt/exp_soft/bioinf/svtoolkit/lib/gatk/Queue.jar' 'org.broadinstitute.sv.genotyping.RefineCNVBoundaries' \ '-I' '/home/lcottino/attempt8/ALL/Run1/cnv_stage6/seq_7/seq_7.merged_headers.bam' '-O' '/home/lcottino/attempt8/ALL/Run1/\ cnv_stage7/seq_7/P0134/seq_7.merged.brig.vcf' '-R' '/home/lcottino/references/Homo_sapiens_assembly19/Homo_sapiens_assembl\ y19.fasta' '-md' '/home/lcottino/attempt8/ALL/output_metadata_directory' '-configFile' '/opt/exp_soft/bioinf/svtoolkit/co\ nf/genstrip_parameters.txt' '-P' 'depth.readCountCacheIgnoreGenomeMask:true' '-genomeMaskFile' '/home/lcottino/references\ /Homo_sapiens_assembly19/Homo_sapiens_assembly19.svmask.fasta' '-genomeMaskFile' '/home/lcottino/references/Homo_sapiens_as\ sembly19/Homo_sapiens_assembly19.lcmask.fasta' '-genderMapFile' '/home/lcottino/attempt8/genderMaps/gender.map' '-ploidyM\ apFile' '/home/lcottino/references/Homo_sapiens_assembly19/Homo_sapiens_assembly19.ploidymap.txt' '-vcf' 'Run1/cnv_stage4/\ seq_7/seq_7.merged.genotypes.vcf.gz' '-site' 'Run1/cnv_stage7/seq_7/P0134.sites.list' '-boundaryPrecision' '100' '-minim\ umRefinedLength' '500' '-maximumReferenceGapLength' '1000' ERROR 00:13:56,781 FunctionEdge - Contents of /home/lcottino/attempt8/ALL/Run1/cnv_stage7/seq_7/logs/CNVDiscoveryStage7-135\ .out:

However, the VCF output is produced and looks good and when we check the CNVDiscoveryStage7-135 log it says that it is done without errors. Furthermore, if we run it manually there is no problem.

We have tried to rerun it to see if it will correct itself but the same error occurred.

Stage 7 also seems to overwhelm the cluster, with 1000s of jobs being submitted - is this supposed to be happening?

We'd be very grateful for any advice on trying to work out where the problem is.

Kind regards,
Laura

Answers

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    My guess would be that it is some interaction between Queue and your cluster software. For example, if the job status is getting lost due to the large number of jobs. If it is happening non-reproducibly (i.e. different jobs fail each time) then I think this is even more likely.

    One workaround is to manually touch the relevant .*.done files if you are sure the job succeeded. This will cause Queue to not rerun those steps and then you can keep going.

    Stage 7 is the longest and it does submit a lot of jobs. I think you can limit the number of concurrent jobs submitted by passing -maxConcurrentRun N. Note that each chromosome in stage 7 is run in a separate job which then submits additional jobs, and it is at this leaf level you want -maxConcurrentRun N so when you specify N it will really run 24*N jobs. If you can use this to get Queue to throttle back the job submissions, this may also reduce the number of failures you are seeing.

Sign In or Register to comment.