Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

No discovery.vcf when vars change

Will_GilksWill_Gilks University of Sussex, UKMember ✭✭

Hi,

As suggested in the title, I successfully ran the deletion calling pipeline to completion on 20 individuals, 2 million contiguous, autosomal bases, 100-10,000bp in length.

However, when I run the same script but modified to include 40 individuals, 9 million bases, 100-100,000bp in length, the process fails seemingly because there is no discovery genotypes file generated for it to work on.

Initial logic suggests the problem is caused by a difference in the inputs: maybe one of the additional individuals is corrupting the process with bad data, or there's a memory limit reached with trying to look for deletions in such a wide size range. I've double-checked the variables.

The point in the log at which the two runs diverge is in the Discovery QGraph stage, where the successful one precedes Function Edge: Starting java.. , and the unsuccessful run just seems to write a job report, which contains only #:GATKReport.v1.1:0 Also the gender report is good quality. Below are sections from the logs. Any help would be much appreciated,

Cheers,

Will

(Successful job) :
INFO 21:10:33,052 HelpFormatter - Date/Time: 2016/01/18 21:10:33 INFO 21:10:33,052 HelpFormatter - ---------------------------------------------------------------------- INFO 21:10:33,053 HelpFormatter - ---------------------------------------------------------------------- INFO 21:10:33,067 QCommandLine - Scripting SVDiscovery INFO 21:10:33,498 QCommandLine - Added 2 functions INFO 21:10:33,498 QGraph - Generating graph. INFO 21:10:33,512 QGraph - Running jobs. INFO 21:10:33,740 FunctionEdge - Starting: 'java' '-Xmx4096m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/lustre/scratch/bioenv/wg39/LHm_analysis/genotyping/CNVs/tmpdir' '-cp'

(Unsuccessful job) :
INFO 02:11:51,565 HelpFormatter - Date/Time: 2016/01/21 02:11:51 INFO 02:11:51,565 HelpFormatter - ---------------------------------------------------------------------- INFO 02:11:51,565 HelpFormatter - ---------------------------------------------------------------------- INFO 02:11:51,576 QCommandLine - Scripting SVDiscovery INFO 02:11:53,150 QCommandLine - Added 2 functions INFO 02:11:53,151 QGraph - Generating graph. INFO 02:11:53,165 QGraph - Running jobs. INFO 02:11:54,753 QGraph - 0 Pend, 0 Run, 0 Fail, 2 Done INFO 02:11:54,755 QCommandLine - Writing final jobs report... INFO 02:11:54,756 QJobsReporter - Writing JobLogging GATKReport to file /lustre/scratch/bioenv/wg39/LHm_analysis/genotyping/CNVs/SVDiscovery.jobreport.txt INFO 02:11:54,800 QJobsReporter - Plotting JobLogging GATKReport to file /lustre/scratch/bioenv/wg39/LHm_analysis/genotyping/CNVs/SVDiscovery.jobreport.pdf WARN 02:11:56,452 RScriptExecutor - RScript exited with 1. Run with -l DEBUG for more info. INFO 02:11:56,456 QCommandLine - Script completed successfully with 2 total jobs

Best Answer

Answers

Sign In or Register to comment.