If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How do you know when SVPreprocess has successfully finshed?

dfermindfermin USMember

I apologize in advance if this is a stupid question.

I'm coming from the world of gotcloud where when a step in the pipeline finishes a file called all.done gets created for that step. So in the event a pipeline crashe, I can re-execute the program and it will pickup where it left off.

I've been running SVPreprocess for over a week now. It's crashed a few times, usually due to memory limits but I've overcome those. At no point did I clear out the tmp folders or files.

I can't tell if it ever finished completely and I can move on with the next step in the CNV pipeline or if I'm starting from scratch each time.

Is there a file or a specific message I should look for?



  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    SVPreprocess uses Queue, which allows restarting similar to gotcloud. You can look for .*.done files, which show that each step is done. Rerunning should not redo already completed work.

    You should also see in the main output from SVPreprocess how many jobs have been run, how many have succeeded, how many failed, and how many are left to run.

    If you are using cram files, preprocessing takes around 6-8 hours per sample depending on the speed of your compute node and if I/O is not a bottleneck. After each sample is preprocessed individually, there is by default a merging step which makes the dataset more efficient for downstream pipelines.

    The merging step can be expensive depending on the number of samples in your dataset. If you have thousands of samples, you might consider merging in batches of 1000.

  • dfermindfermin USMember

    Okay I see the done files in the logs directory.


    There doesn't seem to be a final "all.done" file though.
    Is there a specific *.done file that indicates everything went well?
    The standard output of SVPreprocess says 1179 total jobs ran.

    I get one warning at the end in the standard output related to RScript. :

    INFO  11:31:12,892 QGraph - Writing incremental jobs reports...
    INFO  11:31:12,892 QJobsReporter - Writing JobLogging GATKReport to file /home/dfermin/genomeStrip.gf127/SVPreprocess.jobreport.txt
    INFO  11:31:13,078 FunctionEdge - Starting:  'Rscript'  '/home/dfermin/apps/genomeSTRIP/R/metadata/plot_chr_vs_chr_readdepth.R'  '/home/dfermin/genomeStrip.gf127/metadata/'  '/home/dfermin/genomeStrip.gf127/metadata/chrY_vs_chrX.pdf'  'seq_Y vs. seq_X Read Depth'  'DOSAGE_X'  'DOSAGE_Y'
    INFO  11:31:13,078 FunctionEdge - Output written to /home/dfermin/genomeStrip.gf127/logs/SVPreprocess-1178.out
    INFO  11:31:13,081 QGraph - 0 Pend, 1 Run, 0 Fail, 1178 Done
    INFO  11:31:43,045 FunctionEdge - Done:  'Rscript'  '/home/dfermin/apps/genomeSTRIP/R/metadata/plot_chr_vs_chr_readdepth.R'  '/home/dfermin/genomeStrip.gf127/metadata/'  '/home/dfermin/genomeStrip.gf127/metadata/chrY_vs_chrX.pdf'  'seq_Y vs. seq_X Read Depth'  'DOSAGE_X'  'DOSAGE_Y'
    INFO  11:31:43,045 QGraph - Writing incremental jobs reports...
    INFO  11:31:43,045 QJobsReporter - Writing JobLogging GATKReport to file /home/dfermin/genomeStrip.gf127/SVPreprocess.jobreport.txt
    INFO  11:31:43,209 QGraph - 0 Pend, 0 Run, 0 Fail, 1179 Done
    INFO  11:31:43,213 QCommandLine - Writing final jobs report...
    INFO  11:31:43,213 QJobsReporter - Writing JobLogging GATKReport to file /home/dfermin/genomeStrip.gf127/SVPreprocess.jobreport.txt
    INFO  11:31:43,309 QJobsReporter - Plotting JobLogging GATKReport to file /home/dfermin/genomeStrip.gf127/SVPreprocess.jobreport.pdf
    WARN  11:31:44,742 RScriptExecutor - RScript exited with 1. Run with -l DEBUG for more info.
    INFO  11:31:44,749 QCommandLine - Script completed successfully with 1179 total jobs
    Done. There were 1 WARN messages, the first 1 are repeated below.
    WARN  11:31:44,742 RScriptExecutor - RScript exited with 1. Run with -l DEBUG for more info.
Sign In or Register to comment.