We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

How to pre-assign the multiple regions I want to process in each pipeline?

Dear Genome STRiP users,

I intend to process part of the chromosome in each pipeline rather than processing the whole sequence. I know there is an -L flag in the SVPreprocess, but it is not listed in the documentation (http://software.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_SVPreprocess.html), but I do not know how to pre-assign multiple regions. More precisely, if I intend to process the 1-20000 and 25000-40000 of chr16, how can I set the -L?

Similarly, I am not sure if I can pre-assign the regions in SVDiscovery and SVGenotyper; if so, how can I pre-assign the multiple regions in these two pipelines?

Besides, in the SVCNVDiscovery, I know that there is a -intervalList flag, and need a .list file to set the interval list, but how can I set multiple regions, for example, do I need to put them in the different lines or separated them by the comma?

A further question is that I am not sure if the following two cases are equivalent:
1. Running SVPreprocess with -L chr16:1-50000, and then running SVCNVDiscovery with -intervalList chr16:20000-40000;
2. Running SVPreprocess with -L chr16:20000-40000, and then running SVCNVDiscovery with -intervalList chr16:20000-40000.

May I have your suggestions about these questions? Thank you in advance.

Best regards,
Wusheng

Answers

  • TerryMulhernTerryMulhern Member
    edited February 2019

    I thought you need to first create multiple dataset directories and then use -md argument like it's mentioned in the SVPreprocess argument details and pipeline logic. I've been using GotCloud/GenomeSTRiP pipeline as it supports SLURM and MOSIX cluster environments (University of Michigan projects).

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    It is generally discouraged to use -L during preprocessing. Preprocessing gathers genome-wide statistics that are then used in downstream pipelines. Gathering statistics over one chromosome (or one region) is something we don't typically do and I can't vouch for the results.

    Once you have run preprocessing, you can run the deletion pipeline or CNV pipeline on different regions using -L and split up the work that way. This will give more consistent results.

Sign In or Register to comment.