Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How to pre-assign the multiple regions I want to process in each pipeline?

Dear Genome STRiP users,

I intend to process part of the chromosome in each pipeline rather than processing the whole sequence. I know there is an -L flag in the SVPreprocess, but it is not listed in the documentation (http://software.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_SVPreprocess.html), but I do not know how to pre-assign multiple regions. More precisely, if I intend to process the 1-20000 and 25000-40000 of chr16, how can I set the -L?

Similarly, I am not sure if I can pre-assign the regions in SVDiscovery and SVGenotyper; if so, how can I pre-assign the multiple regions in these two pipelines?

Besides, in the SVCNVDiscovery, I know that there is a -intervalList flag, and need a .list file to set the interval list, but how can I set multiple regions, for example, do I need to put them in the different lines or separated them by the comma?

A further question is that I am not sure if the following two cases are equivalent:
1. Running SVPreprocess with -L chr16:1-50000, and then running SVCNVDiscovery with -intervalList chr16:20000-40000;
2. Running SVPreprocess with -L chr16:20000-40000, and then running SVCNVDiscovery with -intervalList chr16:20000-40000.

May I have your suggestions about these questions? Thank you in advance.

Best regards,
Wusheng

Answers

  • TerryMulhernTerryMulhern Member
    edited February 27

    I thought you need to first create multiple dataset directories and then use -md argument like it's mentioned in the SVPreprocess argument details and pipeline logic. I've been using GotCloud/GenomeSTRiP pipeline as it supports SLURM and MOSIX cluster environments (University of Michigan projects).

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    It is generally discouraged to use -L during preprocessing. Preprocessing gathers genome-wide statistics that are then used in downstream pipelines. Gathering statistics over one chromosome (or one region) is something we don't typically do and I can't vouch for the results.

    Once you have run preprocessing, you can run the deletion pipeline or CNV pipeline on different regions using -L and split up the work that way. This will give more consistent results.

Sign In or Register to comment.