Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
A Gene Pattern Question

Hi, I was referred here but not sure if this is the right place to ask. Any comments will be appreciated.
I am planning to run a Gene Pattern Module "CopyNumberInferencePipeline" on 240 tumor .CEL files, and roughly equivalent number of normal .CEL files. Since the module documentation specifically stated that the max number of CEL to be processed is 200, my strategy is to run the samples in batches. However, the noise reduction step relies on normal samples. For that reason, it will be the best to run all samples all together, which is beyond the max number specified. Any advice? If there is someone I can talk to regarding this, please point out.
Tagged:
Comments
Hi @ayjoon,
Please direct GenePattern questions to gp-help (at) broadinstitute (dot) org.
And actually, the CopyNumberInferencePipeline module documentation states
I would suggest two solutions. First, you can try your 240 files on the public server. Second, you can see if the tool is available on GenePattern @ Indiana University, which has more compute resources. Information on [email protected] is here and the actual site is here.
Thank you for the response, shlee ! In fact, I was refereed here through [email protected] That section you quoted was exactly where my concern came from. I interpret it as NOT to exceed 200 CEL files on the public server. I can only use public server for the reason that one of the component of the pipeline is not available outside Broad server (permission issue) . Also, I went to the Indiana University first but for the same permission issue they lack part of the module and referred me to the Broad.
@ayjoon --The key word is estimated. It's a bit funny that gp-help should refer you to the GATK forum. I would suggest you ask them whether there is a Docker version of GenePattern (with a binary version of the tool) that would then allow you to run your analyses in the cloud.
Hi all,
Figured we could save @ayjoon some round trips by converging here.
@Geraldine_VdAuwera, yes that's the crux. We don't have any nodes with enough memory to run this job, and I'm not sure if simply providing more memory would address the issue, or if there needs to be some optimization and/or threading as well. IE I do not know how this thing scales.
We have not set this up in Docker (we're getting there), but regardless, this particular pipeline makes use of some human data which can't leave the Broad, so it's stuck here.
Are there are any former CGA folks still kicking around over there (Gordon?) in FireCloud who could help with answering this? (They have shut down the forum that used to support these sorts of questions, and provided no alternative source for assistance). Basically we need to know the following:
@ayjoon - thanks for bearing with us, and apologies for bouncing you between our help forums. We are actually only down the hall from each other, but work on separate teams, so this can happen.
To that end - @Geraldine_VdAuwera , I'm happy to stop by and see if we can't sort this out together/find the folks who can.
@bahill, I would suggest you ask @esalinas on the FireCloud team.
Yeah, Eddie is probably the one to ask about this. We should discuss over coffee some time, @bahill, but it's unlikely my team can do much to help with this stuff.
Let's start with Gordon as he provided some help with the construction of the GenePattern pipeline. I'll ask Gordon to respond on this forum.
There is a copy number analysis workflow available on FireCloud. It is a component of our suite of somatic mutation calling workflows. Please see http://gatkforums.broadinstitute.org/firecloud/discussion/7512/broad-mutation-calling-best-practice-workflows#latest, and in particular the subsection on Broad Mutation Calling Copy Number Workflow.
@birger Does that workflow cover SNP6 data, specifically, .CEL files (not WES)?
@ayjoon,
For CNV calling on SNP6 data, try out GISTIC or ABSOLUTE.