Cluster Analysis (CNMF)

jneffjneff BostonMember, Broadie, Moderator admin
edited October 2016 in Cancer Genome Analysis

Overview

The Cluster Analysis tool calculates clusters based on a consensus non-negative matrix factorization (NMF) clustering method. The pipeline has the following features:

  1. Convert input data set to a non-negitive matrix by column rank normalization.

  2. Classify samples into consensus clusters.

  3. Determine differentially expressed focal events for each subtype.

How does Cluster Analysis work?

Non-negative matrix factorization (NMF) is an unsupervised learning algorithm that has been shown to identify molecular patterns when applied to gene expression data. Rather than separating gene clusters based on distance computation, NMF detects context-dependent patterns of gene expression in complex biological systems.

Inputs

All Lesions File (Copy Number Data File): The all lesions file is from the GISTIC pipeline and summarizes the results from the GISTIC run. It contains data about the significant regions of amplification and deletion as well as which samples are amplified or deleted in each of these regions. The identified regions are listed down the first column, and the samples are listed across the first row.

Outputs

Images

-Clustering plots

-Consensus plots

-Gene heat maps

-Gene heat maps for top genes

-Cormatrix

Files

-Markers

-Expression

-Coefficient

-Membership

Reports

-Nozzle HTML reports

-rData

How to run Cluster Analysis in FireCloud

  1. Clone to the workspace broad-firecloud-tutorials/ClusterAnalysisCNMF_V1_Tutorial.

  2. In your cloned workspace, navigate to the Method Configurations tab and select ClusterAnalysisCNMF.

  3. Click Launch Analysis.

  4. Sort to sample_set, then click Launch.

  5. In the Monitor tab, view the status of your analysis. Initially, the status displays Submitted. The expected runtime is 15-30 minutes.

  6. When the status displays Done, click on ACC (sample_set).

  7. Click on Outputs: Show, then select output files to view the results of this analysis.

  8. You can also view results, including Nozzle HTML reports and graphical plots by viewing attributes in the Data tab.

References

  • Brunet, J.P., Tamayo, P., Golub, T.R. & Mesirov, J.P., Metagenes and molecular pattern discovery using matrix factorization, Proc Natl Acad Sci U S A 12(101):4164-9 (2004)

  • Rousseeuw, P.J., Silhouettes: A graphical aid to the interpretation and validation of cluster analysis., J. Comput. Appl. Math. 20:53-65 (1987)

  • Broad Genepattern: NMFConsensus

  • R silhouette package

Post edited by jneff on
Sign In or Register to comment.