Support Vector Machine (SVM) Model

Support Vector Machine (SVM) Model

What is it? Support vector machine (SVM) determines the boundaries that best classifies the different groups from each other using a subset of variables (e.g., biomarkers) in multi-dimensional space. The boundary is a hyperplane in which a subset of data points closest to the hyperplane (called support vectors) have the…

Linear Discriminant Analysis (LDA) Model

Linear Discriminant Analysis (LDA) Model

What is it? Linear discriminant analysis (LDA) separates samples into ≥ 2 classes based on the distance between class means and variance within each class. LDA can also serve to reduce data dimension. When is it used? This analysis is used when there are a lot of variables to consider…

PCA Analysis

PCA Analysis

What is it? Principle Component Analysis (PCA) transforms high-dimensional data into a lower-dimensional structure to improve data presentation, pattern recognition, and analysis. PCA determines which dimensions will result in the largest variability of measurements (e.g., expression of specific proteins) across all samples. It does not separate the different groups from…

Hierarchical Clustering

Hierarchical Clustering

What is it? Hierarchical clustering characterizes how similar (or dissimilar) the samples are based on overall patterns of measurements. For example, the groups may be patients and the overall patterns may be derived from the protein expression across numerous proteins. Hierarchical clustering analyzes the similarity in a binary fashion starting…

SAM (Significance Analysis of Microarray)

SAM (Significance Analysis of Microarray)

What is it? SAM is a method used for large-scale gene or protein expression data like those collected with microarrays. It addresses the issue of analyzing large-scale data in which a microarray experiment of 10,000 proteins would identify 100 proteins by chance using a p-value cut-off of 0.01. Therefore, SAM…