Dimension Reduction

Principal component analysis (PCA) and singular value decomposition (SVD) are commonly used dimensionality reduction approaches in exploratory data analysis (EDA) and Machine Learning. They are both classical linear dimensionality reduction methods that attempt to find linear combinations of features in the original high dimensional data matrix to construct meaningful representation of the dataset.

Illustrating clusters

set.seed(96)
x = rnorm(12, mean = rep(1:3, each = 4), sd = 0.2)
y = rnorm(12, mean = rep(c(1,2,1),each = 4), sd = 0.2)
plot(x, y, col = "orange", pch = 19,cex = 1.5)
text(x+0.05, y+0.05, labels = as.character(1:12))

Missing values

It is imporatant to deal with missing values when working with SVD, since error occurs if our data contains any sort of missing value.

One way to deal with errors is to use built-in R function ‘impute()’

Consider the dataset

x = rnorm(12, mean = rep(1:3, each = 4), sd = 0.2)
y = rnorm(12, mean = rep(c(1,2,1),each = 4), sd = 0.2)
data = data.frame(x,y)
dataAsMatrix = as.matrix(data)[sample(1:12),]
image(t(dataAsMatrix)[,nrow(dataAsMatrix):1],yaxt = "n")

ggplot

Anandu R

6/15/2020

Dimension Reduction

Illustrating clusters

Missing values