This document explains PCA/clustering related plotting using {ggplot2}
and {ggfortify}
.
First, install ggfortify
from CRAN.
install.packages('ggfortify')
{ggfortify}
let {ggplot2}
know how to interpret PCA objects. After loading {ggfortify}
, you can use ggplot2::autoplot
function for stats::prcomp
and stats::princomp
objects.
library(ggfortify)
df <- iris[c(1, 2, 3, 4)]
autoplot(prcomp(df))
PCA result should only contains numeric values. If you want to colorize by non-numeric values which original data has, pass original data using data
keyword and then specify column name by colour
keyword. Use help(autoplot.prcomp)
(or help(autoplot.*)
for any other objects) to check available options.
autoplot(prcomp(df), data = iris, colour = 'Species')
Passing label = TRUE
draws each data label using rownames
autoplot(prcomp(df), data = iris, colour = 'Species', label = TRUE, label.size = 3)
Passing shape = FALSE
makes plot without points. In this case, label
is turned on unless otherwise specified.
autoplot(prcomp(df), data = iris, colour = 'Species', shape = FALSE, label.size = 3)
Passing loadings = TRUE
draws eigenvectors.
autoplot(prcomp(df), data = iris, colour = 'Species', loadings = TRUE)
You can attach eigenvector labels and change some options.
autoplot(prcomp(df), data = iris, colour = 'Species',
loadings = TRUE, loadings.colour = 'blue',
loadings.label = TRUE, loadings.label.size = 3)
ggfortify
supports stats::factanal
object as the same manner as PCAs. Available opitons are the same as PCAs.
Important You must specify scores
option when calling factanal
to calcurate sores (default scores = NULL
). Otherwise, plotting will fail.
d.factanal <- factanal(state.x77, factors = 3, scores = 'regression')
autoplot(d.factanal, data = state.x77, colour = 'Income')
autoplot(d.factanal, label = TRUE, label.size = 3,
loadings = TRUE, loadings.label = TRUE, loadings.label.size = 3)
{ggfortify}
supports stats::kmeans
class. You must explicitly pass original data to autoplot
function via data
keyword. Because kmeans
object doesn’t store original data. The result will be automatically colorized by categorized cluster.
set.seed(1)
autoplot(kmeans(USArrests, 3), data = USArrests)
autoplot(kmeans(USArrests, 3), data = USArrests, label = TRUE, label.size = 3)
{ggfortify}
supports cluster::clara
, cluster::fanny
, cluster::pam
classes. Because these instances should contains original data in its property, there is no need to pass original data explicitly.
library(cluster)
autoplot(clara(iris[-5], 3))
Specifying frame = TRUE
in autoplot
for stats::kmeans
and cluster::*
draws convex for each cluster.
autoplot(fanny(iris[-5], 3), frame = TRUE)
If you want probability ellipse, ggplot2
1.0.0 or later is required. Specify whatever supported in ggplot2::stat_ellipse
’s type
keyword via frame.type
option.
autoplot(pam(iris[-5], 3), frame = TRUE, frame.type = 'norm')
{lfda}
package{lfda}
package supports a set of Local Fisher Discriminant Analysis methods. You can use autoplot
to plot the analysis result as the same manner as PCA.
Thanks to the kind contribution of Yuan Tang, the author of {lfda}
package.
library(lfda)
# Local Fisher Discriminant Analysis (LFDA)
model <- lfda(iris[-5], iris[, 5], 4, metric="plain")
autoplot(model, data = iris, frame = TRUE, frame.colour = 'Species')
# Kernel Local Fisher Discriminant Analysis (KLFDA)
model <- klfda(kmatrixGauss(iris[-5]), iris[, 5], 4, metric="plain")
autoplot(model, data = iris, frame = TRUE, frame.colour = 'Species')
NOTE Note that for iris
data set the relationships between different classes are not linear. Kernel Local Fisher Discriminant Analysis is only aimed for capturing non-linear relationships, especially when it comes to many different classes. In this case, visualization of iris
data set is poor because klfda
is too strong for capturing linear relationships. If using klfda
for this kind of data, later when it comes to classification or clustering tasks, the model would very likely overfit the transformed data set.
# Semi-supervised Local Fisher Discriminant Analysis (SELF)
model <- self(iris[-5], iris[, 5], beta = 0.1, r = 3, metric="plain")
autoplot(model, data = iris, frame = TRUE, frame.colour = 'Species')