Question: How do carry out cluster analysis in R?

Data

We’ll use the “palmerpenguins” packages (https://allisonhorst.github.io/palmerpenguins/) to address this question. You’ll need to install the package with install.packages(“palmerpenguins”) if you have not done so before, call library(““palmerpenguins”), and load the data with data(penguins)

#install.packages("ggplot2")
library(ggplot2)

#install.packages("palmerpenguins")
library(palmerpenguins)

data(penguins)

Carry out cluster analysis in R

panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- abs(cor(x, y))
    txt <- format(c(r, 0.123456789), digits = digits)[1]
    txt <- paste0(prefix, txt)
    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = cex.cor * r)
}

plot(penguins,upper.panel = panel.cor,
     panel = panel.smooth)

row.names(penguins) <- paste(penguins$Species,1:nrow(penguins),sep = ".")
## Warning: Unknown or uninitialised column: `Species`.
## Warning: Setting row names on a tibble is deprecated.
i <- 1:nrow(penguins)
i <- sample(i, length(i)/2, replace = F)
penguins_sub <- penguins[i,]

dist_euc <- dist(penguins_sub[,-5], 
                 method = "euclidean")
## Warning in dist(penguins_sub[, -5], method = "euclidean"): NAs introduced by
## coercion
clust_euc <- hclust(dist_euc)

plot(clust_euc, hang = -1, cex = 0.5)