Multivariate Analysis Lecture 1617: Cluster Analysis

Author

Published

May 30, 2023

Outline

Introduction
Model-free cluster analysis
K-Means
Model-based cluster analysis
Gaussian mixture model
Choose the number of clusters

Introduction

Introduction to Cluster Analysis

Exploratory
Aims to understand/ learn the unknown substructure of multivariate data
Seeks rules to group data
- Large between-group difference
- Small within-group difference

Cluster Analysis vs Classification Analysis

Cluster analysis
- Data are unlabeled
- The number of clusters are unknown
- “Unsupervised” learning
Classification
- The labels for training data are known
- The number of classes are known
- Want to allocate new observations, whose labels are unknown, to one of the known classes
- “Supervised” learning

Clustering Methods

Model-free:
- Hierarchical clustering. Based on similarity measures
- Nonhierarchical clustering. K-means
Model-based clustering

K-Means

Introduction

The idea goes to back Hugo Steinhaus (1956); the name was first used by James MacQueen (1967)
K-means is a model-free clustering method
Is is a type of nonhierarchical clustering
Assign each observation to the cluster with the nearest centroid
“Nearest” is usually defined on Euclidean distance
The goal is to minimize within-cluster variances
We need to decide whether or not to standardize observations
There are variations such as more robust versions (k-medians, k-medoids) and model-based (Gaussian mixture)

K-Means vs K-Nearest-Neighbors (KNN)

Both are non-parametric
K-means: unsupervised
KNN: supervised
K:
- the number of neighbors in KNN
- the number of clusters in K-Means

How Does K-means Work

We assume the number of classes, denoted by \(K\), is known
Let \(\mathbf X_1, \cdots, \mathbf X_n\) denote \(n\) observations in \(\mathbb R^p\).
Let \(C_1, \cdots, C_K\) be the indices of membership
Want to find \((C_1, \cdots, C_K)\) to minimize the objective function

\[\underset{C_1, \cdots, C_K}{min} \{ \sum_{k=1}^K \frac{1}{|C_k|}\sum_{(i,i')\in C_k} D^2(\mathbf X_i, \mathbf X_{i'}) \}\]

How Does K-means Work

The objective function is equivalent to \[\underset{Z_1, \cdots, Z_K}{min} \{ \sum_{i=1}^n D^2(\mathbf X_i, \bar {\mathbf X}_{Z_i}) \}\] where \(\bar {\mathbf X}_{Z_i}\) is the centriod of cluster \(C_i\).
One widely used choice of \(D\) is the Euclidean distance, i.e., \[D^2(x,y)=||x-y||^2=(x-y)^T (x-y)\]

K-Means: Algorithm

Step 0: standardize data if appropriate
Step 1: Partition the observations into K initial clusters. Alternatively, we can initialize K centroids
Step 2: Assign each observation to its nearest cluster.
Step 3: recalculate the centroids.
Step 4: repeat 2-3 until no more changes in assignments

Figures of K-means of Iris

knitr::include_graphics("img/clustering.png")

cluster analysis using the R function “kmeans”

obj=kmeans(data.frame(iris[,-5]),centers=3)

An Animated Figure of Clusters of Iris Data

Code

##################################
## Define Squared Euclidean Distance
distxy=function(x,y) { return(sum( (x-y)^2)) }

#prepare the scatter plots for an animated gif
dir_out="C:/Users/yu790/Desktop/Desktop/teaching-multivariate/img/Iris_kmean/scatter"
dir.create(dir_out, recursive = TRUE, showWarnings = FALSE)

mydata=iris[,-5]
#mydata=scale(mydata)
n=dim(mydata)[1]
K=3
distance=matrix(0, n, K)
#initialize
set.seed(1)
cluster01=rmultinom(n, size=1, prob=c(0.3,0.3,0.4) )
cluster01.new=rmultinom(n, size=1, prob=c(0.3,0.3,0.4) )
cluster123=apply(cluster01.new, 2, which.max)

i=0
while( sum(cluster01!=cluster01.new) )
{
 i=i+1
 if (i < 10) {name = paste('000',i,'plot.png',sep='')}
 if (i < 100 && i >= 10) {name = paste('00',i,'plot.png', sep='')}
 if (i >= 100) {name = paste('0', i,'plot.png', sep='')}
 png(filename=paste(dir_out,"/", name, sep=""))
 plot(mydata[,3],mydata[,4], type="n",xlab="Petal L",ylab="Petal W",
 main=paste("K-mean Clustering (Iris Data): iteration=", i))
 for(k in 1:K)
  points(mydata[cluster123==k,3], mydata[cluster123==k,4],col=k)
 dev.off()

 centroids=as.matrix(cluster01.new)%*%as.matrix(mydata)/rowSums(cluster01.new)
 ##to avoid switch label
 centroids=centroids[order(centroids[,1]),]
 cluster01=cluster01.new

 for(k in 1:K)
  distance[,k]=apply(mydata, 1, distxy, centroids[k,])

 cluster123=apply(distance, 1, which.min)
 #convert the assignment to dummy variables
 cluster01.new=matrix(0, K, n)
 for(k in 1:K)
  cluster01.new[k,cluster123==k]=1

 print(i)
 print(table(cluster123))
}

[1] 1
cluster123
 1  2  3 
22 49 79

[1] 2
cluster123
 1  2  3 
50 30 70

[1] 3
cluster123
 1  2  3 
50 36 64

[1] 4
cluster123
 1  2  3 
50 40 60

[1] 5
cluster123
 1  2  3 
50 45 55

[1] 6
cluster123
 1  2  3 
50 49 51

[1] 7
cluster123
 1  2  3 
50 54 46

[1] 8
cluster123
 1  2  3 
50 57 43

[1] 9
cluster123
 1  2  3 
50 60 40

[1] 10
cluster123
 1  2  3 
50 61 39

[1] 11
cluster123
 1  2  3 
50 61 39

Code

imgs <- list.files(dir_out, full.names = TRUE)
img_list <- lapply(imgs, image_read)

## join the images together
img_joined <- image_join(img_list)

## animate at 2 frames per second
img_animated <- image_animate(img_joined, fps = 2)

## view animated image
img_animated

Code

## save to disk
image_write(image = img_animated,
            path = "img/iris_kmeans_animated_scatter.gif")

Cluster Memberships of Iris Data (K-Means)

knitr::include_graphics("img/Iris_kmeans_cluster_memberships_steps1to9.png")

An animated figure of cluster memberships of iris k-means

Code

## Define Squared Euclidean Distance
distxy=function(x,y) { return(sum( (x-y)^2)) }

#prepare the plots of cluster assignments for an animated gif
dir_out="C:/Users/yu790/Desktop/Desktop/teaching-multivariate/img/Iris_kmean/assignment"
dir.create(dir_out, recursive = TRUE, showWarnings = FALSE)

mydata=iris[,-5]
#mydata=scale(mydata) #if we would like to use standardized data
n=dim(mydata)[1]
K=3
distance=matrix(0, n, K)


#initialize clusters
set.seed(1)
cluster01=rmultinom(n, size=1, prob=c(0.3,0.3,0.4) )
cluster01.new=rmultinom(n, size=1, prob=c(0.3,0.3,0.4) )
cluster123=apply(cluster01.new, 2, which.max)
#cluster01.new is 3-by-150

i=0
while( sum(cluster01!=cluster01.new) )
{
 i=i+1
 if (i < 10) {name = paste('000',i,'plot.png',sep='')}
 if (i < 100 && i >= 10) {name = paste('00',i,'plot.png', sep='')}
 if (i >= 100) {name = paste('0', i,'plot.png', sep='')}

png(filename=paste(dir_out,"/", name, sep=""))
tmp=cluster01.new
colnames(tmp)=1:n
barplot(cluster01.new, ylim=c(0,1), col=c(1:3), 
     main=paste("K-mean Clustering (Iris Data): iteration=", i),
     xlab="Iris Species: Setosa, Versicolor, Virginica")
dev.off()

centroids=as.matrix(cluster01.new)%*%as.matrix(mydata)/rowSums(cluster01.new)
##to avoid switch label
centroids=centroids[order(centroids[,1]),]
cluster01=cluster01.new

for(k in 1:K)
  distance[,k]=apply(mydata, 1, distxy, centroids[k,])
  cluster123=apply(distance, 1, which.min) #distance is 150-by-3
  #convert the assignment to dummy variables
  cluster01.new=matrix(0, K, n)
  for(k in 1:K)
    cluster01.new[k,cluster123==k]=1
 print(i)
 print(table(cluster123))
}

[1] 1
cluster123
 1  2  3 
22 49 79

[1] 2
cluster123
 1  2  3 
50 30 70

[1] 3
cluster123
 1  2  3 
50 36 64

[1] 4
cluster123
 1  2  3 
50 40 60

[1] 5
cluster123
 1  2  3 
50 45 55

[1] 6
cluster123
 1  2  3 
50 49 51

[1] 7
cluster123
 1  2  3 
50 54 46

[1] 8
cluster123
 1  2  3 
50 57 43

[1] 9
cluster123
 1  2  3 
50 60 40

[1] 10
cluster123
 1  2  3 
50 61 39

[1] 11
cluster123
 1  2  3 
50 61 39

Code

## list file names and read in
imgs <- list.files(dir_out, full.names = TRUE)
img_list <- lapply(imgs, image_read)

## join the images together
img_joined <- image_join(img_list)

## animate at 2 frames per second
img_animated <- image_animate(img_joined, fps = 2)

## view animated image
img_animated

Code

## save to disk
image_write(image = img_animated,
            path = "img/iris_kmeans_animated_assignment.gif")

Remarks

The algorithm is guaranteed to decrease the objective function
It might converge to local minimum
Use different and random initial values

Different Initial Values

knitr::include_graphics("img/Iris_kmeans_InitialValues.png")

Different Initial Values

kmeans(data.frame(iris[,-5]),centers=3, nstart=5)

K-means clustering with 3 clusters of sizes 62, 38, 50

Cluster means:
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.901613    2.748387     4.393548    1.433871
2     6.850000    3.073684     5.742105    2.071053
3     5.006000    3.428000     1.462000    0.246000

Clustering vector:
  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [38] 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [75] 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 2 2 2 2
[112] 2 2 1 1 2 2 2 2 1 2 1 2 1 2 2 1 1 2 2 2 2 2 1 2 2 2 2 1 2 2 2 1 2 2 2 1 2
[149] 2 1

Within cluster sum of squares by cluster:
[1] 39.82097 23.87947 15.15100
 (between_SS / total_SS =  88.4 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"         "iter"         "ifault"

Model-based

Mixture Model

Consider a random variable \(X\)
We say it follows a mixture of \(K\) distributions if its pdf/pmf can be written as a weighted sum of \(K\) pdf/pmf’s:
The weights \(p_k, k=1, \cdots, K\) are nonnegative numbers and they add up to 1

Example: Gaussian Mixture Models

Let \(f_k(x)\) be the pdf of \(N(\boldsymbol \mu_k, \boldsymbol\Sigma_k)\)
We say \(X\) follow a Gassian mixture model if

\[\begin{aligned} f_{Mix}(x|\boldsymbol \mu_1, \boldsymbol \Sigma_1, \cdots, \boldsymbol \mu_K, \boldsymbol \Sigma_K, p_1, \cdots, p_K)\\ =\sum_{k=1}^Kp_k \frac{1}{(1\pi)^{p/2}|\boldsymbol\Sigma_k|^{1/2}}exp\left(-\frac{1}{2} (x-\boldsymbol\mu_k)^T \boldsymbol\Sigma_k^{-1} (x-\boldsymbol \mu_k)\right) \end{aligned}\]

Examples: Mixtures of Two Univariate Normal

Code

# Load required library
# Set the parameters
mean1 <- 1
var1 <- 0.25
mean0 <- 0
var0 <- 0.25


# Generate data points
x <- seq(-2, 3, length.out = 1000)
y1 <- dnorm(x, mean = mean1, sd = sqrt(var1))
y0 <- dnorm(x, mean = mean0, sd = sqrt(var0))
ymix_70 <- 0.7 * y1 + 0.3 * y0
ymix_eq <- 0.5 * y1 + 0.5 * y0
ymix_20  <- 0.2 * y1 + 0.8 * y0

# Create a data frame for plotting
df <- data.frame(x = x, y1 = y1, y0 = y0, ymix_70 = ymix_70,
                 ymix_eq = ymix_eq, ymix_20 = ymix_20)

# Create the plot
plot1 <- ggplot(df, aes(x)) +
  geom_line(aes(y = y1, color = "Distribution 1"), size = 1) +
  geom_line(aes(y = y0, color = "Distribution 0"), size = 1) +
  geom_line(aes(y = ymix_eq, color = "Mixture"), size = 1) +
  scale_color_manual(values = c("Distribution 1" = "blue", "Distribution 0" = "green", "Mixture" = "red")) +
  labs(x = "x", y = "Density", title = "Mixture of Two Normal 0.5 and 0.5") +
  theme_minimal()
plot2 <- ggplot(df, aes(x)) +
  geom_line(aes(y = y1, color = "Distribution 1"), size = 1) +
  geom_line(aes(y = y0, color = "Distribution 0"), size = 1) +
  geom_line(aes(y = ymix_70, color = "Mixture"), size = 1) +
  scale_color_manual(values = c("Distribution 1" = "blue", "Distribution 0" = "green", "Mixture" = "red")) +
  labs(x = "x", y = "Density", title = "Mixture of Two Normal 0.7 and 0.3") +
  theme_minimal()

plot3 <- ggplot(df, aes(x)) +
  geom_line(aes(y = y1, color = "Distribution 1"), size = 1) +
  geom_line(aes(y = y0, color = "Distribution 0"), size = 1) +
  geom_line(aes(y = ymix_20, color = "Mixture"), size = 1) +
  scale_color_manual(values = c("Distribution 1" = "blue", "Distribution 0" = "green", "Mixture" = "red")) +
  labs(x = "x", y = "Density", title = "Mixture of Two Normal 0.2 and 0.8") +
  theme_minimal()
# Display the plot
grid.arrange(plot1, plot2, plot3, ncol = 1, nrow = 3)

Gaussian Mixture Model

The likelihood function based on \(n\) observations is \[L(\boldsymbol \theta)=\prod_{i=1}^n f_{Mix}(x_i|\boldsymbol\theta)\] where \(\boldsymbol\theta\) represents all the parameters.
There is no analytic solution to obtain the Maximum likelihood estimation (MLE)
The expectation-maximization (EM) algorithm can be used to estimate parameters

The Expectation Maximization (EM) Algorithm

Formally outlined by Dempster, Laird, and Rubin (1977) in “Maximum likelihood from incomplete data via the EM algorithm”
Observed data: \(y^O\)
Unobserved data: \(y^U\)
Complete data: \(y^C=(y^O, y^U)\)
Parameter \(\boldsymbol\theta\)

The EM Algorithm

Initialize parameters: set \(\boldsymbol\theta=\boldsymbol\theta^{(0)}\)
For \(t=1\) to
- E-step: Compute
\[Q(\boldsymbol\theta| \boldsymbol\theta^{(t)}) = E[f(y^C|\boldsymbol\theta)|y^O, \boldsymbol\theta^{(t)}]\]

This step is equivalent to compute conditional expectations in most situations
- M-step: Update parameters
\[\boldsymbol\theta^{(t+1)}=argmax_{\boldsymbol\theta}Q(\boldsymbol\theta| \boldsymbol\theta^{(t)})\]

The EM Algorithm

Ascent property
- The M step ensures the algorithm improves \(Q(\boldsymbol\theta| \boldsymbol\theta^{(t)})\)
- It can be shown that improving \(Q(\boldsymbol\theta| \boldsymbol\theta^{(t)})\) implies improving \(L(\boldsymbol\theta)=f(y^O|\boldsymbol\theta)\) (the observed-data likelihood)
Convergence to local maxim
- Choose multiple sets of inital values

Example of EM Algorithm

Observed data
- x=(0.37 1.18 0.16 2.60 1.33 0.18 1.49 1.74 3.58 2.69 4.51 3.39 2.38 0.79 4.12 2.96 2.98 3.94 3.82 3.59)
Assume \(K=2\)
Parameters: \(\mu_1, \mu_0, \sigma_1, \sigma_0, p\)
Unobserved data: \(z=(\cdots)\), the class membership, a vector of binary variables

Initialization

Parameters need to be initialized
Different initial values might lead to different MLEs
Use different initial values and check which one gives the largest likelihood

E-Step

knitr::include_graphics("img/EM1.png")

E-Step

knitr::include_graphics("img/EM2.png")

M-Step

knitr::include_graphics("img/EM3.png")

Code the EM in R

Code

##### gif animation for simulated data. Gaussian Mixture model, EM algorithm
dir_out="C:/Users/yu790/Desktop/Desktop/teaching-multivariate/img/EM"
dir.create(dir_out, recursive = TRUE, showWarnings = FALSE)
set.seed(1);
x=c(rnorm(8,1), rnorm(12,3))
n=length(x)
z=rbinom(n,1,0.5)
k0=c(1:n)[!z]
k1=setdiff(1:n, k0)
p.old=mean(z)
u.old=c(mean(x[k0]), mean(x[k1]) )
sd.old=c(sd(x[k0]), sd(x[k1]) )
p.new=p.old
u.new=u.old
sd.new=sd.old

i=0
while(i==0 || max(abs(u.new-u.old))>0.0001) 
{
p.old=p.new
u.old=u.new
sd.old=sd.new

##### E-Step
ez1=p.old*dnorm(x, u.old[2], sd.old[2]) / 
    ((1-p.new)*dnorm(x, u.old[1], sd.old[1]) + p.old*dnorm(x, u.old[2], sd.old[2]))
ez0=1-ez1

##### M-step
p.new=mean(ez1)
u.new=c( weighted.mean(x, ez0), weighted.mean(x, ez1))
sd.new=c( sqrt(weighted.mean((x-u.new[1])^2, ez0)) , 
    sqrt(weighted.mean((x-u.new[2])^2, ez1)))

i=i+1
if(i==1) cat("\t", "p \t", "u0 \t", "u1 \n")
cat("iteration", i, "\n")
print(c(p.new, u.new))

 if (i < 10) {name = paste('000',i,'plot.png',sep='')}
 if (i < 100 && i >= 10) {name = paste('00',i,'plot.png', sep='')}
 if (i >= 100) {name = paste('0', i,'plot.png', sep='')}
png(filename=paste(dir_out,"/", name, sep=""))
 tmp=rbind(ez1,1-ez1)
 colnames(tmp)=1:n
 barplot(tmp, ylim=c(0,1),col=c(2,4),main=paste("Estimated Prob from EM: iteration=", i),ylab="prob")
 dev.off()
}

     p   u0      u1 
iteration 1 
[1] 0.5486808 2.7438535 2.0998914

iteration 2 
[1] 0.5499383 2.7797870 2.0719564

iteration 3 
[1] 0.5510853 2.8159372 2.0439817

iteration 4 
[1] 0.5523403 2.8560647 2.0132132

iteration 5 
[1] 0.5538187 2.9025178 1.9780384

iteration 6 
[1] 0.5555898 2.9574019 1.9370844

iteration 7 
[1] 0.5576711 3.0228999 1.8889413

iteration 8 
[1] 0.5599612 3.1009544 1.8322405

iteration 9 
[1] 0.5620851 3.1919255 1.7661598

iteration 10 
[1] 0.5631783 3.2917851 1.6914726

iteration 11 
[1] 0.5618823 3.3892693 1.6117699

iteration 12 
[1] 0.5571285 3.4680534 1.5339762

iteration 13 
[1] 0.5496536 3.5180095 1.4667436

iteration 14 
[1] 0.5410559 3.5420960 1.4137166

iteration 15 
[1] 0.531741 3.548049 1.371190

iteration 16 
[1] 0.5218244 3.5432627 1.3342075

iteration 17 
[1] 0.5114847 3.5329410 1.2994096

iteration 18 
[1] 0.5008803 3.5199750 1.2650429

iteration 19 
[1] 0.4901394 3.5057700 1.2304049

iteration 20 
[1] 0.4793735 3.4909783 1.1953685

iteration 21 
[1] 0.4686951 3.4759362 1.1601186

iteration 22 
[1] 0.458235 3.460886 1.125049

iteration 23 
[1] 0.4481546 3.4460795 1.0907414

iteration 24 
[1] 0.4386503 3.4318119 1.0579664

iteration 25 
[1] 0.4299456 3.4184276 1.0276513

iteration 26 
[1] 0.422263 3.406285 1.000768

iteration 27 
[1] 0.4157754 3.3956927 0.9781162

iteration 28 
[1] 0.4105534 3.3868395 0.9600769

iteration 29 
[1] 0.4065404 3.3797458 0.9464773

iteration 30 
[1] 0.4035762 3.3742769 0.9366876

iteration 31 
[1] 0.4014519 3.3701973 0.9298717

iteration 32 
[1] 0.3999615 3.3672344 0.9252231

iteration 33 
[1] 0.3989304 3.3651268 0.9220866

iteration 34 
[1] 0.3982234 3.3636510 0.9199797

iteration 35 
[1] 0.3977415 3.3626292 0.9185661

iteration 36 
[1] 0.3974142 3.3619276 0.9176173

iteration 37 
[1] 0.3971925 3.3614485 0.9169798

iteration 38 
[1] 0.3970425 3.3611226 0.9165512

iteration 39 
[1] 0.3969411 3.3609016 0.9162627

iteration 40 
[1] 0.3968727 3.3607519 0.9160685

iteration 41 
[1] 0.3968265 3.3606507 0.9159376

iteration 42 
[1] 0.3967953 3.3605824 0.9158494

Code

## list file names and read in
imgs <- list.files(dir_out, full.names = TRUE)
img_list <- lapply(imgs, image_read)

## join the images together
img_joined <- image_join(img_list)

## animate at 2 frames per second
img_animated <- image_animate(img_joined, fps = 2)

## view animated image
img_animated

Code

## save to disk
image_write(image = img_animated,
            path = "img/EM.gif")

Gaussian Mixture For Iris Data

Code

##### gif animation for iris data. Gaussian Mixture model, EM algorithm
### two species: Setosa, Versicolor. One variable: petal length
library(MASS)
dir_out="C:/Users/yu790/Desktop/Desktop/teaching-multivariate/img/Iris_GMixture"
dir.create(dir_out, recursive = TRUE, showWarnings = FALSE)
set.seed(1);
x=c(iris3[,3,1],iris3[,3,2])
n=length(x)
z=rbinom(n,1,0.5)
k0=c(1:n)[!z]
k1=setdiff(1:n, k0)
p.old=mean(z)
u.old=c(mean(x[k0]), mean(x[k1]) )
sd.old=c(sd(x[k0]), sd(x[k1]) )
p.new=p.old
u.new=u.old
sd.new=sd.old

i=0
while(i==0 || max(abs(u.new-u.old))>0.0001) 
{
p.old=p.new
u.old=u.new
sd.old=sd.new

#e-step
ez1=p.old*dnorm(x, u.old[2], sd.old[2]) / 
    ((1-p.new)*dnorm(x, u.old[1], sd.old[1]) + p.old*dnorm(x, u.old[2], sd.old[2]))
ez0=1-ez1

#m-step
p.new=mean(ez1)
u.new=c( weighted.mean(x, ez0), weighted.mean(x, ez1))
sd.new=c( sqrt(weighted.mean((x-u.new[1])^2, ez0)) , 
    sqrt(weighted.mean((x-u.new[2])^2, ez1)))

i=i+1
if(i==1) cat("\t", "p \t", "u0 \t", "u1 \n")
cat("iteration", i, "\n")
print(c(p.new, u.new))

 if (i < 10) {name = paste('000',i,'plot.png',sep='')}
 if (i < 100 && i >= 10) {name = paste('00',i,'plot.png', sep='')}
 if (i >= 100) {name = paste('0', i,'plot.png', sep='')}
png(filename=paste(dir_out,"/", name, sep=""))
 tmp=rbind(ez1,1-ez1)
 colnames(tmp)=1:n
 barplot(tmp, ylim=c(0,1),col=c(2,4),main=paste("Estimated Prob (Iris Data): iteration=", i),ylab="prob")
 dev.off()
}

     p   u0      u1 
iteration 1 
[1] 0.4787883 3.1563362 2.5394953

iteration 2 
[1] 0.4782697 3.1680802 2.5260153

iteration 3 
[1] 0.4778071 3.1804369 2.5118890

iteration 4 
[1] 0.4772961 3.1940672 2.4962462

iteration 5 
[1] 0.4767149 3.2092686 2.4787092

iteration 6 
[1] 0.4760469 3.2263743 2.4588568

iteration 7 
[1] 0.4752706 3.2458171 2.4361371

iteration 8 
[1] 0.4743576 3.2681770 2.4098015

iteration 9 
[1] 0.4732687 3.2942509 2.3788071

iteration 10 
[1] 0.4719481 3.3251648 2.3416565

iteration 11 
[1] 0.4703152 3.3625623 2.2961238

iteration 12 
[1] 0.4682516 3.4089313 2.2387669

iteration 13 
[1] 0.465584 3.468194 2.164038

iteration 14 
[1] 0.4620888 3.5467770 2.0626964

iteration 15 
[1] 0.4576889 3.6553706 1.9197582

iteration 16 
[1] 0.4540255 3.8101438 1.7196361

iteration 17 
[1] 0.462824 4.022454 1.512960

iteration 18 
[1] 0.4940374 4.2282204 1.4607771

iteration 19 
[1] 0.4999789 4.2598997 1.4619823

iteration 20 
[1] 0.4999995 4.2599976 1.4619996

Code

## list file names and read in
imgs <- list.files(dir_out, full.names = TRUE)
img_list <- lapply(imgs, image_read)

## join the images together
img_joined <- image_join(img_list)

## animate at 2 frames per second
img_animated <- image_animate(img_joined, fps = 2)

## view animated image
img_animated

Code

## save to disk
image_write(image = img_animated,
            path = "img/iris_GMixture.gif")

Choose the Optimal Number of Clusters

Model-based method: introduce a penalty term for \(K\)
- AIC: maximize

\[AIC=2\ln L_{max}-2n\left(K\frac{1}{2}(p+1)(p+2)-1 \right)\]

BIC: maximize \[BIC=2\ln L_{max}-2\ln n\left(K\frac{1}{2}(p+1)(p+2)-1 \right)\]
Model-free:
- Elbow method
- Silhouette analysis (Peter J. Rousseuw (1987). “Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics. 20: 53–65. Available in “cluster” package)
- Gap statistic method (Tibshirani, R., Walther, G. and Hastie, T., 2001. Estimating the number of clusters in a data set via the gap statistic. JRSSB, 63: 411-423. Also available in package “cluster”)

Compare Cluster Results

Rand index
Adjusted rand index