#1. Expectation-Maximization (EM) Clustering Algorithm EM clustering algorithm is a method of clustering data points into groups. The algorithm is based on the Expectation-Maximization algorithm. It is also called Gaussian- Mixture model.

The Expectation-Maximization (EM) algorithm is an iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM algorithm iterates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.

The EM algorithm is used in many statistical applications, including normal mixture models and missing data problems. When applied to a mixture model, the EM algorithm provides a maximum likelihood estimate of the parameters of a mixture of probability distributions. The EM algorithm is used in bioinformatics, computational biology, engineering, finance, genomics, machine learning, medicine, physics, and social science.

library(mclust)  # For EM clustering

## Package 'mclust' version 6.1.1
## Type 'citation("mclust")' for citing this R package in publications.

library(ggplot2) # For visualization
library(factoextra) # For cluster visualization

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

#load data
data <- read.csv("M3_House_Worth.csv")
attach(data)
head(data)

##   HousePrice StoreArea BasementArea  LawnArea HouseNetWorth
## 1     138800      29.9           75 11.223911           Low
## 2     155000      44.0          504  9.689869        Medium
## 3     152000      46.2          493 10.192613        Medium
## 4     160000      46.2          510  6.817316        Medium
## 5     226000      48.7          445 10.916215        Medium
## 6     275000      56.4         1148  9.000686          High

summary(data)

##    HousePrice       StoreArea       BasementArea       LawnArea     
##  Min.   : 39300   Min.   :  1.80   Min.   :   0.0   Min.   : 6.214  
##  1st Qu.:115000   1st Qu.: 27.00   1st Qu.:   0.0   1st Qu.: 9.212  
##  Median :173950   Median : 47.60   Median : 402.5   Median : 9.923  
##  Mean   :213355   Mean   : 48.31   Mean   : 573.0   Mean   : 9.914  
##  3rd Qu.:294058   3rd Qu.: 67.30   3rd Qu.:1107.0   3rd Qu.:10.488  
##  Max.   :755000   Max.   :122.00   Max.   :2188.0   Max.   :21.539  
##  HouseNetWorth     
##  Length:316        
##  Class :character  
##  Mode  :character  
##                    
##                    
##

str(data)

## 'data.frame':    316 obs. of  5 variables:
##  $ HousePrice   : int  138800 155000 152000 160000 226000 275000 215000 392000 325000 151000 ...
##  $ StoreArea    : num  29.9 44 46.2 46.2 48.7 56.4 47.1 56.7 84 49.2 ...
##  $ BasementArea : int  75 504 493 510 445 1148 380 945 1572 506 ...
##  $ LawnArea     : num  11.22 9.69 10.19 6.82 10.92 ...
##  $ HouseNetWorth: chr  "Low" "Medium" "Medium" "Medium" ...

dim(data)

## [1] 316   5

# Select numerical features for clustering
data1 <- data[,c(1,2,3,4)]
head(data1)

##   HousePrice StoreArea BasementArea  LawnArea
## 1     138800      29.9           75 11.223911
## 2     155000      44.0          504  9.689869
## 3     152000      46.2          493 10.192613
## 4     160000      46.2          510  6.817316
## 5     226000      48.7          445 10.916215
## 6     275000      56.4         1148  9.000686

We normalize the features before clustering using z-score(z = (x - μ) / σ) due to the following

Equal Feature Influence: Without normalization, variables with larger scales would dominate the distance calculations compared to variables with smaller scales.
Convergence: Helps the EM algorithm converge faster by giving all features similar ranges.
Covariance Estimation: EM clustering relies on covariance matrices, which are scale-dependent.
Interpretability: Normalization makes it easier to interpret the results of the clustering algorithm.

For EM clustering, z-score normalization (scale()) is typically the best choice because:

It preserves the shape of the original distribution
Works well with Gaussian mixture models
Handles most outlier situations adequately

# Normalize the data
data1 <- scale(data1)
head(data1)

##      HousePrice   StoreArea BasementArea   LawnArea
## [1,] -0.6086554 -0.74477444   -0.8827550  0.8408859
## [2,] -0.4764016 -0.17444291   -0.1223336 -0.1435064
## [3,] -0.5008930 -0.08545501   -0.1418316  0.1791037
## [4,] -0.4355825 -0.08545501   -0.1116983 -1.9868181
## [5,]  0.1032292  0.01566760   -0.2269137  0.6434377
## [6,]  0.5032561  0.32712525    1.0191848 -0.5857539

We’ll now perform EM clustering on the normalized data using the Mclust() function from the mclust package. The function automatically determines the optimal number of clusters based on the Bayesian Information Criterion (BIC).

# Perform EM clustering
set.seed(42)
em_model <- Mclust(data1)
# View BIC values
print(em_model$BIC)

## Bayesian Information Criterion (BIC): 
##         EII       VII       EEI       VEI       EVI       VVI       EEE
## 1 -3611.849 -3611.849 -3629.116 -3629.116 -3629.116 -3629.116 -2642.520
## 2 -2901.321 -2769.193 -2735.068 -2603.683 -2722.601 -2587.465 -2526.189
## 3 -2683.474 -2632.697 -2692.100 -2329.741 -2569.688 -2222.906 -2477.199
## 4 -2655.087 -2368.734 -2503.781 -2214.208 -2387.337 -2041.275 -2377.722
## 5 -2602.538 -2211.217 -2508.157 -2104.288        NA        NA -2425.645
## 6 -2448.605 -2229.172 -2391.230 -2087.035        NA        NA -2417.365
## 7 -2477.337 -2180.705 -2420.009 -2089.881        NA        NA -2446.145
## 8 -2493.642 -2171.836 -2448.090 -2093.633        NA        NA -2474.503
## 9 -2522.203 -2167.241 -2476.869 -2089.756        NA        NA -2503.289
##         VEE       EVE       VVE       EEV       VEV       EVV       VVV
## 1 -2642.520 -2642.520 -2642.520 -2642.520 -2642.520 -2642.520 -2642.520
## 2 -2334.228 -2443.045 -2367.020 -2443.993 -2301.208 -2409.614 -2316.940
## 3 -2278.269 -2343.338 -2288.086 -2345.279 -2200.319 -2314.356 -2226.943
## 4 -2201.483 -2282.390 -2180.795 -2302.190 -2061.420 -2269.319 -2063.955
## 5 -2081.083 -2292.995 -2057.572 -2305.715 -2048.567        NA -2076.457
## 6 -2077.642 -2291.765 -2091.458 -2263.600 -2101.814 -2284.109 -2144.805
## 7 -2083.939 -2338.048 -2060.719 -2274.790 -2037.341        NA        NA
## 8 -2081.882 -2225.683 -2097.764 -2355.057 -2037.243        NA        NA
## 9 -2080.264 -2284.620 -2101.035 -2414.578 -2090.775        NA        NA
## 
## Top 3 models based on the BIC criterion: 
##     VEV,8     VEV,7     VVI,4 
## -2037.243 -2037.341 -2041.275

Observation: BIC = -2 × log-likelihood + k × log(n), where k is the number of parameters in the model and n is the number of data points. The BIC value is used to compare models, BIC with lower values indicate better model.

Interpreting the BIC Plot * X-axis: Number of clusters

Y-axis: BIC value

Different Lines: Represent covariance structures:

EII: Spherical, equal volume
VII: Spherical, unequal volume
EEI: Diagonal, equal volume/shape
VEI: Diagonal, varying volume
EVI: Diagonal, equal volume, varying shape
VVI: Diagonal, varying volume/shape

Extract the number of clusters from the model.

# Extract the number of clusters
cat("number of cluaters:",em_model$G, "\n")

## number of cluaters: 8

Visualize classification plot to show how data points are assigned to clusters.

#Classification visualization
plot(em_model, what = "classification")

Observation: The classification plot shows how data points are assigned to clusters. The colors represent different clusters, and the points are the data points.

Visualize the clusters using a pairwise scatter plot.

Pairwise scatter plots reveal cluster separation and the relationships between features. The plot shows the relationship between each pair of features in the data set.

# Visualize the clusters
fviz_cluster(list(data = data1, cluster = em_model$classification), geom = "point")

#Get the optimal number of clusters

# Visualize BIC and get the optimal number of clusters
plot(em_model, what = "BIC", ylim = range(em_model$BIC, na.rm = TRUE))

observation: The optimal number of clusters is 3. We’ll now assign the data points to their respective clusters using the classification() function.

The uncertainty plot reveals which points the model is least confident about.

#uncertainity plot
plot(em_model, what = "uncertainty")

observation: The uncertainty plot shows the uncertainty of the model in assigning data points to clusters. The points with higher uncertainty are less confidently assigned to clusters.

Cluster profiles show characteristics of each group

# Cluster profiles
summary(em_model)

## ---------------------------------------------------- 
## Gaussian finite mixture model fitted by EM algorithm 
## ---------------------------------------------------- 
## 
## Mclust VEV (ellipsoidal, equal shape) model with 8 components: 
## 
##  log-likelihood   n df       BIC       ICL
##       -736.5901 316 98 -2037.243 -2069.621
## 
## Clustering table:
##  1  2  3  4  5  6  7  8 
## 47 42 14 66 26 70 11 40

observation: The cluster profiles show the characteristics of each group, including the number of data points, the mean values of each feature, and the covariance matrix.

Potential Enhancements:

Feature Selection: Use feature selection techniques to identify the most relevant features for clustering.

# Try different combinations of features i.e and create a model
data2 <- data[,c(1, 3)]
head(data2,3)

##   HousePrice BasementArea
## 1     138800           75
## 2     155000          504
## 3     152000          493

I will try to use 3 clusters

# Force a specific number of clusters
em_model_3 <- Mclust(data1, G = 3)

# Visualize the clusters
fviz_cluster(list(data = data1, cluster = em_model_3$classification), geom = "point")

# Visualize BIC and get the optimal number of clusters
plot(em_model_3, what = "BIC", ylim = range(em_model_3$BIC, na.rm = TRUE))

observation: The optimal number of clusters is 3. We’ll now assign the data points to their respective clusters using the classification() function.

assign the cluster names.

# Assign cluster names
cluster_names <- c("Low priced", "Medium priced", "High priced")
em_model_3$classification <- factor(em_model_3$classification, labels = cluster_names)

#assign to original data to confirm the model perfomance
data$cluster <- em_model_3$classification
head(data)

##   HousePrice StoreArea BasementArea  LawnArea HouseNetWorth       cluster
## 1     138800      29.9           75 11.223911           Low    Low priced
## 2     155000      44.0          504  9.689869        Medium Medium priced
## 3     152000      46.2          493 10.192613        Medium Medium priced
## 4     160000      46.2          510  6.817316        Medium   High priced
## 5     226000      48.7          445 10.916215        Medium Medium priced
## 6     275000      56.4         1148  9.000686          High   High priced

Observayion: The data points have been assigned to clusters based on the EM clustering model with 3 clusters. The clusters are named “Low priced,” “Medium priced,” and “High priced.”

#2. K-Means Clustering Algorithm K-means clustering is a method of clustering data points into groups. The algorithm is based on the K-means algorithm. It is also called K-means clustering.

The K-means algorithm is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

The K-means algorithm is used in many statistical applications, including data mining, machine learning, pattern recognition, image analysis, and bioinformatics. It is also used in clustering, classification, and anomaly detection.

#load data
data2 <- read.csv("M3_House_Worth.csv")
attach(data2)

## The following objects are masked from data:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

(head(data2,3))

##   HousePrice StoreArea BasementArea  LawnArea HouseNetWorth
## 1     138800      29.9           75 11.223911           Low
## 2     155000      44.0          504  9.689869        Medium
## 3     152000      46.2          493 10.192613        Medium

# Select numerical features for clustering
data3 <- data2[,c(1,2,3,4)]
head(data3,3)

##   HousePrice StoreArea BasementArea  LawnArea
## 1     138800      29.9           75 11.223911
## 2     155000      44.0          504  9.689869
## 3     152000      46.2          493 10.192613

We normalize the features before clustering using z-score(z = (x - μ) / σ)

# Normalize the data
data3_scale <- scale(data3)
head(data3_scale,3)

##      HousePrice   StoreArea BasementArea   LawnArea
## [1,] -0.6086554 -0.74477444   -0.8827550  0.8408859
## [2,] -0.4764016 -0.17444291   -0.1223336 -0.1435064
## [3,] -0.5008930 -0.08545501   -0.1418316  0.1791037

we need to know the number of clusters to use in the K-means algorithm. We can use the 3 methods to determine the optimal number of clusters.

Elbow Method: The Elbow method is a heuristic used to determine the optimal number of clusters in a dataset. It works by plotting the explained variance as a function of the number of clusters, and the “elbow” point is the optimal number of clusters.

# Elbow Method
#install.packages("factoextra")
library(factoextra)

#Using elbow method
fviz_nbclust(data3_scale[,1:4], kmeans, method = "wss") + theme_minimal()

**observation*: The Elbow method suggests that the optimal number of clusters is 2. We’ll now use the Silhouette method and Gap statistic to confirm this.

Silhouette Method: The Silhouette method is a measure of how similar an object is to its cluster compared to other clusters. It ranges from -1 to 1, where a high value indicates that the object is well matched to its cluster and poorly matched to neighboring clusters.

# Silhouette Method
fviz_nbclust(data3_scale[,1:4], kmeans, method = "silhouette") + theme_minimal()

Observation: The Silhouette method suggests that the optimal number of clusters is 2. We’ll now use the Gap statistic to confirm this.

Gap Statistic: The Gap statistic compares the total within intra-cluster variation for different values of k with their expected values under null reference distribution of the data. The optimal number of clusters is the one that maximizes the gap statistic.

# Gap Statistic
fviz_nbclust(data3_scale[,1:4], kmeans, method = "gap_stat") + theme_minimal()

Observation: The Gap statistic suggests that the optimal number of clusters is 2. We’ll now perform K-means clustering with 2 clusters using the kmeans() function.

# Perform K-means clustering
set.seed(42)
kmeans_model <- kmeans(data3_scale, centers = 2, nstart = 25, iter.max = 10)
summary(kmeans_model)

##              Length Class  Mode   
## cluster      316    -none- numeric
## centers        8    -none- numeric
## totss          1    -none- numeric
## withinss       2    -none- numeric
## tot.withinss   1    -none- numeric
## betweenss      1    -none- numeric
## size           2    -none- numeric
## iter           1    -none- numeric
## ifault         1    -none- numeric

Observation: The K-means clustering model has been created with 2 clusters. The summary() function provides information about the cluster centers and sizes.

Visualize classification plot to show how data points are assigned to clusters.

# Classification visualization
fviz_cluster(kmeans_model, data = data3_scale, geom = "point")

observation: The classification plot shows how data points are assigned to clusters. The colors represent different clusters, and the points are the data points.

Visualize the clusters using a clusplot

# Visualize the clusters
library(cluster)
clusplot(data3_scale, kmeans_model$cluster, 
         color = TRUE, shade = TRUE, labels = 2)

observation: The clusplot shows the clusters in a scatter plot with the first two principal components. The colors represent different clusters, and the points are the data points.

# Create the visualization
library(factoextra)
fviz_cluster(kmeans_model, data = data3_scale, 
             ellipse.type = "convex",  # Adds convex hulls
             repel = TRUE)             # Avoids label overlapping

we’ll try to create the model with k = 3

#K-means with k = 3
set.seed(42)
kmeans_model2 <- kmeans(data3_scale, centers = 3, nstart = 25, iter.max = 10)
#summary(kmeans_model2)
# Classification visualization using fviz
fviz_cluster(kmeans_model2, data = data3_scale, geom = "point")

# Visualize the clusters using clusplot
library(cluster)
clusplot(data3_scale, kmeans_model2$cluster, 
         color = TRUE, shade = TRUE, labels = 2)

# Create the visualization using convex
library(factoextra)
fviz_cluster(kmeans_model2, data = data3_scale, 
             ellipse.type = "convex",  # Adds convex hulls
             repel = TRUE)             # Avoids label overlapping

#assign three clusters to low, medium and high

# Add cluster assignments to original data
data2$cluster <- kmeans_model2$cluster
head(data2)

##   HousePrice StoreArea BasementArea  LawnArea HouseNetWorth cluster
## 1     138800      29.9           75 11.223911           Low       2
## 2     155000      44.0          504  9.689869        Medium       2
## 3     152000      46.2          493 10.192613        Medium       2
## 4     160000      46.2          510  6.817316        Medium       2
## 5     226000      48.7          445 10.916215        Medium       2
## 6     275000      56.4         1148  9.000686          High       1

observtion: The data points have been assigned to clusters based on the K-means clustering model with 3 clusters. The clusters are named “Low priced,” “Medium priced,” and “High priced.”

#3. Kmedian Clustering K-medians clustering is a method of clustering data points into groups. The algorithm is based on the K-medians algorithm. It is also called K-medians clustering.

The K-medians algorithm is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest median (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

The K-medians algorithm is used in many statistical applications, including data mining, machine learning, pattern recognition, image analysis, and bioinformatics. It is also used in clustering, classification, and anomaly detection.

We Use pam (Partitioning Around Medoids, more robust) instead of kmeans for K-medians clustering.

#load data
data4 <- read.csv("M3_House_Worth.csv")
attach(data4)

## The following objects are masked from data2:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

## The following objects are masked from data:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

(head(data4,3))

##   HousePrice StoreArea BasementArea  LawnArea HouseNetWorth
## 1     138800      29.9           75 11.223911           Low
## 2     155000      44.0          504  9.689869        Medium
## 3     152000      46.2          493 10.192613        Medium

# Select numerical features for clustering
data5 <- data4[,c(1,2,3,4)]
head(data5,3)

##   HousePrice StoreArea BasementArea  LawnArea
## 1     138800      29.9           75 11.223911
## 2     155000      44.0          504  9.689869
## 3     152000      46.2          493 10.192613

We normalize the features before clustering using z-score(z = (x - μ) / σ)

# Normalize the data
data5_scale <- scale(data5)
head(data5_scale,3)

##      HousePrice   StoreArea BasementArea   LawnArea
## [1,] -0.6086554 -0.74477444   -0.8827550  0.8408859
## [2,] -0.4764016 -0.17444291   -0.1223336 -0.1435064
## [3,] -0.5008930 -0.08545501   -0.1418316  0.1791037

# Perform K-medians clustering
library(cluster)
set.seed(42)
kmedians <- pam(data5_scale, k = 3, metric = "manhattan")

summary(kmedians)

## Medoids:
##       ID HousePrice    StoreArea BasementArea    LawnArea
## [1,] 229 -0.9294933 -0.886346097   -1.0156958 -0.31325763
## [2,] 121 -0.3294529  0.003532891   -0.3066782 -0.08886448
## [3,] 226  0.7563343  1.168465385    1.1432629  0.28049931
## Clustering vector:
##   [1] 1 2 2 2 2 3 2 3 3 2 1 1 1 1 3 1 1 3 2 1 1 3 3 3 1 2 3 1 2 1 1 2 2 3 3 3 2
##  [38] 3 1 1 3 1 3 1 2 1 1 3 2 3 1 1 1 3 1 3 3 3 3 1 1 2 3 1 2 3 2 1 3 1 3 1 1 3
##  [75] 1 1 2 2 2 1 3 2 1 3 3 3 3 1 3 1 1 2 1 3 3 1 1 2 3 3 1 2 2 2 2 1 2 3 3 3 3
## [112] 1 3 2 1 1 3 1 1 2 2 1 2 3 1 3 1 1 3 2 1 2 3 3 1 1 3 1 3 3 3 3 2 3 3 1 1 1
## [149] 2 3 1 1 1 3 1 2 1 3 3 1 3 3 1 1 3 2 1 1 1 2 1 1 3 1 1 2 3 3 2 2 1 2 3 3 3
## [186] 3 3 1 1 1 1 2 3 2 3 1 3 1 1 2 3 3 3 2 2 2 3 3 3 1 2 3 1 1 1 2 1 1 3 1 3 1
## [223] 1 3 1 3 1 2 1 1 3 1 1 3 3 3 3 3 1 3 2 1 1 1 1 3 3 3 3 2 3 2 1 3 2 1 1 3 1
## [260] 3 1 3 3 1 3 1 1 1 3 2 3 3 3 3 3 2 2 1 1 1 1 1 3 1 3 3 1 1 3 1 3 1 3 2 1 1
## [297] 3 1 3 1 1 1 3 3 3 1 3 2 3 1 3 1 3 1 3 1
## Objective function:
##    build     swap 
## 1.412042 1.412042 
## 
## Numerical information per cluster:
##      size max_diss  av_diss  diameter separation
## [1,]  131 3.485703 1.038048  5.520093  0.5241868
## [2,]   61 3.800782 1.090812  5.910857  0.5241868
## [3,]  124 8.822645 1.965172 12.406079  0.5748856
## 
## Isolated clusters:
##  L-clusters: character(0)
##  L*-clusters: character(0)
## 
## Silhouette plot information:
##     cluster neighbor     sil_width
## 268       1        2  0.6617098809
## 239       1        2  0.6611502208
## 222       1        2  0.6581987461
## 164       1        2  0.6580703576
## 244       1        2  0.6574009050
## 168       1        2  0.6563846141
## 44        1        2  0.6563813039
## 73        1        2  0.6517139431
## 40        1        2  0.6506182890
## 163       1        2  0.6463300080
## 112       1        2  0.6449818469
## 93        1        2  0.6448695394
## 296       1        2  0.6429913750
## 215       1        2  0.6412140757
## 169       1        2  0.6393082312
## 261       1        2  0.6377599978
## 229       1        2  0.6366882554
## 188       1        2  0.6338175664
## 281       1        2  0.6319547454
## 175       1        2  0.6294187291
## 138       1        2  0.6274242277
## 213       1        2  0.6270310117
## 198       1        2  0.6248666083
## 28        1        2  0.6221336148
## 253       1        2  0.6215665824
## 298       1        2  0.6206858851
## 91        1        2  0.6178702367
## 127       1        2  0.6178186897
## 30        1        2  0.6178176540
## 17        1        2  0.6149639502
## 122       1        2  0.6131761067
## 53        1        2  0.6129373717
## 11        1        2  0.6055084140
## 232       1        2  0.6051892024
## 152       1        2  0.6050403879
## 115       1        2  0.5996537972
## 128       1        2  0.5978830907
## 233       1        2  0.5944367121
## 174       1        2  0.5943716467
## 227       1        2  0.5932870486
## 214       1        2  0.5923095843
## 280       1        2  0.5919813933
## 151       1        2  0.5905115762
## 257       1        2  0.5879307753
## 39        1        2  0.5874369242
## 167       1        2  0.5808857425
## 210       1        2  0.5797096324
## 292       1        2  0.5792947954
## 16        1        2  0.5783110686
## 243       1        2  0.5782383288
## 13        1        2  0.5780570405
## 256       1        2  0.5773281600
## 223       1        2  0.5771579467
## 278       1        2  0.5769840968
## 181       1        2  0.5759846669
## 76        1        2  0.5758338652
## 131       1        2  0.5753044249
## 302       1        2  0.5742310792
## 42        1        2  0.5731983895
## 284       1        2  0.5727700516
## 21        1        2  0.5709101719
## 118       1        2  0.5632381640
## 153       1        2  0.5615522173
## 196       1        2  0.5580645164
## 266       1        2  0.5566408653
## 61        1        2  0.5550615390
## 199       1        2  0.5541610831
## 119       1        2  0.5507968659
## 259       1        2  0.5489072396
## 64        1        2  0.5487564615
## 47        1        2  0.5482356731
## 31        1        2  0.5433772540
## 230       1        2  0.5369451575
## 125       1        2  0.5256326724
## 160       1        2  0.5237546315
## 55        1        2  0.5185763994
## 51        1        2  0.5147971567
## 136       1        2  0.5130994405
## 20        1        2  0.5072491791
## 171       1        2  0.5031999938
## 267       1        2  0.4950950489
## 287       1        2  0.4930125977
## 68        1        2  0.4895160005
## 46        1        2  0.4883337108
## 301       1        2  0.4866386726
## 288       1        2  0.4822921216
## 90        1        2  0.4785870942
## 72        1        2  0.4744828611
## 96        1        2  0.4736438184
## 242       1        2  0.4698331772
## 220       1        2  0.4688567866
## 218       1        2  0.4679865685
## 314       1        2  0.4639638547
## 75        1        2  0.4630952175
## 245       1        2  0.4628192486
## 172       1        2  0.4622140450
## 290       1        2  0.4576214292
## 25        1        2  0.4449071920
## 191       1        2  0.4413977393
## 148       1        2  0.4374031733
## 190       1        2  0.4303362007
## 97        1        2  0.4196395924
## 80        1        2  0.4133797149
## 70        1        2  0.4120505277
## 295       1        2  0.4109259940
## 157       1        2  0.4105675997
## 310       1        2  0.4083390456
## 316       1        2  0.4025960423
## 189       1        2  0.3910579249
## 147       1        2  0.3693183473
## 88        1        2  0.3681025756
## 146       1        2  0.3661469171
## 101       1        2  0.3611287546
## 52        1        2  0.3464682341
## 116       1        2  0.3407345029
## 14        1        2  0.3358821858
## 106       1        2  0.3293899604
## 217       1        2  0.3079177144
## 83        1        2  0.2974697010
## 282       1        2  0.2735922022
## 279       1        2  0.2711273010
## 155       1        2  0.2449151186
## 135       1        2  0.2255088411
## 264       1        2  0.2210306593
## 312       1        2  0.2162382362
## 306       1        2  0.2125157702
## 1         1        2  0.1999397278
## 300       1        2  0.1774370261
## 225       1        2  0.1389991288
## 60        1        2  0.1381827977
## 12        1        2  0.0790465011
## 294       2        1  0.6194821601
## 276       2        1  0.6140001286
## 65        2        1  0.6126896969
## 211       2        1  0.6096018286
## 176       2        1  0.6078874840
## 130       2        1  0.6062683799
## 216       2        1  0.6000051831
## 7         2        1  0.5937277743
## 5         2        1  0.5901120124
## 182       2        1  0.5894467553
## 121       2        1  0.5884166048
## 200       2        1  0.5855228217
## 250       2        1  0.5766583197
## 123       2        1  0.5733467983
## 45        2        1  0.5714570256
## 228       2        1  0.5676345405
## 205       2        1  0.5575192656
## 3         2        1  0.5572578756
## 194       2        1  0.5511206272
## 270       2        1  0.5420470483
## 156       2        1  0.5383816410
## 170       2        1  0.5325074009
## 192       2        1  0.5291847630
## 105       2        1  0.5274995904
## 103       2        1  0.5233875644
## 10        2        1  0.5215091721
## 2         2        1  0.5171843694
## 107       2        1  0.5113015434
## 37        2        1  0.5073181425
## 179       2        1  0.5056460660
## 149       2        1  0.4997497286
## 102       2        1  0.4987230860
## 78        2        1  0.4823894585
## 82        2        1  0.4712982825
## 166       2        1  0.4675951229
## 143       2        1  0.4656947153
## 120       2        1  0.4654995356
## 49        2        1  0.4529787344
## 241       2        1  0.4515657343
## 277       2        1  0.4380118575
## 67        2        1  0.4166839642
## 98        2        1  0.3779468374
## 308       2        3  0.3744536843
## 92        2        1  0.3280351990
## 62        2        1  0.3151938688
## 114       2        1  0.3090971894
## 77        2        1  0.3066664508
## 104       2        1  0.3056337633
## 33        2        1  0.3027156388
## 204       2        1  0.3000038898
## 255       2        3  0.2931695061
## 4         2        1  0.2903282487
## 180       2        1  0.2853244514
## 252       2        3  0.2761636957
## 32        2        1  0.2711459542
## 19        2        1  0.2703009074
## 26        2        3  0.2381279748
## 206       2        3  0.2187413315
## 79        2        3  0.1257787759
## 132       2        1  0.0879460701
## 29        2        1  0.0394093160
## 142       3        2  0.5732539277
## 56        3        2  0.5692458124
## 27        3        2  0.5675090472
## 238       3        2  0.5634592282
## 108       3        2  0.5622317506
## 87        3        2  0.5597019957
## 291       3        2  0.5581524834
## 50        3        2  0.5564390720
## 9         3        2  0.5555684790
## 18        3        2  0.5519395993
## 95        3        2  0.5484154020
## 209       3        2  0.5480396596
## 263       3        2  0.5470309090
## 81        3        2  0.5468708443
## 265       3        2  0.5460011841
## 99        3        2  0.5454551156
## 41        3        2  0.5447195693
## 111       3        2  0.5403554148
## 297       3        2  0.5380927848
## 158       3        2  0.5361525243
## 240       3        2  0.5306644723
## 226       3        2  0.5300532604
## 74        3        2  0.5276690279
## 69        3        2  0.5251986993
## 246       3        2  0.5230122443
## 235       3        2  0.5212522646
## 249       3        2  0.5103423734
## 262       3        2  0.5005259940
## 117       3        2  0.4998802845
## 269       3        2  0.4962539343
## 129       3        2  0.4951520106
## 219       3        2  0.4918313317
## 24        3        2  0.4910705753
## 307       3        2  0.4874921809
## 203       3        2  0.4866578119
## 236       3        2  0.4863175960
## 273       3        2  0.4831820303
## 274       3        2  0.4776584697
## 289       3        2  0.4751891968
## 311       3        2  0.4744295076
## 260       3        2  0.4725343895
## 86        3        2  0.4682725334
## 161       3        2  0.4677919524
## 186       3        2  0.4669798055
## 195       3        2  0.4663240815
## 231       3        2  0.4649726033
## 137       3        2  0.4647214319
## 224       3        2  0.4614711517
## 150       3        2  0.4578736328
## 184       3        2  0.4546288695
## 234       3        2  0.4446393199
## 66        3        2  0.4422713473
## 133       3        2  0.4382107770
## 187       3        2  0.4284327842
## 247       3        2  0.4257551774
## 258       3        2  0.4223164576
## 43        3        2  0.4210585195
## 154       3        2  0.4203459486
## 299       3        2  0.4139226176
## 140       3        2  0.4135778600
## 84        3        2  0.4118414085
## 35        3        2  0.4108832297
## 59        3        2  0.4081823365
## 177       3        2  0.4070440840
## 165       3        2  0.3974722561
## 237       3        2  0.3963710405
## 110       3        2  0.3940630201
## 71        3        2  0.3930854629
## 57        3        2  0.3901753756
## 303       3        2  0.3857788051
## 109       3        2  0.3837549196
## 185       3        2  0.3780713350
## 286       3        2  0.3753449834
## 315       3        2  0.3727446895
## 309       3        2  0.3702672492
## 134       3        2  0.3542099903
## 202       3        2  0.3512761352
## 304       3        2  0.3475358053
## 173       3        2  0.3430783451
## 23        3        2  0.3417085857
## 248       3        2  0.3401941808
## 159       3        2  0.3394109232
## 38        3        2  0.3378700513
## 58        3        2  0.3250503006
## 207       3        2  0.3248106388
## 144       3        2  0.3236126804
## 54        3        2  0.3178829903
## 141       3        2  0.3135990171
## 22        3        2  0.3134218685
## 283       3        2  0.2875329379
## 113       3        2  0.2583325040
## 36        3        2  0.2530923054
## 15        3        2  0.2508491901
## 313       3        2  0.2492584689
## 272       3        2  0.1993447380
## 8         3        2  0.1885170428
## 305       3        2  0.1720941073
## 100       3        2  0.1675335876
## 89        3        2  0.1657144354
## 124       3        2  0.1438096909
## 275       3        2  0.1383467678
## 34        3        2  0.1337208857
## 178       3        2  0.1324638650
## 212       3        2  0.1160429932
## 63        3        2  0.1017507453
## 126       3        2  0.1016448844
## 271       3        2  0.0982372161
## 293       3        2  0.0921486417
## 197       3        2  0.0904958495
## 6         3        2  0.0827873125
## 48        3        2  0.0726827093
## 139       3        2 -0.0005316276
## 193       3        2 -0.0017411301
## 201       3        2 -0.0111769463
## 208       3        2 -0.0148978862
## 94        3        2 -0.0163081130
## 285       3        2 -0.0246584193
## 251       3        2 -0.0366183369
## 254       3        2 -0.0575974951
## 85        3        2 -0.1020291014
## 183       3        2 -0.1109709107
## 162       3        2 -0.1113224469
## 145       3        2 -0.1289099725
## 221       3        2 -0.1562342434
## Average silhouette width per cluster:
## [1] 0.5076544 0.4484344 0.3488767
## Average silhouette width of total data set:
## [1] 0.4339175
## 
## Available components:
##  [1] "medoids"    "id.med"     "clustering" "objective"  "isolation" 
##  [6] "clusinfo"   "silinfo"    "diss"       "call"       "data"

# Classification visualization
fviz_cluster(kmedians, data = data5_scale, geom = "point")

The median doesn’t partition well as compared to k-means and EM clustering. The clusters are not well separated, and the classification plot shows overlapping data points.

K median can be used due to the following * More robust to outliers than k-means (uses median instead of mean). * Works well with Manhattan distance (L1 norm) instead of Euclidean (L2). * Can be used with non-numeric data (k-means cannot).

#4. DBSCAN Clustering DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups together data points that are closely packed together (dense regions) while marking data points in low-density regions as outliers. It is based on the concept of density reachability and density connectivity.

The DBSCAN algorithm is used in many statistical applications, including data mining, machine learning, pattern recognition, image analysis, and bioinformatics. It is also used in clustering, classification, and anomaly detection.

#load data
data6 <- read.csv("M3_House_Worth.csv")
attach(data6)

## The following objects are masked from data4:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

## The following objects are masked from data2:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

## The following objects are masked from data:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

(head(data6,3))

##   HousePrice StoreArea BasementArea  LawnArea HouseNetWorth
## 1     138800      29.9           75 11.223911           Low
## 2     155000      44.0          504  9.689869        Medium
## 3     152000      46.2          493 10.192613        Medium

# Select numerical features for clustering

data7 <- data6[,c(1,2,3,4)]
head(data7,3)

##   HousePrice StoreArea BasementArea  LawnArea
## 1     138800      29.9           75 11.223911
## 2     155000      44.0          504  9.689869
## 3     152000      46.2          493 10.192613

# Normalize the data
data7_scale <- scale(data7)
head(data7_scale)

##      HousePrice   StoreArea BasementArea   LawnArea
## [1,] -0.6086554 -0.74477444   -0.8827550  0.8408859
## [2,] -0.4764016 -0.17444291   -0.1223336 -0.1435064
## [3,] -0.5008930 -0.08545501   -0.1418316  0.1791037
## [4,] -0.4355825 -0.08545501   -0.1116983 -1.9868181
## [5,]  0.1032292  0.01566760   -0.2269137  0.6434377
## [6,]  0.5032561  0.32712525    1.0191848 -0.5857539

# Perform DBSCAN clustering
library(fpc)
set.seed(42)
dbscan_model <- dbscan(data7_scale, eps = 0.5, MinPts = 5)
summary(dbscan_model)

##         Length Class  Mode   
## cluster 316    -none- numeric
## eps       1    -none- numeric
## MinPts    1    -none- numeric
## isseed  316    -none- logical

# Classification visualization
fviz_cluster(dbscan_model, data = data7_scale, geom = "point")

observation: There’s 3 clusters, 1 noise point, and 1 outlier point. The classification plot shows how data points are assigned to clusters, noise points, and outliers.

#asign values to clusters
data6$cluster <- dbscan_model$cluster
head(data6)

##   HousePrice StoreArea BasementArea  LawnArea HouseNetWorth cluster
## 1     138800      29.9           75 11.223911           Low       2
## 2     155000      44.0          504  9.689869        Medium       1
## 3     152000      46.2          493 10.192613        Medium       1
## 4     160000      46.2          510  6.817316        Medium       4
## 5     226000      48.7          445 10.916215        Medium       1
## 6     275000      56.4         1148  9.000686          High       0

why use dbscan? * Works well with non-linear data and complex shapes. * Can handle noise and outliers effectively. * Does not require the number of clusters as input.

#5. Identify/Describe at-least three other clustering methods.

Hierarchical Clustering: Hierarchical clustering is a method of clustering data points into groups based on their similarity. The algorithm builds a hierarchy of clusters by either merging or splitting data points based on their distance or similarity. It is used in many statistical applications, including data mining, machine learning, pattern recognition, image analysis, and bioinformatics. It is also used in clustering, classification, and anomaly detection.
Spectral Clustering: Spectral clustering is a method of clustering data points into groups based on the eigenvectors of a similarity matrix. The algorithm uses the spectral properties of the similarity matrix to partition the data points into clusters. It is used in many statistical applications, including data mining, machine learning, pattern recognition, image analysis, and bioinformatics. It is also used in clustering, classification, and anomaly detection.

# Perform Spectral clustering
#load data
data7 <- read.csv("M3_House_Worth.csv")
attach(data7)

## The following objects are masked from data6:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

## The following objects are masked from data4:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

## The following objects are masked from data2:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

## The following objects are masked from data:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

(head(data7,3))

##   HousePrice StoreArea BasementArea  LawnArea HouseNetWorth
## 1     138800      29.9           75 11.223911           Low
## 2     155000      44.0          504  9.689869        Medium
## 3     152000      46.2          493 10.192613        Medium

library(cluster)
# Select numerical features for clustering
data7 <- data7[,c(1,2,3,4)]
head(data7,3)

##   HousePrice StoreArea BasementArea  LawnArea
## 1     138800      29.9           75 11.223911
## 2     155000      44.0          504  9.689869
## 3     152000      46.2          493 10.192613

# Normalize the data
data8_scale <- scale(data7)

library(kernlab)

## 
## Attaching package: 'kernlab'

## The following object is masked from 'package:ggplot2':
## 
##     alpha

set.seed(42)
spectral_model <- specc(data8_scale, centers = 3)

#clusters
plot(data, col = spectral_model, pch = 19, main = "Spectral Clustering")

Observations: The spectral clustering algorithm has created 3 clusters based on the data points spectral properties. The clusters are well separated, and the data points are assigned to their respective clusters.

We can tune the above to have a better clusters of 2

#Tune the model
sigma_est <- sigest(data8_scale, frac = 0.1)[2]  # Estimate sigma
spectral_model <- specc(data8_scale, centers = 2, kernel = "rbfdot", sigma = sigma_est)
summary(spectral_model)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   2.000   1.585   2.000   2.000

#visualize clusters
plot(data8_scale, col = spectral_model, pch = 19, main = "Spectral Clustering")

We have 2 distinct clusters based on the spectral clustering algorithm. The clusters are well separated, and the data points are assigned to their respective clusters.

When to Use Spectral Clustering?

Use when:

Data has complex shapes (e.g., nested circles).
Traditional methods (k-means) fail.

Avoid when:

Data is very high-dimensional (use PCA first).
Computational efficiency is critical.

# Assign cluster names
cluster_names <- c("Low priced", "High priced")
spectral_model <- factor(spectral_model, labels = cluster_names)

# Add cluster assignments to original data
data7$cluster <- spectral_model
head(data7)

##   HousePrice StoreArea BasementArea  LawnArea     cluster
## 1     138800      29.9           75 11.223911 High priced
## 2     155000      44.0          504  9.689869 High priced
## 3     152000      46.2          493 10.192613 High priced
## 4     160000      46.2          510  6.817316 High priced
## 5     226000      48.7          445 10.916215 High priced
## 6     275000      56.4         1148  9.000686  Low priced

Affinity Propagation: Affinity Propagation is a method of clustering data points into groups based on their similarity. The algorithm uses message passing to find the exemplars (representative data points) that best describe the data. It is used in many statistical applications, including data mining, machine learning, pattern recognition, image analysis, and bioinformatics. It is also used in clustering, classification, and anomaly detection.

# Perform Affinity Propagation clustering
#load data
data9 <- read.csv("M3_House_Worth.csv")
attach(data9)

## The following objects are masked from data7:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

## The following objects are masked from data6:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

## The following objects are masked from data4:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

## The following objects are masked from data2:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

## The following objects are masked from data:
## 
##     BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea

(head(data9,3))

##   HousePrice StoreArea BasementArea  LawnArea HouseNetWorth
## 1     138800      29.9           75 11.223911           Low
## 2     155000      44.0          504  9.689869        Medium
## 3     152000      46.2          493 10.192613        Medium

# Select numerical features for clustering
data9 <- data9[,c(1,2,3,4)]
head(data9,3)

##   HousePrice StoreArea BasementArea  LawnArea
## 1     138800      29.9           75 11.223911
## 2     155000      44.0          504  9.689869
## 3     152000      46.2          493 10.192613

# Normalize the data
data10_scale <- scale(data9)

library(apcluster)

## 
## Attaching package: 'apcluster'

## The following object is masked from 'package:stats':
## 
##     heatmap

set.seed(42)

# Perform Affinity Propagation clustering
model7 <- apcluster(negDistMat(r = 2), data10_scale, q = 0.5)

#visualize cluster
summary(model7)

##   Length    Class     Mode 
##       18 APResult       S4

clusters <- model7@clusters  # List of clusters
print(clusters)

## [[1]]
##   6   8  26  34  36 145 178 197 208 272 
##   6   8  26  34  36 145 178 197 208 272 
## 
## [[2]]
##   4  19  32  33  62  92 300 
##   4  19  32  33  62  92 300 
## 
## [[3]]
##  35  58  66  69  86 109 117 126 129 134 141 144 159 161 185 186 203 231 236 246 
##  35  58  66  69  86 109 117 126 129 134 141 144 159 161 185 186 203 231 236 246 
## 248 249 251 254 260 263 273 274 286 289 293 311 313 
## 248 249 251 254 260 263 273 274 286 289 293 311 313 
## 
## [[4]]
##  27  43  50  56  74  87  95  99 108 111 137 142 219 224 238 240 291 297 303 
##  27  43  50  56  74  87  95  99 108 111 137 142 219 224 238 240 291 297 303 
## 
## [[5]]
##  63  94 204 283 
##  63  94 204 283 
## 
## [[6]]
## 100 
## 100 
## 
## [[7]]
##   2   7  10  29  37  49  65  67  77  78  82  98 102 104 105 107 121 123 149 156 
##   2   7  10  29  37  49  65  67  77  78  82  98 102 104 105 107 121 123 149 156 
## 179 180 192 216 228 241 270 276 277 294 
## 179 180 192 216 228 241 270 276 277 294 
## 
## [[8]]
##   3   5  45 103 114 120 130 143 166 170 176 182 194 200 205 211 250 
##   3   5  45 103 114 120 130 143 166 170 176 182 194 200 205 211 250 
## 
## [[9]]
##  12  60  83 132 155 264 306 312 
##  12  60  83 132 155 264 306 312 
## 
## [[10]]
##  21  25  28  39  40  42  44  51  52  53  64  73  93  96 122 125 136 152 160 163 
##  21  25  28  39  40  42  44  51  52  53  64  73  93  96 122 125 136 152 160 163 
## 164 168 169 171 175 188 190 196 198 218 220 229 239 244 259 261 267 268 280 281 
## 164 168 169 171 175 188 190 196 198 218 220 229 239 244 259 261 267 268 280 281 
## 284 298 
## 284 298 
## 
## [[11]]
##  13  16  31  61  76 112 119 138 174 213 215 222 223 253 256 292 302 
##  13  16  31  61  76 112 119 138 174 213 215 222 223 253 256 292 302 
## 
## [[12]]
## 140 150 234 
## 140 150 234 
## 
## [[13]]
##  84 165 237 299 
##  84 165 237 299 
## 
## [[14]]
##   1  11  14  17  20  30  46  47  55  72  75  80  88  91  97 101 106 115 118 127 
##   1  11  14  17  20  30  46  47  55  72  75  80  88  91  97 101 106 115 118 127 
## 128 131 135 146 148 151 153 157 167 172 181 189 191 199 210 214 217 225 227 230 
## 128 131 135 146 148 151 153 157 167 172 181 189 191 199 210 214 217 225 227 230 
## 232 233 242 243 257 266 278 279 282 287 295 296 301 310 314 316 
## 232 233 242 243 257 266 278 279 282 287 295 296 301 310 314 316 
## 
## [[15]]
##   9  15  18  24  41  54  71  81 113 133 154 158 173 177 184 187 195 207 209 226 
##   9  15  18  24  41  54  71  81 113 133 154 158 173 177 184 187 195 207 209 226 
## 235 247 258 262 265 269 304 307 315 
## 235 247 258 262 265 269 304 307 315 
## 
## [[16]]
##  68  70  90 116 147 245 288 290 
##  68  70  90 116 147 245 288 290 
## 
## [[17]]
##  22  23  38  48  79  85  89 110 124 139 162 183 193 201 202 206 212 221 252 255 
##  22  23  38  48  79  85  89 110 124 139 162 183 193 201 202 206 212 221 252 255 
## 271 275 285 305 308 
## 271 275 285 305 308 
## 
## [[18]]
##  57  59 309 
##  57  59 309

exemplars <- model7@exemplars  # Indices of exemplars (centers)
print(exemplars)

##   6  32  86  87  94 100 121 130 155 164 213 234 237 243 262 288 305 309 
##   6  32  86  87  94 100 121 130 155 164 213 234 237 243 262 288 305 309

n_clusters <- length(clusters)  # Number of clusters
print(n_clusters)

## [1] 18

#Convert to Vector Format
cluster_labels <- rep(0, nrow(data10_scale))
for (i in 1:length(clusters)) {
  cluster_labels[clusters[[i]]] <- i
}

print(i)

## [1] 18

#PCA for Dimensionality Reduction (if data has >2 features)
pca_result <- prcomp(data10_scale, scale. = TRUE)
pca_data <- as.data.frame(pca_result$x[, 1:2])  # First 2 PCs
pca_data$Cluster <- as.factor(cluster_labels)

#Plot
library(ggplot2)
ggplot(pca_data, aes(x = PC1, y = PC2, color = Cluster)) +
  geom_point(size = 3, alpha = 0.7) +
  ggtitle("Affinity Propagation Clustering") +
  theme_minimal()

observation: The Affinity Propagation algorithm has created clusters based on the data points’ similarity. The clusters are well separated, and the data points are assigned to their respective clusters.

When to Use Affinity Propagation?

Use when:

Data has no clear cluster structure.
Number of clusters is unknown.

Avoid when:

Data has a clear cluster structure (use k-means).
Number of clusters is known.

QUESTION 3

Describe the Gradient Operator as used in Mathematics and calculation of Gradient Decent

The gradient operator is a mathematical operator that represents the rate of change of a function at a given point. It is a vector that points in the direction of the steepest ascent of the function. The gradient operator is denoted by the symbol ∇ (nabla) and is used to calculate the gradient of a function.

The gradient descent algorithm is an optimization algorithm used to minimize a function by iteratively moving in the direction of the negative gradient of the function. The algorithm starts at an initial point and takes steps in the direction of the negative gradient until it reaches a local minimum. The gradient descent algorithm is used in many machine learning algorithms, including linear regression, logistic regression, and neural networks.

The gradient descent algorithm can be summarized as follows:

Initialize the parameters (weights) of the model.
Calculate the gradient of the loss function with respect to the parameters.
Update the parameters by moving in the direction of the negative gradient.
Repeat steps 2 and 3 until the algorithm converges to a local minimum.

The gradient descent algorithm is used to optimize the parameters of a model by minimizing the loss function. It is an iterative algorithm that requires tuning of hyperparameters such as the learning rate and the number of iterations.

Describe the concept of a Loss Function and its importance in Machine Learning

A loss function is a mathematical function that measures the difference between the predicted values of a model and the actual values of the data. It is used to quantify the error or loss of a model and is an essential component of machine learning algorithms. The loss function is used to optimize the parameters of a model by minimizing the error between the predicted and actual values.

The loss function is important in machine learning for the following reasons:

Optimization: The loss function is used to optimize the parameters of a model by minimizing the error between the predicted and actual values. It guides the learning process of the model and helps improve its performance.
Evaluation: The loss function is used to evaluate the performance of a model. A lower loss value indicates better performance, while a higher loss value indicates poorer performance.
Generalization: The loss function helps prevent overfitting by penalizing complex models that perform well on the training data but poorly on unseen data. It encourages the model to generalize well to new data.
Interpretability: The loss function provides insights into the behavior of the model and the quality of its predictions. It helps identify areas for improvement and guides the selection of hyperparameters.
Comparison: The loss function allows for the comparison of different models and algorithms. It provides a standardized metric for evaluating the performance of models and selecting the best one for a given task.

Overall, the loss function is a critical component of machine learning algorithms and plays a key role in optimizing, evaluating, and improving the performance of models.

Gradeint descent in action

# Generate random data
set.seed(42)
x <- 1:100
y <- 2 * x + rnorm(100, mean = 0, sd = 10)

# Plot the data
plot(x, y, main = "Random Data", xlab = "X", ylab = "Y")

#calculate the gradient descent
# Initialize the parameters
w <- 0  # Slope

# Set the learning rate
alpha <- 0.0001

# Set the number of iterations
n_iter <- 100

# Gradient Descent Function
gradient_descent <- function(x, y, alpha, n_iter) {
  # Initialize the parameters
  w <- 0  # Slope
  
  # Perform gradient descent
  for (i in 1:n_iter) {
    # Calculate the predicted values
    y_pred <- w * x
    
    # Calculate the error
    error <- y_pred - y
    
    # Calculate the gradient
    gradient <- sum(error * x)
    
    # Update the parameters
    w <- w - (alpha * gradient)
  }
  
  return(w)
}

# Perform gradient descent
w <- gradient_descent(x, y, alpha = 0.0001, n_iter = 100)
print(w)

## [1] -8.617865e+151

observation: this shows the gradient descent in action. The algorithm iteratively updates the slope parameter to minimize the error between the predicted and actual values of the data. The final slope value represents the best-fit line that minimizes the error of the model.

# create linear mode
model <- lm(y ~ x)
summary(model)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.195  -6.618   0.809   6.527  22.264 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.68935    2.10869   0.327    0.744    
## x            1.99279    0.03625  54.971   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.46 on 98 degrees of freedom
## Multiple R-squared:  0.9686, Adjusted R-squared:  0.9683 
## F-statistic:  3022 on 1 and 98 DF,  p-value: < 2.2e-16

# Plot the data and the linear model
plot(x, y, main = "Linear Regression", xlab = "X", ylab = "Y")
abline(model, col = "red")

observation: The linear regression model has been created using the lm() function in R. The model fits a line to the data that minimizes the error between the predicted and actual values. The red line represents the best-fit line of the model.

Types of Gradient descent

Batch Gradient Descent: Batch gradient descent computes the gradient of the loss function with respect to the parameters using the entire training dataset. It updates the parameters by taking a step in the direction of the negative gradient of the loss function. Batch gradient descent is computationally expensive for large datasets but guarantees convergence to the global minimum.
Stochastic Gradient Descent: Stochastic gradient descent computes the gradient of the loss function with respect to the parameters using a single data point or a mini-batch of data points. It updates the parameters after each data point or mini-batch, making it faster than batch gradient descent. Stochastic gradient descent is less computationally expensive but may not converge to the global minimum.
Mini-Batch Gradient Descent: Mini-batch gradient descent computes the gradient of the loss function with respect to the parameters using a small batch of data points. It updates the parameters after each mini-batch, striking a balance between batch and stochastic gradient descent. Mini-batch gradient descent is commonly used in practice for training deep learning models.
Momentum Gradient Descent: Momentum gradient descent is an extension of gradient descent that adds a momentum term to the update rule. The momentum term accelerates the convergence of the algorithm by accumulating the gradients of previous steps. Momentum gradient descent helps overcome local minima and oscillations in the loss function.
Adagrad: Adagrad is an adaptive learning rate optimization algorithm that scales the learning rate of each parameter based on the historical gradients. It adapts the learning rate for each parameter, allowing for faster convergence and better performance on sparse data.
RMSprop: RMSprop is an adaptive learning rate optimization algorithm that divides the learning rate by the root mean square of the historical gradients. It normalizes the learning rate for each parameter, preventing the learning rate from becoming too small or too large.
Adam: Adam (Adaptive Moment Estimation) is an adaptive learning rate optimization algorithm that combines the benefits of momentum and RMSprop. It computes the adaptive learning rate for each parameter based on the first and second moments of the gradients. Adam is widely used in deep learning for its fast convergence and robust performance.
Nesterov Accelerated Gradient (NAG): Nesterov Accelerated Gradient is an extension of momentum gradient descent that calculates the gradient at the lookahead point instead of the current point. It helps reduce oscillations and overshooting in the loss function, leading to faster convergence and better performance.
AdaDelta: AdaDelta is an adaptive learning rate optimization algorithm that eliminates the need for a learning rate hyperparameter. It uses the root mean square of the historical gradients to adapt the learning rate for each parameter. AdaDelta is robust to noisy gradients and converges faster than traditional optimization algorithms.
Nadam: Nadam (Nesterov-accelerated Adaptive Moment Estimation) is an extension of Adam that combines the benefits of Nesterov momentum and RMSprop. It calculates the adaptive learning rate for each parameter based on the first and second moments of the gradients. Nadam is known for its fast convergence and robust performance on a wide range of datasets.

Conclusion

In this assignment, we have explored the Expectation-Maximization (EM) clustering algorithm and its application to clustering data points into groups. We have also discussed the K-means clustering algorithm and its use in clustering data points based on their similarity. We have compared the performance of EM and K-means clustering on a dataset of house prices and visualized the clusters using various plots.

We have also described the Gradient Operator and the Gradient Descent algorithm in machine learning. We have implemented the gradient descent algorithm to fit a linear regression model to a dataset of random data points. We have discussed the importance of loss functions in machine learning and their role in optimizing, evaluating, and improving the performance of models.

Finally, we have described various types of gradient descent algorithms, including Batch Gradient Descent, Stochastic Gradient Descent, Mini-Batch Gradient Descent, Momentum Gradient Descent, Adagrad, RMSprop, Adam, Nesterov Accelerated Gradient, AdaDelta, and Nadam. We have discussed the characteristics and applications of each algorithm and their role in optimizing the parameters of machine learning models.

Overall, this assignment has provided a comprehensive overview of clustering algorithms, gradient descent, and loss functions in machine learning. It has demonstrated the practical application of these concepts in data analysis and model optimization.

References

Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media.
Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT press.

Machine Learning Assignment 2

“Okomba William Ogweli:SDS6/46994/2024/ Machine Learning, SDS6106”

2025-03-29

We normalize the features before clustering using z-score(z = (x - μ) / σ) due to the following

For EM clustering, z-score normalization (scale()) is typically the best choice because:

Different Lines: Represent covariance structures:

Extract the number of clusters from the model.

Potential Enhancements:

I will try to use 3 clusters

assign the cluster names.

We normalize the features before clustering using z-score(z = (x - μ) / σ)

we’ll try to create the model with k = 3

We normalize the features before clustering using z-score(z = (x - μ) / σ)

When to Use Spectral Clustering?

When to Use Affinity Propagation?

QUESTION 3

Gradeint descent in action

Types of Gradient descent

Conclusion

References