#1. Expectation-Maximization (EM) Clustering Algorithm EM clustering algorithm is a method of clustering data points into groups. The algorithm is based on the Expectation-Maximization algorithm. It is also called Gaussian- Mixture model.
The Expectation-Maximization (EM) algorithm is an iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM algorithm iterates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.
The EM algorithm is used in many statistical applications, including normal mixture models and missing data problems. When applied to a mixture model, the EM algorithm provides a maximum likelihood estimate of the parameters of a mixture of probability distributions. The EM algorithm is used in bioinformatics, computational biology, engineering, finance, genomics, machine learning, medicine, physics, and social science.
library(mclust) # For EM clustering
## Package 'mclust' version 6.1.1
## Type 'citation("mclust")' for citing this R package in publications.
library(ggplot2) # For visualization
library(factoextra) # For cluster visualization
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
#load data
data <- read.csv("M3_House_Worth.csv")
attach(data)
head(data)
## HousePrice StoreArea BasementArea LawnArea HouseNetWorth
## 1 138800 29.9 75 11.223911 Low
## 2 155000 44.0 504 9.689869 Medium
## 3 152000 46.2 493 10.192613 Medium
## 4 160000 46.2 510 6.817316 Medium
## 5 226000 48.7 445 10.916215 Medium
## 6 275000 56.4 1148 9.000686 High
summary(data)
## HousePrice StoreArea BasementArea LawnArea
## Min. : 39300 Min. : 1.80 Min. : 0.0 Min. : 6.214
## 1st Qu.:115000 1st Qu.: 27.00 1st Qu.: 0.0 1st Qu.: 9.212
## Median :173950 Median : 47.60 Median : 402.5 Median : 9.923
## Mean :213355 Mean : 48.31 Mean : 573.0 Mean : 9.914
## 3rd Qu.:294058 3rd Qu.: 67.30 3rd Qu.:1107.0 3rd Qu.:10.488
## Max. :755000 Max. :122.00 Max. :2188.0 Max. :21.539
## HouseNetWorth
## Length:316
## Class :character
## Mode :character
##
##
##
str(data)
## 'data.frame': 316 obs. of 5 variables:
## $ HousePrice : int 138800 155000 152000 160000 226000 275000 215000 392000 325000 151000 ...
## $ StoreArea : num 29.9 44 46.2 46.2 48.7 56.4 47.1 56.7 84 49.2 ...
## $ BasementArea : int 75 504 493 510 445 1148 380 945 1572 506 ...
## $ LawnArea : num 11.22 9.69 10.19 6.82 10.92 ...
## $ HouseNetWorth: chr "Low" "Medium" "Medium" "Medium" ...
dim(data)
## [1] 316 5
# Select numerical features for clustering
data1 <- data[,c(1,2,3,4)]
head(data1)
## HousePrice StoreArea BasementArea LawnArea
## 1 138800 29.9 75 11.223911
## 2 155000 44.0 504 9.689869
## 3 152000 46.2 493 10.192613
## 4 160000 46.2 510 6.817316
## 5 226000 48.7 445 10.916215
## 6 275000 56.4 1148 9.000686
# Normalize the data
data1 <- scale(data1)
head(data1)
## HousePrice StoreArea BasementArea LawnArea
## [1,] -0.6086554 -0.74477444 -0.8827550 0.8408859
## [2,] -0.4764016 -0.17444291 -0.1223336 -0.1435064
## [3,] -0.5008930 -0.08545501 -0.1418316 0.1791037
## [4,] -0.4355825 -0.08545501 -0.1116983 -1.9868181
## [5,] 0.1032292 0.01566760 -0.2269137 0.6434377
## [6,] 0.5032561 0.32712525 1.0191848 -0.5857539
We’ll now perform EM clustering on the normalized data using the Mclust() function from the mclust package. The function automatically determines the optimal number of clusters based on the Bayesian Information Criterion (BIC).
# Perform EM clustering
set.seed(42)
em_model <- Mclust(data1)
# View BIC values
print(em_model$BIC)
## Bayesian Information Criterion (BIC):
## EII VII EEI VEI EVI VVI EEE
## 1 -3611.849 -3611.849 -3629.116 -3629.116 -3629.116 -3629.116 -2642.520
## 2 -2901.321 -2769.193 -2735.068 -2603.683 -2722.601 -2587.465 -2526.189
## 3 -2683.474 -2632.697 -2692.100 -2329.741 -2569.688 -2222.906 -2477.199
## 4 -2655.087 -2368.734 -2503.781 -2214.208 -2387.337 -2041.275 -2377.722
## 5 -2602.538 -2211.217 -2508.157 -2104.288 NA NA -2425.645
## 6 -2448.605 -2229.172 -2391.230 -2087.035 NA NA -2417.365
## 7 -2477.337 -2180.705 -2420.009 -2089.881 NA NA -2446.145
## 8 -2493.642 -2171.836 -2448.090 -2093.633 NA NA -2474.503
## 9 -2522.203 -2167.241 -2476.869 -2089.756 NA NA -2503.289
## VEE EVE VVE EEV VEV EVV VVV
## 1 -2642.520 -2642.520 -2642.520 -2642.520 -2642.520 -2642.520 -2642.520
## 2 -2334.228 -2443.045 -2367.020 -2443.993 -2301.208 -2409.614 -2316.940
## 3 -2278.269 -2343.338 -2288.086 -2345.279 -2200.319 -2314.356 -2226.943
## 4 -2201.483 -2282.390 -2180.795 -2302.190 -2061.420 -2269.319 -2063.955
## 5 -2081.083 -2292.995 -2057.572 -2305.715 -2048.567 NA -2076.457
## 6 -2077.642 -2291.765 -2091.458 -2263.600 -2101.814 -2284.109 -2144.805
## 7 -2083.939 -2338.048 -2060.719 -2274.790 -2037.341 NA NA
## 8 -2081.882 -2225.683 -2097.764 -2355.057 -2037.243 NA NA
## 9 -2080.264 -2284.620 -2101.035 -2414.578 -2090.775 NA NA
##
## Top 3 models based on the BIC criterion:
## VEV,8 VEV,7 VVI,4
## -2037.243 -2037.341 -2041.275
Observation: BIC = -2 × log-likelihood + k × log(n), where k is the number of parameters in the model and n is the number of data points. The BIC value is used to compare models, BIC with lower values indicate better model.
Interpreting the BIC Plot * X-axis: Number of clusters
EII: Spherical, equal volume
VII: Spherical, unequal volume
EEI: Diagonal, equal volume/shape
VEI: Diagonal, varying volume
EVI: Diagonal, equal volume, varying shape
VVI: Diagonal, varying volume/shape
# Extract the number of clusters
cat("number of cluaters:",em_model$G, "\n")
## number of cluaters: 8
#Classification visualization
plot(em_model, what = "classification")
Observation: The classification plot shows how data points are assigned
to clusters. The colors represent different clusters, and the points are
the data points.
# Visualize the clusters
fviz_cluster(list(data = data1, cluster = em_model$classification), geom = "point")
#Get the optimal number of clusters
# Visualize BIC and get the optimal number of clusters
plot(em_model, what = "BIC", ylim = range(em_model$BIC, na.rm = TRUE))
observation: The optimal number of clusters is 3. We’ll now assign the data points to their respective clusters using the classification() function.
#uncertainity plot
plot(em_model, what = "uncertainty")
observation: The uncertainty plot shows the uncertainty of the model in
assigning data points to clusters. The points with higher uncertainty
are less confidently assigned to clusters.
# Cluster profiles
summary(em_model)
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VEV (ellipsoidal, equal shape) model with 8 components:
##
## log-likelihood n df BIC ICL
## -736.5901 316 98 -2037.243 -2069.621
##
## Clustering table:
## 1 2 3 4 5 6 7 8
## 47 42 14 66 26 70 11 40
observation: The cluster profiles show the characteristics of each group, including the number of data points, the mean values of each feature, and the covariance matrix.
# Try different combinations of features i.e and create a model
data2 <- data[,c(1, 3)]
head(data2,3)
## HousePrice BasementArea
## 1 138800 75
## 2 155000 504
## 3 152000 493
# Force a specific number of clusters
em_model_3 <- Mclust(data1, G = 3)
# Visualize the clusters
fviz_cluster(list(data = data1, cluster = em_model_3$classification), geom = "point")
# Visualize BIC and get the optimal number of clusters
plot(em_model_3, what = "BIC", ylim = range(em_model_3$BIC, na.rm = TRUE))
observation: The optimal number of clusters is 3. We’ll now assign the data points to their respective clusters using the classification() function.
# Assign cluster names
cluster_names <- c("Low priced", "Medium priced", "High priced")
em_model_3$classification <- factor(em_model_3$classification, labels = cluster_names)
#assign to original data to confirm the model perfomance
data$cluster <- em_model_3$classification
head(data)
## HousePrice StoreArea BasementArea LawnArea HouseNetWorth cluster
## 1 138800 29.9 75 11.223911 Low Low priced
## 2 155000 44.0 504 9.689869 Medium Medium priced
## 3 152000 46.2 493 10.192613 Medium Medium priced
## 4 160000 46.2 510 6.817316 Medium High priced
## 5 226000 48.7 445 10.916215 Medium Medium priced
## 6 275000 56.4 1148 9.000686 High High priced
Observayion: The data points have been assigned to clusters based on the EM clustering model with 3 clusters. The clusters are named “Low priced,” “Medium priced,” and “High priced.”
#2. K-Means Clustering Algorithm K-means clustering is a method of clustering data points into groups. The algorithm is based on the K-means algorithm. It is also called K-means clustering.
The K-means algorithm is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
The K-means algorithm is used in many statistical applications, including data mining, machine learning, pattern recognition, image analysis, and bioinformatics. It is also used in clustering, classification, and anomaly detection.
#load data
data2 <- read.csv("M3_House_Worth.csv")
attach(data2)
## The following objects are masked from data:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
(head(data2,3))
## HousePrice StoreArea BasementArea LawnArea HouseNetWorth
## 1 138800 29.9 75 11.223911 Low
## 2 155000 44.0 504 9.689869 Medium
## 3 152000 46.2 493 10.192613 Medium
# Select numerical features for clustering
data3 <- data2[,c(1,2,3,4)]
head(data3,3)
## HousePrice StoreArea BasementArea LawnArea
## 1 138800 29.9 75 11.223911
## 2 155000 44.0 504 9.689869
## 3 152000 46.2 493 10.192613
# Normalize the data
data3_scale <- scale(data3)
head(data3_scale,3)
## HousePrice StoreArea BasementArea LawnArea
## [1,] -0.6086554 -0.74477444 -0.8827550 0.8408859
## [2,] -0.4764016 -0.17444291 -0.1223336 -0.1435064
## [3,] -0.5008930 -0.08545501 -0.1418316 0.1791037
we need to know the number of clusters to use in the K-means algorithm. We can use the 3 methods to determine the optimal number of clusters.
# Elbow Method
#install.packages("factoextra")
library(factoextra)
#Using elbow method
fviz_nbclust(data3_scale[,1:4], kmeans, method = "wss") + theme_minimal()
**observation*: The Elbow method suggests that the optimal number of
clusters is 2. We’ll now use the Silhouette method and Gap statistic to
confirm this.
# Silhouette Method
fviz_nbclust(data3_scale[,1:4], kmeans, method = "silhouette") + theme_minimal()
Observation: The Silhouette method suggests that the optimal number of
clusters is 2. We’ll now use the Gap statistic to confirm this.
# Gap Statistic
fviz_nbclust(data3_scale[,1:4], kmeans, method = "gap_stat") + theme_minimal()
Observation: The Gap statistic suggests that the optimal number of
clusters is 2. We’ll now perform K-means clustering with 2 clusters
using the kmeans() function.
# Perform K-means clustering
set.seed(42)
kmeans_model <- kmeans(data3_scale, centers = 2, nstart = 25, iter.max = 10)
summary(kmeans_model)
## Length Class Mode
## cluster 316 -none- numeric
## centers 8 -none- numeric
## totss 1 -none- numeric
## withinss 2 -none- numeric
## tot.withinss 1 -none- numeric
## betweenss 1 -none- numeric
## size 2 -none- numeric
## iter 1 -none- numeric
## ifault 1 -none- numeric
Observation: The K-means clustering model has been created with 2 clusters. The summary() function provides information about the cluster centers and sizes.
# Classification visualization
fviz_cluster(kmeans_model, data = data3_scale, geom = "point")
observation: The classification plot shows how data points are assigned
to clusters. The colors represent different clusters, and the points are
the data points.
# Visualize the clusters
library(cluster)
clusplot(data3_scale, kmeans_model$cluster,
color = TRUE, shade = TRUE, labels = 2)
observation: The clusplot shows the clusters in a scatter plot with the
first two principal components. The colors represent different clusters,
and the points are the data points.
# Create the visualization
library(factoextra)
fviz_cluster(kmeans_model, data = data3_scale,
ellipse.type = "convex", # Adds convex hulls
repel = TRUE) # Avoids label overlapping
#K-means with k = 3
set.seed(42)
kmeans_model2 <- kmeans(data3_scale, centers = 3, nstart = 25, iter.max = 10)
#summary(kmeans_model2)
# Classification visualization using fviz
fviz_cluster(kmeans_model2, data = data3_scale, geom = "point")
# Visualize the clusters using clusplot
library(cluster)
clusplot(data3_scale, kmeans_model2$cluster,
color = TRUE, shade = TRUE, labels = 2)
# Create the visualization using convex
library(factoextra)
fviz_cluster(kmeans_model2, data = data3_scale,
ellipse.type = "convex", # Adds convex hulls
repel = TRUE) # Avoids label overlapping
#assign three clusters to low, medium and high
# Add cluster assignments to original data
data2$cluster <- kmeans_model2$cluster
head(data2)
## HousePrice StoreArea BasementArea LawnArea HouseNetWorth cluster
## 1 138800 29.9 75 11.223911 Low 2
## 2 155000 44.0 504 9.689869 Medium 2
## 3 152000 46.2 493 10.192613 Medium 2
## 4 160000 46.2 510 6.817316 Medium 2
## 5 226000 48.7 445 10.916215 Medium 2
## 6 275000 56.4 1148 9.000686 High 1
observtion: The data points have been assigned to clusters based on the K-means clustering model with 3 clusters. The clusters are named “Low priced,” “Medium priced,” and “High priced.”
#3. Kmedian Clustering K-medians clustering is a method of clustering data points into groups. The algorithm is based on the K-medians algorithm. It is also called K-medians clustering.
The K-medians algorithm is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest median (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
The K-medians algorithm is used in many statistical applications, including data mining, machine learning, pattern recognition, image analysis, and bioinformatics. It is also used in clustering, classification, and anomaly detection.
We Use pam (Partitioning Around Medoids, more robust) instead of kmeans for K-medians clustering.
#load data
data4 <- read.csv("M3_House_Worth.csv")
attach(data4)
## The following objects are masked from data2:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
## The following objects are masked from data:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
(head(data4,3))
## HousePrice StoreArea BasementArea LawnArea HouseNetWorth
## 1 138800 29.9 75 11.223911 Low
## 2 155000 44.0 504 9.689869 Medium
## 3 152000 46.2 493 10.192613 Medium
# Select numerical features for clustering
data5 <- data4[,c(1,2,3,4)]
head(data5,3)
## HousePrice StoreArea BasementArea LawnArea
## 1 138800 29.9 75 11.223911
## 2 155000 44.0 504 9.689869
## 3 152000 46.2 493 10.192613
# Normalize the data
data5_scale <- scale(data5)
head(data5_scale,3)
## HousePrice StoreArea BasementArea LawnArea
## [1,] -0.6086554 -0.74477444 -0.8827550 0.8408859
## [2,] -0.4764016 -0.17444291 -0.1223336 -0.1435064
## [3,] -0.5008930 -0.08545501 -0.1418316 0.1791037
# Perform K-medians clustering
library(cluster)
set.seed(42)
kmedians <- pam(data5_scale, k = 3, metric = "manhattan")
summary(kmedians)
## Medoids:
## ID HousePrice StoreArea BasementArea LawnArea
## [1,] 229 -0.9294933 -0.886346097 -1.0156958 -0.31325763
## [2,] 121 -0.3294529 0.003532891 -0.3066782 -0.08886448
## [3,] 226 0.7563343 1.168465385 1.1432629 0.28049931
## Clustering vector:
## [1] 1 2 2 2 2 3 2 3 3 2 1 1 1 1 3 1 1 3 2 1 1 3 3 3 1 2 3 1 2 1 1 2 2 3 3 3 2
## [38] 3 1 1 3 1 3 1 2 1 1 3 2 3 1 1 1 3 1 3 3 3 3 1 1 2 3 1 2 3 2 1 3 1 3 1 1 3
## [75] 1 1 2 2 2 1 3 2 1 3 3 3 3 1 3 1 1 2 1 3 3 1 1 2 3 3 1 2 2 2 2 1 2 3 3 3 3
## [112] 1 3 2 1 1 3 1 1 2 2 1 2 3 1 3 1 1 3 2 1 2 3 3 1 1 3 1 3 3 3 3 2 3 3 1 1 1
## [149] 2 3 1 1 1 3 1 2 1 3 3 1 3 3 1 1 3 2 1 1 1 2 1 1 3 1 1 2 3 3 2 2 1 2 3 3 3
## [186] 3 3 1 1 1 1 2 3 2 3 1 3 1 1 2 3 3 3 2 2 2 3 3 3 1 2 3 1 1 1 2 1 1 3 1 3 1
## [223] 1 3 1 3 1 2 1 1 3 1 1 3 3 3 3 3 1 3 2 1 1 1 1 3 3 3 3 2 3 2 1 3 2 1 1 3 1
## [260] 3 1 3 3 1 3 1 1 1 3 2 3 3 3 3 3 2 2 1 1 1 1 1 3 1 3 3 1 1 3 1 3 1 3 2 1 1
## [297] 3 1 3 1 1 1 3 3 3 1 3 2 3 1 3 1 3 1 3 1
## Objective function:
## build swap
## 1.412042 1.412042
##
## Numerical information per cluster:
## size max_diss av_diss diameter separation
## [1,] 131 3.485703 1.038048 5.520093 0.5241868
## [2,] 61 3.800782 1.090812 5.910857 0.5241868
## [3,] 124 8.822645 1.965172 12.406079 0.5748856
##
## Isolated clusters:
## L-clusters: character(0)
## L*-clusters: character(0)
##
## Silhouette plot information:
## cluster neighbor sil_width
## 268 1 2 0.6617098809
## 239 1 2 0.6611502208
## 222 1 2 0.6581987461
## 164 1 2 0.6580703576
## 244 1 2 0.6574009050
## 168 1 2 0.6563846141
## 44 1 2 0.6563813039
## 73 1 2 0.6517139431
## 40 1 2 0.6506182890
## 163 1 2 0.6463300080
## 112 1 2 0.6449818469
## 93 1 2 0.6448695394
## 296 1 2 0.6429913750
## 215 1 2 0.6412140757
## 169 1 2 0.6393082312
## 261 1 2 0.6377599978
## 229 1 2 0.6366882554
## 188 1 2 0.6338175664
## 281 1 2 0.6319547454
## 175 1 2 0.6294187291
## 138 1 2 0.6274242277
## 213 1 2 0.6270310117
## 198 1 2 0.6248666083
## 28 1 2 0.6221336148
## 253 1 2 0.6215665824
## 298 1 2 0.6206858851
## 91 1 2 0.6178702367
## 127 1 2 0.6178186897
## 30 1 2 0.6178176540
## 17 1 2 0.6149639502
## 122 1 2 0.6131761067
## 53 1 2 0.6129373717
## 11 1 2 0.6055084140
## 232 1 2 0.6051892024
## 152 1 2 0.6050403879
## 115 1 2 0.5996537972
## 128 1 2 0.5978830907
## 233 1 2 0.5944367121
## 174 1 2 0.5943716467
## 227 1 2 0.5932870486
## 214 1 2 0.5923095843
## 280 1 2 0.5919813933
## 151 1 2 0.5905115762
## 257 1 2 0.5879307753
## 39 1 2 0.5874369242
## 167 1 2 0.5808857425
## 210 1 2 0.5797096324
## 292 1 2 0.5792947954
## 16 1 2 0.5783110686
## 243 1 2 0.5782383288
## 13 1 2 0.5780570405
## 256 1 2 0.5773281600
## 223 1 2 0.5771579467
## 278 1 2 0.5769840968
## 181 1 2 0.5759846669
## 76 1 2 0.5758338652
## 131 1 2 0.5753044249
## 302 1 2 0.5742310792
## 42 1 2 0.5731983895
## 284 1 2 0.5727700516
## 21 1 2 0.5709101719
## 118 1 2 0.5632381640
## 153 1 2 0.5615522173
## 196 1 2 0.5580645164
## 266 1 2 0.5566408653
## 61 1 2 0.5550615390
## 199 1 2 0.5541610831
## 119 1 2 0.5507968659
## 259 1 2 0.5489072396
## 64 1 2 0.5487564615
## 47 1 2 0.5482356731
## 31 1 2 0.5433772540
## 230 1 2 0.5369451575
## 125 1 2 0.5256326724
## 160 1 2 0.5237546315
## 55 1 2 0.5185763994
## 51 1 2 0.5147971567
## 136 1 2 0.5130994405
## 20 1 2 0.5072491791
## 171 1 2 0.5031999938
## 267 1 2 0.4950950489
## 287 1 2 0.4930125977
## 68 1 2 0.4895160005
## 46 1 2 0.4883337108
## 301 1 2 0.4866386726
## 288 1 2 0.4822921216
## 90 1 2 0.4785870942
## 72 1 2 0.4744828611
## 96 1 2 0.4736438184
## 242 1 2 0.4698331772
## 220 1 2 0.4688567866
## 218 1 2 0.4679865685
## 314 1 2 0.4639638547
## 75 1 2 0.4630952175
## 245 1 2 0.4628192486
## 172 1 2 0.4622140450
## 290 1 2 0.4576214292
## 25 1 2 0.4449071920
## 191 1 2 0.4413977393
## 148 1 2 0.4374031733
## 190 1 2 0.4303362007
## 97 1 2 0.4196395924
## 80 1 2 0.4133797149
## 70 1 2 0.4120505277
## 295 1 2 0.4109259940
## 157 1 2 0.4105675997
## 310 1 2 0.4083390456
## 316 1 2 0.4025960423
## 189 1 2 0.3910579249
## 147 1 2 0.3693183473
## 88 1 2 0.3681025756
## 146 1 2 0.3661469171
## 101 1 2 0.3611287546
## 52 1 2 0.3464682341
## 116 1 2 0.3407345029
## 14 1 2 0.3358821858
## 106 1 2 0.3293899604
## 217 1 2 0.3079177144
## 83 1 2 0.2974697010
## 282 1 2 0.2735922022
## 279 1 2 0.2711273010
## 155 1 2 0.2449151186
## 135 1 2 0.2255088411
## 264 1 2 0.2210306593
## 312 1 2 0.2162382362
## 306 1 2 0.2125157702
## 1 1 2 0.1999397278
## 300 1 2 0.1774370261
## 225 1 2 0.1389991288
## 60 1 2 0.1381827977
## 12 1 2 0.0790465011
## 294 2 1 0.6194821601
## 276 2 1 0.6140001286
## 65 2 1 0.6126896969
## 211 2 1 0.6096018286
## 176 2 1 0.6078874840
## 130 2 1 0.6062683799
## 216 2 1 0.6000051831
## 7 2 1 0.5937277743
## 5 2 1 0.5901120124
## 182 2 1 0.5894467553
## 121 2 1 0.5884166048
## 200 2 1 0.5855228217
## 250 2 1 0.5766583197
## 123 2 1 0.5733467983
## 45 2 1 0.5714570256
## 228 2 1 0.5676345405
## 205 2 1 0.5575192656
## 3 2 1 0.5572578756
## 194 2 1 0.5511206272
## 270 2 1 0.5420470483
## 156 2 1 0.5383816410
## 170 2 1 0.5325074009
## 192 2 1 0.5291847630
## 105 2 1 0.5274995904
## 103 2 1 0.5233875644
## 10 2 1 0.5215091721
## 2 2 1 0.5171843694
## 107 2 1 0.5113015434
## 37 2 1 0.5073181425
## 179 2 1 0.5056460660
## 149 2 1 0.4997497286
## 102 2 1 0.4987230860
## 78 2 1 0.4823894585
## 82 2 1 0.4712982825
## 166 2 1 0.4675951229
## 143 2 1 0.4656947153
## 120 2 1 0.4654995356
## 49 2 1 0.4529787344
## 241 2 1 0.4515657343
## 277 2 1 0.4380118575
## 67 2 1 0.4166839642
## 98 2 1 0.3779468374
## 308 2 3 0.3744536843
## 92 2 1 0.3280351990
## 62 2 1 0.3151938688
## 114 2 1 0.3090971894
## 77 2 1 0.3066664508
## 104 2 1 0.3056337633
## 33 2 1 0.3027156388
## 204 2 1 0.3000038898
## 255 2 3 0.2931695061
## 4 2 1 0.2903282487
## 180 2 1 0.2853244514
## 252 2 3 0.2761636957
## 32 2 1 0.2711459542
## 19 2 1 0.2703009074
## 26 2 3 0.2381279748
## 206 2 3 0.2187413315
## 79 2 3 0.1257787759
## 132 2 1 0.0879460701
## 29 2 1 0.0394093160
## 142 3 2 0.5732539277
## 56 3 2 0.5692458124
## 27 3 2 0.5675090472
## 238 3 2 0.5634592282
## 108 3 2 0.5622317506
## 87 3 2 0.5597019957
## 291 3 2 0.5581524834
## 50 3 2 0.5564390720
## 9 3 2 0.5555684790
## 18 3 2 0.5519395993
## 95 3 2 0.5484154020
## 209 3 2 0.5480396596
## 263 3 2 0.5470309090
## 81 3 2 0.5468708443
## 265 3 2 0.5460011841
## 99 3 2 0.5454551156
## 41 3 2 0.5447195693
## 111 3 2 0.5403554148
## 297 3 2 0.5380927848
## 158 3 2 0.5361525243
## 240 3 2 0.5306644723
## 226 3 2 0.5300532604
## 74 3 2 0.5276690279
## 69 3 2 0.5251986993
## 246 3 2 0.5230122443
## 235 3 2 0.5212522646
## 249 3 2 0.5103423734
## 262 3 2 0.5005259940
## 117 3 2 0.4998802845
## 269 3 2 0.4962539343
## 129 3 2 0.4951520106
## 219 3 2 0.4918313317
## 24 3 2 0.4910705753
## 307 3 2 0.4874921809
## 203 3 2 0.4866578119
## 236 3 2 0.4863175960
## 273 3 2 0.4831820303
## 274 3 2 0.4776584697
## 289 3 2 0.4751891968
## 311 3 2 0.4744295076
## 260 3 2 0.4725343895
## 86 3 2 0.4682725334
## 161 3 2 0.4677919524
## 186 3 2 0.4669798055
## 195 3 2 0.4663240815
## 231 3 2 0.4649726033
## 137 3 2 0.4647214319
## 224 3 2 0.4614711517
## 150 3 2 0.4578736328
## 184 3 2 0.4546288695
## 234 3 2 0.4446393199
## 66 3 2 0.4422713473
## 133 3 2 0.4382107770
## 187 3 2 0.4284327842
## 247 3 2 0.4257551774
## 258 3 2 0.4223164576
## 43 3 2 0.4210585195
## 154 3 2 0.4203459486
## 299 3 2 0.4139226176
## 140 3 2 0.4135778600
## 84 3 2 0.4118414085
## 35 3 2 0.4108832297
## 59 3 2 0.4081823365
## 177 3 2 0.4070440840
## 165 3 2 0.3974722561
## 237 3 2 0.3963710405
## 110 3 2 0.3940630201
## 71 3 2 0.3930854629
## 57 3 2 0.3901753756
## 303 3 2 0.3857788051
## 109 3 2 0.3837549196
## 185 3 2 0.3780713350
## 286 3 2 0.3753449834
## 315 3 2 0.3727446895
## 309 3 2 0.3702672492
## 134 3 2 0.3542099903
## 202 3 2 0.3512761352
## 304 3 2 0.3475358053
## 173 3 2 0.3430783451
## 23 3 2 0.3417085857
## 248 3 2 0.3401941808
## 159 3 2 0.3394109232
## 38 3 2 0.3378700513
## 58 3 2 0.3250503006
## 207 3 2 0.3248106388
## 144 3 2 0.3236126804
## 54 3 2 0.3178829903
## 141 3 2 0.3135990171
## 22 3 2 0.3134218685
## 283 3 2 0.2875329379
## 113 3 2 0.2583325040
## 36 3 2 0.2530923054
## 15 3 2 0.2508491901
## 313 3 2 0.2492584689
## 272 3 2 0.1993447380
## 8 3 2 0.1885170428
## 305 3 2 0.1720941073
## 100 3 2 0.1675335876
## 89 3 2 0.1657144354
## 124 3 2 0.1438096909
## 275 3 2 0.1383467678
## 34 3 2 0.1337208857
## 178 3 2 0.1324638650
## 212 3 2 0.1160429932
## 63 3 2 0.1017507453
## 126 3 2 0.1016448844
## 271 3 2 0.0982372161
## 293 3 2 0.0921486417
## 197 3 2 0.0904958495
## 6 3 2 0.0827873125
## 48 3 2 0.0726827093
## 139 3 2 -0.0005316276
## 193 3 2 -0.0017411301
## 201 3 2 -0.0111769463
## 208 3 2 -0.0148978862
## 94 3 2 -0.0163081130
## 285 3 2 -0.0246584193
## 251 3 2 -0.0366183369
## 254 3 2 -0.0575974951
## 85 3 2 -0.1020291014
## 183 3 2 -0.1109709107
## 162 3 2 -0.1113224469
## 145 3 2 -0.1289099725
## 221 3 2 -0.1562342434
## Average silhouette width per cluster:
## [1] 0.5076544 0.4484344 0.3488767
## Average silhouette width of total data set:
## [1] 0.4339175
##
## Available components:
## [1] "medoids" "id.med" "clustering" "objective" "isolation"
## [6] "clusinfo" "silinfo" "diss" "call" "data"
# Classification visualization
fviz_cluster(kmedians, data = data5_scale, geom = "point")
The median doesn’t partition well as compared to k-means and EM clustering. The clusters are not well separated, and the classification plot shows overlapping data points.
K median can be used due to the following * More robust to outliers than k-means (uses median instead of mean). * Works well with Manhattan distance (L1 norm) instead of Euclidean (L2). * Can be used with non-numeric data (k-means cannot).
#4. DBSCAN Clustering DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups together data points that are closely packed together (dense regions) while marking data points in low-density regions as outliers. It is based on the concept of density reachability and density connectivity.
The DBSCAN algorithm is used in many statistical applications, including data mining, machine learning, pattern recognition, image analysis, and bioinformatics. It is also used in clustering, classification, and anomaly detection.
#load data
data6 <- read.csv("M3_House_Worth.csv")
attach(data6)
## The following objects are masked from data4:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
## The following objects are masked from data2:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
## The following objects are masked from data:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
(head(data6,3))
## HousePrice StoreArea BasementArea LawnArea HouseNetWorth
## 1 138800 29.9 75 11.223911 Low
## 2 155000 44.0 504 9.689869 Medium
## 3 152000 46.2 493 10.192613 Medium
# Select numerical features for clustering
data7 <- data6[,c(1,2,3,4)]
head(data7,3)
## HousePrice StoreArea BasementArea LawnArea
## 1 138800 29.9 75 11.223911
## 2 155000 44.0 504 9.689869
## 3 152000 46.2 493 10.192613
# Normalize the data
data7_scale <- scale(data7)
head(data7_scale)
## HousePrice StoreArea BasementArea LawnArea
## [1,] -0.6086554 -0.74477444 -0.8827550 0.8408859
## [2,] -0.4764016 -0.17444291 -0.1223336 -0.1435064
## [3,] -0.5008930 -0.08545501 -0.1418316 0.1791037
## [4,] -0.4355825 -0.08545501 -0.1116983 -1.9868181
## [5,] 0.1032292 0.01566760 -0.2269137 0.6434377
## [6,] 0.5032561 0.32712525 1.0191848 -0.5857539
# Perform DBSCAN clustering
library(fpc)
set.seed(42)
dbscan_model <- dbscan(data7_scale, eps = 0.5, MinPts = 5)
summary(dbscan_model)
## Length Class Mode
## cluster 316 -none- numeric
## eps 1 -none- numeric
## MinPts 1 -none- numeric
## isseed 316 -none- logical
# Classification visualization
fviz_cluster(dbscan_model, data = data7_scale, geom = "point")
observation: There’s 3 clusters, 1 noise point, and 1 outlier point. The classification plot shows how data points are assigned to clusters, noise points, and outliers.
#asign values to clusters
data6$cluster <- dbscan_model$cluster
head(data6)
## HousePrice StoreArea BasementArea LawnArea HouseNetWorth cluster
## 1 138800 29.9 75 11.223911 Low 2
## 2 155000 44.0 504 9.689869 Medium 1
## 3 152000 46.2 493 10.192613 Medium 1
## 4 160000 46.2 510 6.817316 Medium 4
## 5 226000 48.7 445 10.916215 Medium 1
## 6 275000 56.4 1148 9.000686 High 0
why use dbscan? * Works well with non-linear data and complex shapes. * Can handle noise and outliers effectively. * Does not require the number of clusters as input.
#5. Identify/Describe at-least three other clustering methods.
Hierarchical Clustering: Hierarchical clustering is a method of clustering data points into groups based on their similarity. The algorithm builds a hierarchy of clusters by either merging or splitting data points based on their distance or similarity. It is used in many statistical applications, including data mining, machine learning, pattern recognition, image analysis, and bioinformatics. It is also used in clustering, classification, and anomaly detection.
Spectral Clustering: Spectral clustering is a method of clustering data points into groups based on the eigenvectors of a similarity matrix. The algorithm uses the spectral properties of the similarity matrix to partition the data points into clusters. It is used in many statistical applications, including data mining, machine learning, pattern recognition, image analysis, and bioinformatics. It is also used in clustering, classification, and anomaly detection.
# Perform Spectral clustering
#load data
data7 <- read.csv("M3_House_Worth.csv")
attach(data7)
## The following objects are masked from data6:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
## The following objects are masked from data4:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
## The following objects are masked from data2:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
## The following objects are masked from data:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
(head(data7,3))
## HousePrice StoreArea BasementArea LawnArea HouseNetWorth
## 1 138800 29.9 75 11.223911 Low
## 2 155000 44.0 504 9.689869 Medium
## 3 152000 46.2 493 10.192613 Medium
library(cluster)
# Select numerical features for clustering
data7 <- data7[,c(1,2,3,4)]
head(data7,3)
## HousePrice StoreArea BasementArea LawnArea
## 1 138800 29.9 75 11.223911
## 2 155000 44.0 504 9.689869
## 3 152000 46.2 493 10.192613
# Normalize the data
data8_scale <- scale(data7)
library(kernlab)
##
## Attaching package: 'kernlab'
## The following object is masked from 'package:ggplot2':
##
## alpha
set.seed(42)
spectral_model <- specc(data8_scale, centers = 3)
#clusters
plot(data, col = spectral_model, pch = 19, main = "Spectral Clustering")
Observations: The spectral clustering algorithm has created 3 clusters based on the data points spectral properties. The clusters are well separated, and the data points are assigned to their respective clusters.
We can tune the above to have a better clusters of 2
#Tune the model
sigma_est <- sigest(data8_scale, frac = 0.1)[2] # Estimate sigma
spectral_model <- specc(data8_scale, centers = 2, kernel = "rbfdot", sigma = sigma_est)
summary(spectral_model)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 1.585 2.000 2.000
#visualize clusters
plot(data8_scale, col = spectral_model, pch = 19, main = "Spectral Clustering")
We have 2 distinct clusters based on the spectral clustering algorithm.
The clusters are well separated, and the data points are assigned to
their respective clusters.
Data has complex shapes (e.g., nested circles).
Traditional methods (k-means) fail.
Data is very high-dimensional (use PCA first).
Computational efficiency is critical.
# Assign cluster names
cluster_names <- c("Low priced", "High priced")
spectral_model <- factor(spectral_model, labels = cluster_names)
# Add cluster assignments to original data
data7$cluster <- spectral_model
head(data7)
## HousePrice StoreArea BasementArea LawnArea cluster
## 1 138800 29.9 75 11.223911 High priced
## 2 155000 44.0 504 9.689869 High priced
## 3 152000 46.2 493 10.192613 High priced
## 4 160000 46.2 510 6.817316 High priced
## 5 226000 48.7 445 10.916215 High priced
## 6 275000 56.4 1148 9.000686 Low priced
# Perform Affinity Propagation clustering
#load data
data9 <- read.csv("M3_House_Worth.csv")
attach(data9)
## The following objects are masked from data7:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
## The following objects are masked from data6:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
## The following objects are masked from data4:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
## The following objects are masked from data2:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
## The following objects are masked from data:
##
## BasementArea, HouseNetWorth, HousePrice, LawnArea, StoreArea
(head(data9,3))
## HousePrice StoreArea BasementArea LawnArea HouseNetWorth
## 1 138800 29.9 75 11.223911 Low
## 2 155000 44.0 504 9.689869 Medium
## 3 152000 46.2 493 10.192613 Medium
# Select numerical features for clustering
data9 <- data9[,c(1,2,3,4)]
head(data9,3)
## HousePrice StoreArea BasementArea LawnArea
## 1 138800 29.9 75 11.223911
## 2 155000 44.0 504 9.689869
## 3 152000 46.2 493 10.192613
# Normalize the data
data10_scale <- scale(data9)
library(apcluster)
##
## Attaching package: 'apcluster'
## The following object is masked from 'package:stats':
##
## heatmap
set.seed(42)
# Perform Affinity Propagation clustering
model7 <- apcluster(negDistMat(r = 2), data10_scale, q = 0.5)
#visualize cluster
summary(model7)
## Length Class Mode
## 18 APResult S4
clusters <- model7@clusters # List of clusters
print(clusters)
## [[1]]
## 6 8 26 34 36 145 178 197 208 272
## 6 8 26 34 36 145 178 197 208 272
##
## [[2]]
## 4 19 32 33 62 92 300
## 4 19 32 33 62 92 300
##
## [[3]]
## 35 58 66 69 86 109 117 126 129 134 141 144 159 161 185 186 203 231 236 246
## 35 58 66 69 86 109 117 126 129 134 141 144 159 161 185 186 203 231 236 246
## 248 249 251 254 260 263 273 274 286 289 293 311 313
## 248 249 251 254 260 263 273 274 286 289 293 311 313
##
## [[4]]
## 27 43 50 56 74 87 95 99 108 111 137 142 219 224 238 240 291 297 303
## 27 43 50 56 74 87 95 99 108 111 137 142 219 224 238 240 291 297 303
##
## [[5]]
## 63 94 204 283
## 63 94 204 283
##
## [[6]]
## 100
## 100
##
## [[7]]
## 2 7 10 29 37 49 65 67 77 78 82 98 102 104 105 107 121 123 149 156
## 2 7 10 29 37 49 65 67 77 78 82 98 102 104 105 107 121 123 149 156
## 179 180 192 216 228 241 270 276 277 294
## 179 180 192 216 228 241 270 276 277 294
##
## [[8]]
## 3 5 45 103 114 120 130 143 166 170 176 182 194 200 205 211 250
## 3 5 45 103 114 120 130 143 166 170 176 182 194 200 205 211 250
##
## [[9]]
## 12 60 83 132 155 264 306 312
## 12 60 83 132 155 264 306 312
##
## [[10]]
## 21 25 28 39 40 42 44 51 52 53 64 73 93 96 122 125 136 152 160 163
## 21 25 28 39 40 42 44 51 52 53 64 73 93 96 122 125 136 152 160 163
## 164 168 169 171 175 188 190 196 198 218 220 229 239 244 259 261 267 268 280 281
## 164 168 169 171 175 188 190 196 198 218 220 229 239 244 259 261 267 268 280 281
## 284 298
## 284 298
##
## [[11]]
## 13 16 31 61 76 112 119 138 174 213 215 222 223 253 256 292 302
## 13 16 31 61 76 112 119 138 174 213 215 222 223 253 256 292 302
##
## [[12]]
## 140 150 234
## 140 150 234
##
## [[13]]
## 84 165 237 299
## 84 165 237 299
##
## [[14]]
## 1 11 14 17 20 30 46 47 55 72 75 80 88 91 97 101 106 115 118 127
## 1 11 14 17 20 30 46 47 55 72 75 80 88 91 97 101 106 115 118 127
## 128 131 135 146 148 151 153 157 167 172 181 189 191 199 210 214 217 225 227 230
## 128 131 135 146 148 151 153 157 167 172 181 189 191 199 210 214 217 225 227 230
## 232 233 242 243 257 266 278 279 282 287 295 296 301 310 314 316
## 232 233 242 243 257 266 278 279 282 287 295 296 301 310 314 316
##
## [[15]]
## 9 15 18 24 41 54 71 81 113 133 154 158 173 177 184 187 195 207 209 226
## 9 15 18 24 41 54 71 81 113 133 154 158 173 177 184 187 195 207 209 226
## 235 247 258 262 265 269 304 307 315
## 235 247 258 262 265 269 304 307 315
##
## [[16]]
## 68 70 90 116 147 245 288 290
## 68 70 90 116 147 245 288 290
##
## [[17]]
## 22 23 38 48 79 85 89 110 124 139 162 183 193 201 202 206 212 221 252 255
## 22 23 38 48 79 85 89 110 124 139 162 183 193 201 202 206 212 221 252 255
## 271 275 285 305 308
## 271 275 285 305 308
##
## [[18]]
## 57 59 309
## 57 59 309
exemplars <- model7@exemplars # Indices of exemplars (centers)
print(exemplars)
## 6 32 86 87 94 100 121 130 155 164 213 234 237 243 262 288 305 309
## 6 32 86 87 94 100 121 130 155 164 213 234 237 243 262 288 305 309
n_clusters <- length(clusters) # Number of clusters
print(n_clusters)
## [1] 18
#Convert to Vector Format
cluster_labels <- rep(0, nrow(data10_scale))
for (i in 1:length(clusters)) {
cluster_labels[clusters[[i]]] <- i
}
print(i)
## [1] 18
#PCA for Dimensionality Reduction (if data has >2 features)
pca_result <- prcomp(data10_scale, scale. = TRUE)
pca_data <- as.data.frame(pca_result$x[, 1:2]) # First 2 PCs
pca_data$Cluster <- as.factor(cluster_labels)
#Plot
library(ggplot2)
ggplot(pca_data, aes(x = PC1, y = PC2, color = Cluster)) +
geom_point(size = 3, alpha = 0.7) +
ggtitle("Affinity Propagation Clustering") +
theme_minimal()
observation: The Affinity Propagation algorithm has created clusters based on the data points’ similarity. The clusters are well separated, and the data points are assigned to their respective clusters.
The gradient operator is a mathematical operator that represents the rate of change of a function at a given point. It is a vector that points in the direction of the steepest ascent of the function. The gradient operator is denoted by the symbol ∇ (nabla) and is used to calculate the gradient of a function.
The gradient descent algorithm is an optimization algorithm used to minimize a function by iteratively moving in the direction of the negative gradient of the function. The algorithm starts at an initial point and takes steps in the direction of the negative gradient until it reaches a local minimum. The gradient descent algorithm is used in many machine learning algorithms, including linear regression, logistic regression, and neural networks.
The gradient descent algorithm can be summarized as follows:
Initialize the parameters (weights) of the model.
Calculate the gradient of the loss function with respect to the parameters.
Update the parameters by moving in the direction of the negative gradient.
Repeat steps 2 and 3 until the algorithm converges to a local minimum.
The gradient descent algorithm is used to optimize the parameters of a model by minimizing the loss function. It is an iterative algorithm that requires tuning of hyperparameters such as the learning rate and the number of iterations.
A loss function is a mathematical function that measures the difference between the predicted values of a model and the actual values of the data. It is used to quantify the error or loss of a model and is an essential component of machine learning algorithms. The loss function is used to optimize the parameters of a model by minimizing the error between the predicted and actual values.
The loss function is important in machine learning for the following reasons:
Optimization: The loss function is used to optimize the parameters of a model by minimizing the error between the predicted and actual values. It guides the learning process of the model and helps improve its performance.
Evaluation: The loss function is used to evaluate the performance of a model. A lower loss value indicates better performance, while a higher loss value indicates poorer performance.
Generalization: The loss function helps prevent overfitting by penalizing complex models that perform well on the training data but poorly on unseen data. It encourages the model to generalize well to new data.
Interpretability: The loss function provides insights into the behavior of the model and the quality of its predictions. It helps identify areas for improvement and guides the selection of hyperparameters.
Comparison: The loss function allows for the comparison of different models and algorithms. It provides a standardized metric for evaluating the performance of models and selecting the best one for a given task.
Overall, the loss function is a critical component of machine learning algorithms and plays a key role in optimizing, evaluating, and improving the performance of models.
# Generate random data
set.seed(42)
x <- 1:100
y <- 2 * x + rnorm(100, mean = 0, sd = 10)
# Plot the data
plot(x, y, main = "Random Data", xlab = "X", ylab = "Y")
#calculate the gradient descent
# Initialize the parameters
w <- 0 # Slope
# Set the learning rate
alpha <- 0.0001
# Set the number of iterations
n_iter <- 100
# Gradient Descent Function
gradient_descent <- function(x, y, alpha, n_iter) {
# Initialize the parameters
w <- 0 # Slope
# Perform gradient descent
for (i in 1:n_iter) {
# Calculate the predicted values
y_pred <- w * x
# Calculate the error
error <- y_pred - y
# Calculate the gradient
gradient <- sum(error * x)
# Update the parameters
w <- w - (alpha * gradient)
}
return(w)
}
# Perform gradient descent
w <- gradient_descent(x, y, alpha = 0.0001, n_iter = 100)
print(w)
## [1] -8.617865e+151
observation: this shows the gradient descent in action. The algorithm iteratively updates the slope parameter to minimize the error between the predicted and actual values of the data. The final slope value represents the best-fit line that minimizes the error of the model.
# create linear mode
model <- lm(y ~ x)
summary(model)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.195 -6.618 0.809 6.527 22.264
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.68935 2.10869 0.327 0.744
## x 1.99279 0.03625 54.971 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.46 on 98 degrees of freedom
## Multiple R-squared: 0.9686, Adjusted R-squared: 0.9683
## F-statistic: 3022 on 1 and 98 DF, p-value: < 2.2e-16
# Plot the data and the linear model
plot(x, y, main = "Linear Regression", xlab = "X", ylab = "Y")
abline(model, col = "red")
observation: The linear regression model has been created using the lm()
function in R. The model fits a line to the data that minimizes the
error between the predicted and actual values. The red line represents
the best-fit line of the model.
Batch Gradient Descent: Batch gradient descent computes the gradient of the loss function with respect to the parameters using the entire training dataset. It updates the parameters by taking a step in the direction of the negative gradient of the loss function. Batch gradient descent is computationally expensive for large datasets but guarantees convergence to the global minimum.
Stochastic Gradient Descent: Stochastic gradient descent computes the gradient of the loss function with respect to the parameters using a single data point or a mini-batch of data points. It updates the parameters after each data point or mini-batch, making it faster than batch gradient descent. Stochastic gradient descent is less computationally expensive but may not converge to the global minimum.
Mini-Batch Gradient Descent: Mini-batch gradient descent computes the gradient of the loss function with respect to the parameters using a small batch of data points. It updates the parameters after each mini-batch, striking a balance between batch and stochastic gradient descent. Mini-batch gradient descent is commonly used in practice for training deep learning models.
Momentum Gradient Descent: Momentum gradient descent is an extension of gradient descent that adds a momentum term to the update rule. The momentum term accelerates the convergence of the algorithm by accumulating the gradients of previous steps. Momentum gradient descent helps overcome local minima and oscillations in the loss function.
Adagrad: Adagrad is an adaptive learning rate optimization algorithm that scales the learning rate of each parameter based on the historical gradients. It adapts the learning rate for each parameter, allowing for faster convergence and better performance on sparse data.
RMSprop: RMSprop is an adaptive learning rate optimization algorithm that divides the learning rate by the root mean square of the historical gradients. It normalizes the learning rate for each parameter, preventing the learning rate from becoming too small or too large.
Adam: Adam (Adaptive Moment Estimation) is an adaptive learning rate optimization algorithm that combines the benefits of momentum and RMSprop. It computes the adaptive learning rate for each parameter based on the first and second moments of the gradients. Adam is widely used in deep learning for its fast convergence and robust performance.
Nesterov Accelerated Gradient (NAG): Nesterov Accelerated Gradient is an extension of momentum gradient descent that calculates the gradient at the lookahead point instead of the current point. It helps reduce oscillations and overshooting in the loss function, leading to faster convergence and better performance.
AdaDelta: AdaDelta is an adaptive learning rate optimization algorithm that eliminates the need for a learning rate hyperparameter. It uses the root mean square of the historical gradients to adapt the learning rate for each parameter. AdaDelta is robust to noisy gradients and converges faster than traditional optimization algorithms.
Nadam: Nadam (Nesterov-accelerated Adaptive Moment Estimation) is an extension of Adam that combines the benefits of Nesterov momentum and RMSprop. It calculates the adaptive learning rate for each parameter based on the first and second moments of the gradients. Nadam is known for its fast convergence and robust performance on a wide range of datasets.
In this assignment, we have explored the Expectation-Maximization (EM) clustering algorithm and its application to clustering data points into groups. We have also discussed the K-means clustering algorithm and its use in clustering data points based on their similarity. We have compared the performance of EM and K-means clustering on a dataset of house prices and visualized the clusters using various plots.
We have also described the Gradient Operator and the Gradient Descent algorithm in machine learning. We have implemented the gradient descent algorithm to fit a linear regression model to a dataset of random data points. We have discussed the importance of loss functions in machine learning and their role in optimizing, evaluating, and improving the performance of models.
Finally, we have described various types of gradient descent algorithms, including Batch Gradient Descent, Stochastic Gradient Descent, Mini-Batch Gradient Descent, Momentum Gradient Descent, Adagrad, RMSprop, Adam, Nesterov Accelerated Gradient, AdaDelta, and Nadam. We have discussed the characteristics and applications of each algorithm and their role in optimizing the parameters of machine learning models.
Overall, this assignment has provided a comprehensive overview of clustering algorithms, gradient descent, and loss functions in machine learning. It has demonstrated the practical application of these concepts in data analysis and model optimization.