Self-organizing maps (SOMs) are a specific architecture of neural networks that cluster high-dimensional data vectors according to a similarity measure. The clusters are arranged in a low-dimensional topology – usually a grid structure – that preserves the neighbourhood relations existing in the high dimensional data. Thus, not only objects that are assigned to one cluster are similar to each other as in every cluster analysis, but also objects of nearby clusters are expected to be more similar than objects in more distant clusters.
Growing SOM is used to address the problem of a suitable map size in SOM. Spread Factor is the factor used to control the growth of SOM. From literature present, I understood that there are multiple ways of Growing a SOM. Either you can readjust the positions of the given neurons defined by the BMU or you can use newly produced neurons and assign to a suitable location.
Usually, two-dimensional arrangements of square/rectangle or hexagons are used for the definition of the neighborhood relations. I have implemented both of them below using GrowingSOM in R.
Iris Dataset
Consists of 150 observations and 5 variables. Explained in detail below.
Auto-mpg Dataset
Consists of 392 observations and 9 variables
library(GrowingSOM)
data("iris")
s = sample(1:150, 100)
summary(iris) # 150 observations of 5 variables
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
train_set = iris[s,1:4]
test_set = iris[-s,1:4]
summary(train_set) # 100 observations
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.400 Min. :2.000 Min. :1.200 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.700 1st Qu.:1.575 1st Qu.:0.275
## Median :5.800 Median :3.000 Median :4.200 Median :1.300
## Mean :5.805 Mean :3.019 Mean :3.751 Mean :1.166
## 3rd Qu.:6.300 3rd Qu.:3.300 3rd Qu.:5.000 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
summary(test_set)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.400 Min. :1.000 Min. :0.100
## 1st Qu.:5.125 1st Qu.:2.900 1st Qu.:1.600 1st Qu.:0.400
## Median :5.750 Median :3.100 Median :4.500 Median :1.400
## Mean :5.920 Mean :3.134 Mean :3.772 Mean :1.266
## 3rd Qu.:6.575 3rd Qu.:3.400 3rd Qu.:5.375 3rd Qu.:2.000
## Max. :7.700 Max. :3.900 Max. :6.700 Max. :2.500
# Train GrowingSOM
gsom_map <- train.gsom(train_set, spreadFactor=0.8, nhood="rect")
## ..................................................
# Some Plots
plot(gsom_map, type = "training")
plot(gsom_map, type="count")
plot(gsom_map, type = "distance")
par(mfrow=c(2,2))
plot(gsom_map, type="property")
data("auto_mpg")
s = sample(1:392, 300)
train_set = auto_mpg[s,1:8]
summary(train_set)
## mpg cylinders displacement horsepower
## Min. : 9.00 Min. :3.0 Min. : 68.0 Min. : 46.00
## 1st Qu.:17.00 1st Qu.:4.0 1st Qu.:105.0 1st Qu.: 76.75
## Median :22.00 Median :4.0 Median :151.0 Median : 95.00
## Mean :23.42 Mean :5.5 Mean :197.2 Mean :105.20
## 3rd Qu.:29.00 3rd Qu.:8.0 3rd Qu.:302.0 3rd Qu.:129.25
## Max. :46.60 Max. :8.0 Max. :455.0 Max. :230.00
## weight acceleration model year origin
## Min. :1649 Min. : 8.00 Min. :70.00 Min. :1.000
## 1st Qu.:2232 1st Qu.:13.50 1st Qu.:73.00 1st Qu.:1.000
## Median :2845 Median :15.50 Median :76.00 Median :1.000
## Mean :2998 Mean :15.56 Mean :76.03 Mean :1.567
## 3rd Qu.:3615 3rd Qu.:17.23 3rd Qu.:79.00 3rd Qu.:2.000
## Max. :5140 Max. :24.80 Max. :82.00 Max. :3.000
test_set = auto_mpg[-s,1:8]
summary(test_set)
## mpg cylinders displacement horsepower
## Min. :10.00 Min. :4.00 Min. : 71.0 Min. : 60.0
## 1st Qu.:17.57 1st Qu.:4.00 1st Qu.: 98.0 1st Qu.: 75.0
## Median :24.00 Median :4.00 Median :140.0 Median : 91.0
## Mean :23.54 Mean :5.38 Mean :185.4 Mean :102.1
## 3rd Qu.:29.80 3rd Qu.:6.00 3rd Qu.:250.0 3rd Qu.:115.2
## Max. :38.10 Max. :8.00 Max. :429.0 Max. :215.0
## weight acceleration model year origin
## Min. :1613 Min. :11.00 Min. :70.00 Min. :1.000
## 1st Qu.:2209 1st Qu.:14.00 1st Qu.:73.00 1st Qu.:1.000
## Median :2659 Median :15.40 Median :76.00 Median :1.000
## Mean :2912 Mean :15.48 Mean :75.82 Mean :1.609
## 3rd Qu.:3590 3rd Qu.:16.52 3rd Qu.:79.00 3rd Qu.:2.000
## Max. :4952 Max. :21.00 Max. :82.00 Max. :3.000
# Train Gsom Model (hexagonal grid)
gsom_map <- train_xy.gsom(train_set[,2:8], train_set[,1], spreadFactor = 0.9, nhood="hex")
## ..................................................
plot(gsom_map, type = "training")
plot(gsom_map, type = "predict")
plot(gsom_map, type = "distance")
par(mfrow=c(3,3))
plot(gsom_map, type="property")
# Predict mpg for the test set
gsom_predictions = predict.gsom(gsom_map, test_set[,2:8])
plot(gsom_predictions, type = "predict")