SOM

Self-organizing maps (SOMs) are a specific architecture of neural networks that cluster high-dimensional data vectors according to a similarity measure. The clusters are arranged in a low-dimensional topology – usually a grid structure – that preserves the neighbourhood relations existing in the high dimensional data. Thus, not only objects that are assigned to one cluster are similar to each other as in every cluster analysis, but also objects of nearby clusters are expected to be more similar than objects in more distant clusters.

Understanding GSOM:

Growing SOM is used to address the problem of a suitable map size in SOM. Spread Factor is the factor used to control the growth of SOM. From literature present, I understood that there are multiple ways of Growing a SOM. Either you can readjust the positions of the given neurons defined by the BMU or you can use newly produced neurons and assign to a suitable location.

Usually, two-dimensional arrangements of square/rectangle or hexagons are used for the definition of the neighborhood relations. I have implemented both of them below using GrowingSOM in R.

Input Data:

Iris Dataset

Consists of 150 observations and 5 variables. Explained in detail below.

Auto-mpg Dataset

Consists of 392 observations and 9 variables

Code and Predictions:

  library(GrowingSOM)
  data("iris")
  s = sample(1:150, 100)
  
  summary(iris) # 150 observations of 5 variables
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
  train_set = iris[s,1:4] 
  test_set = iris[-s,1:4]
  
  summary(train_set) # 100 observations
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.400   Min.   :2.000   Min.   :1.200   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.700   1st Qu.:1.575   1st Qu.:0.275  
##  Median :5.800   Median :3.000   Median :4.200   Median :1.300  
##  Mean   :5.805   Mean   :3.019   Mean   :3.751   Mean   :1.166  
##  3rd Qu.:6.300   3rd Qu.:3.300   3rd Qu.:5.000   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500
  summary(test_set)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.400   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.125   1st Qu.:2.900   1st Qu.:1.600   1st Qu.:0.400  
##  Median :5.750   Median :3.100   Median :4.500   Median :1.400  
##  Mean   :5.920   Mean   :3.134   Mean   :3.772   Mean   :1.266  
##  3rd Qu.:6.575   3rd Qu.:3.400   3rd Qu.:5.375   3rd Qu.:2.000  
##  Max.   :7.700   Max.   :3.900   Max.   :6.700   Max.   :2.500
  # Train GrowingSOM
  gsom_map <- train.gsom(train_set, spreadFactor=0.8, nhood="rect")
## ..................................................
  # Some Plots
  plot(gsom_map, type = "training")

  plot(gsom_map, type="count")

  plot(gsom_map, type = "distance")

  par(mfrow=c(2,2))
  plot(gsom_map, type="property")

Using auto_mpg dataset and and creating hexagonal grid

  data("auto_mpg")
  s = sample(1:392, 300)

  train_set = auto_mpg[s,1:8]
  summary(train_set)
##       mpg          cylinders    displacement     horsepower    
##  Min.   : 9.00   Min.   :3.0   Min.   : 68.0   Min.   : 46.00  
##  1st Qu.:17.00   1st Qu.:4.0   1st Qu.:105.0   1st Qu.: 76.75  
##  Median :22.00   Median :4.0   Median :151.0   Median : 95.00  
##  Mean   :23.42   Mean   :5.5   Mean   :197.2   Mean   :105.20  
##  3rd Qu.:29.00   3rd Qu.:8.0   3rd Qu.:302.0   3rd Qu.:129.25  
##  Max.   :46.60   Max.   :8.0   Max.   :455.0   Max.   :230.00  
##      weight      acceleration     model year        origin     
##  Min.   :1649   Min.   : 8.00   Min.   :70.00   Min.   :1.000  
##  1st Qu.:2232   1st Qu.:13.50   1st Qu.:73.00   1st Qu.:1.000  
##  Median :2845   Median :15.50   Median :76.00   Median :1.000  
##  Mean   :2998   Mean   :15.56   Mean   :76.03   Mean   :1.567  
##  3rd Qu.:3615   3rd Qu.:17.23   3rd Qu.:79.00   3rd Qu.:2.000  
##  Max.   :5140   Max.   :24.80   Max.   :82.00   Max.   :3.000
  test_set = auto_mpg[-s,1:8]
  summary(test_set)
##       mpg          cylinders     displacement     horsepower   
##  Min.   :10.00   Min.   :4.00   Min.   : 71.0   Min.   : 60.0  
##  1st Qu.:17.57   1st Qu.:4.00   1st Qu.: 98.0   1st Qu.: 75.0  
##  Median :24.00   Median :4.00   Median :140.0   Median : 91.0  
##  Mean   :23.54   Mean   :5.38   Mean   :185.4   Mean   :102.1  
##  3rd Qu.:29.80   3rd Qu.:6.00   3rd Qu.:250.0   3rd Qu.:115.2  
##  Max.   :38.10   Max.   :8.00   Max.   :429.0   Max.   :215.0  
##      weight      acceleration     model year        origin     
##  Min.   :1613   Min.   :11.00   Min.   :70.00   Min.   :1.000  
##  1st Qu.:2209   1st Qu.:14.00   1st Qu.:73.00   1st Qu.:1.000  
##  Median :2659   Median :15.40   Median :76.00   Median :1.000  
##  Mean   :2912   Mean   :15.48   Mean   :75.82   Mean   :1.609  
##  3rd Qu.:3590   3rd Qu.:16.52   3rd Qu.:79.00   3rd Qu.:2.000  
##  Max.   :4952   Max.   :21.00   Max.   :82.00   Max.   :3.000
  # Train Gsom Model (hexagonal grid)
  gsom_map <- train_xy.gsom(train_set[,2:8], train_set[,1], spreadFactor = 0.9, nhood="hex")
## ..................................................

Output/Predictions

  plot(gsom_map, type = "training")

  plot(gsom_map, type = "predict")

  plot(gsom_map, type = "distance")

  par(mfrow=c(3,3))
  plot(gsom_map, type="property")
  
  # Predict mpg for the test set
  gsom_predictions = predict.gsom(gsom_map, test_set[,2:8])
  
  plot(gsom_predictions, type = "predict")