R&D Visual Analytics

Introduction

The purpose of this R&D project is to develop an innovative analytics visual model that conveys insight to the end user in the tackling of large scale data problems focused on Internet based applications. This R&D project began a couple of years ago based on years of experience in developing custom machine learning algorithms to solve problems in the area of border security and law enforcement as well as in the financial and scientific domains.

Much research has gone into several different lines of investigation namely review of different machine learning algorithms, large scale data problems with a focus on data sparsity and rare/anomaly events, big data infrastructure, and recent developments in visual analytics.

What follows is the application of this R&D to a production grade Internet dataset. Extensive preliminary analysis and analytics has gone into undertstanding the problem space and feature engineering development.

Internet Domain

Search advertising has been one of the major revenue sources of the Internet industry for years. A key technology behind search advertising is to predict the click-through rate (CTR) of ads, as the economic model behind search advertising requires CTR values to rank ads and to price click.

Training instances derived from session logs from the Tencent proprietary search engine, sosa.com was provided to us. The original purpose of this data set was to accurately predict the CTR of ads in the testing instances. However our objective is to use this data set for our R&D project which has the goal of developing innovative visual analytics model that end users can leverage to derive business insight. Our target domain are in the area of Internet based services including Internet of Things.

Data Set

The training data set consists of 150 Million queries records for 23 Million search users, for a total of 10 Gb of data. Multiple queries with the same properties and their outputs were rolled up into one record in the training data. Expanded to individual queries, this produced 235 million training records.

Each query and its output were described by 10 variables as follows:

The IDs above were all hash-mapped to integers. Each of the queryID, keywordID, titleID, and descriptionID was associated with a set of keywords, which were also hash- mapped to integers and provided in four extra data files, to provide detailed description of the query and advertisement, respectively.

If a user could not be identified, 0 was assigned to the userID (30% of data). In addition, The gender (Male=1, Female=2, and Unknown=0) and age ((0,12]=1, (12,18]=2, (18, 24]=3, (24,30]=4, (30,40]=5, and (40+,]=6) of each userID were provided in another data file. Two variables described the response of a user to an advertisement: the number of clicks and the number of impressions. For each user, query and the resulted ad, the number of impressions indicated the times that the ad was displayed to the user. The number of clicks was the times that the user clicked the ad. Consequently, the CTR was the ratio between the numbers of clicks and impressions. The average CTR of the training data was 0.0387.

Data Set Transformation

The data set was reduced as the purpose of the project was not the development of an optimal analytical model but the development of an innovative visual analytics data product with the requirement to run hundreds of simulations quickly.

The data set reduction algorithm was based on least user usage of the search based on the number of search queries issued by a user. The training data set was randomly split between training 70%, validation 15% and test set 15%.

Feature Engineering

15 derived feature tables were created.

Each Derived Feature table has the following general table format:

Machine Learning Algorithms

Over the last 18 months we have evaluated a number of machine learning algorithms based on the type of problem space (clustering, classification, anomaly detection, prediction) and analytics approaches.

Our focus is on ensemble learning which combines several algorithms based on Self Organizing Map (SOM), Particle Swarm Optimization, Neural Network, Random Forest, KMeans, and Social Network Graph Based Analytics (SNA).

This R code focuses on SOM, kMeans, and Neural Network. Other R code will focus on SOM-SNA and on Random Forest.

Codebook

All dataset reduction and transformations are available in a separate extensive codebook file. The PostgreSQL database was used to store all datasets. Almost all data transformations were conducted in the database and the R programming language was used for the analytics and visualization. Information is also exported to Gephi for graphing visualization.

  library(RPostgreSQL)
## Loading required package: DBI
  library(kohonen)
## Loading required package: class
## Loading required package: MASS
  library(neuralnet)
## Loading required package: grid
  library(ROCR)
## Loading required package: gplots
## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009
## 
## Attaching package: 'gplots'
## 
## The following object is masked from 'package:stats':
## 
##     lowess
## 
## 
## Attaching package: 'ROCR'
## 
## The following object is masked from 'package:neuralnet':
## 
##     prediction
  library(RColorBrewer)
  library(png)
#Function to create the polygon for each hexagon of custom SOM
Hexagon <- function (x, y, unitcell = 1, col = col) {
  polygon(c(x, x, x + unitcell/2, x + unitcell, x + unitcell, 
            x + unitcell/2), c(y + unitcell * 0.125, y + unitcell * 
                               0.875, y + unitcell * 1.125, y + unitcell * 0.875, 
                               y + unitcell * 0.125, y - unitcell * 0.125), 
          col = col, border=NA)
}#function
  coolBlueHotRed <- function(n, alpha = 1) {
    rainbow(n, end=4/6, alpha=alpha)[n:1]
  }

  pretty_palette <- c("#1f77b4","#ff7f0e","#2ca02c",
                      "#d62728","#9467bd","#8c564b","#e377c2")

Loads the database driver and connect to database with credentials. R code is set to echo=FALSE.

  ## Training: Size can vary from 500K to 5.5M 
  advertisingtable <- paste("select * from search_instance LIMIT 750000 OFFSET 0")
  ## Validation
  ##advertisingtable <- paste("select * from search_instance LIMIT 1500000 OFFSET 5500000")

  ## Testing
  ##advertisingtable <- paste("select * from search_instance LIMIT 1500000 OFFSET 7000000")

  rs <- dbSendQuery(con, advertisingtable)

  ## fetch all elements from the result set
  result_set <- fetch(rs,n=-1)

  ## We started we are large number of input features and have reduced based on our analysis.
  data_train <- result_set[,c("position","adid_low","adid_medium","adid_high","adid_age_low","adid_age_medium","adid_age_high","adid_gender_low","adid_gender_medium","adid_gender_high","advertiserid_low","advertiserid_medium","advertiserid_high","advertiserid_age_low","advertiserid_age_medium","advertiserid_age_high","advertiserid_gender_low","advertiserid_gender_medium","advertiserid_gender_high", "male","female","unknowngender","age0to12","age12to18","age18to24","age24to30","age30to40","age40plus","clicked", "notclicked")]

  ## Scaling is not required as this has already been done in the database
  #data_train_matrix <- as.matrix(scale(data_train))

  data_train_matrix <- as.matrix(data_train)

Several hundred simulations were conducted using differenct SOM parameters including custom parameters.

  ## Create the SOM Grid - you generally have to specify the size of the 
  ## training grid prior to training the SOM.
  som_grid <- somgrid(xdim = 7, ydim=7, topo="hexagonal")
  som1total <- 49
  som1row <-7
  som1col <-7

  ## Finally, train the SOM, options for the number of iterations,
  ## the learning rates, and the neighbourhood are available
  set.seed(21121962)
  som_model <- som(subset(data_train_matrix, select=-c(clicked,notclicked)), 
                   grid=som_grid, 
                   rlen=7, 
                   alpha=c(0.05,0.01), 
                   keep.data = TRUE, 
                   n.hood="circular")
  ## Training progress. This shows the variation between the weights of the nodes and 
  ## the cases presented to it. Overtime each individual nodes weight should closely 
  ## match its winning cases. This also shows how many iterations are required before 
  ## the mean distance is minimized. This can be used to determine the optimal size of
  ## of the SOM. If it is too small it may have a hard time to convergence to a minimum.
  plot(som_model, type="changes",main="Training Progress")

plot of chunk SOMTrainingPlot

  ## The SOM allows to visualise the count of how many cases are mapped to each 
  ## node on the map. This metric can be used as a measure of map quality – ideally the 
  ## sample distribution is relatively uniform. Large values in some map areas suggests 
  ## that a larger map would be benificial. If increasing the map does not change this
  ## then it may suggest a large cluster of cases. 
  plot(som_model, type="count", palette.name= coolBlueHotRed,main="Counts Plot")

plot of chunk SOMNodeCountPlot

  ## Often referred to as the “U-Matrix”, this visualisation is of the distance 
  ## between each node and its neighbours. Typically viewed with a grayscale palette, 
  ## areas of low neighbour distance indicate groups of nodes that are similar. 
  ## Areas with large distances indicate the nodes are much more dissimilar – 
  ## and indicate natural boundaries between node clusters. 
  ## The U-Matrix can be used to identify clusters within the SOM map.
  plot(som_model, type="dist.neighbours",palette.name= coolBlueHotRed, main="Neighbour Distance Plot")

plot of chunk SOMUPlot

  ## Shows the mean distance of objects mapped to a unit to the codebook vector of that unit. 
  ## The smaller the distances, the better the objects are represented by the codebook vectors.
  plot(som_model, type="quality",palette.name= coolBlueHotRed, main="Winning Node Inter Distance Plot")

plot of chunk SOMInterPlot

  ## The node weight vectors, or “codes”, are made up of normalised values of the 
  ## original variables used to generate the SOM. Each node’s weight vector is 
  ## representative / similar of the samples mapped to that node. 
  ## By visualising the weight vectors across the map, we can see patterns 
  ## in the distribution of samples and variables. 
  ## The default visualisation of the weight vectors is a “fan diagram”, 
  ## where individual fan representations of the magnitude of each variable 
  ## in the weight vector is shown for each node.
  ##plot(som_model, type="codes",palette.name = rainbow)

The reason that at this point we try and plot the potential clusters using KMeans is that there is little point in doing further detailed analysis if the map is not a quality map. The previous plots may have provided some insight, however this plot will provide final validation.

  ## Look at elbow point which tells your the number of clusters
  mydata <- som_model$codes 
  wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var)) 
  for (i in 2:15) {
    wss[i] <- sum(kmeans(mydata, centers=i)$withinss)
  }
  plot(wss)

plot of chunk SOMNumberofClusterPlot

  ## Plot cluster boundaries based on results found from previous plot(wss)
  ## use hierarchical clustering to cluster the codebook vectors
  som_cluster <- cutree(hclust(dist(som_model$codes)), 2)
  # plot these results:
  plot(som_model, type="mapping", bgcol = pretty_palette[som_cluster], main = "Clusters") 
  add.cluster.boundaries(som_model, som_cluster)

plot of chunk SOMClustersPlot12

  ## Plot cluster boundaries based on results found from previous plot(wss)
  ## use hierarchical clustering to cluster the codebook vectors
  som_cluster <- cutree(hclust(dist(som_model$codes)), 3)
  ## plot these results:
  plot(som_model, type="mapping", bgcol = pretty_palette[som_cluster], main = "Clusters") 
  add.cluster.boundaries(som_model, som_cluster)

plot of chunk SOMClustersPlot13

  ## Plot cluster boundaries based on results found from previous plot(wss)
  ## use hierarchical clustering to cluster the codebook vectors
  som_cluster <- cutree(hclust(dist(som_model$codes)), 4)
  ## plot these results:
  plot(som_model, type="mapping", bgcol = pretty_palette[som_cluster], main = "Clusters") 
  add.cluster.boundaries(som_model, som_cluster)

plot of chunk SOMClustersPlot14

  ## Heatmaps are perhaps the most important visualisation possible for Self-Organising Maps.
  ## The use of a weight space view as in that tries to view all dimensions on the 
  ## one diagram is unsuitable for a high-dimensional (>7 variable) SOM. 
  ## A SOM heatmap allows the visualisation of the distribution of a single variable 
  ## across the map. Typically, a SOM investigative process involves the creation of 
  ## multiple heatmaps, and then the comparison of these heatmaps to identify 
  ## interesting areas on the map. 
  ## It is important to remember that the individual sample positions do not move 
  ## from one visualisation to another, the map is simply coloured by different variables.
  ## The default Kohonen heatmap is created by using the type “heatmap”, and then providing
  ## one of the variables from the set of node weights. 
plot(som_model, type = "property", property = som_model$codes[,1], palette.name=coolBlueHotRed,main=names(data_train)[1])

plot of chunk SOMHeatMapPlot10

plot(som_model, type = "property", property = som_model$codes[,2], palette.name=coolBlueHotRed,main=names(data_train)[2])

plot of chunk SOMHeatMapPlot10

 plot(som_model, type = "property", property = som_model$codes[,3], palette.name=coolBlueHotRed,main=names(data_train)[3])

plot of chunk SOMHeatMapPlot10

plot(som_model, type = "property", property = som_model$codes[,4], palette.name=coolBlueHotRed,main=names(data_train)[4])

plot of chunk SOMHeatMapPlot10

plot(som_model, type = "property", property = som_model$codes[,5], palette.name=coolBlueHotRed,main=names(data_train)[5])

plot of chunk SOMHeatMapPlot10

plot(som_model, type = "property", property = som_model$codes[,6], palette.name=coolBlueHotRed,main=names(data_train)[6])

plot of chunk SOMHeatMapPlot10

plot(som_model, type = "property", property = som_model$codes[,7], palette.name=coolBlueHotRed,main=names(data_train)[7])

plot of chunk SOMHeatMapPlot10

plot(som_model, type = "property", property = som_model$codes[,8], palette.name=coolBlueHotRed,main=names(data_train)[8])

plot of chunk SOMHeatMapPlot10

plot(som_model, type = "property", property = som_model$codes[,9], palette.name=coolBlueHotRed,main=names(data_train)[9])

plot of chunk SOMHeatMapPlot10

plot(som_model, type = "property", property = som_model$codes[,10], palette.name=coolBlueHotRed,main=names(data_train)[10])

plot of chunk SOMHeatMapPlot10

plot(som_model, type = "property", property = som_model$codes[,11], palette.name=coolBlueHotRed,main=names(data_train)[11])

plot of chunk SOMHeatMapPlot11

plot(som_model, type = "property", property = som_model$codes[,12], palette.name=coolBlueHotRed,main=names(data_train)[12])

plot of chunk SOMHeatMapPlot11

plot(som_model, type = "property", property = som_model$codes[,13], palette.name=coolBlueHotRed,main=names(data_train)[13])

plot of chunk SOMHeatMapPlot11

plot(som_model, type = "property", property = som_model$codes[,14], palette.name=coolBlueHotRed,main=names(data_train)[14])

plot of chunk SOMHeatMapPlot11

plot(som_model, type = "property", property = som_model$codes[,15], palette.name=coolBlueHotRed,main=names(data_train)[15])

plot of chunk SOMHeatMapPlot11

plot(som_model, type = "property", property = som_model$codes[,16], palette.name=coolBlueHotRed,main=names(data_train)[16])

plot of chunk SOMHeatMapPlot11

 plot(som_model, type = "property", property = som_model$codes[,17], palette.name=coolBlueHotRed,main=names(data_train)[17])

plot of chunk SOMHeatMapPlot11

plot(som_model, type = "property", property = som_model$codes[,18], palette.name=coolBlueHotRed,main=names(data_train)[18])

plot of chunk SOMHeatMapPlot11

plot(som_model, type = "property", property = som_model$codes[,19], palette.name=coolBlueHotRed,main=names(data_train)[19])

plot of chunk SOMHeatMapPlot11

plot(som_model, type = "property", property = som_model$codes[,20], palette.name=coolBlueHotRed,main=names(data_train)[20])

plot of chunk SOMHeatMapPlot11

plot(som_model, type = "property", property = som_model$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[21])

plot of chunk SOMHeatMapPlot12

plot(som_model, type = "property", property = som_model$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[22])

plot of chunk SOMHeatMapPlot12

plot(som_model, type = "property", property = som_model$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[23])

plot of chunk SOMHeatMapPlot12

plot(som_model, type = "property", property = som_model$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[24])

plot of chunk SOMHeatMapPlot12

plot(som_model, type = "property", property = som_model$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[25])

plot of chunk SOMHeatMapPlot12

plot(som_model, type = "property", property = som_model$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[26])

plot of chunk SOMHeatMapPlot12

plot(som_model, type = "property", property = som_model$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[27])

plot of chunk SOMHeatMapPlot12

plot(som_model, type = "property", property = som_model$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[28])

plot of chunk SOMHeatMapPlot12

Neural Network and Neural Network + SOM Prediction

We have trained our SOM with the desired features and applied KMeans for our clustering. We will use two models namely one model which uses a Feed Forward Neural Network only and one model which uses a combination of the trained SOM + Feed Forward Neural Network.

Both of these were tested extensively and did not perform well due to the low click rate and the high sparsity of the data. This was to be expected given that this problem falls into the rare event/anomaly detection.

neuralnetmodel <- neuralnet(clicked + notclicked ~ male + female + unknowngender + age0to12 +
age12to18 + age18to24 + age24to30 + age30to40 + age40plus,  data_train, hidden = 7, rep=1, lifesign = "minimal", threshold = 100.0,act.fct="tanh")
## hidden: 7    thresh: 100    rep: 1/1    steps:      90   error: 8926.13212   time: 57.84 secs
plot(neuralnetmodel)

plot of chunk neuralplot1

## Mapping of training dataset onto the SOM; meaning this gives the winning node and its
## distance to the winning mode. We used both as input to the Neural Network. 
mapping <- map(som_model,data_train_matrix)

neuralnet_input <- as.data.frame(cbind(mapping$unit.classif, mapping$distances,data_train$clicked,data_train$notclicked))
colnames(neuralnet_input) <- c("winning", "distance", "clicked", "notclicked")

set.seed(23999)
neuralnetmodel <- neuralnet(clicked + notclicked~ winning + distance, neuralnet_input, hidden = 7, rep=1, lifesign = "full", threshold = 100.0,act.fct="tanh")
## hidden: 7    thresh: 100    rep: 1/1    steps:     111   error: 8944.53974   time: 59.51 secs
plot(neuralnetmodel)

## Testing the Neural Network 
data_test <- subset(neuralnet_input, select = c("winning","distance"))
neuralnetwork_results <- compute(neuralnetmodel, data_test)

neuralnetwork_accuracy <- data.frame(data_train$click,data_train$notclicked,round(neuralnetwork_results$net.result))
sum(neuralnetwork_accuracy$data_train.clicked)
## [1] 0
sum(neuralnetwork_accuracy$data_train.X2)
## [1] 0

plot of chunk neuralplot2

Rare Event/Anomaly Dectection using SOM + Genetic Algorithm + SNA

We will use several innovative features that we recently developed. SOM Zooming works as follows: -SOM is trained on all input features except clicked/notclicked -Various SOM plots are analyzed to ensure we have a quality map -SOM map is shown with highlighted clicked areas -User selects area of interest and all cases associated with this area are selected as well as only a subset of variables that contributed most to those cases. -A new SOM map is developed on this subset. -This process continues until the end user is satisfied that the clicked cases are easily discernable and predictable also resulting in selecting the key variables. -Subsequent to this the validation and testing set are submitted to all maps and each data point is plotted on each map. -Performance metrics are established on training, validation, and testing set.

## Mapping of training dataset of clicked observations onto the SOM. We want to see where on the SOM the clicked observations are with the idea that we will subset those training cases around those nodes as well as select only the high contributing variables to train a second and subsequent SOM.

## Select observations that have been clicked
data_train_clicked <- subset(data_train, data_train$clicked == 1,select=-c(clicked,notclicked)) 
data_train_matrix <- as.matrix(data_train_clicked)

## See where observation maps onto the SOM
trainingmappingclicked <- map(som_model, data_train_matrix)

## Collection of unique winning SOM nodes that we will use latter 
## to select non-clicked observations. 
winingsom1 <- unique(trainingmappingclicked$unit.classif) 

## See where observation maps onto the SOM 
data_train_matrix <- as.matrix(data_train)
trainingmapping <- map(som_model,subset(data_train_matrix, select=-c(clicked,notclicked)))

## Add new column of winning SOM, distance from winning node and then subselect and merge with data_train
## ROC data structure 
data_train$winningsom <- trainingmapping$unit.classif
data_train$distance <- trainingmapping$distances
data_roc1 <- subset(data_train, !(data_train$winningsom %in% winingsom1))
data_roc1 <- cbind(predictedclicked= 0, data_roc1)
data_roc1 <- subset(data_roc1, select=c(clicked,predictedclicked))
data_train <- subset(data_train, data_train$winningsom %in% winingsom1)

data_train_matrix <- as.matrix(data_train)
  ## Create the SOM Grid - you generally have to specify the size of the 
  ## training grid prior to training the SOM. Hexagonal and Circular 
  ## topologies are possible. 
  som_grid2 <- somgrid(xdim = 20, ydim=20, topo="hexagonal")
  som2total <- 400
  som2row <-20
  som2col <-20

  ## Finally, train the SOM, options for the number of iterations,
  ## the learning rates, and the neighbourhood are available
  set.seed(16564646)
  som_model2 <- som(subset(data_train_matrix, select=-c(clicked,notclicked,winningsom,distance)), 
                   grid=som_grid2, 
                   rlen=7, 
                   alpha=c(0.05,0.01), 
                   keep.data = TRUE, 
                   n.hood="circular")
  ## Training progress. This shows the variation between the weights of the nodes and 
  ## the cases presented to it. Overtime each individual nodes weight should closely 
  ## match its winning cases. This also shows how many iterations are required before 
  ## the mean distance is minimized. This can be used to determine the optimal size of
  ## of the SOM. If it is too small it may have a hard time to convergence to a minimum.
  plot(som_model2, type="changes",main="Training Progress")

plot of chunk SOMTrainingPlot2

  ## The SOM allows to visualise the count of how many cases are mapped to each 
  ## node on the map. This metric can be used as a measure of map quality – ideally the 
  ## sample distribution is relatively uniform. Large values in some map areas suggests 
  ## that a larger map would be benificial. If increasing the map does not change this
  ## then it may suggest a large cluster of cases. 
  plot(som_model2, type="count", palette.name= coolBlueHotRed,main="Counts Plot")

plot of chunk SOMNodeCountPlot2

  ## Often referred to as the “U-Matrix”, this visualisation is of the distance 
  ## between each node and its neighbours. Typically viewed with a grayscale palette, 
  ## areas of low neighbour distance indicate groups of nodes that are similar. 
  ## Areas with large distances indicate the nodes are much more dissimilar – 
  ## and indicate natural boundaries between node clusters. 
  ## The U-Matrix can be used to identify clusters within the SOM map.
  plot(som_model2, type="dist.neighbours",palette.name= coolBlueHotRed, main="Neighbour Distance Plot")

plot of chunk SOMUPlot2

  ## Shows the mean distance of objects mapped to a unit to the codebook vector of that unit. 
  ## The smaller the distances, the better the objects are represented by the codebook vectors.
  plot(som_model2, type="quality",palette.name= coolBlueHotRed, main="Winning Node Inter Distance Plot")

plot of chunk SOMInterPlot2

  ## The node weight vectors, or “codes”, are made up of normalised values of the 
  ## original variables used to generate the SOM. Each node’s weight vector is 
  ## representative / similar of the samples mapped to that node. 
  ## By visualising the weight vectors across the map, we can see patterns 
  ## in the distribution of samples and variables. 
  ## The default visualisation of the weight vectors is a “fan diagram”, 
  ## where individual fan representations of the magnitude of each variable 
  ## in the weight vector is shown for each node.
  ##plot(som_model2, type="codes",palette.name = rainbow)

The reason that at this point we try and plot the potential clusters using KMeans is that there is little point in doing further detailed analysis if the map is not a quality map. The previous plots may have provided some insight, however this plot will provide final validation.

  ## Look at elbow point which tells your the number of clusters
  mydata <- som_model2$codes 
  wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var)) 
  for (i in 2:15) {
    wss[i] <- sum(kmeans(mydata, centers=i)$withinss)
  }
  plot(wss)

plot of chunk SOMNumberofClusterPlot2

  ## Plot cluster boundaries based on results found from previous plot(wss)
  ## use hierarchical clustering to cluster the codebook vectors
  som_cluster <- cutree(hclust(dist(som_model2$codes)), 2)
  # plot these results:
  plot(som_model2, type="mapping", bgcol = pretty_palette[som_cluster], main = "Clusters") 
  add.cluster.boundaries(som_model2, som_cluster)

plot of chunk SOMClustersPlot22

  ## Plot cluster boundaries based on results found from previous plot(wss)
  ## use hierarchical clustering to cluster the codebook vectors
  som_cluster <- cutree(hclust(dist(som_model2$codes)), 3)
  # plot these results:
  plot(som_model2, type="mapping", bgcol = pretty_palette[som_cluster], main = "Clusters") 
  add.cluster.boundaries(som_model2, som_cluster)

plot of chunk SOMClustersPlot23

  ## Plot cluster boundaries based on results found from previous plot(wss)
  ## use hierarchical clustering to cluster the codebook vectors
  som_cluster <- cutree(hclust(dist(som_model2$codes)), 4)
  # plot these results:
  plot(som_model2, type="mapping", bgcol = pretty_palette[som_cluster], main = "Clusters") 
  add.cluster.boundaries(som_model2, som_cluster)

plot of chunk SOMClustersPlot24

  ## Heatmaps are perhaps the most important visualisation possible for Self-Organising Maps.
  ## The use of a weight space view as in that tries to view all dimensions on the 
  ## one diagram is unsuitable for a high-dimensional (>7 variable) SOM. 
  ## A SOM heatmap allows the visualisation of the distribution of a single variable 
  ## across the map. Typically, a SOM investigative process involves the creation of 
  ## multiple heatmaps, and then the comparison of these heatmaps to identify 
  ## interesting areas on the map. 
  ## It is important to remember that the individual sample positions do not move 
  ## from one visualisation to another, the map is simply coloured by different variables.
  ## The default Kohonen heatmap is created by using the type “heatmap”, and then providing
  ## one of the variables from the set of node weights. 
plot(som_model2, type = "property", property = som_model2$codes[,1], palette.name=coolBlueHotRed,main=names(data_train)[1])

plot of chunk SOMHeatMapPlot20

plot(som_model2, type = "property", property = som_model2$codes[,2], palette.name=coolBlueHotRed,main=names(data_train)[2])
## Error: 'breaks' are not unique

plot of chunk SOMHeatMapPlot20

 plot(som_model2, type = "property", property = som_model2$codes[,3], palette.name=coolBlueHotRed,main=names(data_train)[3])

plot of chunk SOMHeatMapPlot20

plot(som_model2, type = "property", property = som_model2$codes[,4], palette.name=coolBlueHotRed,main=names(data_train)[4])

plot of chunk SOMHeatMapPlot20

plot(som_model2, type = "property", property = som_model2$codes[,5], palette.name=coolBlueHotRed,main=names(data_train)[5])

plot of chunk SOMHeatMapPlot20

plot(som_model2, type = "property", property = som_model2$codes[,6], palette.name=coolBlueHotRed,main=names(data_train)[6])

plot of chunk SOMHeatMapPlot20

plot(som_model2, type = "property", property = som_model2$codes[,7], palette.name=coolBlueHotRed,main=names(data_train)[7])

plot of chunk SOMHeatMapPlot20

plot(som_model2, type = "property", property = som_model2$codes[,8], palette.name=coolBlueHotRed,main=names(data_train)[8])

plot of chunk SOMHeatMapPlot20

plot(som_model2, type = "property", property = som_model2$codes[,9], palette.name=coolBlueHotRed,main=names(data_train)[9])

plot of chunk SOMHeatMapPlot20

plot(som_model2, type = "property", property = som_model2$codes[,10], palette.name=coolBlueHotRed,main=names(data_train)[10])

plot of chunk SOMHeatMapPlot20

plot(som_model2, type = "property", property = som_model2$codes[,11], palette.name=coolBlueHotRed,main=names(data_train)[11])
## Error: 'breaks' are not unique

plot of chunk SOMHeatMapPlot21

plot(som_model2, type = "property", property = som_model2$codes[,12], palette.name=coolBlueHotRed,main=names(data_train)[12])

plot of chunk SOMHeatMapPlot21

plot(som_model2, type = "property", property = som_model2$codes[,13], palette.name=coolBlueHotRed,main=names(data_train)[13])

plot of chunk SOMHeatMapPlot21

plot(som_model2, type = "property", property = som_model2$codes[,14], palette.name=coolBlueHotRed,main=names(data_train)[14])

plot of chunk SOMHeatMapPlot21

plot(som_model2, type = "property", property = som_model2$codes[,15], palette.name=coolBlueHotRed,main=names(data_train)[15])

plot of chunk SOMHeatMapPlot21

plot(som_model2, type = "property", property = som_model2$codes[,16], palette.name=coolBlueHotRed,main=names(data_train)[16])

plot of chunk SOMHeatMapPlot21

plot(som_model2, type = "property", property = som_model2$codes[,17], palette.name=coolBlueHotRed,main=names(data_train)[17])

plot of chunk SOMHeatMapPlot21

plot(som_model2, type = "property", property = som_model2$codes[,18], palette.name=coolBlueHotRed,main=names(data_train)[18])

plot of chunk SOMHeatMapPlot21

plot(som_model2, type = "property", property = som_model2$codes[,19], palette.name=coolBlueHotRed,main=names(data_train)[19])

plot of chunk SOMHeatMapPlot21

plot(som_model2, type = "property", property = som_model2$codes[,20], palette.name=coolBlueHotRed,main=names(data_train)[20])

plot of chunk SOMHeatMapPlot21

plot(som_model2, type = "property", property = som_model2$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[21])

plot of chunk SOMHeatMapPlot22

plot(som_model2, type = "property", property = som_model2$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[22])

plot of chunk SOMHeatMapPlot22

plot(som_model2, type = "property", property = som_model2$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[23])

plot of chunk SOMHeatMapPlot22

plot(som_model2, type = "property", property = som_model2$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[24])

plot of chunk SOMHeatMapPlot22

plot(som_model2, type = "property", property = som_model2$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[25])

plot of chunk SOMHeatMapPlot22

plot(som_model2, type = "property", property = som_model2$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[26])

plot of chunk SOMHeatMapPlot22

plot(som_model2, type = "property", property = som_model2$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[27])

plot of chunk SOMHeatMapPlot22

plot(som_model2, type = "property", property = som_model2$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[28])

plot of chunk SOMHeatMapPlot22

## Select observations that have been clicked
data_train_clicked <- subset(data_train, data_train$clicked == 1,select=-c(clicked,notclicked,winningsom,distance))

data_train_matrix <- as.matrix(data_train_clicked)

## See where observation maps onto the SOM
trainingmappingclicked <- map(som_model2, data_train_matrix)

## To be used later to plot clicked map
trainingsom2clicked <- trainingmappingclicked

## To be used later to plot non-clicked map
data_train_notclicked <- subset(data_train, data_train$notclicked == 1,select=-c(clicked,notclicked,winningsom,distance))
trainingsom2notclicked <- map(som_model2, as.matrix(data_train_notclicked))


## Collection of unique winning SOM nodes that we will use latter 
## to select non-clicked observations. 
winingsom2 <- unique(trainingmappingclicked$unit.classif) 

## See where observation maps onto the SOM 
data_train_matrix <- as.matrix(data_train)
trainingmapping <- map(som_model2,subset(data_train_matrix, select=-c(clicked,notclicked,winningsom,distance)))

## Subselect and merge with data_train
data_train$winningsom <- trainingmapping$unit.classif
data_train$distance <- trainingmapping$distances

## ROC data structure 
data_roc2 <- subset(data_train, !(data_train$winningsom %in% winingsom2))
data_roc2 <-cbind(predictedclicked= 0, data_roc2)
data_roc2 <- subset(data_roc2, select=c(clicked,predictedclicked))

data_train <- subset(data_train, data_train$winningsom %in% winingsom2) 
data_train_matrix <- as.matrix(data_train)
  # Create the SOM Grid - you generally have to specify the size of the 
  # training grid prior to training the SOM. Hexagonal and Circular 
  # topologies are possible. 
  som_grid3 <- somgrid(xdim = 5, ydim=5, topo="hexagonal")
  som3total <- 25
  som3row <-5
  som3col <-5

  # Finally, train the SOM, options for the number of iterations,
  # the learning rates, and the neighbourhood are available
  set.seed(1334435)
  som_model3 <- som(subset(data_train_matrix, select=-c(clicked,notclicked,winningsom,distance)), 
                   grid=som_grid3, 
                   rlen=7, 
                   alpha=c(0.05,0.01), 
                   keep.data = TRUE, 
                   n.hood="circular")
  ## Training progress. This shows the variation between the weights of the nodes and 
  ## the cases presented to it. Overtime each individual nodes weight should closely 
  ## match its winning cases. This also shows how many iterations are required before 
  ## the mean distance is minimized. This can be used to determine the optimal size of
  ## of the SOM. If it is too small it may have a hard time to convergence to a minimum.
  plot(som_model3, type="changes",main="Training Progress")

plot of chunk SOMTrainingPlot3

  ## The SOM allows to visualise the count of how many cases are mapped to each 
  ## node on the map. This metric can be used as a measure of map quality – ideally the 
  ## sample distribution is relatively uniform. Large values in some map areas suggests 
  ## that a larger map would be benificial. If increasing the map does not change this
  ## then it may suggest a large cluster of cases. 
  plot(som_model3, type="count", palette.name= coolBlueHotRed,main="Counts Plot")

plot of chunk SOMNodeCountPlot3

  ## Often referred to as the “U-Matrix”, this visualisation is of the distance 
  ## between each node and its neighbours. Typically viewed with a grayscale palette, 
  ## areas of low neighbour distance indicate groups of nodes that are similar. 
  ## Areas with large distances indicate the nodes are much more dissimilar – 
  ## and indicate natural boundaries between node clusters. 
  ## The U-Matrix can be used to identify clusters within the SOM map.
  plot(som_model3, type="dist.neighbours",palette.name= coolBlueHotRed, main="Neighbour Distance Plot")

plot of chunk SOMUPlot3

  ## Shows the mean distance of objects mapped to a unit to the codebook vector of that unit. 
  ## The smaller the distances, the better the objects are represented by the codebook vectors.
  plot(som_model3, type="quality",palette.name= coolBlueHotRed, main="Winning Node Inter Distance Plot")

plot of chunk SOMInterPlot3

  ## The node weight vectors, or “codes”, are made up of normalised values of the 
  ## original variables used to generate the SOM. Each node’s weight vector is 
  ## representative / similar of the samples mapped to that node. 
  ## By visualising the weight vectors across the map, we can see patterns 
  ## in the distribution of samples and variables. 
  ## The default visualisation of the weight vectors is a “fan diagram”, 
  ## where individual fan representations of the magnitude of each variable 
  ## in the weight vector is shown for each node.
  ##plot(som_model3, type="codes",palette.name = rainbow)

The reason that at this point we try and plot the potential clusters using KMeans is that there is little point in doing further detailed analysis if the map is not a quality map. The previous plots may have provided some insight, however this plot will provide final validation.

  ## Look at elbow point which tells your the number of clusters
  mydata <- som_model3$codes 
  wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var)) 
  for (i in 2:15) {
    wss[i] <- sum(kmeans(mydata, centers=i)$withinss)
  }
  plot(wss)

plot of chunk SOMNumberofClusterPlot3

  ## Plot cluster boundaries based on results found from previous plot(wss)
  ## use hierarchical clustering to cluster the codebook vectors
  som_cluster <- cutree(hclust(dist(som_model3$codes)), 2)
  # plot these results:
  plot(som_model3, type="mapping", bgcol = pretty_palette[som_cluster], main = "Clusters") 
  add.cluster.boundaries(som_model3, som_cluster)

plot of chunk SOMClustersPlot32

  ## Plot cluster boundaries based on results found from previous plot(wss)
  ## use hierarchical clustering to cluster the codebook vectors
  som_cluster <- cutree(hclust(dist(som_model3$codes)), 3)
  # plot these results:
  plot(som_model3, type="mapping", bgcol = pretty_palette[som_cluster], main = "Clusters") 
  add.cluster.boundaries(som_model3, som_cluster)

plot of chunk SOMClustersPlot33

  ## Plot cluster boundaries based on results found from previous plot(wss)
  ## use hierarchical clustering to cluster the codebook vectors
  som_cluster <- cutree(hclust(dist(som_model3$codes)), 4)
  # plot these results:
  plot(som_model3, type="mapping", bgcol = pretty_palette[som_cluster], main = "Clusters") 
  add.cluster.boundaries(som_model3, som_cluster)

plot of chunk SOMClustersPlot34

  ## Heatmaps are perhaps the most important visualisation possible for Self-Organising Maps.
  ## The use of a weight space view as in that tries to view all dimensions on the 
  ## one diagram is unsuitable for a high-dimensional (>7 variable) SOM. 
  ## A SOM heatmap allows the visualisation of the distribution of a single variable 
  ## across the map. Typically, a SOM investigative process involves the creation of 
  ## multiple heatmaps, and then the comparison of these heatmaps to identify 
  ## interesting areas on the map. 
  ## It is important to remember that the individual sample positions do not move 
  ## from one visualisation to another, the map is simply coloured by different variables.
  ## The default Kohonen heatmap is created by using the type “heatmap”, and then providing
  ## one of the variables from the set of node weights. 
plot(som_model3, type = "property", property = som_model3$codes[,1], palette.name=coolBlueHotRed,main=names(data_train)[1])

plot of chunk SOMHeatMapPlot30

plot(som_model3, type = "property", property = som_model3$codes[,2], palette.name=coolBlueHotRed,main=names(data_train)[2])
## Error: 'breaks' are not unique

plot of chunk SOMHeatMapPlot30

 plot(som_model3, type = "property", property = som_model3$codes[,3], palette.name=coolBlueHotRed,main=names(data_train)[3])

plot of chunk SOMHeatMapPlot30

plot(som_model3, type = "property", property = som_model3$codes[,4], palette.name=coolBlueHotRed,main=names(data_train)[4])

plot of chunk SOMHeatMapPlot30

plot(som_model3, type = "property", property = som_model3$codes[,5], palette.name=coolBlueHotRed,main=names(data_train)[5])

plot of chunk SOMHeatMapPlot30

plot(som_model3, type = "property", property = som_model3$codes[,6], palette.name=coolBlueHotRed,main=names(data_train)[6])

plot of chunk SOMHeatMapPlot30

plot(som_model3, type = "property", property = som_model3$codes[,7], palette.name=coolBlueHotRed,main=names(data_train)[7])

plot of chunk SOMHeatMapPlot30

plot(som_model3, type = "property", property = som_model3$codes[,8], palette.name=coolBlueHotRed,main=names(data_train)[8])

plot of chunk SOMHeatMapPlot30

plot(som_model3, type = "property", property = som_model3$codes[,9], palette.name=coolBlueHotRed,main=names(data_train)[9])

plot of chunk SOMHeatMapPlot30

plot(som_model3, type = "property", property = som_model3$codes[,10], palette.name=coolBlueHotRed,main=names(data_train)[10])

plot of chunk SOMHeatMapPlot30

plot(som_model3, type = "property", property = som_model3$codes[,11], palette.name=coolBlueHotRed,main=names(data_train)[11])
## Error: 'breaks' are not unique

plot of chunk SOMHeatMapPlot31

plot(som_model3, type = "property", property = som_model3$codes[,12], palette.name=coolBlueHotRed,main=names(data_train)[12])

plot of chunk SOMHeatMapPlot31

plot(som_model3, type = "property", property = som_model3$codes[,13], palette.name=coolBlueHotRed,main=names(data_train)[13])

plot of chunk SOMHeatMapPlot31

plot(som_model3, type = "property", property = som_model3$codes[,14], palette.name=coolBlueHotRed,main=names(data_train)[14])
## Error: 'breaks' are not unique

plot of chunk SOMHeatMapPlot31

plot(som_model3, type = "property", property = som_model3$codes[,15], palette.name=coolBlueHotRed,main=names(data_train)[15])

plot of chunk SOMHeatMapPlot31

plot(som_model3, type = "property", property = som_model3$codes[,16], palette.name=coolBlueHotRed,main=names(data_train)[16])

plot of chunk SOMHeatMapPlot31

 plot(som_model3, type = "property", property = som_model3$codes[,17], palette.name=coolBlueHotRed,main=names(data_train)[17])

plot of chunk SOMHeatMapPlot31

plot(som_model3, type = "property", property = som_model3$codes[,18], palette.name=coolBlueHotRed,main=names(data_train)[18])

plot of chunk SOMHeatMapPlot31

  plot(som_model3, type = "property", property = som_model3$codes[,19], palette.name=coolBlueHotRed,main=names(data_train)[19])

plot of chunk SOMHeatMapPlot31

plot(som_model3, type = "property", property = som_model3$codes[,20], palette.name=coolBlueHotRed,main=names(data_train)[20])

plot of chunk SOMHeatMapPlot31

plot(som_model3, type = "property", property = som_model3$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[21])

plot of chunk SOMHeatMapPlot32

plot(som_model3, type = "property", property = som_model3$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[22])

plot of chunk SOMHeatMapPlot32

plot(som_model3, type = "property", property = som_model3$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[23])

plot of chunk SOMHeatMapPlot32

plot(som_model3, type = "property", property = som_model3$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[24])

plot of chunk SOMHeatMapPlot32

plot(som_model3, type = "property", property = som_model3$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[25])

plot of chunk SOMHeatMapPlot32

plot(som_model3, type = "property", property = som_model3$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[26])

plot of chunk SOMHeatMapPlot32

plot(som_model3, type = "property", property = som_model3$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[27])

plot of chunk SOMHeatMapPlot32

plot(som_model3, type = "property", property = som_model3$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[28])

plot of chunk SOMHeatMapPlot32

## Select observations that have been clicked
data_train_clicked <- subset(data_train, data_train$clicked == 1,select=-c(clicked,notclicked,winningsom,distance)) 
data_train_matrix <- as.matrix(data_train_clicked)

## See where observation maps onto the SOM
trainingmappingclicked <- map(som_model3, data_train_matrix)

## To be used later to plot clicked map
trainingsom3clicked <- trainingmappingclicked

## Collection of unique winning SOM nodes that we will use latter 
## to select non-clicked observations. 
winingsom3 <- unique(trainingmappingclicked$unit.classif) 

data_roc3 <- subset(data_train, !(data_train$winningsom %in% winingsom3))
data_roc3 <- cbind(predictedclicked= 0, data_roc3)
data_roc3 <- subset(data_roc3, select=c(clicked,predictedclicked))

## Basically for now what ever is left we assign the SOM node as clicked.
data_roc4 <- subset(data_train,(data_train$winningsom %in% winingsom3))
data_roc4 <-cbind(predictedclicked= 1, data_roc4)
data_roc4 <- subset(data_roc4, select=c(clicked,predictedclicked))

## Basic frequency table and barplot sorted descending
sorted_table <-table(trainingmappingclicked$unit.classif)
barplot(sorted_table [order(sorted_table,decreasing=TRUE)])

plot of chunk final SOM Analysis

sorted_table [order(sorted_table,decreasing=TRUE)]
## 
##    5   25   20   10   24   15    4   14    2   21    1   22   23    6    3 
## 2460 1315 1137  486  459  410  300  287  262  253  202  177  163  162  148 
##    7   16   13    8   17   19   18    9   12   11 
##  126  124   90   89   83   83   78   69   51   23
## Select observations that have been not clicked
data_train_notclicked <- subset(data_train, data_train$notclicked == 1, select=-c(clicked,notclicked,winningsom,distance)) 
data_train_matrix<- as.matrix(data_train_notclicked)

## See where observation maps onto the SOM 
## Also re-use to plot non-clicked map 
trainingmappingnotclicked <- map(som_model3,data_train_matrix)

## Basic frequency table and barplot sorted descending
sorted_table <-table(trainingmappingnotclicked$unit.classif)
barplot(sorted_table [order(sorted_table,decreasing=TRUE)])

plot of chunk final SOM Analysis

sorted_table [order(sorted_table,decreasing=TRUE)]
## 
##    21     5    23     2     1    22    16    20     6     7    24    25 
## 29269 27819 17875 17539 17449 17301 14759 14162 13164 12908 12344 11627 
##    15    18    17     4    14    10     8    13     3    19    12    11 
## 10141  9946  8798  8764  6590  5688  5674  5450  5075  4974  4717  4529 
##     9 
##  4315
  ## Validation
  advertisingtable <- paste("select * from search_instance LIMIT 1000000 OFFSET 5000000")

  rs <- dbSendQuery(con, advertisingtable)

  ## fetch all elements from the result set
  result_set <- fetch(rs,n=-1)

  data_validate <- result_set[,c("position","adid_low","adid_medium","adid_high","adid_age_low","adid_age_medium","adid_age_high","adid_gender_low","adid_gender_medium","adid_gender_high","advertiserid_low","advertiserid_medium","advertiserid_high","advertiserid_age_low","advertiserid_age_medium","advertiserid_age_high","advertiserid_gender_low","advertiserid_gender_medium","advertiserid_gender_high", "male","female","unknowngender","age0to12","age12to18","age18to24","age24to30","age30to40","age40plus","clicked", "notclicked")]

## Select observations that have been clicked
data_validate_clicked <- subset(data_validate, data_validate$clicked == 1,select=-c(clicked,notclicked))

data_validate_matrix <- as.matrix(data_validate_clicked)

## See where observation maps onto the SOM 1, 2 and 3
validatemappingclickedsom1 <- map(som_model, data_validate_matrix)
validatemappingclickedsom2 <- map(som_model2, data_validate_matrix)
validatemappingclickedsom3 <- map(som_model3, data_validate_matrix)

## To be used later to plot non-clicked map SOM 1, 2, and 3
data_validate_notclicked <- subset(data_validate, data_validate$notclicked == 1,select=-c(clicked,notclicked))
validatemappingnotclickedsom1 <- map(som_model, as.matrix(data_validate_notclicked))
validatemappingnotclickedsom2 <- map(som_model2, as.matrix(data_validate_notclicked))
validatemappingnotclickedsom3 <- map(som_model3, as.matrix(data_validate_notclicked))

Visual Part I

Show where clicked and not-clicked aggregate on the SOM for the second and third SOM

library(fields) #to use designer.colors
## Loading required package: spam
## Spam version 0.41-0 (2014-02-26) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction 
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
## 
## Attaching package: 'spam'
## 
## The following objects are masked from 'package:base':
## 
##     backsolve, forwardsolve
## 
## Loading required package: maps
## 
## Attaching package: 'maps'
## 
## The following object is masked from 'package:kohonen':
## 
##     map
##-----------------------------
somvisual <- runif(400, 0, 0)

#####
#trainingsom2clicked 
#####

##Original testsomvisualdataframe <- as.data.frame(table(trainingmappingclicked$unit.classif),stringsAsFactors=FALSE)
somvisualdataframe <- as.data.frame(table(trainingsom2clicked$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
  somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}

somvisualmatrix <- as.matrix(array(somvisual,dim=c(20,20)))

##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column), 
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)

##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]

##To make room for the legend
par(mar = c(0.4, 2, 2, 7))

##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns), 
     ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 2 Clicked Observations",asp=1)

##Create the color palette 
##designer.colors to interpolate 50 colors between 
##the maxmimum number of allowed values in Brewer 
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))

##Make a vector with length(ColRamp) number of bins between the minimum and 
##maximum value of x. 
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
    if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))] 

##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
  for (column in 0:(SOM_Columns - 1)) 
     Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
  offset <- ifelse(offset, 0, 0.5)
}

##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))

plot of chunk visual

##-----------------------------


#####
#trainingsom2notclicked 
#####
somvisual <- runif(400, 0, 0)

##Original testsomvisualdataframe <- as.data.frame(table(trainingmappingclicked$unit.classif),stringsAsFactors=FALSE)
somvisualdataframe <- as.data.frame(table(trainingsom2notclicked$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
  somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}

somvisualmatrix <- as.matrix(array(somvisual,dim=c(20,20)))

##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column), 
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)

##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]

##To make room for the legend
par(mar = c(0.4, 2, 2, 7))

##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns), 
     ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 2 Not-Clicked Observations",asp=1)

##Create the color palette 
##designer.colors to interpolate 50 colors between 
##the maxmimum number of allowed values in Brewer 
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))

##Make a vector with length(ColRamp) number of bins between the minimum and 
##maximum value of x. 
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
    if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))] 

##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
  for (column in 0:(SOM_Columns - 1)) 
     Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
  offset <- ifelse(offset, 0, 0.5)
}

##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))

plot of chunk visual

##-----------------------------


#####
#trainingsom3clicked
#####
somvisual <- runif(25, 0, 0)

##Original testsomvisualdataframe <- as.data.frame(table(trainingmappingclicked$unit.classif),stringsAsFactors=FALSE)
somvisualdataframe <- as.data.frame(table(trainingsom3clicked$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
  somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}

somvisualmatrix <- as.matrix(array(somvisual,dim=c(5,5)))

##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column), 
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)

##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]

##To make room for the legend
par(mar = c(0.4, 2, 2, 7))

##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns), 
     ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 3 Clicked Observations",asp=1)

##Create the color palette 
##designer.colors to interpolate 50 colors between 
##the maxmimum number of allowed values in Brewer 
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))

##Make a vector with length(ColRamp) number of bins between the minimum and 
##maximum value of x. 
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
    if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))] 

##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
  for (column in 0:(SOM_Columns - 1)) 
     Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
  offset <- ifelse(offset, 0, 0.5)
}

##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))

plot of chunk visual

##-----------------------------



#####
#trainingmappingnotclicked
#####
somvisual <- runif(25, 0, 0)

##Original testsomvisualdataframe <- as.data.frame(table(trainingmappingclicked$unit.classif),stringsAsFactors=FALSE)
somvisualdataframe <- as.data.frame(table(trainingmappingnotclicked$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
  somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}

somvisualmatrix <- as.matrix(array(somvisual,dim=c(5,5)))

##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column), 
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)

##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]

##To make room for the legend
par(mar = c(0.4, 2, 2, 7))

##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns), 
     ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 3 Non-Clicked Observations",asp=1)

##Create the color palette 
##designer.colors to interpolate 50 colors between 
##the maxmimum number of allowed values in Brewer 
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))

##Make a vector with length(ColRamp) number of bins between the minimum and 
##maximum value of x. 
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
    if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))] 

##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
  for (column in 0:(SOM_Columns - 1)) 
     Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
  offset <- ifelse(offset, 0, 0.5)
}

##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))

plot of chunk visual

##-----------------------------


#####
#validatemappingclicked SOM1
#####
somvisual <- runif(49, 0, 0)
somvisualdataframe <- as.data.frame(table(validatemappingclickedsom1$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
  somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}

somvisualmatrix <- as.matrix(array(somvisual,dim=c(7,7)))

##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column), 
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)

##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]

##To make room for the legend
par(mar = c(0.4, 2, 2, 7))

##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns), 
     ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 1 Clicked Validate Observations",asp=1)

##Create the color palette 
##designer.colors to interpolate 50 colors between 
##the maxmimum number of allowed values in Brewer 
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))

##Make a vector with length(ColRamp) number of bins between the minimum and 
##maximum value of x. 
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
    if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))] 

##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
  for (column in 0:(SOM_Columns - 1)) 
     Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
  offset <- ifelse(offset, 0, 0.5)
}

##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))

plot of chunk visual

##-----------------------------



#####
#validatemappingnotclicked SOM 1
#####
somvisual <- runif(49, 0, 0)

somvisualdataframe <- as.data.frame(table(validatemappingnotclickedsom1$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
  somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}

somvisualmatrix <- as.matrix(array(somvisual,dim=c(7,7)))

##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column), 
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)

##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]

##To make room for the legend
par(mar = c(0.4, 2, 2, 7))

##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns), 
     ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 1 Non-Clicked Validate Observations",asp=1)

##Create the color palette 
##designer.colors to interpolate 50 colors between 
##the maxmimum number of allowed values in Brewer 
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))

##Make a vector with length(ColRamp) number of bins between the minimum and 
##maximum value of x. 
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
    if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))] 

##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
  for (column in 0:(SOM_Columns - 1)) 
     Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
  offset <- ifelse(offset, 0, 0.5)
}

##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))

plot of chunk visual

##-----------------------------



#####
#validatemappingclicked SOM 2
#####
somvisual <- runif(400, 0, 0)

somvisualdataframe <- as.data.frame(table(validatemappingclickedsom2$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
  somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}

somvisualmatrix <- as.matrix(array(somvisual,dim=c(20,20)))

##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column), 
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)

##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]

##To make room for the legend
par(mar = c(0.4, 2, 2, 7))

##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns), 
     ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 2 Clicked Validate Observations",asp=1)

##Create the color palette 
##designer.colors to interpolate 50 colors between 
##the maxmimum number of allowed values in Brewer 
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))

##Make a vector with length(ColRamp) number of bins between the minimum and 
##maximum value of x. 
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
    if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))] 

##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
  for (column in 0:(SOM_Columns - 1)) 
     Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
  offset <- ifelse(offset, 0, 0.5)
}

##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))

plot of chunk visual

##-----------------------------



#####
#validatemappingnotclicked SOM 2
#####
somvisual <- runif(400, 0, 0)

somvisualdataframe <- as.data.frame(table(validatemappingnotclickedsom2$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
  somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}

somvisualmatrix <- as.matrix(array(somvisual,dim=c(20,20)))

##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column), 
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)

##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]

##To make room for the legend
par(mar = c(0.4, 2, 2, 7))

##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns), 
     ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 2 Non-Clicked Validate Observations",asp=1)

##Create the color palette 
##designer.colors to interpolate 50 colors between 
##the maxmimum number of allowed values in Brewer 
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))

##Make a vector with length(ColRamp) number of bins between the minimum and 
##maximum value of x. 
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
    if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))] 

##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
  for (column in 0:(SOM_Columns - 1)) 
     Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
  offset <- ifelse(offset, 0, 0.5)
}

##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))

plot of chunk visual

##-----------------------------




#####
#validatemappingclicked SOM 3
#####
somvisual <- runif(25, 0, 0)

somvisualdataframe <- as.data.frame(table(validatemappingclickedsom3$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
  somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}

somvisualmatrix <- as.matrix(array(somvisual,dim=c(5,5)))

##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column), 
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)

##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]

##To make room for the legend
par(mar = c(0.4, 2, 2, 7))

##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns), 
     ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 3 Clicked Validate Observations",asp=1)

##Create the color palette 
##designer.colors to interpolate 50 colors between 
##the maxmimum number of allowed values in Brewer 
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))

##Make a vector with length(ColRamp) number of bins between the minimum and 
##maximum value of x. 
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
    if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))] 

##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
  for (column in 0:(SOM_Columns - 1)) 
     Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
  offset <- ifelse(offset, 0, 0.5)
}

##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))

plot of chunk visual

##-----------------------------



#####
#validatemappingnotclicked SOM 3
#####
somvisual <- runif(25, 0, 0)

somvisualdataframe <- as.data.frame(table(validatemappingnotclickedsom3$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
  somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}

somvisualmatrix <- as.matrix(array(somvisual,dim=c(5,5)))

##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column), 
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)

##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]

##To make room for the legend
par(mar = c(0.4, 2, 2, 7))

##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns), 
     ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 3 Non-Clicked Validate Observations",asp=1)

##Create the color palette 
##designer.colors to interpolate 50 colors between 
##the maxmimum number of allowed values in Brewer 
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))

##Make a vector with length(ColRamp) number of bins between the minimum and 
##maximum value of x. 
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
    if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))] 

##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
  for (column in 0:(SOM_Columns - 1)) 
     Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
  offset <- ifelse(offset, 0, 0.5)
}

##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))

plot of chunk visual

##-----------------------------
roc_merged <-rbind(data_roc1,data_roc2)
roc_merged <-rbind(roc_merged,data_roc3)
roc_merged <-rbind(roc_merged,data_roc4)

##For a Particle Swarm Optimization we will calculate ourselves a single value for optimization function
prediction <-roc_merged$predictedclicked
observation <- roc_merged$clicked  
pred<-prediction(prediction,observation)
plot(performance(pred,'tpr','fpr'),main='SOM Training ROC')

plot of chunk ROC Metrics

Conclusion

Overall the SOM's performed well. The second SOM derived from the winning clicked nodes from the first SOM was very interesting. We were very pleased with the visualization that showed that low clicked observation do naturally cluster together. The challenge is that within those clusters there are still a fairly large number of non-clicked observations. We will investigate the use of Social Network Analytics and Particle Sworm Optimization on the second SOM. This will be the focus of the another R code.

The third SOM did not add any value and on the contrary its performance was worse.

The Neural Network stacked against the first SOM did not perform well given the very low click rate and the performance measure which provided very little incentive to differenciate between clicked and non-clicked observations. This was to expected on a very challenging anomaly detection problem.

We will not pursue this line of investigation any further. We will focus on investigating the use of Random Forest algorithm and the SOM-Graph Analytics using Gephi.

detach(package:fields, unload=TRUE)
detach(package:kohonen, unload=TRUE)
library(kohonen)
## 
## Attaching package: 'kohonen'
## 
## The following object is masked from 'package:maps':
## 
##     map
  ## Closes the connection
  dbDisconnect(con)

  ## Frees all the resources on the driver
  dbUnloadDriver(drv)