The purpose of this R&D project is to develop an innovative analytics visual model that conveys insight to the end user in the tackling of large scale data problems focused on Internet based applications. This R&D project began a few years ago based on years of experience in developing custom machine learning algorithms to solve problems in the area of border security and law enforcement as well as in the financial and scientific domains.
Much research has gone into several different lines of investigation namely review of different machine learning algorithms, large scale data problems with a focus on data sparsity and rare/anomaly events, big data infrastructure, and recent developments in visual analytics.
What follows is the application of this R&D to a production grade Internet dataset. Extensive preliminary analysis and analytics has gone into undertstanding the problem space and feature engineering development.
Search advertising has been one of the major revenue sources of the Internet industry for years. A key technology behind search advertising is to predict the click-through rate (CTR) of ads, as the economic model behind search advertising requires CTR values to rank ads and to price click.
Training instances derived from session logs from the Tencent proprietary search engine, sosa.com was provided to us. The original purpose of this data set was to accurately predict the CTR of ads in the testing instances. However our objective is to use this data set for our R&D project which has the goal of developing innovative visual analytics model that end users can leverage to derive business insight. Our target domain are in the area of Internet based services including Internet of Things.
The training data set consists of 150 Million queries records for 23 Million search users, for a total of 10 Gb of data. Multiple queries with the same properties and their outputs were rolled up into one record in the training data. Expanded to individual queries, this produced 235 million training records.
Each query and its output were described by 10 variables as follows:
The IDs above were all hash-mapped to integers. Each of the queryID, keywordID, titleID, and descriptionID was associated with a set of keywords, which were also hash- mapped to integers and provided in four extra data files, to provide detailed description of the query and advertisement, respectively.
If a user could not be identified, 0 was assigned to the userID (30% of data). In addition, The gender (Male=1, Female=2, and Unknown=0) and age ((0,12]=1, (12,18]=2, (18, 24]=3, (24,30]=4, (30,40]=5, and (40+,]=6) of each userID were provided in another data file. Two variables described the response of a user to an advertisement: the number of clicks and the number of impressions. For each user, query and the resulted ad, the number of impressions indicated the times that the ad was displayed to the user. The number of clicks was the times that the user clicked the ad. Consequently, the CTR was the ratio between the numbers of clicks and impressions. The average CTR of the training data was 0.0387.
The data set was reduced as the purpose of the project was not the development of an optimal analytical model but the development of an innovative visual analytics data product with the requirement to run hundreds of simulations quickly.
The data set reduction algorithm was based on least user usage of the search based on the number of search queries issued by a user. The training data set was randomly split between training 70%, validation 15% and test set 15%.
15 derived feature tables were created.
Each Derived Feature table has the following general table format:
Over the last 18 months we have evaluated a number of machine learning algorithms based on the type of problem space (clustering, classification, anomaly detection, prediction) and analytics approaches.
Our focus is on ensemble learning which combines several algorithms based on Self Organizing Map (SOM), Genetic Algorithm, Particle Swarm, Neural Network, Random Forest, KMeans, and Social Network Graph Based Analytics (SNA).
All dataset reduction and transformations are available in a separate extensive codebook file. The PostgreSQL database was used to store all datasets. Almost all data transformations were conducted in the database and the R programming language was used for the analytics and visualization. Information is also exported to Gephi for graphing visualization.
library(RPostgreSQL)
## Loading required package: DBI
library(kohonen)
## Loading required package: class
## Loading required package: MASS
library(RColorBrewer)
library(scales)
library(rgexf)
## Loading required package: XML
## Loading required package: Rook
## Loading required package: tools
##
## Attaching package: 'tools'
##
## The following object is masked from 'package:XML':
##
## toHTML
##
## Loading required package: brew
## Loading required package: igraph
#Function to create the polygon for each hexagon of custom SOM
Hexagon <- function (x, y, unitcell = 1, col = col) {
polygon(c(x, x, x + unitcell/2, x + unitcell, x + unitcell,
x + unitcell/2), c(y + unitcell * 0.125, y + unitcell *
0.875, y + unitcell * 1.125, y + unitcell * 0.875,
y + unitcell * 0.125, y - unitcell * 0.125),
col = col, border=NA)
}
coolBlueHotRed <- function(n, alpha = 1) {
rainbow(n, end=4/6, alpha=alpha)[n:1]
}
pretty_palette <- c("#1f77b4","#ff7f0e","#2ca02c",
"#d62728","#9467bd","#8c564b","#e377c2")
Load the database driver and connect to database with credentials. R code is set to echo=FALSE.
## Training
advertisingtable <- paste("select * from search_instance LIMIT 2500000 OFFSET 0")
## Validation
##advertisingtable <- paste("select * from search_instance LIMIT 1500000 OFFSET 5500000")
## Testing
##advertisingtable <- paste("select * from search_instance LIMIT 1500000 OFFSET 7000000")
rs <- dbSendQuery(con, advertisingtable)
## fetch all elements from the result set
result_set <- fetch(rs,n=-1)
## We started we are large number of input features and have reduced based on our analysis.
data_train <- result_set[,c("position","adid_low","adid_medium","adid_high","adid_age_low","adid_age_medium","adid_age_high","adid_gender_low","adid_gender_medium","adid_gender_high","advertiserid_low","advertiserid_medium","advertiserid_high","advertiserid_age_low","advertiserid_age_medium","advertiserid_age_high","advertiserid_gender_low","advertiserid_gender_medium","advertiserid_gender_high", "male","female","unknowngender","age0to12","age12to18","age18to24","age24to30","age30to40","age40plus","clicked", "notclicked")]
data_train_matrix <- as.matrix(data_train)
Several hundred simulations were conducted using differenct SOM parameters including custom parameters.
# Create the SOM Grid - you generally have to specify the size of the
# training grid prior to training the SOM.
som_grid <- somgrid(xdim = 7, ydim=7, topo="hexagonal")
som1total <- 49
som1row <-7
som1col <-7
# Finally, train the SOM, options for the number of iterations,
# the learning rates, and the neighbourhood are available
set.seed(21121962)
som_model <- som(subset(data_train_matrix, select=-c(clicked,notclicked)),
grid=som_grid,
rlen=7,
alpha=c(0.05,0.01),
keep.data = TRUE,
n.hood="circular")
## Training progress. This shows the variation between the weights of the nodes and
## the cases presented to it. Overtime each individual nodes weight should closely
## match its winning cases. This also shows how many iterations are required before
## the mean distance is minimized. This can be used to determine the optimal size of
## of the SOM. If it is too small it may have a hard time to convergence to a minimum.
plot(som_model, type="changes",main="Training Progress")
## The SOM allows to visualise the count of how many cases are mapped to each
## node on the map. This metric can be used as a measure of map quality – ideally the
## sample distribution is relatively uniform. Large values in some map areas suggests
## that a larger map would be benificial. If increasing the map does not change this
## then it may suggest a large cluster of cases.
plot(som_model, type="count", palette.name= coolBlueHotRed,main="Counts Plot")
## Often referred to as the “U-Matrix”, this visualisation is of the distance
## between each node and its neighbours. Typically viewed with a grayscale palette,
## areas of low neighbour distance indicate groups of nodes that are similar.
## Areas with large distances indicate the nodes are much more dissimilar –
## and indicate natural boundaries between node clusters.
## The U-Matrix can be used to identify clusters within the SOM map.
plot(som_model, type="dist.neighbours",palette.name= coolBlueHotRed, main="Neighbour Distance Plot")
## Shows the mean distance of objects mapped to a unit to the codebook vector of that unit.
## The smaller the distances, the better the objects are represented by the codebook vectors.
plot(som_model, type="quality",palette.name= coolBlueHotRed, main="Winning Node Inter Distance Plot")
## The node weight vectors, or “codes”, are made up of normalised values of the
## original variables used to generate the SOM. Each node’s weight vector is
## representative / similar of the samples mapped to that node.
## By visualising the weight vectors across the map, we can see patterns
## in the distribution of samples and variables.
## The default visualisation of the weight vectors is a “fan diagram”,
## where individual fan representations of the magnitude of each variable
## in the weight vector is shown for each node.
##plot(som_model, type="codes",palette.name = rainbow)
The reason that at this point we try and plot the potential clusters using KMeans is that there is little point in doing further detailed analysis if the map is not a quality map. The previous plots may have provided some insight, however this plot will provide final validation.
## Look at elbow point which tells your the number of clusters
mydata <- som_model$codes
wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var))
for (i in 2:15) {
wss[i] <- sum(kmeans(mydata, centers=i)$withinss)
}
plot(wss)
## Plot cluster boundaries based on results found from previous plot(wss)
## use hierarchical clustering to cluster the codebook vectors
som_cluster <- cutree(hclust(dist(som_model$codes)), 2)
# plot these results:
plot(som_model, type="mapping", bgcol = pretty_palette[som_cluster], main = " 2 Clusters")
add.cluster.boundaries(som_model, som_cluster)
## Plot cluster boundaries based on results found from previous plot(wss)
## use hierarchical clustering to cluster the codebook vectors
som_cluster <- cutree(hclust(dist(som_model$codes)), 3)
# plot these results:
plot(som_model, type="mapping", bgcol = pretty_palette[som_cluster], main = "3 Clusters")
add.cluster.boundaries(som_model, som_cluster)
## Plot cluster boundaries based on results found from previous plot(wss)
## use hierarchical clustering to cluster the codebook vectors
som_cluster <- cutree(hclust(dist(som_model$codes)), 4)
# plot these results:
plot(som_model, type="mapping", bgcol = pretty_palette[som_cluster], main = "4 Clusters")
add.cluster.boundaries(som_model, som_cluster)
We will use several innovative features that we recently developed. SOM Zooming works as follows: -SOM is trained on all input features except clicked/notclicked -Various SOM plots are analyzed to ensure we have a quality map -SOM map is shown with highlighted clicked areas -User selects area of interest and all cases associated with this area are selected as well as only a subset of variables that contributed most to those cases. -A new SOM map is developed on this subset. -This process continues until the end user is satisfied that the clicked cases are easily discernable and predictable also resulting in selecting the key variables. -Subsequent to this the validation and testing set are submitted to all maps and each data point is plotted on each map.
## Mapping of training dataset of clicked observations onto the SOM. We want to see where on the SOM the clicked observations are with the idea that we will subset those training cases around those nodes as well as select only the high contributing variables to train a second.
## Select observations that have been clicked
data_train_clicked <- subset(data_train, data_train$clicked == 1,select=-c(clicked,notclicked))
data_train_matrix <- as.matrix(data_train_clicked)
## See where observation maps onto the SOM
trainingmappingclicked <- map(som_model, data_train_matrix)
## Collection of unique winning SOM nodes that we will use latter
## to select non-clicked observations.
winingsom1 <- unique(trainingmappingclicked$unit.classif)
## See where observation maps onto the SOM
data_train_matrix <- as.matrix(data_train)
trainingmapping <- map(som_model,subset(data_train_matrix, select=-c(clicked,notclicked)))
## Add new column of winning SOM, distance from winning node and then subselect and merge with data_train
data_train$winningsom <- trainingmapping$unit.classif
data_train$distance <- trainingmapping$distances
data_train <- subset(data_train, data_train$winningsom %in% winingsom1)
data_train_matrix <- as.matrix(data_train)
# Create the SOM Grid - you generally have to specify the size of the
# training grid prior to training the SOM. Hexagonal and Circular
# topologies are possible.
som_grid2 <- somgrid(xdim = 14, ydim=14, topo="hexagonal")
som2total <- 196
som2row <-14
som2col <-14
# Finally, train the SOM, options for the number of iterations,
# the learning rates, and the neighbourhood are available
set.seed(16564646)
som_model2 <- som(subset(data_train_matrix, select=-c(clicked,notclicked,winningsom,distance)),
grid=som_grid2,
rlen=7,
alpha=c(0.05,0.01),
keep.data = TRUE,
n.hood="circular")
## Training progress. This shows the variation between the weights of the nodes and
## the cases presented to it. Overtime each individual nodes weight should closely
## match its winning cases. This also shows how many iterations are required before
## the mean distance is minimized. This can be used to determine the optimal size of
## of the SOM. If it is too small it may have a hard time to convergence to a minimum.
plot(som_model2, type="changes",main="Training Progress")
## The SOM allows to visualise the count of how many cases are mapped to each
## node on the map. This metric can be used as a measure of map quality – ideally the
## sample distribution is relatively uniform. Large values in some map areas suggests
## that a larger map would be benificial. If increasing the map does not change this
## then it may suggest a large cluster of cases.
plot(som_model2, type="count", palette.name= coolBlueHotRed,main="Counts Plot")
## Often referred to as the “U-Matrix”, this visualisation is of the distance
## between each node and its neighbours. Typically viewed with a grayscale palette,
## areas of low neighbour distance indicate groups of nodes that are similar.
## Areas with large distances indicate the nodes are much more dissimilar –
## and indicate natural boundaries between node clusters.
## The U-Matrix can be used to identify clusters within the SOM map.
plot(som_model2, type="dist.neighbours",palette.name= coolBlueHotRed, main="Neighbour Distance Plot")
## Shows the mean distance of objects mapped to a unit to the codebook vector of that unit.
## The smaller the distances, the better the objects are represented by the codebook vectors.
plot(som_model2, type="quality",palette.name= coolBlueHotRed, main="Winning Node Inter Distance Plot")
## The node weight vectors, or “codes”, are made up of normalised values of the
## original variables used to generate the SOM. Each node’s weight vector is
## representative / similar of the samples mapped to that node.
## By visualising the weight vectors across the map, we can see patterns
## in the distribution of samples and variables.
## The default visualisation of the weight vectors is a “fan diagram”,
## where individual fan representations of the magnitude of each variable
## in the weight vector is shown for each node.
##plot(som_model2, type="codes",palette.name = rainbow)
The reason that at this point we try and plot the potential clusters using KMeans is that there is little point in doing further detailed analysis if the map is not a quality map. The previous plots may have provided some insight, however this plot will provide final validation.
## Look at elbow point which tells your the number of clusters
mydata <- som_model2$codes
wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var))
for (i in 2:15) {
wss[i] <- sum(kmeans(mydata, centers=i)$withinss)
}
plot(wss)
## Plot cluster boundaries based on results found from previous plot(wss)
## use hierarchical clustering to cluster the codebook vectors
som_cluster <- cutree(hclust(dist(som_model2$codes)), 2)
# plot these results:
plot(som_model2, type="mapping", bgcol = pretty_palette[som_cluster], main = "2 Clusters")
add.cluster.boundaries(som_model2, som_cluster)
## Plot cluster boundaries based on results found from previous plot(wss)
## use hierarchical clustering to cluster the codebook vectors
som_cluster <- cutree(hclust(dist(som_model2$codes)), 3)
# plot these results:
plot(som_model2, type="mapping", bgcol = pretty_palette[som_cluster], main = "3 Clusters")
add.cluster.boundaries(som_model2, som_cluster)
## Plot cluster boundaries based on results found from previous plot(wss)
## use hierarchical clustering to cluster the codebook vectors
som_cluster <- cutree(hclust(dist(som_model2$codes)), 4)
# plot these results:
plot(som_model2, type="mapping", bgcol = pretty_palette[som_cluster], main = "4 Clusters")
add.cluster.boundaries(som_model2, som_cluster)
## Heatmaps are perhaps the most important visualisation possible for Self-Organising Maps.
## The use of a weight space view as in that tries to view all dimensions on the
## one diagram is unsuitable for a high-dimensional (>7 variable) SOM.
## A SOM heatmap allows the visualisation of the distribution of a single variable
## across the map. Typically, a SOM investigative process involves the creation of
## multiple heatmaps, and then the comparison of these heatmaps to identify
## interesting areas on the map.
## It is important to remember that the individual sample positions do not move
## from one visualisation to another, the map is simply coloured by different variables.
## The default Kohonen heatmap is created by using the type “heatmap”, and then providing
## one of the variables from the set of node weights.
plot(som_model2, type = "property", property = som_model2$codes[,1], palette.name=coolBlueHotRed,main=names(data_train)[1])
plot(som_model2, type = "property", property = som_model2$codes[,2], palette.name=coolBlueHotRed,main=names(data_train)[2])
## Error: 'breaks' are not unique
plot(som_model2, type = "property", property = som_model2$codes[,3], palette.name=coolBlueHotRed,main=names(data_train)[3])
plot(som_model2, type = "property", property = som_model2$codes[,4], palette.name=coolBlueHotRed,main=names(data_train)[4])
plot(som_model2, type = "property", property = som_model2$codes[,5], palette.name=coolBlueHotRed,main=names(data_train)[5])
plot(som_model2, type = "property", property = som_model2$codes[,6], palette.name=coolBlueHotRed,main=names(data_train)[6])
plot(som_model2, type = "property", property = som_model2$codes[,7], palette.name=coolBlueHotRed,main=names(data_train)[7])
plot(som_model2, type = "property", property = som_model2$codes[,8], palette.name=coolBlueHotRed,main=names(data_train)[8])
plot(som_model2, type = "property", property = som_model2$codes[,9], palette.name=coolBlueHotRed,main=names(data_train)[9])
plot(som_model2, type = "property", property = som_model2$codes[,10], palette.name=coolBlueHotRed,main=names(data_train)[10])
plot(som_model2, type = "property", property = som_model2$codes[,11], palette.name=coolBlueHotRed,main=names(data_train)[11])
## Error: 'breaks' are not unique
plot(som_model2, type = "property", property = som_model2$codes[,12], palette.name=coolBlueHotRed,main=names(data_train)[12])
plot(som_model2, type = "property", property = som_model2$codes[,13], palette.name=coolBlueHotRed,main=names(data_train)[13])
plot(som_model2, type = "property", property = som_model2$codes[,14], palette.name=coolBlueHotRed,main=names(data_train)[14])
plot(som_model2, type = "property", property = som_model2$codes[,15], palette.name=coolBlueHotRed,main=names(data_train)[15])
plot(som_model2, type = "property", property = som_model2$codes[,16], palette.name=coolBlueHotRed,main=names(data_train)[16])
plot(som_model2, type = "property", property = som_model2$codes[,17], palette.name=coolBlueHotRed,main=names(data_train)[17])
plot(som_model2, type = "property", property = som_model2$codes[,18], palette.name=coolBlueHotRed,main=names(data_train)[18])
plot(som_model2, type = "property", property = som_model2$codes[,19], palette.name=coolBlueHotRed,main=names(data_train)[19])
plot(som_model2, type = "property", property = som_model2$codes[,20], palette.name=coolBlueHotRed,main=names(data_train)[20])
plot(som_model2, type = "property", property = som_model2$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[21])
plot(som_model2, type = "property", property = som_model2$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[22])
plot(som_model2, type = "property", property = som_model2$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[23])
plot(som_model2, type = "property", property = som_model2$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[24])
plot(som_model2, type = "property", property = som_model2$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[25])
plot(som_model2, type = "property", property = som_model2$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[26])
plot(som_model2, type = "property", property = som_model2$codes[,21], palette.name=coolBlueHotRed,main=names(data_train)[27])
plot(som_model2, type = "property", property = som_model2$codes[,22], palette.name=coolBlueHotRed,main=names(data_train)[28])
## Select observations that have been clicked
data_train_clicked <- subset(data_train, data_train$clicked == 1,select=-c(clicked,notclicked,winningsom,distance))
data_train_matrix <- as.matrix(data_train_clicked)
## See where observation maps onto the SOM
trainingmappingclicked <- map(som_model2, data_train_matrix)
## To be used later to plot clicked map
trainingsom2clicked <- trainingmappingclicked
## To be used later to plot non-clicked map
data_train_notclicked <- subset(data_train, data_train$notclicked == 1,select=-c(clicked,notclicked,winningsom,distance))
trainingsom2notclicked <- map(som_model2, as.matrix(data_train_notclicked))
## Collection of unique winning SOM nodes that we will use latter
## to select non-clicked observations.
winingsom2 <- unique(trainingmappingclicked$unit.classif)
## See where observation maps onto the SOM
data_train_matrix <- as.matrix(data_train)
trainingmapping <- map(som_model2,subset(data_train_matrix, select=-c(clicked,notclicked,winningsom,distance)))
## Subselect and merge with data_train
data_train$winningsom <- trainingmapping$unit.classif
data_train$distance <- trainingmapping$distances
## Code to Export nodes and edges second SOM to Gephi
## Rescale so that the SOM Winning Distance is between 1 and 100.
data_train$distance <- round( rescale(data_train$distance, to = c(1, 100),from=range(data_train$distance ,na.rm=TRUE)), digits = 2)
## Vertices
verticessom <- as.data.frame(cbind(seq(1,som2total),seq(1,som2total)))
vertices <- as.data.frame(cbind(1000+as.numeric(rownames(data_train)),rownames(data_train)))
vertices <- rbind(verticessom, vertices)
colnames(vertices) <- c('id','label')
## Edges
edges <- as.data.frame(cbind(1000+as.numeric(rownames(data_train)),data_train$winningsom))
colnames(edges) <- c('source','target')
## Edges Attributes
edgesattributes <- data.frame(weight = data_train$distance)
## Nodes Attributes
somnodesattributes <- data.frame(cbind(seq(1,som2total),
clicked = 0,
male = 0,
female = 0,
unknowngender=0,
age0to12=0,
age12to18=0,
age18to24=0,
age24to30=0,
age30to40=0,
age40plus=0) )
nodesattributes <- data.frame(clicked = data_train$clicked,
male = data_train$male,
female = data_train$female,
unknowngender=data_train$unknowngender,
age0to12=data_train$age0to12,
age12to18=data_train$age12to18,
age18to24=data_train$age18to24,
age24to30=data_train$age24to30,
age30to40=data_train$age30to40,
age40plus=data_train$age40plus)
nodesattributes <- rbind(subset(somnodesattributes,select=-c(V1)),nodesattributes)
#write.gexf(nodes=vertices,edges=edges,
# defaultedgetype = "undirected",
# meta = list(creator="InfoKnowledge Inc.",
# description="SOM Graph Representation",
# keywords="gexf graph, SOM, R, rgexf"),
# nodesAtt = nodesattributes,
# edgesAtt = edgesattributes,
# output = "SOM.gexf"
# )
## Validation
advertisingtable <- paste("select * from search_instance LIMIT 1500000 OFFSET 5000000")
rs <- dbSendQuery(con, advertisingtable)
## fetch all elements from the result set
result_set <- fetch(rs,n=-1)
data_validate <- result_set[,c("position","adid_low","adid_medium","adid_high","adid_age_low","adid_age_medium","adid_age_high","adid_gender_low","adid_gender_medium","adid_gender_high","advertiserid_low","advertiserid_medium","advertiserid_high","advertiserid_age_low","advertiserid_age_medium","advertiserid_age_high","advertiserid_gender_low","advertiserid_gender_medium","advertiserid_gender_high", "male","female","unknowngender","age0to12","age12to18","age18to24","age24to30","age30to40","age40plus","clicked", "notclicked")]
## Select observations that have been clicked
data_validate_clicked <- subset(data_validate, data_validate$clicked == 1,select=-c(clicked,notclicked))
data_validate_matrix <- as.matrix(data_validate_clicked)
## See where observation maps onto the SOM 1, 2
validatemappingclickedsom1 <- map(som_model, data_validate_matrix)
validatemappingclickedsom2 <- map(som_model2, data_validate_matrix)
## To be used later to plot non-clicked map SOM 1, 2
data_validate_notclicked <- subset(data_validate, data_validate$notclicked == 1,select=-c(clicked,notclicked))
validatemappingnotclickedsom1 <- map(som_model, as.matrix(data_validate_notclicked))
validatemappingnotclickedsom2 <- map(som_model2, as.matrix(data_validate_notclicked))
Show where clicked and not-clicked aggregate on the SOM for the second SOM
library(fields) #to use designer.colors
## Loading required package: spam
## Loading required package: grid
## Spam version 0.41-0 (2014-02-26) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
##
## Attaching package: 'spam'
##
## The following objects are masked from 'package:base':
##
## backsolve, forwardsolve
##
## Loading required package: maps
##
## Attaching package: 'maps'
##
## The following object is masked from 'package:kohonen':
##
## map
##-----------------------------
somvisual <- runif(som2total, 0, 0)
#####
#trainingsom2clicked
#####
##Original testsomvisualdataframe <- as.data.frame(table(trainingmappingclicked$unit.classif),stringsAsFactors=FALSE)
somvisualdataframe <- as.data.frame(table(trainingsom2clicked$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}
somvisualmatrix <- as.matrix(array(somvisual,dim=c(som2row,som2col)))
##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column),
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)
##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]
##To make room for the legend
par(mar = c(0.4, 2, 2, 7))
##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns),
ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 2 Clicked Observations",asp=1)
##Create the color palette
##designer.colors to interpolate 50 colors between
##the maxmimum number of allowed values in Brewer
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))
##Make a vector with length(ColRamp) number of bins between the minimum and
##maximum value of x.
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))]
##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
for (column in 0:(SOM_Columns - 1))
Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
offset <- ifelse(offset, 0, 0.5)
}
##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))
##-----------------------------
#####
#trainingsom2notclicked
#####
somvisual <- runif(som2total, 0, 0)
##Original testsomvisualdataframe <- as.data.frame(table(trainingmappingclicked$unit.classif),stringsAsFactors=FALSE)
somvisualdataframe <- as.data.frame(table(trainingsom2notclicked$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}
somvisualmatrix <- as.matrix(array(somvisual,dim=c(som2row,som2col)))
##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column),
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)
##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]
##To make room for the legend
par(mar = c(0.4, 2, 2, 7))
##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns),
ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 2 Not-Clicked Observations",asp=1)
##Create the color palette
##designer.colors to interpolate 50 colors between
##the maxmimum number of allowed values in Brewer
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))
##Make a vector with length(ColRamp) number of bins between the minimum and
##maximum value of x.
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))]
##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
for (column in 0:(SOM_Columns - 1))
Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
offset <- ifelse(offset, 0, 0.5)
}
##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))
##-----------------------------
#####
#validatemappingclicked SOM1
#####
somvisual <- runif(som1total, 0, 0)
somvisualdataframe <- as.data.frame(table(validatemappingclickedsom1$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}
somvisualmatrix <- as.matrix(array(somvisual,dim=c(som1row,som1col)))
##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column),
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)
##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]
##To make room for the legend
par(mar = c(0.4, 2, 2, 7))
##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns),
ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 1 Clicked Validate Observations",asp=1)
##Create the color palette
##designer.colors to interpolate 50 colors between
##the maxmimum number of allowed values in Brewer
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))
##Make a vector with length(ColRamp) number of bins between the minimum and
##maximum value of x.
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))]
##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
for (column in 0:(SOM_Columns - 1))
Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
offset <- ifelse(offset, 0, 0.5)
}
##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))
##-----------------------------
#####
#validatemappingnotclicked SOM 1
#####
somvisual <- runif(som1total, 0, 0)
somvisualdataframe <- as.data.frame(table(validatemappingnotclickedsom1$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}
somvisualmatrix <- as.matrix(array(somvisual,dim=c(som1row,som1col)))
##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column),
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)
##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]
##To make room for the legend
par(mar = c(0.4, 2, 2, 7))
##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns),
ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 1 Non-Clicked Validate Observations",asp=1)
##Create the color palette
##designer.colors to interpolate 50 colors between
##the maxmimum number of allowed values in Brewer
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))
##Make a vector with length(ColRamp) number of bins between the minimum and
##maximum value of x.
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))]
##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
for (column in 0:(SOM_Columns - 1))
Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
offset <- ifelse(offset, 0, 0.5)
}
##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))
##-----------------------------
#####
#validatemappingclicked SOM 2
#####
somvisual <- runif(som2total, 0, 0)
somvisualdataframe <- as.data.frame(table(validatemappingclickedsom2$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}
somvisualmatrix <- as.matrix(array(somvisual,dim=c(som2row,som2col)))
##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column),
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)
##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]
##To make room for the legend
par(mar = c(0.4, 2, 2, 7))
##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns),
ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 2 Clicked Validate Observations",asp=1)
##Create the color palette
##designer.colors to interpolate 50 colors between
##the maxmimum number of allowed values in Brewer
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))
##Make a vector with length(ColRamp) number of bins between the minimum and
##maximum value of x.
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))]
##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
for (column in 0:(SOM_Columns - 1))
Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
offset <- ifelse(offset, 0, 0.5)
}
##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))
##-----------------------------
#####
#validatemappingnotclicked SOM 2
#####
somvisual <- runif(som2total, 0, 0)
somvisualdataframe <- as.data.frame(table(validatemappingnotclickedsom2$unit.classif))
for (i in 1:nrow(somvisualdataframe) ) {
somvisual[as.numeric(somvisualdataframe$Var1[i])] <- as.numeric(somvisualdataframe$Freq[i])
}
somvisualmatrix <- as.matrix(array(somvisual,dim=c(som2row,som2col)))
##Start with a matrix that would be the numerical representation of you heatmap
##Here [1,1] will become the lower left node (1st row, 1st column),
##[1,2] will become the node to the right
##[2,1] will be the first node to the left in the second row
##So visually you work your way from bottom left to top right
x <- as.vector(somvisualmatrix)
##Number of rows and columns of your SOM
SOM_Rows <- dim(somvisualmatrix)[1]
SOM_Columns <- dim(somvisualmatrix)[2]
##To make room for the legend
par(mar = c(0.4, 2, 2, 7))
##Initiate the plot window but do show any axes or points on the plot
plot(0, 0, type = "n", axes = FALSE, xlim=c(0, SOM_Columns),
ylim=c(0, SOM_Rows), xlab="", ylab= "", main="SOM 2 Non-Clicked Validate Observations",asp=1)
##Create the color palette
##designer.colors to interpolate 50 colors between
##the maxmimum number of allowed values in Brewer
ColRamp <- rev(designer.colors(n=50, col=brewer.pal(9, "Spectral")))
##Make a vector with length(ColRamp) number of bins between the minimum and
##maximum value of x.
##Next match each point from x with one of the colors in ColorRamp
ColorCode <- rep("#FFFFFF", length(x)) #default is all white
Bins <- seq(min(x, na.rm=T), max(x, na.rm=T), length=length(ColRamp))
for (i in 1:length(x))
if (!is.na(x[i])) ColorCode[i] <- ColRamp[which.min(abs(Bins-x[i]))]
##Actual plotting of hexagonal polygons on map
offset <- 0.5 #offset for the hexagons when moving up a row
for (row in 1:SOM_Rows) {
for (column in 0:(SOM_Columns - 1))
Hexagon(column + offset, row - 1, col = ColorCode[row + SOM_Rows * column])
offset <- ifelse(offset, 0, 0.5)
}
##Add legend to the right
image.plot(legend.only=TRUE, col=ColRamp, zlim=c(min(x, na.rm=T), max(x, na.rm=T)))
##-----------------------------
detach(package:fields, unload=TRUE)
detach(package:kohonen, unload=TRUE)
library(kohonen)
##
## Attaching package: 'kohonen'
##
## The following object is masked from 'package:maps':
##
## map
## Closes the connection
dbDisconnect(con)
## Frees all the resources on the driver
dbUnloadDriver(drv)