TSB Optimise Proposal - Old method developed understand design variable impact along the pareto frontier captured by GA’s genetic optimisation engine. Rows sorted/group in terms of design variables and row sorted/grouped in terms of similar objective variables.
heatmap(as.matrix(d), Rowv =TRUE, Colv =NA) # sort 4d des.parameter space
heatmap(as.matrix(d))
Clustering both the 4-dimensional feature space points & the 50-dimensional dual design space lets us examine features and design parameters represented in the pareto front it shuffles the rows and also columns without scrambling (designs in the feature space & and features in the dual design space are the same - the rows and columns are just ordered according to the clustering) - more useful for gene expresion datasets with high numbers of features
Basic Workflow
0. Here we prepare a trivial example
Let’s put the vertices of a simple 3-4-5 triangle into a data.table…
id x y
<char> <num> <num>
1: A -0.1083825 0.5756526
2: B 3.2772819 -0.4908377
3: C -0.0689406 2.8234177
4: D 0.5678667 -0.5379234
2. Second we perform hierarchical clustering to produce a dendogram
dendgram<-hclust(dm, method ="complete")dendgram
Call:
hclust(d = dm, method = "complete")
Cluster method : complete
Distance : euclidean
Number of objects: 4
3. Visualise and “slice the tree”
plot(dendgram, labels = d[,id], main ="Have a nice day!")
clusters <-cutree(dendgram, k =2) # Cut the dendrogram at 2 clustersd$cluster <- clustersprint(d)
id x y cluster
<char> <num> <num> <int>
1: A -0.1083825 0.5756526 1
2: B 3.2772819 -0.4908377 2
3: C -0.0689406 2.8234177 1
4: D 0.5678667 -0.5379234 1
Distance Metrics
The method of calculating distance is customisable in the dist(x, method = “euclidean”, diag = FALSE, upper = FALSE, p = 2) command.
Method
Purpose
“euclidean”
Useful for high dimensional data
D.I.Y.
https://en.wikipedia.org/wiki/Chebyshev_distance
“maximum”
“manhattan”
https://en.wikipedia.org/wiki/Taxicab_geometry
“canberra”
https://en.wikipedia.org/wiki/Canberra_distance
“binary”
“minkowski”
https://en.wikipedia.org/wiki/Minkowski_distance
Agglomeration method
The method of calculating cluster position is customisable in the hclust(d, method = “complete”, members = NULL) command.
Method
Purpose
“ward.D”
“ward.D2”
“single”
“complete”
complete linkage - average between two furthest points
“average” (= UPGMA)
average linkage - averaged value of \(x_i^d\) for all points in cluster \(i=1:N\) in each dimension \(d\)