POPTOT (total population) Hapoly (municipality area)
Surfacebois (wood area) Airbat (area with buildings)
library(SamplingStrata)
data("swissmunicipalities")
df <- swissmunicipalities[,c(1,3,6:8,23)]
# df$DOM <- 1
df$HApoly.cat <- var.bin(df$HApoly,15)
df$POPTOT.cat <- var.bin(df$POPTOT,15)
frame <- buildFrameDF(df=df,
id="id",
X=c("HApoly.cat","POPTOT.cat"),
Y=c("Surfacesbois",
"Surfacescult"),
domainvalue = "REG")
strata <- buildStrataDF(frame,progress=FALSE, verbose=FALSE)
ndom <- length(unique(frame$domainvalue))
cv <- as.data.frame(list(DOM=rep("DOM1",ndom),
CV1=rep(0.05,ndom),
CV2=rep(0.05,ndom),
domainvalue=c(1:ndom)))
cv
## DOM CV1 CV2 domainvalue
## 1 DOM1 0.05 0.05 1
## 2 DOM1 0.05 0.05 2
## 3 DOM1 0.05 0.05 3
## 4 DOM1 0.05 0.05 4
## 5 DOM1 0.05 0.05 5
## 6 DOM1 0.05 0.05 6
## 7 DOM1 0.05 0.05 7
library(hEDA)
dom<-unique(strata$DOM1)
ndom<-length(unique(strata$DOM1))
Kmean<-SamplingStrata::KmeansSolution(strata,
errors=cv,
nstrata=NA,
minnumstrat=2,
maxclusters = 20,
showPlot=FALSE)
##
## -------------------
## Kmeans solution
## -------------------
## *** Domain: 1 ***
## Number of strata: 20
## Sample size : 78
## *** Domain: 2 ***
## Number of strata: 14
## Sample size : 58
## *** Domain: 3 ***
## Number of strata: 11
## Sample size : 35
## *** Domain: 4 ***
## Number of strata: 8
## Sample size : 26
## *** Domain: 5 ***
## Number of strata: 20
## Sample size : 66
## *** Domain: 6 ***
## Number of strata: 12
## Sample size : 35
## *** Domain: 7 ***
## Number of strata: 20
## Sample size : 91
## -------------------
## Total size: 389
## -------------------
nstrata<-NULL
for(i in 1:ndom){
nstrata[i]<-length(table(Kmean$suggestions[which(Kmean$domainvalue==dom[i])]))
}
nstrata
## [1] 20 14 11 8 20 12 20
##The kmeans algorithm: - separates observations in to \(k\)-clusters (where \(k\) is an input parameter), so that the within sum of squares in each cluster is minimised - The K-meansSolution algorithm generates clusterings for iteratively increasing values of k - The goal is to select a good quality solution: that solution will have k clusters. - The algorithm starts by partitioning the atomic strata into the k clusters (strata), where k starts at a value, e.g. 2, based on their proximity to a stratum centroid (the mean of each of the relevant values from the target variable columns in that stratum). This solution is then evaluated using the Bethel-Chromy algorithm. - The K-means algorithm then proceeds to create a solution with k + 1 strata and again evaluates that solution. This process is repeated for each k up to a predefined maximum number of strata (in this example 20 strata). The solution which provides the lowest cost is then chosen to be the initial solution. -
Kmean<-SamplingStrata::KmeansSolution(strata,
errors=cv,
nstrata=NA,
minnumstrat=2,
maxclusters = 20,
showPlot=FALSE)
nstrata<-NULL
for(i in 1:ndom){
nstrata[i]<-length(table(Kmean$suggestions[which(Kmean$domainvalue==dom[i])]))
}
nstrata
Source: https://link.springer.com/referenceworkentry/10.1007/978-0-387-30164-8_426 - The K-means clustering algorithm is sensitive to outliers, because a mean is easily influenced by extreme values. - - K-medoids clustering is a variant of K-means that is more robust to noises and outliers. - Instead of using the mean point as the center of a cluster, K-medoids uses an actual point in the cluster to represent it. - Medoid is the most centrally located object of the cluster, with minimum sum of distances to other points. - As an alternative to pam consider using the median - Mean is greatly influenced by the outlier and thus cannot represent the correct cluster center, while medoid is robust to the outlier and correctly represents the cluster center.
library(hEDA)
library(cluster)
Kmedoids<-kmedoidsSolution(strata,
errors=cv,
nstrata=NA,
minnumstrat=2,
maxclusters = 20,
showPlot=FALSE)
nstrata<-NULL
for(i in 1:ndom){
Kmedoidsnstrata[i]<-length(table(Kmedoids$suggestions[which(Kmedoids$domainvalue==dom[i])]))
}
Kmedoidsnstrata
## DOM CV1 CV2 domainvalue
## 1 DOM1 0.05 0.05 1
## 2 DOM1 0.05 0.05 2
## 3 DOM1 0.05 0.05 3
## 4 DOM1 0.05 0.05 4
## 5 DOM1 0.05 0.05 5
## 6 DOM1 0.05 0.05 6
## 7 DOM1 0.05 0.05 7
##
## -----------------
##
## Kmedoids solution
## -----------------
## *** Domain: 1 ***
## Number of strata: 20
## Sample size : 81
## *** Domain: 2 ***
## Number of strata: 16
## Sample size : 62
## *** Domain: 3 ***
## Number of strata: 12
## Sample size : 39
## *** Domain: 4 ***
## Number of strata: 14
## Sample size : 29
## *** Domain: 5 ***
## Number of strata: 20
## Sample size : 68
## *** Domain: 6 ***
## Number of strata: 15
## Sample size : 33
## *** Domain: 7 ***
## Number of strata: 20
## Sample size : 94
## Total Sample size : 406
## [1] 20 16 12 14 20 15 20
library(hEDA)
fuzzy<-fuzzySolution(strata,
cv,
minClusters=2,
maxclusters = 20)
#sample size
sum(fuzzy[[2]])
#adapt so it can be used in hEDA
fuzzySol<-Kmean
for(i in 1:ndom){
fuzzySol$suggestions[which(fuzzySol$domainvalue==dom[i])]<-unlist(fuzzy[[1]][i])
}
## [1] 346.9717
ptm <- proc.time()
outpar<-hEDA::parallelhEDA(strata, cv, fuzzySol,
Temp=0.0001,initialStrata=nstrata, decrement_constant=0.95, end_time =Inf,
jsize=5,length_of_markov_chain =50,
SAArun=TRUE,SAAiters=50,
popSize = 20, iters = 50, mutationChance = 0.01, elitism = 0.1,
addStrataFactor=0.000001, EDAfreq=1,
verbose = FALSE, dominio=dom,minnumstrat=2,kmax_percent=0.025,ProbNewStratum=0.0001,
strcens=FALSE,writeFiles=FALSE, showPlot=TRUE, minTemp = 0.000005, realAllocation=TRUE)
Time<-proc.time() - ptm
sum(unlist(outpar$SampleSize))
Time
ptm <- proc.time()
solutionGGA <-optimizeStrata(errors=cv, strata, cens = NULL, strcens = FALSE, alldomains = TRUE,
dom = NULL, initialStrata = nstrata, addStrataFactor = 0, minnumstr = 2,
iter = 400, pops = 20, mut_chance = NA, elitism_rate = 0.2,
highvalue = 1e+08, suggestions =fuzzySol,
realAllocation = TRUE,
writeFiles = TRUE, showPlot = TRUE, parallel = TRUE, cores = NA)
Time<-proc.time() - ptm
Time
Bezdek, J. C. (1981). Objective function clustering. In Pattern recognition with fuzzy objective function algorithms, pp. 43–93. Springer.
Klawonn, F. (2004). Fuzzy clustering: Insights and a new approach. Mathware & soft computing. 2004 Vol. 11 Núm. 3.
Morissette, L. and S. Chartier (2013). The k-means clustering technique: General considerations and implementation in mathematica. Tutorials in Quantitative Methods for Psychology 9(1), 15–24.