According with the data analized in the study, we consider an reduced example of 10 municipalities and fictitious sub-domains. The numerical data is assigned as follows: if the municipality does not meet a specific item, it will be assigned a value of 0. If the municipality partially meets the item, it will be assigned a value of 0.5. And if the municipality fully meets the item, it will be assigned a value of 1.
# Crear el data.frame
data <- data.frame(
Nombre = c("Adsubia", "Agost", "Agres", "Aigues", "Albatera", "Alcalali", "Alcocer de Planes", "Alcoleja", "Alcoy/Alcoi", "Alfafara"),
A1 = c(1, 0.5, 0, 1, 0.5, 0.5, 0.5, 0.5, 1, 0),
A2 = c(1, 1, 1, 0, 0, 0, 0, 0, 0.5, 0),
A3 = c(1, 1, 1, 0, 0, 0, 0, 0, 0.5, 0),
A4 = c(1, 1, 1, 0, 1, 0.5, 1, 1, 1, 0),
A5 = c(1, 0, 0, 0, 0.5, 0, 0, 0, 1, 0),
A6 = c(1, 0, 0, 0.5, 0, 0.5, 1, 0, 0, 0),
A7 = c(1, 1, 1, 0.5, 1, 0, 0, 0, 1, 0),
A8 = c(1, 1, 1, 0, 1, 0, 0.5, 0, 1, 0)
)
print(data)
## Nombre A1 A2 A3 A4 A5 A6 A7 A8
## 1 Adsubia 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
## 2 Agost 0.5 1.0 1.0 1.0 0.0 0.0 1.0 1.0
## 3 Agres 0.0 1.0 1.0 1.0 0.0 0.0 1.0 1.0
## 4 Aigues 1.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0
## 5 Albatera 0.5 0.0 0.0 1.0 0.5 0.0 1.0 1.0
## 6 Alcalali 0.5 0.0 0.0 0.5 0.0 0.5 0.0 0.0
## 7 Alcocer de Planes 0.5 0.0 0.0 1.0 0.0 1.0 0.0 0.5
## 8 Alcoleja 0.5 0.0 0.0 1.0 0.0 0.0 0.0 0.0
## 9 Alcoy/Alcoi 1.0 0.5 0.5 1.0 1.0 0.0 1.0 1.0
## 10 Alfafara 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Now we assume the next transformation of the data according to the ordered set of transparency of each category j, \(datos[i,j]\in\{1,2,3\}\), where the most transparent is \(datos[i,j]=1\) and the least transparent is \(datos[i,j]=3\).
datos <- data[,-1] # Delete the non numeric column
datos <- replace(datos, datos == 0, 3)
datos <- replace(datos, datos == 0.5, 2)
summary(datos)
## A1 A2 A3 A4 A5
## Min. :1.00 Min. :1.00 Min. :1.00 Min. :1.00 Min. :1.00
## 1st Qu.:1.25 1st Qu.:1.25 1st Qu.:1.25 1st Qu.:1.00 1st Qu.:2.25
## Median :2.00 Median :3.00 Median :3.00 Median :1.00 Median :3.00
## Mean :1.90 Mean :2.30 Mean :2.30 Mean :1.50 Mean :2.50
## 3rd Qu.:2.00 3rd Qu.:3.00 3rd Qu.:3.00 3rd Qu.:1.75 3rd Qu.:3.00
## Max. :3.00 Max. :3.00 Max. :3.00 Max. :3.00 Max. :3.00
## A6 A7 A8
## Min. :1.0 Min. :1.0 Min. :1.0
## 1st Qu.:2.0 1st Qu.:1.0 1st Qu.:1.0
## Median :3.0 Median :1.5 Median :1.5
## Mean :2.4 Mean :1.9 Mean :1.9
## 3rd Qu.:3.0 3rd Qu.:3.0 3rd Qu.:3.0
## Max. :3.0 Max. :3.0 Max. :3.0
Let F(x) be a distribution function. We define the ‘’membership’’ function as 1-F(x), which represents the proportion of municipalities with input (‘’1’‘,’‘2’’ or ‘’3’‘) greater than the input (’‘1’‘,’‘2’’ or ‘’3’’) with respect to the municipality i.
membership <- function(item,indiv){
n <- length(item)
memb <- which(item>indiv)
cdistri <- length(memb)/n
return(cdistri)
}
We define the function \(e_{hji}\) as the continuous domain score for the municipality i. Thus, \(e_{hji}\) represents the transparency score in the sub-domain f for the municipality i. Namely, \[e_{h,j,i}=\frac{1-F(x_{ij})}{1-F(1)}\]
ehji <- matrix(nrow = nrow(datos),ncol = ncol(datos))
for (j in 1:ncol(datos)){
for (i in 1:nrow(datos)){
ehji[i,j] <- membership(datos[,j],datos[i,j])/membership(datos[,j],min(datos[,j]))
}
}
return(ehji)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 1.0000000 1.0000000 1.0000000 1.0000000 1.000 1.00 1.0 1.0
## [2,] 0.2857143 1.0000000 1.0000000 1.0000000 0.000 0.00 1.0 1.0
## [3,] 0.0000000 1.0000000 1.0000000 1.0000000 0.000 0.00 1.0 1.0
## [4,] 1.0000000 0.0000000 0.0000000 0.0000000 0.000 0.75 0.8 0.0
## [5,] 0.2857143 0.0000000 0.0000000 1.0000000 0.875 0.00 1.0 1.0
## [6,] 0.2857143 0.0000000 0.0000000 0.6666667 0.000 0.75 0.0 0.0
## [7,] 0.2857143 0.0000000 0.0000000 1.0000000 0.000 1.00 0.0 0.8
## [8,] 0.2857143 0.0000000 0.0000000 1.0000000 0.000 0.00 0.0 0.0
## [9,] 1.0000000 0.8571429 0.8571429 1.0000000 1.000 0.00 1.0 1.0
## [10,] 0.0000000 0.0000000 0.0000000 0.0000000 0.000 0.00 0.0 0.0
In order to compute continuous domain score for each municipality, we define a weighting system that assigns a specific weight to each sub-domain j. The weight is composed by the product of to weights. The first one called weightA computes a numeric data between [0,1] as follows \[\omega_j^a=1-\sum_{i=1}^{10}\frac{e_{hji}}{10}\]
weightA <- numeric()
weight_A <- function(item){
wa <- 1- sum(item)/length(item)
return(wa)
}
weightA <- apply(X = ehji, MARGIN = 2, FUN = weight_A)
return(weightA)
## [1] 0.5571429 0.6142857 0.6142857 0.2333333 0.7125000 0.6500000 0.4200000
## [8] 0.4200000
The second one capture the proportion of the variance of sub-domain j that is not explained by the rest of the sub-domains \(\{1,\ldots, j − 1, j + 1,\ldots\}\) in a linear multiple regression model and whose objective is to eliminate the redundant information that may accumulate the continuous domain score for municipality i. The function ``p2distance’’ has been used to calculate these weightings (Pena Trapero, 1977). Please note that this package is not available in the CRAN repository and must be installed manually. The distance \(DP2\) can be defined as follows: \[ DP2_i=\sum_{j=1}^m\frac{|x_{ij}|}{\sigma_j}\omega_j^b \] where \(\sigma_j\) is the standard deviation of the j-sub-domain \(j\in\{1,...,m\}\), subject to the standard deviation \(\sigma_j\neq 0\) and \(\omega_j^b=1-R^2_{j,...,1}\). Likewise for each \(j\in\{1,...,m\}\), \(R^2_{j,...1}\) represents the coefficient of determination in the multiple linear regression of \(X_{\cdot j}\) over the preceding indicators \(X_{\cdot j-1},...,X_{\cdot 1}\) assuming \(R^2_1=0\).
library(p2distance)
names <- colnames(datos)
depscor <- as.data.frame(ehji)
colnames(depscor) <- names
p2dist <- p2distance(as.matrix(depscor),reference_vector = NULL)
## [1] "Iteration 1"
## Warning in summary.lm(lm(matriz[, i + 1] ~ matriz[, 1:i])): essentially perfect
## fit: summary may be unreliable
## [1] "Iteration 2"
## Warning in summary.lm(lm(matriz[, i + 1] ~ matriz[, 1:i])): essentially perfect
## fit: summary may be unreliable
## [1] "Iteration 3"
## Warning in summary.lm(lm(matriz[, i + 1] ~ matriz[, 1:i])): essentially perfect
## fit: summary may be unreliable
res <- as.data.frame(t(p2dist$correction_factors))
res.ord <- res[, names]
weightB <- as.numeric(res.ord[1,])
return(weightB)
## [1] 0.4610996 0.3792168 0.0000000 0.3684120 0.6155732 0.4497543 0.4833195
## [8] 1.0000000
Finally, we compute the weighting system as the product of the above weights.
weight <- weightA*weightB
return(weight)
## [1] 0.2568983 0.2329474 0.0000000 0.0859628 0.4385959 0.2923403 0.2029942
## [8] 0.4200000
After calculating the weights, we compute continuous domain score for
each municipality as
\[e_{hi}=\frac{\sum_{j=1}^8\omega_{hj}e_{hji}}{\sum_{j=1}^8\omega_{hj}}\]
ehi <- numeric()
for(i in 1:nrow(ehji)){
ehi[i] =sum(weight*ehji[i,])/sum(weight)
}
return(cbind(ehi))
## ehi
## [1,] 1.00000000
## [2,] 0.52613538
## [3,] 0.48809940
## [4,] 0.33089910
## [5,] 0.60429310
## [6,] 0.18135264
## [7,] 0.40819127
## [8,] 0.08258232
## [9,] 0.83126294
## [10,] 0.00000000
Now, we compute the degree a municipality is left behind. Defining how ‘behind’ a municipality is in each specific sub-domain involves assessing its relative position or situation in comparison with certain benchmarks or standards within that sub-domain. First of all, we define a new membershipL function which determines how much the sub-domain that exceed the corresponding municipality value contribute to the total sub-domain. This value can be interpreted as the proportion of the sub-domain contribution that exceeds the values in municipality relative to the total sub-domain.
membershipL <- function(item,indiv){
memb <- which(item>indiv)
Ldistri <- sum(item[memb])/sum(item)
return(Ldistri)
}
Then, we can compute the degree the municipality is ‘left behind’ (Lbi)
xhi <- 1- ehi
Fhi <- numeric()
LFhi <- numeric()
for (i in 1:length(xhi)){
Fhi[i] <- membership(xhi[],xhi[i])
LFhi[i] <- membershipL(xhi[],xhi[i])
}
Lbi = LFhi - xhi/mean(xhi)*Fhi
return(data.frame(cbind(Fhi),cbind(LFhi),cbind(Lbi)))
## Fhi LFhi Lbi
## 1 0.9 1.0000000 1.00000000
## 2 0.6 0.8128224 0.30027625
## 3 0.5 0.7205412 0.25913539
## 4 0.3 0.4932350 0.13137519
## 5 0.7 0.8982468 0.39890360
## 6 0.2 0.3456561 0.05049823
## 7 0.4 0.6138549 0.18710954
## 8 0.1 0.1802717 0.01488725
## 9 0.8 0.9695815 0.72623342
## 10 0.0 0.0000000 0.00000000
As can be seen, those municipalities with more sub-domains taking the value 1, i.e. more transparent, have an indicator close to or equal to 1. Those municipalities with the majority of values equal to three have indicators close to 0.
results <- data.frame(datos[,1] ,datos,cbind(ehi),cbind(Lbi))
return(results)
## datos...1. A1 A2 A3 A4 A5 A6 A7 A8 ehi Lbi
## 1 1 1 1 1 1 1 1 1 1 1.00000000 1.00000000
## 2 2 2 1 1 1 3 3 1 1 0.52613538 0.30027625
## 3 3 3 1 1 1 3 3 1 1 0.48809940 0.25913539
## 4 1 1 3 3 3 3 2 2 3 0.33089910 0.13137519
## 5 2 2 3 3 1 2 3 1 1 0.60429310 0.39890360
## 6 2 2 3 3 2 3 2 3 3 0.18135264 0.05049823
## 7 2 2 3 3 1 3 1 3 2 0.40819127 0.18710954
## 8 2 2 3 3 1 3 3 3 3 0.08258232 0.01488725
## 9 1 1 2 2 1 1 3 1 1 0.83126294 0.72623342
## 10 3 3 3 3 3 3 3 3 3 0.00000000 0.00000000
A simple example is used to show the fuzzy methodology for calculating composite indicators. We consider three hypothetical dimensions of ten observations.
Munici = c("Adsubia", "Agost", "Agres", "Aigues", "Albatera", "Alcalali", "Alcocer de Planes", "Alcoleja", "Alcoy/Alcoi", "Alfafara")
set.seed(123)
Dim1 <- runif(10,0,1)
Dim2 <- runif(10,0,1)
Dim3 <- runif(10,0,1)
Data_fuzzy <- data.frame(Munici,Dim1,Dim2,Dim3)
print(Data_fuzzy)
## Munici Dim1 Dim2 Dim3
## 1 Adsubia 0.2875775 0.95683335 0.8895393
## 2 Agost 0.7883051 0.45333416 0.6928034
## 3 Agres 0.4089769 0.67757064 0.6405068
## 4 Aigues 0.8830174 0.57263340 0.9942698
## 5 Albatera 0.9404673 0.10292468 0.6557058
## 6 Alcalali 0.0455565 0.89982497 0.7085305
## 7 Alcocer de Planes 0.5281055 0.24608773 0.5440660
## 8 Alcoleja 0.8924190 0.04205953 0.5941420
## 9 Alcoy/Alcoi 0.5514350 0.32792072 0.2891597
## 10 Alfafara 0.4566147 0.95450365 0.1471136
We want to compute the composite fuzzy indicator. The methodology can be found in Jiménez-Fernández et al. (2022). In this study, we assume a \(n\times m\)-dimension matrix where the columns represent the sub-domains or domains \(X_j\) \(j\in\{1,...,m\}\), and the rows of \(X\) are referred to the municipalities, in our example \(n=10\) and \(m=3\). Let \((X_j,M_j,\ast)\) be fuzzy metrics spaces \(j=\{1,...,m\}\) according to George and Veeramani (1994) over the same t-norm \(\ast\). We assume \(\ast\) as the usual product. We denote by \(x_{i}=(x_{i1},...,x_{im})\) the 3-dimension vector that groups all single indicators corresponding to the i-observation \(i\in\{1,...,10\}\). Supposing that \(x_{*}=(x_{*1},...,x_{*m})\) be a fictitious vector reference (target or baseline) that is composed by the results of a theoretical observation with the best-worst possible scenario for all the single indicators depending on the polarity. In this example, we suppose that the domains has positive polarity, namely, the higher the value in the domain, the higher the value in the transparency indicator. Therefore, the fictitious vector reference is \(x_{*}=(1,1,1)\).
The (TCFI) can be defined as \[\begin{equation} TCFI_i=TCFI(x_{*},x_{i})=\prod_{j=1}^m M_j(x_{*j},x_{ij}) \end{equation}\] where the fuzzy metric \(M_j\) is defined as follows \[\begin{equation} M_j(x_{*j},x_{ij})=\frac{k}{k+ d(x_{*j},x_{ij})} \end{equation}\] \(d\) is a euclidean distance and the parameter \(k\) a sensitivity constant.
We call the appropriate libraries to perform the algorithm to make composite fuzzy indicator.
library (earth)
## Loading required package: Formula
## Loading required package: plotmo
## Loading required package: plotrix
## Loading required package: TeachingDemos
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(rsample)
library(GGally)
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(corrgram)
##
## Attaching package: 'corrgram'
## The following object is masked from 'package:GGally':
##
## baseball
Below, the show the algorithm to compute the fuzzy composite indicator
CompFuzzy <-function(x,polarity,err,iterations,degrees){
if (!is.data.frame(x)) {
warning("the argument 'x' would be a data.frame object")
matriz <- as.matrix(x)
}
################# Functions #########################################
# Normalization compatible with respect to the polarity of indicators
normalization <- function(x,polarity=NULL){
names_var <- colnames(x)
names_regions <- rownames(x)
m <- dim(x)[2]
columns <- c(1:m)
pospol <- polarity
if(!is.null(polarity)){
negpol <- columns[-pospol]
}else{0
negpol <- NULL
}
normdata <- matrix(0,ncol = ncol(x),nrow = nrow(x))
norm_minmax <- function(x){
(x- min(x)) /(max(x)-min(x))
}
norm_maxmin <- function(x){
(max(x)-x) /(max(x)-min(x))
}
for(j in pospol){
normdata[,j] <- norm_minmax(x[,j])
}
for(j in negpol){
normdata[,j] <- norm_maxmin(x[,j])
}
normdata <- as.data.frame(normdata)
colnames(normdata) <- names_var
rownames(normdata) <- names_regions
return(normdata)
}
################ Metric ############################################
# Initial Fuzzy metric computed when k_j=1
calcularDistancia = function(x){
X <- as.matrix(x)
f_y <- (1)/(1+X)
colnames(f_y)<-colnames(X)
mI <- as.matrix(f_y)
return(mI)
}
# Making the composite indicator
CompInd = function(x){
n=nrow(x)
m=ncol(x)
DFS <- matrix(1,nrow=n,ncol = 1)
for (j in 1:m){
DFS = DFS*x[,j]
}
colnames(DFS) <- "Frechet.Index"
return(DFS)
}
# Computes de fuzzy metric for given parameters "Factores"
FuzzyDistance = function(x,Factores){
n <- dim(x)[1]
m <- dim(x)[2]
# Make the similarity fuzzy metric for each variable
Sym_welfare <-matrix(0,nrow=n,ncol=m)
for (j in 1:m){
for (i in 1:n){
Sym_welfare[i,j]=(Factores[j])/((Factores[j])+x[i,j])
}
}
DFS <- CompInd(Sym_welfare)
colnames(DFS) <- paste("fuzzydistance", iteracion, sep = ".")
return(DFS)
}
## Compute weights throught MARS and variable importance
calculoFactoresPonderacion = function(x,Compind,degrees){
m <- ncol(x)
dat_os <- x
#Compind <- df_ite
#degrees=3
x <- x %>% mutate(Cind = as.numeric(Compind))
mymodel <- earth(Cind~ ., data = x, keepxy=TRUE, degree=degrees, nfold=5,trace=.5)
p2 <- evimp(mymodel,trim=FALSE)
Variable <- row.names(p2)
weigths <- as.numeric(p2[,"gcv"])
# Avoid that one weight can be 0
mini <- min(weigths[weigths > 0])
cero <- which(weigths==0)
Variable[cero] <- sub("-unused", "", Variable[cero] )
for(j in 1:m){
if(is.na(weigths[j])) { print('Missing values has been found')}
else
{if(weigths[j]==0){
weigths[j] <- mini/m
}
}
}
weigths <- weigths/100
res <- data.frame(t(weigths))
colnames(res) <- Variable
res.ord <- as.numeric(res[colnames(dat_os)])
important <- list(res.ord,Variable)
return(important)
}
## Algorithm ##############################################################
n <- dim(x)[2]
resultados <- list()
error.d <- numeric()
weights <- c(rep(1,ncol(x)))
Weights <- matrix(nrow = 0, ncol = ncol(x))
normdata <- normalization(x,polarity) # Normalize the data
corrgr <- corrgram(normdata, order=TRUE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel=panel.txt, main="Correlation between each features present in the data")
mDif <- calcularDistancia(normdata) # We compute the distance matrix
dps <- as.numeric(CompInd(mDif))
iteracion <- 1
repeat{
print(paste("Iteration", iteracion))
mFac <- calculoFactoresPonderacion(normdata,dps,degrees)
Weights <- rbind(Weights, mFac[[1]])
df_ite <- as.numeric(FuzzyDistance(normdata, mFac[[1]]))
error <- sum((dps-df_ite)^2)
error.d[iteracion] <- error
if (( error < err) || (iteracion >= iterations)) {
break
}
iteracion <- iteracion + 1
dps <- df_ite
}
## Outputs Summary
resultados$Fuzzydistance <- df_ite
resultados$Fuzzydistance_1 <- dps
resultados$Correlation <- corrgr
resultados$iteration <- iteracion
resultados$Names.Single.Ind <- mFac[[2]]
resultados$Single.Ind <- mFac[[1]]
resultados$error <- error.d
fuzzy_index <<- resultados
}
The Data_fuzzy indicators are used to compute the previous function
Data_num <- Data_fuzzy[,2:4]
degrees = 3;err <- 0.01;iterations <- 20
polarity <- 1:3
CFuzzy <- CompFuzzy(Data_num, polarity,err,iterations,degrees)
## [1] "Iteration 1"
## Model with pmethod="backward": GRSq 0.291 RSq 0.630 nterms 2
## CV fold 1 CVRSq -0.586 n.oof 8 20% n.infold.nz 8 100% n.oof.nz 2 100%
## CV fold 2 CVRSq -2.260 n.oof 8 20% n.infold.nz 8 100% n.oof.nz 2 100%
## CV fold 3 CVRSq 0.747 n.oof 8 20% n.infold.nz 8 100% n.oof.nz 2 100%
## CV fold 4 CVRSq -30.528 n.oof 8 20% n.infold.nz 8 100% n.oof.nz 2 100%
## CV fold 5 CVRSq 0.513 n.oof 8 20% n.infold.nz 8 100% n.oof.nz 2 100%
## CV all CVRSq -6.423 n.infold.nz 10 100%
##
## [1] "Iteration 2"
## Model with pmethod="backward": GRSq 0.013 RSq 0.485 nterms 2
## CV fold 1 CVRSq -1.580 n.oof 8 20% n.infold.nz 8 100% n.oof.nz 2 100%
## CV fold 2 CVRSq -467.260 n.oof 8 20% n.infold.nz 8 100% n.oof.nz 2 100%
## CV fold 3 CVRSq -0.634 n.oof 8 20% n.infold.nz 8 100% n.oof.nz 2 100%
## CV fold 4 CVRSq -0.006 n.oof 8 20% n.infold.nz 8 100% n.oof.nz 2 100%
## CV fold 5 CVRSq -28.500 n.oof 8 20% n.infold.nz 8 100% n.oof.nz 2 100%
## CV all CVRSq -99.596 n.infold.nz 10 100%
Results <- data.frame(Munici,CFuzzy$Fuzzydistance)
print(Results)
## Munici CFuzzy.Fuzzydistance
## 1 Adsubia 0.07355705
## 2 Agost 0.07420018
## 3 Agres 0.09236887
## 4 Aigues 0.04792775
## 5 Albatera 0.13022229
## 6 Alcalali 0.15772939
## 7 Alcocer de Planes 0.15585100
## 8 Alcoleja 0.17051290
## 9 Alcoy/Alcoi 0.16396263
## 10 Alfafara 0.10533216
The results show that the municipality of Alcalali with a value of 0.53664279 is the closest to 1, which represents the notion of transparency presented in this study. On the other hand, Adsubia, with a value of 0.06392467, represents the least transparent municipality as it is the closest to zero.