Grouping of sub-domains

Data

According with the data analized in the study, we consider an reduced example of 10 municipalities and fictitious sub-domains. The numerical data is assigned as follows: if the municipality does not meet a specific item, it will be assigned a value of 0. If the municipality partially meets the item, it will be assigned a value of 0.5. And if the municipality fully meets the item, it will be assigned a value of 1.

# Crear el data.frame
data <- data.frame(
  Nombre = c("Adsubia", "Agost", "Agres", "Aigues", "Albatera", "Alcalali", "Alcocer de Planes", "Alcoleja", "Alcoy/Alcoi", "Alfafara"),
  A1 = c(1, 0.5, 0, 1, 0.5, 0.5, 0.5, 0.5, 1, 0),
  A2 = c(1, 1, 1, 0, 0, 0, 0, 0, 0.5, 0),
  A3 = c(1, 1, 1, 0, 0, 0, 0, 0, 0.5, 0),
  A4 = c(1, 1, 1, 0, 1, 0.5, 1, 1, 1, 0),
  A5 = c(1, 0, 0, 0, 0.5, 0, 0, 0, 1, 0),
  A6 = c(1, 0, 0, 0.5, 0, 0.5, 1, 0, 0, 0),
  A7 = c(1, 1, 1, 0.5, 1, 0, 0, 0, 1, 0),
  A8 = c(1, 1, 1, 0, 1, 0, 0.5, 0, 1, 0)
)
print(data)
##               Nombre  A1  A2  A3  A4  A5  A6  A7  A8
## 1            Adsubia 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
## 2              Agost 0.5 1.0 1.0 1.0 0.0 0.0 1.0 1.0
## 3              Agres 0.0 1.0 1.0 1.0 0.0 0.0 1.0 1.0
## 4             Aigues 1.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0
## 5           Albatera 0.5 0.0 0.0 1.0 0.5 0.0 1.0 1.0
## 6           Alcalali 0.5 0.0 0.0 0.5 0.0 0.5 0.0 0.0
## 7  Alcocer de Planes 0.5 0.0 0.0 1.0 0.0 1.0 0.0 0.5
## 8           Alcoleja 0.5 0.0 0.0 1.0 0.0 0.0 0.0 0.0
## 9        Alcoy/Alcoi 1.0 0.5 0.5 1.0 1.0 0.0 1.0 1.0
## 10          Alfafara 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Now we assume the next transformation of the data according to the ordered set of transparency of each category j, \(datos[i,j]\in\{1,2,3\}\), where the most transparent is \(datos[i,j]=1\) and the least transparent is \(datos[i,j]=3\).

datos <- data[,-1] # Delete the non numeric column
datos <- replace(datos, datos == 0, 3)
datos <- replace(datos, datos == 0.5, 2)
summary(datos)
##        A1             A2             A3             A4             A5      
##  Min.   :1.00   Min.   :1.00   Min.   :1.00   Min.   :1.00   Min.   :1.00  
##  1st Qu.:1.25   1st Qu.:1.25   1st Qu.:1.25   1st Qu.:1.00   1st Qu.:2.25  
##  Median :2.00   Median :3.00   Median :3.00   Median :1.00   Median :3.00  
##  Mean   :1.90   Mean   :2.30   Mean   :2.30   Mean   :1.50   Mean   :2.50  
##  3rd Qu.:2.00   3rd Qu.:3.00   3rd Qu.:3.00   3rd Qu.:1.75   3rd Qu.:3.00  
##  Max.   :3.00   Max.   :3.00   Max.   :3.00   Max.   :3.00   Max.   :3.00  
##        A6            A7            A8     
##  Min.   :1.0   Min.   :1.0   Min.   :1.0  
##  1st Qu.:2.0   1st Qu.:1.0   1st Qu.:1.0  
##  Median :3.0   Median :1.5   Median :1.5  
##  Mean   :2.4   Mean   :1.9   Mean   :1.9  
##  3rd Qu.:3.0   3rd Qu.:3.0   3rd Qu.:3.0  
##  Max.   :3.0   Max.   :3.0   Max.   :3.0

Continuous domain score for the subdomain 1.1 at the municipality ‘’i’’

Let F(x) be a distribution function. We define the ‘’membership’’ function as 1-F(x), which represents the proportion of municipalities with input (‘’1’‘,’‘2’’ or ‘’3’‘) greater than the input (’‘1’‘,’‘2’’ or ‘’3’’) with respect to the municipality i.

  membership <- function(item,indiv){
    n <- length(item)
    memb <- which(item>indiv)
    cdistri <- length(memb)/n
    return(cdistri)
  }

We define the function \(e_{hji}\) as the continuous domain score for the municipality i. Thus, \(e_{hji}\) represents the transparency score in the sub-domain f for the municipality i. Namely, \[e_{h,j,i}=\frac{1-F(x_{ij})}{1-F(1)}\]

ehji <- matrix(nrow = nrow(datos),ncol = ncol(datos))
  for (j in 1:ncol(datos)){
    for (i in 1:nrow(datos)){
      ehji[i,j] <- membership(datos[,j],datos[i,j])/membership(datos[,j],min(datos[,j]))
    }
  }
return(ehji)
##            [,1]      [,2]      [,3]      [,4]  [,5] [,6] [,7] [,8]
##  [1,] 1.0000000 1.0000000 1.0000000 1.0000000 1.000 1.00  1.0  1.0
##  [2,] 0.2857143 1.0000000 1.0000000 1.0000000 0.000 0.00  1.0  1.0
##  [3,] 0.0000000 1.0000000 1.0000000 1.0000000 0.000 0.00  1.0  1.0
##  [4,] 1.0000000 0.0000000 0.0000000 0.0000000 0.000 0.75  0.8  0.0
##  [5,] 0.2857143 0.0000000 0.0000000 1.0000000 0.875 0.00  1.0  1.0
##  [6,] 0.2857143 0.0000000 0.0000000 0.6666667 0.000 0.75  0.0  0.0
##  [7,] 0.2857143 0.0000000 0.0000000 1.0000000 0.000 1.00  0.0  0.8
##  [8,] 0.2857143 0.0000000 0.0000000 1.0000000 0.000 0.00  0.0  0.0
##  [9,] 1.0000000 0.8571429 0.8571429 1.0000000 1.000 0.00  1.0  1.0
## [10,] 0.0000000 0.0000000 0.0000000 0.0000000 0.000 0.00  0.0  0.0

In order to compute continuous domain score for each municipality, we define a weighting system that assigns a specific weight to each sub-domain j. The weight is composed by the product of to weights. The first one called weightA computes a numeric data between [0,1] as follows \[\omega_j^a=1-\sum_{i=1}^{10}\frac{e_{hji}}{10}\]

  weightA <- numeric()
  weight_A <- function(item){
    wa <- 1- sum(item)/length(item)
    return(wa)
  }
  weightA <- apply(X = ehji, MARGIN = 2, FUN = weight_A)
  return(weightA)
## [1] 0.5571429 0.6142857 0.6142857 0.2333333 0.7125000 0.6500000 0.4200000
## [8] 0.4200000

The second one capture the proportion of the variance of sub-domain j that is not explained by the rest of the sub-domains \(\{1,\ldots, j − 1, j + 1,\ldots\}\) in a linear multiple regression model and whose objective is to eliminate the redundant information that may accumulate the continuous domain score for municipality i. The function ``p2distance’’ has been used to calculate these weightings (Pena Trapero, 1977). Please note that this package is not available in the CRAN repository and must be installed manually. The distance \(DP2\) can be defined as follows: \[ DP2_i=\sum_{j=1}^m\frac{|x_{ij}|}{\sigma_j}\omega_j^b \] where \(\sigma_j\) is the standard deviation of the j-sub-domain \(j\in\{1,...,m\}\), subject to the standard deviation \(\sigma_j\neq 0\) and \(\omega_j^b=1-R^2_{j,...,1}\). Likewise for each \(j\in\{1,...,m\}\), \(R^2_{j,...1}\) represents the coefficient of determination in the multiple linear regression of \(X_{\cdot j}\) over the preceding indicators \(X_{\cdot j-1},...,X_{\cdot 1}\) assuming \(R^2_1=0\).

library(p2distance)
  names <- colnames(datos)
  depscor <- as.data.frame(ehji)
  colnames(depscor) <- names
  p2dist <- p2distance(as.matrix(depscor),reference_vector = NULL)
## [1] "Iteration 1"
## Warning in summary.lm(lm(matriz[, i + 1] ~ matriz[, 1:i])): essentially perfect
## fit: summary may be unreliable
## [1] "Iteration 2"
## Warning in summary.lm(lm(matriz[, i + 1] ~ matriz[, 1:i])): essentially perfect
## fit: summary may be unreliable
## [1] "Iteration 3"
## Warning in summary.lm(lm(matriz[, i + 1] ~ matriz[, 1:i])): essentially perfect
## fit: summary may be unreliable
  res <- as.data.frame(t(p2dist$correction_factors))
  res.ord <- res[, names]
  weightB <- as.numeric(res.ord[1,])
  return(weightB)
## [1] 0.4610996 0.3792168 0.0000000 0.3684120 0.6155732 0.4497543 0.4833195
## [8] 1.0000000

Finally, we compute the weighting system as the product of the above weights.

weight <- weightA*weightB
return(weight)
## [1] 0.2568983 0.2329474 0.0000000 0.0859628 0.4385959 0.2923403 0.2029942
## [8] 0.4200000

After calculating the weights, we compute continuous domain score for each municipality as
\[e_{hi}=\frac{\sum_{j=1}^8\omega_{hj}e_{hji}}{\sum_{j=1}^8\omega_{hj}}\]

ehi <- numeric()
  for(i in 1:nrow(ehji)){
    ehi[i] =sum(weight*ehji[i,])/sum(weight)
  }
return(cbind(ehi))
##              ehi
##  [1,] 1.00000000
##  [2,] 0.52613538
##  [3,] 0.48809940
##  [4,] 0.33089910
##  [5,] 0.60429310
##  [6,] 0.18135264
##  [7,] 0.40819127
##  [8,] 0.08258232
##  [9,] 0.83126294
## [10,] 0.00000000

Degree a municipality ‘’i’’ is left behind

Now, we compute the degree a municipality is left behind. Defining how ‘behind’ a municipality is in each specific sub-domain involves assessing its relative position or situation in comparison with certain benchmarks or standards within that sub-domain. First of all, we define a new membershipL function which determines how much the sub-domain that exceed the corresponding municipality value contribute to the total sub-domain. This value can be interpreted as the proportion of the sub-domain contribution that exceeds the values in municipality relative to the total sub-domain.

  membershipL <- function(item,indiv){
    memb <- which(item>indiv)
    Ldistri <- sum(item[memb])/sum(item)
    return(Ldistri)
  }

Then, we can compute the degree the municipality is ‘left behind’ (Lbi)

xhi <- 1- ehi
Fhi <- numeric()
LFhi <- numeric()
  for (i in 1:length(xhi)){
    Fhi[i] <- membership(xhi[],xhi[i])
    LFhi[i] <- membershipL(xhi[],xhi[i])
  }
Lbi = LFhi - xhi/mean(xhi)*Fhi
return(data.frame(cbind(Fhi),cbind(LFhi),cbind(Lbi)))
##    Fhi      LFhi        Lbi
## 1  0.9 1.0000000 1.00000000
## 2  0.6 0.8128224 0.30027625
## 3  0.5 0.7205412 0.25913539
## 4  0.3 0.4932350 0.13137519
## 5  0.7 0.8982468 0.39890360
## 6  0.2 0.3456561 0.05049823
## 7  0.4 0.6138549 0.18710954
## 8  0.1 0.1802717 0.01488725
## 9  0.8 0.9695815 0.72623342
## 10 0.0 0.0000000 0.00000000

Results

As can be seen, those municipalities with more sub-domains taking the value 1, i.e. more transparent, have an indicator close to or equal to 1. Those municipalities with the majority of values equal to three have indicators close to 0.

results <- data.frame(datos[,1] ,datos,cbind(ehi),cbind(Lbi))
return(results)
##    datos...1. A1 A2 A3 A4 A5 A6 A7 A8        ehi        Lbi
## 1           1  1  1  1  1  1  1  1  1 1.00000000 1.00000000
## 2           2  2  1  1  1  3  3  1  1 0.52613538 0.30027625
## 3           3  3  1  1  1  3  3  1  1 0.48809940 0.25913539
## 4           1  1  3  3  3  3  2  2  3 0.33089910 0.13137519
## 5           2  2  3  3  1  2  3  1  1 0.60429310 0.39890360
## 6           2  2  3  3  2  3  2  3  3 0.18135264 0.05049823
## 7           2  2  3  3  1  3  1  3  2 0.40819127 0.18710954
## 8           2  2  3  3  1  3  3  3  3 0.08258232 0.01488725
## 9           1  1  2  2  1  1  3  1  1 0.83126294 0.72623342
## 10          3  3  3  3  3  3  3  3  3 0.00000000 0.00000000

Computing fuzzy indicators

Data

A simple example is used to show the fuzzy methodology for calculating composite indicators. We consider three hypothetical dimensions of ten observations.

Munici = c("Adsubia", "Agost", "Agres", "Aigues", "Albatera", "Alcalali", "Alcocer de Planes", "Alcoleja", "Alcoy/Alcoi", "Alfafara")
set.seed(123)
Dim1 <- runif(10,0,1)
Dim2 <- runif(10,0,1)
Dim3 <- runif(10,0,1)
Data_fuzzy <- data.frame(Munici,Dim1,Dim2,Dim3)
print(Data_fuzzy)
##               Munici      Dim1       Dim2      Dim3
## 1            Adsubia 0.2875775 0.95683335 0.8895393
## 2              Agost 0.7883051 0.45333416 0.6928034
## 3              Agres 0.4089769 0.67757064 0.6405068
## 4             Aigues 0.8830174 0.57263340 0.9942698
## 5           Albatera 0.9404673 0.10292468 0.6557058
## 6           Alcalali 0.0455565 0.89982497 0.7085305
## 7  Alcocer de Planes 0.5281055 0.24608773 0.5440660
## 8           Alcoleja 0.8924190 0.04205953 0.5941420
## 9        Alcoy/Alcoi 0.5514350 0.32792072 0.2891597
## 10          Alfafara 0.4566147 0.95450365 0.1471136

We want to compute the composite fuzzy indicator. The methodology can be found in Jiménez-Fernández et al. (2022). In this study, we assume a \(n\times m\)-dimension matrix where the columns represent the sub-domains or domains \(X_j\) \(j\in\{1,...,m\}\), and the rows of \(X\) are referred to the municipalities, in our example \(n=10\) and \(m=3\). Let \((X_j,M_j,\ast)\) be fuzzy metrics spaces \(j=\{1,...,m\}\) according to George and Veeramani (1994) over the same t-norm \(\ast\). We assume \(\ast\) as the usual product. We denote by \(x_{i}=(x_{i1},...,x_{im})\) the 3-dimension vector that groups all single indicators corresponding to the i-observation \(i\in\{1,...,10\}\). Supposing that \(x_{*}=(x_{*1},...,x_{*m})\) be a fictitious vector reference (target or baseline) that is composed by the results of a theoretical observation with the best-worst possible scenario for all the single indicators depending on the polarity. In this example, we suppose that the domains has positive polarity, namely, the higher the value in the domain, the higher the value in the transparency indicator. Therefore, the fictitious vector reference is \(x_{*}=(1,1,1)\).

The (TCFI) can be defined as \[\begin{equation} TCFI_i=TCFI(x_{*},x_{i})=\prod_{j=1}^m M_j(x_{*j},x_{ij}) \end{equation}\] where the fuzzy metric \(M_j\) is defined as follows \[\begin{equation} M_j(x_{*j},x_{ij})=\frac{k}{k+ d(x_{*j},x_{ij})} \end{equation}\] \(d\) is a euclidean distance and the parameter \(k\) a sensitivity constant.

We call the appropriate libraries to perform the algorithm to make composite fuzzy indicator.

library (earth)
## Loading required package: Formula
## Loading required package: plotmo
## Loading required package: plotrix
## Loading required package: TeachingDemos
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(rsample)
library(GGally)
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(corrgram)
## 
## Attaching package: 'corrgram'
## The following object is masked from 'package:GGally':
## 
##     baseball

Below, the show the algorithm to compute the fuzzy composite indicator

CompFuzzy <-function(x,polarity,err,iterations,degrees){
  if (!is.data.frame(x)) {
    warning("the argument 'x' would be a data.frame object")
    matriz <- as.matrix(x)
  }
  ################# Functions #########################################
  # Normalization compatible with respect to the polarity of indicators
  normalization <- function(x,polarity=NULL){
    names_var <- colnames(x)
    names_regions <- rownames(x)
    m <- dim(x)[2]
    columns <- c(1:m)
    pospol <- polarity
    if(!is.null(polarity)){
      negpol <- columns[-pospol]
    }else{0
      negpol <- NULL
    }
    
    
    normdata <- matrix(0,ncol = ncol(x),nrow = nrow(x))
    
    norm_minmax <- function(x){
      (x- min(x)) /(max(x)-min(x))
    }
    
    norm_maxmin <- function(x){
      (max(x)-x) /(max(x)-min(x))
    }
    
    for(j in pospol){
      normdata[,j] <- norm_minmax(x[,j])
    }
    
    for(j in negpol){
      normdata[,j] <- norm_maxmin(x[,j])
    }
    normdata <- as.data.frame(normdata)
    colnames(normdata) <-  names_var
    rownames(normdata) <- names_regions
    return(normdata)
  }
  ################ Metric ############################################
  
  # Initial Fuzzy metric computed when k_j=1
  calcularDistancia = function(x){
    X <- as.matrix(x)
    f_y <- (1)/(1+X)
    colnames(f_y)<-colnames(X)
    mI <- as.matrix(f_y)
    return(mI)
  }
  # Making the composite indicator 
  CompInd = function(x){
    n=nrow(x)
    m=ncol(x)
    DFS <- matrix(1,nrow=n,ncol = 1)
    for (j in 1:m){
      DFS = DFS*x[,j] 
    }
    colnames(DFS) <- "Frechet.Index"
    return(DFS)
  }
  # Computes de fuzzy metric for given parameters "Factores"
  
  FuzzyDistance = function(x,Factores){
    n <- dim(x)[1]
    m <- dim(x)[2]
    # Make the similarity fuzzy metric for each variable
    Sym_welfare <-matrix(0,nrow=n,ncol=m)
    for (j in 1:m){
      for (i in 1:n){ 
        Sym_welfare[i,j]=(Factores[j])/((Factores[j])+x[i,j])
      }
    }
    DFS <- CompInd(Sym_welfare)
    colnames(DFS) <- paste("fuzzydistance", iteracion, sep = ".")
    return(DFS)
  }
  ## Compute weights throught MARS and variable importance
  calculoFactoresPonderacion = function(x,Compind,degrees){
    m <- ncol(x)
    dat_os <- x
    #Compind <- df_ite
    #degrees=3
    x <- x %>% mutate(Cind = as.numeric(Compind))
    mymodel <- earth(Cind~ ., data = x, keepxy=TRUE, degree=degrees, nfold=5,trace=.5)
    p2 <- evimp(mymodel,trim=FALSE)
    Variable <- row.names(p2)
    weigths <- as.numeric(p2[,"gcv"])
    # Avoid that one weight can be 0
    mini <- min(weigths[weigths > 0])
    cero <- which(weigths==0)
    Variable[cero] <- sub("-unused", "", Variable[cero] )
    for(j in 1:m){
      if(is.na(weigths[j])) { print('Missing values has been found')}
      else
      {if(weigths[j]==0){
        weigths[j] <- mini/m
      }
      }
    }
    
    weigths <-  weigths/100
    res <- data.frame(t(weigths))
    colnames(res) <- Variable
    res.ord <- as.numeric(res[colnames(dat_os)])
    important <- list(res.ord,Variable)
    return(important)
  }
  
  ## Algorithm ##############################################################
  n <- dim(x)[2]
  resultados <- list()
  error.d <- numeric()  
  weights <- c(rep(1,ncol(x)))
  Weights <- matrix(nrow = 0, ncol = ncol(x))
  normdata <- normalization(x,polarity) # Normalize the data
  corrgr <- corrgram(normdata, order=TRUE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel=panel.txt, main="Correlation between each features present in the data")
  mDif <- calcularDistancia(normdata)  # We compute the distance matrix
  dps <- as.numeric(CompInd(mDif))
  iteracion <- 1
  
  
  repeat{
    print(paste("Iteration", iteracion))
    mFac <- calculoFactoresPonderacion(normdata,dps,degrees)
    Weights <- rbind(Weights, mFac[[1]])
    df_ite <- as.numeric(FuzzyDistance(normdata, mFac[[1]]))
    error <- sum((dps-df_ite)^2)
    error.d[iteracion] <- error
    if (( error < err) || (iteracion >= iterations)) {
      break
    }
    iteracion <- iteracion + 1
    dps <- df_ite
  }
  ## Outputs Summary
  resultados$Fuzzydistance <- df_ite
  resultados$Fuzzydistance_1 <- dps
  resultados$Correlation <- corrgr
  resultados$iteration <- iteracion
  resultados$Names.Single.Ind <- mFac[[2]]
  resultados$Single.Ind <- mFac[[1]]
  resultados$error <- error.d
  fuzzy_index <<- resultados
}

The Data_fuzzy indicators are used to compute the previous function

Data_num <- Data_fuzzy[,2:4]
degrees = 3;err <- 0.01;iterations <- 20
polarity <- 1:3
CFuzzy <- CompFuzzy(Data_num, polarity,err,iterations,degrees)

## [1] "Iteration 1"
## Model with pmethod="backward": GRSq 0.291 RSq 0.630 nterms 2
## CV fold 1  CVRSq -0.586   n.oof 8 20%   n.infold.nz 8 100%   n.oof.nz 2 100%
## CV fold 2  CVRSq -2.260   n.oof 8 20%   n.infold.nz 8 100%   n.oof.nz 2 100%
## CV fold 3  CVRSq  0.747   n.oof 8 20%   n.infold.nz 8 100%   n.oof.nz 2 100%
## CV fold 4  CVRSq -30.528   n.oof 8 20%   n.infold.nz 8 100%   n.oof.nz 2 100%
## CV fold 5  CVRSq  0.513   n.oof 8 20%   n.infold.nz 8 100%   n.oof.nz 2 100%
## CV all     CVRSq -6.423                 n.infold.nz 10 100%
## 
## [1] "Iteration 2"
## Model with pmethod="backward": GRSq 0.013 RSq 0.485 nterms 2
## CV fold 1  CVRSq -1.580   n.oof 8 20%   n.infold.nz 8 100%   n.oof.nz 2 100%
## CV fold 2  CVRSq -467.260   n.oof 8 20%   n.infold.nz 8 100%   n.oof.nz 2 100%
## CV fold 3  CVRSq -0.634   n.oof 8 20%   n.infold.nz 8 100%   n.oof.nz 2 100%
## CV fold 4  CVRSq -0.006   n.oof 8 20%   n.infold.nz 8 100%   n.oof.nz 2 100%
## CV fold 5  CVRSq -28.500   n.oof 8 20%   n.infold.nz 8 100%   n.oof.nz 2 100%
## CV all     CVRSq -99.596                 n.infold.nz 10 100%
Results <- data.frame(Munici,CFuzzy$Fuzzydistance)
print(Results)
##               Munici CFuzzy.Fuzzydistance
## 1            Adsubia           0.07355705
## 2              Agost           0.07420018
## 3              Agres           0.09236887
## 4             Aigues           0.04792775
## 5           Albatera           0.13022229
## 6           Alcalali           0.15772939
## 7  Alcocer de Planes           0.15585100
## 8           Alcoleja           0.17051290
## 9        Alcoy/Alcoi           0.16396263
## 10          Alfafara           0.10533216

The results show that the municipality of Alcalali with a value of 0.53664279 is the closest to 1, which represents the notion of transparency presented in this study. On the other hand, Adsubia, with a value of 0.06392467, represents the least transparent municipality as it is the closest to zero.