Authors

Hasbi Yasin (hasbiyasin@live.undip.ac.id)
Purhadi (purhadi@its.ac.id)
Achmad Choiruddin (choiruddin@its.ac.id)

Resource

The data used in this research are secondary data derived from the official website of BPS of Central Java Provinces Indonesia (https://jateng.bps.go.id/)

Response Variables

Y1 Mean of Years Schooling (MYS)
Y2 School Enrollment Rate (SER)
Y3 Gross Enrollment Rate (GER)

Predictor Variables

X1 GRDP per Capita
X2 Percentage of Households Accessing Proper Sanitation
X3 Teacher to Student Ratio

Library Packages

Before we start the analysis, here are several packages needed to carry out data analysis using Geographically Weighted Multivariate Generalized Gamma Regression (GWMGGR). However, we have to make sure that all the packages have been installed, so that we can call the packages as follows.

library(maxLik)
library(MASS)
library(GWmodel)
library(PerformanceAnalytics)
library(lmtest)
library(car)
library(DT)
library(pracma)
set.seed(123)
options(warn = -1, max.cols = 15, max.rows = 200, max.print = 200)

#Call syntax of related functions
source('List Function UGGD.R')
source('List Function UGGR.R')
source('List Function GWUGGR.R')
source('List Function MGGD.R')
source('List Function GWMGGD.R')
source('List Function Asumption Test.R')
source('List Function MGGR.R')
source('List Function GWMGGR.R')

Data Exploration

Descriptive Statistics

#load data
mydata = read.csv('Education Indicator.csv')
datatable(mydata, caption = "Education Indicator in Central Java 2017-2021")

y = mydata[,c(1:3)]
X = mydata[,c(4:6)]

deskripsi = function(y)
{
  desk = cbind(apply(y,2,mean),
               sqrt(apply(y,2,var)),
               apply(y,2,min),
               apply(y,2,max))
  colnames(desk) = c("Mean","Std.Dev","Min","Max")
  rownames(desk) = c(variable.names(y))
  cat("\n-------------------------------------------------\n")
  cat("Descriptive Statistics\n")
  cat("-------------------------------------------------\n")
  print(desk)
  cat("-------------------------------------------------\n\n")
  return(desk)
}
newdata = cbind(y,X)
desk = deskripsi(newdata)

## 
## -------------------------------------------------
## Descriptive Statistics
## -------------------------------------------------
##          Mean   Std.Dev   Min    Max
## MYS  7.772857  1.211731  6.18  10.90
## SER 71.500971  9.011441 49.56  91.39
## GER 85.734171 14.940557 52.98 121.91
## X1  28.385600 17.966137 12.37  87.36
## X2  77.954057 17.292396  9.24  98.07
## X3  18.062857  2.934445 13.00  39.00
## -------------------------------------------------

Scatter Plot and Correlation Matrix

# Correlation panel
panel.cor <- function(x, y){
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  r <- round(cor(x, y), digits=2)
  txt <- paste0("R = ", r)
  text(0.5, 0.5, txt,cex=1.5)
}
# Customize upper panel
upper.panel<-function(x, y){
  points(x,y, pch = 10)
}
# Create the plots
chart.Correlation(cbind(y,X), histogram=TRUE, pch=19)

Fitting Distribution of Response Variables

est.y = est.mggd(y)
par.y = est.y$parameter
mggd.test = mg4gamma.test(y,par.y)

## 
## Mean Generalized Gamma:
##          MYS      SER      GER
## [1,] 8.04296 69.35407 83.11049
## 
## Variance Generalized Gamma:
##          MYS        SER        GER
## MYS 1.118356   1.118356   1.118356
## SER 1.118356 117.848115 117.848115
## GER 1.118356 117.848115 267.201003

## 
## Kolmogorov-Smirnov Test:
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  dy
## D = 0.059668, p-value = 0.5616
## alternative hypothesis: two-sided

The MGG distribution completed evaluation by the Kolmogorov-Smirnov (KS) test. The null hypothesis proposes that the observed data conforms to the MGG distribution, whereas the alternative hypothesis proposes that the observed data diverges from the MGG distribution. The KS test produced a test statistic of 0.059668 with p-value of 0.5616. These results lead us to the conclusion that there is no compelling evidence to reject the null hypothesis. In simpler terms, it means that the response variables are well-suited to follow the Multivariate Generalized Gamma (MGG) distribution.

Correlation Test using Bartlett’s Test

Bartlett’s test of sphericity tests the hypothesis that your correlation matrix is an identity matrix, which would indicate that your variables are unrelated and therefore unsuitable for structure detection.

Bartlett.test(y)

## Correlation Test using Bartlett's Test: 
## Test Statistic: 169.933 Sig.: 0

## $Stat
## [1] 169.933   0.000
## 
## $R
##           MYS       SER       GER
## MYS 1.0000000 0.6428344 0.4508070
## SER 0.6428344 1.0000000 0.5942678
## GER 0.4508070 0.5942678 1.0000000
## 
## $Var
##          MYS       SER       GER
## MYS 1.468292  7.019392   8.16138
## SER 7.019392 81.206062  80.00980
## GER 8.161380 80.009802 223.22025

Based on the correlation matrix, the test statistic R is 169.933 with a p-value of 0.000. With a significance level of 5%, it can be concluded that the three response variables are significantly correlated, making them suitable for analysis with a multivariate regression approach.

Multicollinearity Diagnostic

y1 = mydata[,1]
mydata1 = cbind(y=y1,X)
reg = lm(y~., data = mydata1)
vif(reg)

##       X1       X2       X3 
## 1.149863 1.146796 1.003653

At this step, we will identify whether or not there is a case of multicollinearity between the predictors. Based on the VIF value, it can be seen that all predictors have VIF values less than 10. This indicates that there is no case of multicollinearity between the predictor variables.

MGGR Model

mymggr = mggr(y,X)

## 
## -------------------------------------------------------------------
##         Multivariate Generalized Gamma Regression Model
## -------------------------------------------------------------------
## printed at: 02 May 2024 14:45:56 
## created by: Hasbi Yasin
## 
## --- Step 1: MGGD Parameter Estimation ... !!!
## 
## Parameter of MGGR (Null Model):
##                 MYS        SER        GER
## Intercept 2.0847971  4.2392248  4.4201710
## lambda    0.4395209  0.4395209  0.4395209
## tau       3.1446516  3.1446516  3.1446516
## delta     6.1490221 41.9617592 -8.1303801
## 
## Log-likelihood of MGGR (Null Model)  : -1579.397 
## Deviance of MGGR (Null Model)        : 3158.794 
## 
## --- Step 2: MGGR Parameter Estimation ... !!!
## 
## Multivariate Generalized Gamma Regression
## Created by: Hasbi Yasin
## ---
## Parameter of Response: MYS 
##               Estimate    Std.Error     Z-Value Pr(>|Z|)      Odd
## Intercept  1.810801150 1.519425e-10 11917674909        0 6.115345
## X1         0.004362468 2.409808e-09     1810297        0 1.004372
## X2         0.002786232 1.762533e-09     1580811        0 1.002790
## X3        -0.005867199 2.431080e-09    -2413413        0 0.994150
## ---
## Parameter of Response: SER 
##               Estimate    Std.Error       Z-Value Pr(>|Z|)       Odd
## Intercept  4.155684767 4.452319e-19  9.333754e+18        0 63.795635
## X1         0.001107728 7.202675e-18  1.537940e+14        0  1.001108
## X2         0.002691379 5.690612e-18  4.729508e+14        0  1.002695
## X3        -0.009458597 7.161862e-18 -1.320690e+15        0  0.990586
## ---
## Parameter of Response: GER 
##               Estimate    Std.Error       Z-Value Pr(>|Z|)        Odd
## Intercept  4.106941116 1.114477e-16  3.685083e+16        0 60.7605735
## X1         0.001297714 1.767706e-15  7.341233e+11        0  1.0012986
## X2         0.004671501 1.293177e-15  3.612421e+12        0  1.0046824
## X3        -0.005839441 1.783214e-15 -3.274672e+12        0  0.9941776
## ---
## Parameter of Multivariate Generalized Gamma Distribution:
##              MYS        SER        GER
## lambda 0.4463974  0.4463974  0.4463974
## tau    3.1458614  3.1458614  3.1458614
## delta  6.1490221 41.9617592 -8.1303801
## 
## Log-likelihood of MGGR  : -1518.394 
## Deviance of MGGR        : 3036.787 
## 
##     Null Deviance: 3158.794 on 172 degrees of freedom
## Residual Deviance: 3036.787 on 163 degrees of freedom
## G2     : 122.0063     Sig.: 0 
## RSq_LR : 0.5020116 (based on Likelihood Ratio)
## RSg_MF : 0.03862432 (McFadden’s R-Squared)
## AIC    : 3060.787 
## 
## Number of BHHH iterations: 16 
## Required Time:  56.96347 secs

First, we use the MGGR model to find which predictor variables significantly predict the response variable. If we use a significant level of 5%, it can be concluded that all predictors significantly affect to the all responses. The model’s significance can be assessed concurrently using Wilk’s likelihood ratio statistics derived from the MLRT. The calculated test statistic is 122.0063 with a p-value of 0.0000. It implies that all predictors collectively influence the response variables considerably.

Spatial Heterogeneity Test

This hypothesis testing is intended to test whether spatial factors have a significant influence or not. Formally, spatial heterogeneity can be detected by testing the similarity of the covariance matrix at each location (Johnson and Wichern, 2007).

e = mymggr$Residuals #Residual of MGGR Model
heterogen.test = glejser.test(e,X)

## Spatial Heterogeneity Diagnostic
## Test Statistic G2: 1131.981 Sig.: 0

The results of testing spatial heterogeneity obtained a test statistic value of 1131.981 with a p-value of 0.0000, so it can be concluded that there is spatial heterogeneity in educational indicators model. Therefore, we improved the model by using the GWMGGR model.

GWMGGR Model

This study proposes the GWMGGR model, which is based on information on the distribution of response variables following the MGG distribution and accommodates spatial effects that affect the relationship mechanism of response variables and predictor variables. This model was applied with the aim of obtaining factors that affect MYS, SER, and GER locally in each district/city in Central Java. Modelling is done by involving the geographical coordinates of the central location of the district/city in the parameter estimation. One important factor in the GWMGGR model is the bandwidth used in forming the spatial weight matrix.

mydata.cord = cbind(mydata$u,mydata$v)
#input bandwidth for each observation period
hS = rep(0.25,5) #for 2017 - 2021
gwmggr.res = gwmggr(y,X,mydata.cord,period=5,hS,kernel.type ="gaussian")

## 
## -------------------------------------------------------------------
##    Geographically Weighted Multivariate Generalized Gamma Regression   
## -------------------------------------------------------------------
## printed at: 02 May 2024 14:46:54 
## created by: Hasbi Yasin
## 
## --- Step 1: MGGD parameter Estimation... !!!
## 
## Parameter of MGGR (Null Model):
##                 MYS        SER        GER
## Intercept 2.0847971  4.2392248  4.4201710
## lambda    0.4395209  0.4395209  0.4395209
## tau       3.1446516  3.1446516  3.1446516
## delta     6.1490221 41.9617592 -8.1303801
## 
## Log-likelihood of MGGR (Null Model)  : -1579.397 
## Deviance of MGGR (Null Model)        : 3158.794 
## 
## --- Step 2: MGGR Parameter Estimation ... !!!
##                    MYS          SER          GER
## Intercept  1.810801150  4.155684767  4.106941116
## X1         0.004362468  0.001107728  0.001297714
## X2         0.002786232  0.002691379  0.004671501
## X3        -0.005867199 -0.009458597 -0.005839441
## lambda     0.446397446  0.446397446  0.446397446
## tau        3.145861351  3.145861351  3.145861351
## delta      6.149022123 41.961759182 -8.130380125
## 
## Log-likelihood of MGGR: -1518.394 
## Deviance of MGGR      : 3036.787 
## 
## --- Step 3: GWMGGR (Null Model) Parameter Estimation ... !!!
## Parameter Summary of GWMGGR (Null Model):
## Response Variable: MYS 
##         Intercept    lambda      tau    delta
## Min.     1.959137 0.4395209 2.425575 6.149022
## 1st Qu.  2.000038 0.4395209 3.000752 6.149022
## Median   2.074728 0.4395209 3.714247 6.149022
## Mean     2.076298 0.4395209 3.824351 6.149022
## 3rd Qu.  2.159558 0.4395209 4.236676 6.149022
## Max.     2.170550 0.4395209 8.278198 6.149022
## ---
## Response Variable: SER 
##         Intercept    lambda      tau    delta
## Min.     4.127663 0.4395209 2.425575 41.96176
## 1st Qu.  4.160718 0.4395209 3.000752 41.96176
## Median   4.246823 0.4395209 3.714247 41.96176
## Mean     4.229147 0.4395209 3.824351 41.96176
## 3rd Qu.  4.300312 0.4395209 4.236676 41.96176
## Max.     4.321556 0.4395209 8.278198 41.96176
## ---
## Response Variable: GER 
##         Intercept    lambda      tau    delta
## Min.     4.262901 0.4395209 2.425575 -8.13038
## 1st Qu.  4.356873 0.4395209 3.000752 -8.13038
## Median   4.446300 0.4395209 3.714247 -8.13038
## Mean     4.415245 0.4395209 3.824351 -8.13038
## 3rd Qu.  4.477769 0.4395209 4.236676 -8.13038
## Max.     4.504832 0.4395209 8.278198 -8.13038
## ---
## 
## Log-likelihood of GWMGGR (Null Model): -1438.645 
## Deviance of GWMGGR (Null Model)      : 2877.291 
## 
## --- Step 4: GWMGGR Parameter Estimation ... !!!
## Error in qr.solve(H[!fixed, !fixed, drop = FALSE], G0[!fixed], tol = slot(control,  : 
##   singular matrix 'a' in solve
## Error in qr.solve(H[!fixed, !fixed, drop = FALSE], G0[!fixed], tol = slot(control,  : 
##   singular matrix 'a' in solve
## Error in qr.solve(H[!fixed, !fixed, drop = FALSE], G0[!fixed], tol = slot(control,  : 
##   singular matrix 'a' in solve
## Error in qr.solve(H[!fixed, !fixed, drop = FALSE], G0[!fixed], tol = slot(control,  : 
##   singular matrix 'a' in solve
## 
## 
## Geographically Weighted Multivariate Generalized Gamma Regression
## Created by: Hasbi Yasin
## 
## Global Coefficients:
##                    MYS          SER          GER
## Intercept  1.810801150  4.155684767  4.106941116
## X1         0.004362468  0.001107728  0.001297714
## X2         0.002786232  0.002691379  0.004671501
## X3        -0.005867199 -0.009458597 -0.005839441
## lambda     0.446397446  0.446397446  0.446397446
## tau        3.145861351  3.145861351  3.145861351
## delta      6.149022123 41.961759182 -8.130380125
## ---
## Parameter Summary of GWMGGR Model:
## Response Variable: MYS 
##         Intercept           X1            X2            X3
## Min.     1.477809 -0.004687369 -0.0004859595 -0.0104864495
## 1st Qu.  1.703308  0.004410380  0.0018372113 -0.0061723060
## Median   1.742911  0.005447649  0.0022660991 -0.0008191737
## Mean     1.742306  0.005528801  0.0021414057 -0.0012959183
## 3rd Qu.  1.810728  0.006561399  0.0027638974  0.0010960094
## Max.     1.940631  0.014184488  0.0033344901  0.0134441957
## ---
## Response Variable: SER 
##         Intercept            X1            X2           X3
## Min.     3.631474 -0.0006215636 -0.0013312772 -0.021871201
## 1st Qu.  4.116476  0.0009320177  0.0009784528 -0.009521185
## Median   4.155845  0.0014761832  0.0023514271 -0.008309675
## Mean     4.118853  0.0024228519  0.0021249895 -0.006532755
## 3rd Qu.  4.181724  0.0029069221  0.0029691878 -0.005683274
## Max.     4.350242  0.0095493090  0.0073674222  0.019607599
## ---
## Response Variable: GER 
##         Intercept            X1           X2           X3
## Min.     3.624337 -0.0015551856 -0.003494421 -0.022957520
## 1st Qu.  4.106884  0.0006773448  0.002248967 -0.013823296
## Median   4.134112  0.0012823773  0.004877076 -0.008134137
## Mean     4.202760  0.0025275725  0.003697012 -0.008314919
## 3rd Qu.  4.278730  0.0033876642  0.005473420 -0.005828193
## Max.     4.917275  0.0128192444  0.007425980  0.019881558
## ---
## 
## Log-likelihood of GWMGGR (Full Model) : -1376.621 
## Deviance of GWMGGR (Full Model)       : 2753.242 
## Number of Effective Parameters        : 86.54622 
## 
## G2 MGGR   : 122.0063 on 9 degrees of freedom
## G2 GWMGGR : 124.049 on 86.54622 degrees of freedom    Sig.: 0.005110405 
## FStat     : 9.457895      Sig.: 6.852746e-10 
## Bandwidth : 0.25 0.25 0.25 0.25 0.25 
## RSq_LR    : 0.9014744 (based on Likelihood Ratio)
## RSg_MF    : 0.1283882 (McFadden’s R-Squared)
## AIC       : 3014.788 
## 
## Required Time:  25.10843 mins

Save Output of GWMGGR Model

Save the GWMGGR model output into several csv files to facilitate further analysis.

# Save Output of GWMGGR Model 
write.csv(gwmggr.res$Parameter.GWMGGR,"GWMGGR Parameters.csv",
          row.names = FALSE)
write.csv(gwmggr.res$Beta,"Coefficients of GWMGGR Model.csv",
          row.names = FALSE)
write.csv(gwmggr.res$SE.beta,"Standard Error of GWMGGR Coefficients.csv",
          row.names = FALSE)
write.csv(gwmggr.res$Zhit,"Z-Stats of GWMGGR Model.csv",
          row.names = FALSE)
write.csv(gwmggr.res$Sig.beta,"p-value of of GWMGGR Coefficients.csv",
          row.names = FALSE)

Spatial Clustering

Based on the regression parameters of the GWMGGR model at each location, we then group districts/cities that have the same spatial effects. The clustering process is based on the regression coefficient of each predictor on each response variable. Therefore, in this study we use the k-means clustering method in the hope that we will obtain regions that have homogeneous characteristics.

Find optimal number of cluster

library(rgdal) 
library(tidyverse) 
library(tmap) 
library(tmaptools) 
library(ggplot2) 
library(cartogram) 
library(geogrid) 
library(geosphere) 
library(broom)
library(sf)
library(cluster)
library(factoextra)
options(warn = -1)
#Read polygon data####
tun_shp <- sf::st_read("central_java.shp")

## Reading layer `central_java' from data source 
##   `C:\DISERTASI\Artikel Publikasi Disertasi\MethodsX\central_java.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 35 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 108.5554 ymin: -8.212804 xmax: 111.6933 ymax: -5.727798
## CRS:           NA

#Spatial Clustering based on Coefficients of GWMGGR using k-means####
distric = read.csv('location_list.csv')
Beta = gwmggr.res$Beta
rownames(Beta) = c(distric$Name)
colnames(Beta) = c("b01","b11","b21","b31","b02","b12","b22","b32",
                    "b03","b13","b23","b33")

Coeff = scale(Beta)
#find the optimal cluster using silhouette Methods
fviz_nbclust(Coeff, kmeans, method = "silhouette") #silhouette methods

#print silhoutte score
fviz_nbclust(Coeff, kmeans, method = "silhouette")$data

##    clusters         y
## 1         1 0.0000000
## 2         2 0.2427202
## 3         3 0.3402803
## 4         4 0.2321499
## 5         5 0.2999332
## 6         6 0.2289220
## 7         7 0.2340544
## 8         8 0.2438737
## 9         9 0.2524587
## 10       10 0.2380909

k-means Clustering

After we find out the optimal number of clusters, we can perform k-means clustering. By default, the kmeans() call will minimise the Euclidean distance between data points to define clusters.

#clustering using k-means methods
spat.clust = kmeans(Coeff, 3, nstart = 30)
#size of each cluster
spat.clust$size

## [1]  4 25  6

#member of each cluster
spat.clust$cluster

##         Cilacap        Banyumas     Purbalingga    Banjarnegara         Kebumen 
##               3               2               2               2               2 
##       Purworejo        Wonosobo        Magelang        Boyolali          Klaten 
##               2               2               2               2               2 
##       Sukoharjo        Wonogiri     Karanganyar          Sragen        Grobogan 
##               2               2               2               2               2 
##           Blora         Rembang            Pati           Kudus          Jepara 
##               3               3               3               3               2 
##           Demak        Semarang      Temanggung          Kendal          Batang 
##               2               2               2               2               2 
##      Pekalongan        Pemalang           Tegal          Brebes   Magelang City 
##               2               1               1               3               2 
##  Surakarta City   Salatiga City   Semarang City Pekalongan City      Tegal City 
##               2               2               2               1               1

#Cluster plot
fviz_cluster(spat.clust, data = Coeff)

#Profiling the cluster (center of each cluster)
center = as.data.frame(Beta) %>%
  mutate(Cluster = spat.clust$cluster) %>%
  group_by(Cluster) %>%
  summarise_all("mean")

datatable(center, caption = "Center of the cluster")

#Add column to shp file###
tun_shp$Spat_Cluster = as.factor(spat.clust$cluster)
tun_shp$location = distric$Name
names(tun_shp)

## [1] "KODE_KAB"     "NAMA_KAB"     "KODE_PROP"    "NAMA_PROP"    "JKB"         
## [6] "geometry"     "Spat_Cluster" "location"

Mapping the spatial cluster based on GWMGGR Parameters

The map shows that districts/municipalities in Central Java are grouped into three clusters, each with unique characteristics. Thus, these results indicate that to improve the achievement of the three education indicators, special measures are needed according to the characteristics of each cluster of districts/cities.

#Spatial Cluster based on GWMGGR parameters ####
tm_shape(tun_shp) + 
  tm_polygons("Spat_Cluster", palette = "viridis", style = "fixed",
              border.col = "black",legend.is.portrait = TRUE, 
              title = "Spatial cluster based on GWMGGR") +
  tm_text("location",size = "AREA", scale=.5) +
  tm_layout(frame = FALSE)

Spatial Clustering based on Geographically Weighted Multivariate Generalized Gamma Regression for Mapping Multidimensional Education Indicators in Central Java, Indonesia