The data used in this research are secondary data derived from the official website of BPS of Central Java Provinces Indonesia (https://jateng.bps.go.id/)
Y1 Mean of Years Schooling (MYS)
Y2 School Enrollment Rate (SER)
Y3 Gross Enrollment Rate (GER)
X1 GRDP per Capita
X2 Percentage of Households Accessing Proper
Sanitation
X3 Teacher to Student Ratio
Before we start the analysis, here are several packages needed to
carry out data analysis using Geographically Weighted Multivariate
Generalized Gamma Regression (GWMGGR). However, we have to make sure
that all the packages have been installed, so that we can call the
packages as follows.
library(maxLik)
library(MASS)
library(GWmodel)
library(PerformanceAnalytics)
library(lmtest)
library(car)
library(DT)
library(pracma)
set.seed(123)
options(warn = -1, max.cols = 15, max.rows = 200, max.print = 200)
#Call syntax of related functions
source('List Function UGGD.R')
source('List Function UGGR.R')
source('List Function GWUGGR.R')
source('List Function MGGD.R')
source('List Function GWMGGD.R')
source('List Function Asumption Test.R')
source('List Function MGGR.R')
source('List Function GWMGGR.R')
#load data
mydata = read.csv('Education Indicator.csv')
datatable(mydata, caption = "Education Indicator in Central Java 2017-2021")
y = mydata[,c(1:3)]
X = mydata[,c(4:6)]
deskripsi = function(y)
{
desk = cbind(apply(y,2,mean),
sqrt(apply(y,2,var)),
apply(y,2,min),
apply(y,2,max))
colnames(desk) = c("Mean","Std.Dev","Min","Max")
rownames(desk) = c(variable.names(y))
cat("\n-------------------------------------------------\n")
cat("Descriptive Statistics\n")
cat("-------------------------------------------------\n")
print(desk)
cat("-------------------------------------------------\n\n")
return(desk)
}
newdata = cbind(y,X)
desk = deskripsi(newdata)
##
## -------------------------------------------------
## Descriptive Statistics
## -------------------------------------------------
## Mean Std.Dev Min Max
## MYS 7.772857 1.211731 6.18 10.90
## SER 71.500971 9.011441 49.56 91.39
## GER 85.734171 14.940557 52.98 121.91
## X1 28.385600 17.966137 12.37 87.36
## X2 77.954057 17.292396 9.24 98.07
## X3 18.062857 2.934445 13.00 39.00
## -------------------------------------------------
# Correlation panel
panel.cor <- function(x, y){
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- round(cor(x, y), digits=2)
txt <- paste0("R = ", r)
text(0.5, 0.5, txt,cex=1.5)
}
# Customize upper panel
upper.panel<-function(x, y){
points(x,y, pch = 10)
}
# Create the plots
chart.Correlation(cbind(y,X), histogram=TRUE, pch=19)
est.y = est.mggd(y)
par.y = est.y$parameter
mggd.test = mg4gamma.test(y,par.y)
##
## Mean Generalized Gamma:
## MYS SER GER
## [1,] 8.04296 69.35407 83.11049
##
## Variance Generalized Gamma:
## MYS SER GER
## MYS 1.118356 1.118356 1.118356
## SER 1.118356 117.848115 117.848115
## GER 1.118356 117.848115 267.201003
##
## Kolmogorov-Smirnov Test:
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: dy
## D = 0.059668, p-value = 0.5616
## alternative hypothesis: two-sided
The MGG distribution completed evaluation by the Kolmogorov-Smirnov (KS) test. The null hypothesis proposes that the observed data conforms to the MGG distribution, whereas the alternative hypothesis proposes that the observed data diverges from the MGG distribution. The KS test produced a test statistic of 0.059668 with p-value of 0.5616. These results lead us to the conclusion that there is no compelling evidence to reject the null hypothesis. In simpler terms, it means that the response variables are well-suited to follow the Multivariate Generalized Gamma (MGG) distribution.
Bartlett’s test of sphericity tests the hypothesis that your correlation matrix is an identity matrix, which would indicate that your variables are unrelated and therefore unsuitable for structure detection.
Bartlett.test(y)
## Correlation Test using Bartlett's Test:
## Test Statistic: 169.933 Sig.: 0
## $Stat
## [1] 169.933 0.000
##
## $R
## MYS SER GER
## MYS 1.0000000 0.6428344 0.4508070
## SER 0.6428344 1.0000000 0.5942678
## GER 0.4508070 0.5942678 1.0000000
##
## $Var
## MYS SER GER
## MYS 1.468292 7.019392 8.16138
## SER 7.019392 81.206062 80.00980
## GER 8.161380 80.009802 223.22025
Based on the correlation matrix, the test statistic R is 169.933 with a p-value of 0.000. With a significance level of 5%, it can be concluded that the three response variables are significantly correlated, making them suitable for analysis with a multivariate regression approach.
y1 = mydata[,1]
mydata1 = cbind(y=y1,X)
reg = lm(y~., data = mydata1)
vif(reg)
## X1 X2 X3
## 1.149863 1.146796 1.003653
At this step, we will identify whether or not there is a case of
multicollinearity between the predictors. Based on the VIF
value, it can be seen that all predictors have VIF values less than 10.
This indicates that there is no case of multicollinearity between the
predictor variables.
mymggr = mggr(y,X)
##
## -------------------------------------------------------------------
## Multivariate Generalized Gamma Regression Model
## -------------------------------------------------------------------
## printed at: 02 May 2024 14:45:56
## created by: Hasbi Yasin
##
## --- Step 1: MGGD Parameter Estimation ... !!!
##
## Parameter of MGGR (Null Model):
## MYS SER GER
## Intercept 2.0847971 4.2392248 4.4201710
## lambda 0.4395209 0.4395209 0.4395209
## tau 3.1446516 3.1446516 3.1446516
## delta 6.1490221 41.9617592 -8.1303801
##
## Log-likelihood of MGGR (Null Model) : -1579.397
## Deviance of MGGR (Null Model) : 3158.794
##
## --- Step 2: MGGR Parameter Estimation ... !!!
##
## Multivariate Generalized Gamma Regression
## Created by: Hasbi Yasin
## ---
## Parameter of Response: MYS
## Estimate Std.Error Z-Value Pr(>|Z|) Odd
## Intercept 1.810801150 1.519425e-10 11917674909 0 6.115345
## X1 0.004362468 2.409808e-09 1810297 0 1.004372
## X2 0.002786232 1.762533e-09 1580811 0 1.002790
## X3 -0.005867199 2.431080e-09 -2413413 0 0.994150
## ---
## Parameter of Response: SER
## Estimate Std.Error Z-Value Pr(>|Z|) Odd
## Intercept 4.155684767 4.452319e-19 9.333754e+18 0 63.795635
## X1 0.001107728 7.202675e-18 1.537940e+14 0 1.001108
## X2 0.002691379 5.690612e-18 4.729508e+14 0 1.002695
## X3 -0.009458597 7.161862e-18 -1.320690e+15 0 0.990586
## ---
## Parameter of Response: GER
## Estimate Std.Error Z-Value Pr(>|Z|) Odd
## Intercept 4.106941116 1.114477e-16 3.685083e+16 0 60.7605735
## X1 0.001297714 1.767706e-15 7.341233e+11 0 1.0012986
## X2 0.004671501 1.293177e-15 3.612421e+12 0 1.0046824
## X3 -0.005839441 1.783214e-15 -3.274672e+12 0 0.9941776
## ---
## Parameter of Multivariate Generalized Gamma Distribution:
## MYS SER GER
## lambda 0.4463974 0.4463974 0.4463974
## tau 3.1458614 3.1458614 3.1458614
## delta 6.1490221 41.9617592 -8.1303801
##
## Log-likelihood of MGGR : -1518.394
## Deviance of MGGR : 3036.787
##
## Null Deviance: 3158.794 on 172 degrees of freedom
## Residual Deviance: 3036.787 on 163 degrees of freedom
## G2 : 122.0063 Sig.: 0
## RSq_LR : 0.5020116 (based on Likelihood Ratio)
## RSg_MF : 0.03862432 (McFadden’s R-Squared)
## AIC : 3060.787
##
## Number of BHHH iterations: 16
## Required Time: 56.96347 secs
First, we use the MGGR model to find which predictor variables
significantly predict the response variable. If we use a significant
level of 5%, it can be concluded that all predictors significantly
affect to the all responses. The model’s significance can be assessed
concurrently using Wilk’s likelihood ratio statistics derived from the
MLRT. The calculated test statistic is 122.0063 with a p-value of
0.0000. It implies that all predictors collectively influence the
response variables considerably.
This hypothesis testing is intended to test whether spatial factors
have a significant influence or not. Formally, spatial heterogeneity can
be detected by testing the similarity of the covariance matrix at each
location (Johnson and Wichern, 2007).
e = mymggr$Residuals #Residual of MGGR Model
heterogen.test = glejser.test(e,X)
## Spatial Heterogeneity Diagnostic
## Test Statistic G2: 1131.981 Sig.: 0
The results of testing spatial heterogeneity obtained a test
statistic value of 1131.981 with a p-value of 0.0000, so it can be
concluded that there is spatial heterogeneity in educational indicators
model. Therefore, we improved the model by using the GWMGGR model.
This study proposes the GWMGGR model, which is based on information
on the distribution of response variables following the MGG distribution
and accommodates spatial effects that affect the relationship mechanism
of response variables and predictor variables. This model was applied
with the aim of obtaining factors that affect MYS, SER, and GER locally
in each district/city in Central Java. Modelling is done by involving
the geographical coordinates of the central location of the
district/city in the parameter estimation. One important factor in the
GWMGGR model is the bandwidth used in forming the spatial weight
matrix.
mydata.cord = cbind(mydata$u,mydata$v)
#input bandwidth for each observation period
hS = rep(0.25,5) #for 2017 - 2021
gwmggr.res = gwmggr(y,X,mydata.cord,period=5,hS,kernel.type ="gaussian")
##
## -------------------------------------------------------------------
## Geographically Weighted Multivariate Generalized Gamma Regression
## -------------------------------------------------------------------
## printed at: 02 May 2024 14:46:54
## created by: Hasbi Yasin
##
## --- Step 1: MGGD parameter Estimation... !!!
##
## Parameter of MGGR (Null Model):
## MYS SER GER
## Intercept 2.0847971 4.2392248 4.4201710
## lambda 0.4395209 0.4395209 0.4395209
## tau 3.1446516 3.1446516 3.1446516
## delta 6.1490221 41.9617592 -8.1303801
##
## Log-likelihood of MGGR (Null Model) : -1579.397
## Deviance of MGGR (Null Model) : 3158.794
##
## --- Step 2: MGGR Parameter Estimation ... !!!
## MYS SER GER
## Intercept 1.810801150 4.155684767 4.106941116
## X1 0.004362468 0.001107728 0.001297714
## X2 0.002786232 0.002691379 0.004671501
## X3 -0.005867199 -0.009458597 -0.005839441
## lambda 0.446397446 0.446397446 0.446397446
## tau 3.145861351 3.145861351 3.145861351
## delta 6.149022123 41.961759182 -8.130380125
##
## Log-likelihood of MGGR: -1518.394
## Deviance of MGGR : 3036.787
##
## --- Step 3: GWMGGR (Null Model) Parameter Estimation ... !!!
## Parameter Summary of GWMGGR (Null Model):
## Response Variable: MYS
## Intercept lambda tau delta
## Min. 1.959137 0.4395209 2.425575 6.149022
## 1st Qu. 2.000038 0.4395209 3.000752 6.149022
## Median 2.074728 0.4395209 3.714247 6.149022
## Mean 2.076298 0.4395209 3.824351 6.149022
## 3rd Qu. 2.159558 0.4395209 4.236676 6.149022
## Max. 2.170550 0.4395209 8.278198 6.149022
## ---
## Response Variable: SER
## Intercept lambda tau delta
## Min. 4.127663 0.4395209 2.425575 41.96176
## 1st Qu. 4.160718 0.4395209 3.000752 41.96176
## Median 4.246823 0.4395209 3.714247 41.96176
## Mean 4.229147 0.4395209 3.824351 41.96176
## 3rd Qu. 4.300312 0.4395209 4.236676 41.96176
## Max. 4.321556 0.4395209 8.278198 41.96176
## ---
## Response Variable: GER
## Intercept lambda tau delta
## Min. 4.262901 0.4395209 2.425575 -8.13038
## 1st Qu. 4.356873 0.4395209 3.000752 -8.13038
## Median 4.446300 0.4395209 3.714247 -8.13038
## Mean 4.415245 0.4395209 3.824351 -8.13038
## 3rd Qu. 4.477769 0.4395209 4.236676 -8.13038
## Max. 4.504832 0.4395209 8.278198 -8.13038
## ---
##
## Log-likelihood of GWMGGR (Null Model): -1438.645
## Deviance of GWMGGR (Null Model) : 2877.291
##
## --- Step 4: GWMGGR Parameter Estimation ... !!!
## Error in qr.solve(H[!fixed, !fixed, drop = FALSE], G0[!fixed], tol = slot(control, :
## singular matrix 'a' in solve
## Error in qr.solve(H[!fixed, !fixed, drop = FALSE], G0[!fixed], tol = slot(control, :
## singular matrix 'a' in solve
## Error in qr.solve(H[!fixed, !fixed, drop = FALSE], G0[!fixed], tol = slot(control, :
## singular matrix 'a' in solve
## Error in qr.solve(H[!fixed, !fixed, drop = FALSE], G0[!fixed], tol = slot(control, :
## singular matrix 'a' in solve
##
##
## Geographically Weighted Multivariate Generalized Gamma Regression
## Created by: Hasbi Yasin
##
## Global Coefficients:
## MYS SER GER
## Intercept 1.810801150 4.155684767 4.106941116
## X1 0.004362468 0.001107728 0.001297714
## X2 0.002786232 0.002691379 0.004671501
## X3 -0.005867199 -0.009458597 -0.005839441
## lambda 0.446397446 0.446397446 0.446397446
## tau 3.145861351 3.145861351 3.145861351
## delta 6.149022123 41.961759182 -8.130380125
## ---
## Parameter Summary of GWMGGR Model:
## Response Variable: MYS
## Intercept X1 X2 X3
## Min. 1.477809 -0.004687369 -0.0004859595 -0.0104864495
## 1st Qu. 1.703308 0.004410380 0.0018372113 -0.0061723060
## Median 1.742911 0.005447649 0.0022660991 -0.0008191737
## Mean 1.742306 0.005528801 0.0021414057 -0.0012959183
## 3rd Qu. 1.810728 0.006561399 0.0027638974 0.0010960094
## Max. 1.940631 0.014184488 0.0033344901 0.0134441957
## ---
## Response Variable: SER
## Intercept X1 X2 X3
## Min. 3.631474 -0.0006215636 -0.0013312772 -0.021871201
## 1st Qu. 4.116476 0.0009320177 0.0009784528 -0.009521185
## Median 4.155845 0.0014761832 0.0023514271 -0.008309675
## Mean 4.118853 0.0024228519 0.0021249895 -0.006532755
## 3rd Qu. 4.181724 0.0029069221 0.0029691878 -0.005683274
## Max. 4.350242 0.0095493090 0.0073674222 0.019607599
## ---
## Response Variable: GER
## Intercept X1 X2 X3
## Min. 3.624337 -0.0015551856 -0.003494421 -0.022957520
## 1st Qu. 4.106884 0.0006773448 0.002248967 -0.013823296
## Median 4.134112 0.0012823773 0.004877076 -0.008134137
## Mean 4.202760 0.0025275725 0.003697012 -0.008314919
## 3rd Qu. 4.278730 0.0033876642 0.005473420 -0.005828193
## Max. 4.917275 0.0128192444 0.007425980 0.019881558
## ---
##
## Log-likelihood of GWMGGR (Full Model) : -1376.621
## Deviance of GWMGGR (Full Model) : 2753.242
## Number of Effective Parameters : 86.54622
##
## G2 MGGR : 122.0063 on 9 degrees of freedom
## G2 GWMGGR : 124.049 on 86.54622 degrees of freedom Sig.: 0.005110405
## FStat : 9.457895 Sig.: 6.852746e-10
## Bandwidth : 0.25 0.25 0.25 0.25 0.25
## RSq_LR : 0.9014744 (based on Likelihood Ratio)
## RSg_MF : 0.1283882 (McFadden’s R-Squared)
## AIC : 3014.788
##
## Required Time: 25.10843 mins
Save the GWMGGR model output into several csv files to facilitate
further analysis.
# Save Output of GWMGGR Model
write.csv(gwmggr.res$Parameter.GWMGGR,"GWMGGR Parameters.csv",
row.names = FALSE)
write.csv(gwmggr.res$Beta,"Coefficients of GWMGGR Model.csv",
row.names = FALSE)
write.csv(gwmggr.res$SE.beta,"Standard Error of GWMGGR Coefficients.csv",
row.names = FALSE)
write.csv(gwmggr.res$Zhit,"Z-Stats of GWMGGR Model.csv",
row.names = FALSE)
write.csv(gwmggr.res$Sig.beta,"p-value of of GWMGGR Coefficients.csv",
row.names = FALSE)
Based on the regression parameters of the GWMGGR model at each
location, we then group districts/cities that have the same spatial
effects. The clustering process is based on the regression coefficient
of each predictor on each response variable. Therefore, in this study we
use the k-means clustering method in the hope that we will obtain
regions that have homogeneous characteristics.
library(rgdal)
library(tidyverse)
library(tmap)
library(tmaptools)
library(ggplot2)
library(cartogram)
library(geogrid)
library(geosphere)
library(broom)
library(sf)
library(cluster)
library(factoextra)
options(warn = -1)
#Read polygon data####
tun_shp <- sf::st_read("central_java.shp")
## Reading layer `central_java' from data source
## `C:\DISERTASI\Artikel Publikasi Disertasi\MethodsX\central_java.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 35 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 108.5554 ymin: -8.212804 xmax: 111.6933 ymax: -5.727798
## CRS: NA
#Spatial Clustering based on Coefficients of GWMGGR using k-means####
distric = read.csv('location_list.csv')
Beta = gwmggr.res$Beta
rownames(Beta) = c(distric$Name)
colnames(Beta) = c("b01","b11","b21","b31","b02","b12","b22","b32",
"b03","b13","b23","b33")
Coeff = scale(Beta)
#find the optimal cluster using silhouette Methods
fviz_nbclust(Coeff, kmeans, method = "silhouette") #silhouette methods
#print silhoutte score
fviz_nbclust(Coeff, kmeans, method = "silhouette")$data
## clusters y
## 1 1 0.0000000
## 2 2 0.2427202
## 3 3 0.3402803
## 4 4 0.2321499
## 5 5 0.2999332
## 6 6 0.2289220
## 7 7 0.2340544
## 8 8 0.2438737
## 9 9 0.2524587
## 10 10 0.2380909
After we find out the optimal number of clusters, we can perform
k-means clustering. By default, the kmeans() call will minimise
the Euclidean distance between data points to define clusters.
#clustering using k-means methods
spat.clust = kmeans(Coeff, 3, nstart = 30)
#size of each cluster
spat.clust$size
## [1] 4 25 6
#member of each cluster
spat.clust$cluster
## Cilacap Banyumas Purbalingga Banjarnegara Kebumen
## 3 2 2 2 2
## Purworejo Wonosobo Magelang Boyolali Klaten
## 2 2 2 2 2
## Sukoharjo Wonogiri Karanganyar Sragen Grobogan
## 2 2 2 2 2
## Blora Rembang Pati Kudus Jepara
## 3 3 3 3 2
## Demak Semarang Temanggung Kendal Batang
## 2 2 2 2 2
## Pekalongan Pemalang Tegal Brebes Magelang City
## 2 1 1 3 2
## Surakarta City Salatiga City Semarang City Pekalongan City Tegal City
## 2 2 2 1 1
#Cluster plot
fviz_cluster(spat.clust, data = Coeff)
#Profiling the cluster (center of each cluster)
center = as.data.frame(Beta) %>%
mutate(Cluster = spat.clust$cluster) %>%
group_by(Cluster) %>%
summarise_all("mean")
datatable(center, caption = "Center of the cluster")
#Add column to shp file###
tun_shp$Spat_Cluster = as.factor(spat.clust$cluster)
tun_shp$location = distric$Name
names(tun_shp)
## [1] "KODE_KAB" "NAMA_KAB" "KODE_PROP" "NAMA_PROP" "JKB"
## [6] "geometry" "Spat_Cluster" "location"
The map shows that districts/municipalities in Central Java are
grouped into three clusters, each with unique characteristics. Thus,
these results indicate that to improve the achievement of the three
education indicators, special measures are needed according to the
characteristics of each cluster of districts/cities.
#Spatial Cluster based on GWMGGR parameters ####
tm_shape(tun_shp) +
tm_polygons("Spat_Cluster", palette = "viridis", style = "fixed",
border.col = "black",legend.is.portrait = TRUE,
title = "Spatial cluster based on GWMGGR") +
tm_text("location",size = "AREA", scale=.5) +
tm_layout(frame = FALSE)