The objective of this Spatial Statistics project is to construct a regression model employing spatial methodologies. The intention is to explore the spatial relationships within the data to enhance the predictive power of the model. The dataset being utilized for this analysis is obtained from the reputable source, Badan Pusat Statistik (BPS). By incorporating spatial considerations into the regression framework, we aim to capture any geographical dependencies or spatial patterns that may exist in the data, thereby providing more accurate and robust modeling results.

MLR : Multiple Linear Regression
GWR : Geographically Weighted Regression

Data Preparation

In this project, Y represents the dependent variable, while X represents the independent variable. The variables employed in this analysis are as follows:
Y : Human Development Index (IPM)
X1 : School Average Duration (RLS)
X2 : Percentage of 15-49 Years Old Women with Married Status and Using Contraception Tools
X3 : Gini Ratio
X4 : Percentage of Student of Age 5-24 Years Old Using Internet for The Last 3 Months

Import Libraries

library(spgwr)
library(readxl)
library(car)
library(lmtest)
library(zoo)

Import Data

data <- read_excel("D:/JAMBORE/education.xlsx")
data
## # A tibble: 34 × 8
##    Provinsi                         Y    X1    X2    X3    X4 latitude longitude
##    <chr>                        <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>     <dbl>
##  1 Provinsi Aceh                 72.8  9.79  43.6 0.291  53.5    4.70       96.7
##  2 Provinsi Sumatera Utara       72.7  9.99  42.9 0.326  71.8    2.12       99.5
##  3 Provinsi Sumatera Barat       73.3  9.51  45.6 0.292  72.7   -0.740     101. 
##  4 Provinsi Riau                 73.5  9.54  48.2 0.323  73.9    0.293     102. 
##  5 Provinsi Jambi                72.1  9.07  63.6 0.335  72.2   -1.49      102. 
##  6 Provinsi Sumatera Selatan     70.9  8.82  61.7 0.33   75.9   -3.32      104. 
##  7 Provinsi Bengkulu             72.2  9.28  61.9 0.315  75.5   -3.58      102. 
##  8 Provinsi Lampung              70.4  8.61  66.1 0.313  81.2   -4.56      105. 
##  9 Provinsi Kepulauan Bangka B…  72.2  8.57  61.8 0.255  79.8   -2.74      106. 
## 10 Provinsi Kepulauan Riau       76.6 10.5   41.2 0.325  79.4    3.95      108. 
## # ℹ 24 more rows

Descritive Statistics

summary(data[,2:6])
##        Y               X1               X2              X3        
##  Min.   :61.39   Min.   : 7.310   Min.   :22.10   Min.   :0.2550  
##  1st Qu.:69.93   1st Qu.: 8.580   1st Qu.:44.29   1st Qu.:0.3095  
##  Median :72.19   Median : 9.225   Median :52.38   Median :0.3325  
##  Mean   :71.86   Mean   : 9.247   Mean   :51.46   Mean   :0.3432  
##  3rd Qu.:73.22   3rd Qu.: 9.777   3rd Qu.:61.45   3rd Qu.:0.3698  
##  Max.   :81.65   Max.   :11.300   Max.   :67.92   Max.   :0.4590  
##        X4       
##  Min.   :30.89  
##  1st Qu.:68.74  
##  Median :75.12  
##  Mean   :72.39  
##  3rd Qu.:81.06  
##  Max.   :94.08

Data Visualization

These plots illustrate the linearity between each independent variable and the dependent variable.

attach(data)
par(mfrow=c(2,2))
plot(X1, Y, main="Y vs X1")
plot(X2, Y, main="Y vs X2")
plot(X3, Y, main="Y vs X3")
plot(X4, Y, main="Y vs X4")

Multiple Linear Regression Analysis

First, we’re going to check how the data was created using the MLR method. Then, we’ll compare it with the GWR method. This helps us understand how the data varies depending on the method used. It’s like comparing two ways of cooking to see which one tastes better.

Data Modeling

mlr<-lm(formula = data$Y ~ data$X1 + data$X2 + data$X3 + data$X4,
      data = data)
summary(mlr)
## 
## Call:
## lm(formula = data$Y ~ data$X1 + data$X2 + data$X3 + data$X4, 
##     data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.3203 -0.8071  0.2453  0.9406  3.4606 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.10177    5.41258   5.561 5.34e-06 ***
## data$X1      2.62842    0.46498   5.653 4.15e-06 ***
## data$X2      0.05298    0.04416   1.200 0.239980    
## data$X3      8.25094    7.36750   1.120 0.271942    
## data$X4      0.16431    0.03734   4.400 0.000134 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.816 on 29 degrees of freedom
## Multiple R-squared:  0.818,  Adjusted R-squared:  0.7929 
## F-statistic: 32.59 on 4 and 29 DF,  p-value: 2.394e-10

Multicolinearity Detection

vif(mlr)
##  data$X1  data$X2  data$X3  data$X4 
## 1.442920 2.246113 1.125901 2.308783

Now, we have obtained the VIF (Variance Inflation Factor) value for each independent variable, where VIFs less than 10 indicate that there is no multicollinearity among the independent variables. Therefore, these variables are suitable for inclusion in the multiple linear regression model.

Parameter Estimation of MLR model

After executing the data modeling process, the results reveal the model obtained through the MLR (Multiple Linear Regression) method, which is outlined below:
\(Y = 30.10177 + 2.62842X_1 + 0.05298X_2 + 8.25094X_3 + 0.16431X_4\)

In addition, an essential step in inferential statistics involves conducting hypothesis testing. By employing the Hypothesis Testing method with a significance level \(\alpha =0.05\), we can perform both Simultaneous and Partial Hypothesis Testing.

Simultaneous Hypothesis Testing

Hypothesis
H0: \(\beta_1=\beta_2=...=\beta_j=0\); \(j=1,2,...,4\)
H1: At least one \(\beta_j \neq 0\)

Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)

Test Statistics
\(p-value = 2.394 \times 10^{-10} < \alpha = 0.05\)
Based on the test conducted above, the null hypothesis (H0) is rejected which implies that the parameters incorporated within the model exert a notable influence on the dependent variable.

Partial Hypothesis Testing

Since there are four parameters representing each independent variable, the test will be applied to each parameter individually. The hypothesis for each parameter can be expressed as follows:

Hypothesis
H0: \(\beta_j=0\);
H1: \(\beta_j \neq 0\)

Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)

Test Statistics
Coefficient \(p-value\) Decision (\(\alpha=0.05\))
X1 \(4.15 \times 10^{-6}\) H0 is rejected
X2 \(0.239980\) H0 is not rejected
X3 \(0.271942\) H0 is not rejected
X4 \(0.000134\) H0 is rejected

Hence, the independent variables significantly influencing the dependent values are X1 and X4, representing ‘School Average Duration’ and ‘Percentage of Students Aged 5-24 Using the Internet in the Last 3 Months’ respectively.

Determination Coefficient (R2)

R2 = 81.8%
Approximately 81.8% of the variability observed in the dependent variable can be explained by the independent variables included in the regression model. In other words, the model is able to account for a large proportion of the variation in the dependent variable. This suggests that the independent variables in the model are providing a good fit to the data and are effective in explaining the changes in the dependent variable.

Clasic Assumption Test of Linear Regression Model

Normal Distribution of residuals

Hypothesis
H0: The residuals of the model are normally distributed H1: The residuals of the model are not normally distributed

Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)

Test Statistics

residu<-abs(mlr$residuals)
res=mlr$residual
ks.test(res,"pnorm",mean(res),sd(res),alternative=c("two.sided"))
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  res
## D = 0.11087, p-value = 0.7564
## alternative hypothesis: two-sided

Decision
\(p-value = 0.7564 > \alpha = 0.05\) means that the H0 fails to be rejected.

Conclusion
The residuals of the model are normally distributed

Non-Autocorelation

Hypothesis
H0: There is no autocorrelation in the residuals
H1: There is autocorrelation in the residuals

Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)

Test Statistics

dwtest(lm(mlr$residuals~X1+X2+X3+X4,data=data))
## 
##  Durbin-Watson test
## 
## data:  lm(mlr$residuals ~ X1 + X2 + X3 + X4, data = data)
## DW = 2.0742, p-value = 0.4757
## alternative hypothesis: true autocorrelation is greater than 0

Decision
\(p-value = 0.4757 > \alpha = 0.05\) means that the H0 fails to be rejected.

Conclusion
There is no autocorrelation in the residuals

Non-Heteroskedasticity

Hypothesis
H0: There is no heteroscedasticity in the residuals
H1: There is heteroscedasticity in the residuals

Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)

Test Statistics

bptest(lm(mlr$residuals~X1+X2+X3+X4,data=data))
## 
##  studentized Breusch-Pagan test
## 
## data:  lm(mlr$residuals ~ X1 + X2 + X3 + X4, data = data)
## BP = 12.591, df = 4, p-value = 0.01345

Decision
\(p-value = 0.01345 < \alpha = 0.05\) means that the H0 is rejected.

Conclusion
There is heteroscedasticity in the residuals

Geographically Weighted Regression Analysis

GWR (Geographically Weighted Regression) serves as a spatial modeling method that offers an alternative approach when global regression (MLR) yields a model with heteroscedastic errors. By incorporating the spatial coordinates of each object, GWR enables the creation of localized models, allowing for a more nuanced understanding of spatial relationships and addressing issues of spatial heterogeneity.

Generate Optimum Bandwith

b <- gwr.sel(Y~X1+X2+X3+X4, 
             coords=cbind(data$latitude,data$longitude),
             data=data, adapt=FALSE,gweight=gwr.Gauss)
## Bandwidth: 16.61727 CV score: 146.6738 
## Bandwidth: 26.86046 CV score: 160.382 
## Bandwidth: 10.28663 CV score: 115.5332 
## Bandwidth: 6.374075 CV score: 74.12735 
## Bandwidth: 3.955985 CV score: 70.86084 
## Bandwidth: 4.701866 CV score: 65.94456 
## Bandwidth: 5.022818 CV score: 65.71401 
## Bandwidth: 4.927587 CV score: 65.67795 
## Bandwidth: 4.936246 CV score: 65.67767 
## Bandwidth: 4.935224 CV score: 65.67767 
## Bandwidth: 4.935183 CV score: 65.67767 
## Bandwidth: 4.935142 CV score: 65.67767 
## Bandwidth: 4.935183 CV score: 65.67767

Based on the output, the optimized bandwidth for the GWR model is determined to be approximately 4.935183.

Modelling GWR with The Generated Bandwith

gwr <- gwr(Y~X1+X2+X3+X4, 
            coords=cbind(data$latitude,data$longitude),bandwidth=b,
            data=data,hatmatrix=TRUE,gweight=gwr.Gauss)
names(gwr)
##  [1] "SDF"       "lhat"      "lm"        "results"   "bandwidth" "adapt"    
##  [7] "hatmatrix" "gweight"   "gTSS"      "this.call" "fp.given"  "timings"
names(gwr$SDF)
##  [1] "sum.w"              "(Intercept)"        "X1"                
##  [4] "X2"                 "X3"                 "X4"                
##  [7] "(Intercept)_se"     "X1_se"              "X2_se"             
## [10] "X3_se"              "X4_se"              "gwr.e"             
## [13] "pred"               "pred.se"            "localR2"           
## [16] "(Intercept)_se_EDF" "X1_se_EDF"          "X2_se_EDF"         
## [19] "X3_se_EDF"          "X4_se_EDF"          "pred.se"
t_X1=gwr$SDF$X1/gwr$SDF$X1_se
t_X2=gwr$SDF$X2/gwr$SDF$X2_se
t_X3=gwr$SDF$X3/gwr$SDF$X3_se
t_X4=gwr$SDF$X4/gwr$SDF$X4_se
gwr$SDF$t_X1 = t_X1
gwr$SDF$t_X2 = t_X2
gwr$SDF$t_X3 = t_X3
gwr$SDF$t_X4 = t_X4

Local GWR Model

Each province’s unique set of observations is leveraged to construct its individualized GWR model, resulting in the following outcomes:

gwr$SDF[,c(2:6, 22:25, 15)]
##               coordinates X.Intercept.        X1           X2          X3
## 1     (4.695135, 96.7494)     43.94621 2.9166391  0.059912483 -17.1237534
## 2     (2.115355, 99.5451)     35.04674 3.3958459  0.054028432  -6.4420558
## 3     (-0.7399397, 100.8)     31.48410 3.3618256  0.025426254   0.7002421
## 4   (0.2933469, 101.7068)     31.66935 3.3484283  0.028351194   0.2719607
## 5   (-1.485183, 102.4381)     29.99260 3.2814684  0.008956870   2.4550678
## 6   (-3.319437, 103.9144)     28.57952 3.2139652 -0.010612419   3.1811522
## 7   (-3.577847, 102.3464)     28.69965 3.2750127 -0.006789254   2.8237814
## 8   (-4.558585, 105.4068)     28.34942 3.1705309 -0.021498056   3.4094236
## 9   (-2.741051, 106.4406)     28.49231 3.1455837 -0.005905151   4.6884782
## 10   (3.945651, 108.1429)     26.76383 3.4758000  0.059666167   5.3602977
## 11  (-6.211544, 106.8452)     28.46126 3.1440758 -0.031389865   3.5671077
## 12  (-7.090911, 107.6689)     28.35714 3.1379398 -0.033035236   3.8472211
## 13  (-7.150975, 110.1403)     26.95443 3.1563842 -0.011890050   6.4624826
## 14  (-7.875385, 110.4262)     26.83899 3.1540506 -0.013031447   6.6284984
## 15  (-7.536064, 112.2384)     25.63871 3.1934093  0.004623216   8.9305002
## 16   (-6.405817, 106.064)     28.46691 3.1557113 -0.034476897   3.0652543
## 17  (-8.409518, 115.1889)     24.74199 3.2693577  0.013399612  10.8344406
## 18  (-8.652933, 117.3616)     25.16391 3.2751783  0.012765904  10.4387402
## 19  (-8.657382, 121.0794)     30.92104 2.7249229  0.007737601   5.6441310
## 20 (-0.2787808, 111.4753)     23.58667 3.3885762  0.057317764  10.3577687
## 21  (-1.681488, 113.3824)     22.23934 3.4455127  0.059800059  11.6012425
## 22  (-3.092642, 115.2838)     21.79156 3.4881644  0.055094741  11.9216520
## 23     (0.5387, 116.4194)     20.31083 3.6811028  0.078001785  10.5314686
## 24     (3.0731, 116.0414)     19.50109 3.7863547  0.091140035   9.7439821
## 25   (0.6246932, 123.975)     32.52768 2.4027618  0.128464301  -4.2773791
## 26  (-1.430025, 121.4456)     26.94207 3.0087670  0.072720021   4.0518296
## 27  (-3.668799, 119.9741)     25.53000 3.1990946  0.047182236   7.3107829
## 28   (-4.14491, 122.1746)     30.37003 2.6521854  0.061401997   3.0985647
## 29  (0.6999372, 122.4467)     28.03747 2.8676421  0.099923773   0.7903947
## 30  (-2.844137, 119.2321)     23.87406 3.3589399  0.052292061   8.4175427
## 31  (-3.238462, 130.1453)     50.45634 0.5833644  0.205221904  -2.6362001
## 32   (1.570999, 127.8088)     46.07619 1.2030198  0.191917588 -11.2923264
## 33  (-1.336115, 133.1747)     48.85850 0.7020728  0.246116699   2.6727015
## 34  (-4.269928, 138.0804)     46.97256 0.7384628  0.265913030   6.7789620
##            X4       t_X1       t_X2        t_X3      t_X4   localR2
## 1  0.04814476  2.5201065  0.8103315 -1.06942582 0.9906552 0.4643943
## 2  0.06743617  4.3506665  0.8867297 -0.66324319 1.5682592 0.7982442
## 3  0.11114930  5.6059365  0.4870317  0.09623519 2.9053724 0.8933404
## 4  0.11023079  5.7683783  0.5556072  0.03839079 2.9784465 0.8954548
## 5  0.14633279  6.3557323  0.1904758  0.37748988 3.9320801 0.9246892
## 6  0.18513652  6.9127138 -0.2476963  0.51673192 4.6947897 0.9481231
## 7  0.17440215  6.4833073 -0.1474229  0.43478203 4.3062376 0.9405208
## 8  0.20089474  7.2741233 -0.5380110  0.57679274 5.1049053 0.9588565
## 9  0.18513198  7.7555063 -0.1613816  0.85466471 5.2101726 0.9559467
## 10 0.11628793  7.7597297  1.7468209  1.06472645 4.0360099 0.9543016
## 11 0.20958839  7.6017074 -0.8274121  0.62426534 5.4467760 0.9660754
## 12 0.21173639  7.8211397 -0.8812484  0.68178710 5.6403201 0.9686560
## 13 0.20054105  8.5535382 -0.3378636  1.24599273 6.2030299 0.9702189
## 14 0.20230022  8.5079736 -0.3512210  1.22659133 6.1838096 0.9713135
## 15 0.18973488  8.9105847  0.1272098  1.70396204 6.2782584 0.9711603
## 16 0.21242081  7.3342555 -0.8632432  0.51518913 5.2405650 0.9653712
## 17 0.17673616  9.2038070  0.3469666  1.84949974 5.9417011 0.9720012
## 18 0.17189592  9.0183647  0.3293828  1.74584401 5.8573502 0.9707677
## 19 0.18516290  6.3801451  0.1925776  0.85273527 6.1672842 0.9650894
## 20 0.14818675  9.7481571  1.8616983  2.51427102 5.0901171 0.9607666
## 21 0.15097282 10.4222903  1.9256155  2.81549154 5.6613555 0.9616920
## 22 0.15302556 10.9252035  1.7830614  2.80984827 6.2082054 0.9618467
## 23 0.13695293  9.9217116  2.4144735  2.10723073 5.3222123 0.9583562
## 24 0.12849569  9.1146074  2.6482010  1.74829985 4.6638256 0.9606888
## 25 0.16027796  5.2459420  3.6900779 -0.62050920 6.3703996 0.9569667
## 26 0.16208625  7.5208027  2.2620220  0.64461591 6.6082645 0.9538437
## 27 0.16245700  8.7581153  1.5002805  1.33595344 6.6216033 0.9564782
## 28 0.17345361  6.7075329  1.8017591  0.47517423 6.7395701 0.9553181
## 29 0.15953069  6.5083162  3.0330499  0.11940389 6.4427646 0.9549225
## 30 0.15634865  9.4097279  1.7010257  1.60829848 6.4448752 0.9563857
## 31 0.07966525  1.2807537  4.3034549 -0.21285836 1.7910238 0.9790792
## 32 0.11290351  2.5635618  4.8462125 -1.40514291 3.4983463 0.9707137
## 33 0.02687652  1.0773905  3.9705451  0.15525793 0.4246041 0.9868351
## 34 0.01544941  0.7520661  2.8710479  0.27262650 0.1594778 0.9936883

Hence, there exist 34 distinct models and local determination coefficient (R2), each tailored to represent the characteristics of a specific province. Below is an exemplar model:

  1. Aceh
    • \(Y=43.95+2.92X_1 +0.06X_2-17.12X_3+0.05X_4\)
    • \(R^2 = 46.44%\)
  2. Sumatera Utara
    • \(Y=35.05+3.40X_1 +0.05X_2-6.44X_3+0.07X_4\)
    • \(R^2 = 79.82%\)
  3. Sumatera Barat
    • \(Y=31.48+3.36X_1+0.03X_2+0.7X_3+0.11X_4\)
    • \(R^2=89.33%\)

Goodness of Fit Test

Hypothesis
H0: \(\beta_j(u_i, v_i)=\beta_j\); \(j=1,2,...,4\)
H1: At least one \(\beta_j(u_i, v_i) \neq \beta_j\)

Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)

Test Statistics

BFC02.gwr.test(gwr)
## 
##  Brunsdon, Fotheringham & Charlton (2002, pp. 91-2) ANOVA
## 
## data:  gwr
## F = 7.2057, df1 = 29.000, df2 = 14.251, p-value = 0.000153
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR residuals 
##         95.62953         13.27137

Decision
\(p-value = 0.000153 < \alpha = 0.05\) means that the H0 is rejected.

Conclusion
There exists a notable disparity between the MLR and GWR models in terms of goodness of fit.

Spatial Effect Testing

Hypothesis
H0: There is spatial effect
H1: There is no spatial effect

Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)

Test Statistics

LMZ.F3GWR.test(gwr)
## 
## Leung et al. (2000) F(3) test
## 
##             F statistic Numerator d.f. Denominator d.f.    Pr(>)   
## (Intercept)     1.69258       13.19849           19.149 0.143198   
## X1              3.10676       15.08858           19.149 0.010563 * 
## X2              3.57655       14.68519           19.149 0.005096 **
## X3              0.59755        7.67783           19.149 0.762904   
## X4              2.08687       13.09254           19.149 0.069812 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Decision
Coefficient \(p-value\) Decision (\(\alpha=0.05\))
X1 0.010563 H0 is rejected
X2 0.005096 H0 is not rejected
X3 0.762904 H0 is not rejected
X4 0.069812 H0 is rejected

Conclusion
There are spatial effect in X1 and X2

Classic Assumption Testing

Residuals Normality Distribution

Hypothesis
H0: The residuals of the model are normally distributed H1: The residuals of the model are not normally distributed

Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)

Test Statistics

predlokal=gwr$SDF$"pred"
galatlokal=data$Y-predlokal
galatlokalabs<-abs(galatlokal)
ks.test(galatlokal,"pnorm",mean(galatlokal),sd(galatlokal),alternative=c("two.sided"))
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  galatlokal
## D = 0.1105, p-value = 0.7599
## alternative hypothesis: two-sided

Decision
\(p-value = 0.7599 > \alpha = 0.05\) means that the H0 fails to be rejected.

Conclusion
The residuals of the model are normally distributed

Spatial Non-Autocorrelation

Hypothesis
H0: There is no autocorrelation in the residuals
H1: There is autocorrelation in the residuals

Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)

Test Statistics

dwtest(lm(galatlokal~X1+X2+X3+X4, data=data))
## 
##  Durbin-Watson test
## 
## data:  lm(galatlokal ~ X1 + X2 + X3 + X4, data = data)
## DW = 1.951, p-value = 0.3376
## alternative hypothesis: true autocorrelation is greater than 0

Decision
\(p-value = 0.3376 > \alpha = 0.05\) means that the H0 fails to be rejected.

Conclusion
There is no autocorrelation in the residuals

Spatial Heteroscedasticity

Hypothesis
H0: There is no heteroscedasticity in the residuals
H1: There is heteroscedasticity in the residuals

Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)

Test Statistics

bptest(lm(galatlokal~X1+X2+X3+X4,data=data))
## 
##  studentized Breusch-Pagan test
## 
## data:  lm(galatlokal ~ X1 + X2 + X3 + X4, data = data)
## BP = 5.6138, df = 4, p-value = 0.2299

Decision
\(p-value = 0.2299 > \alpha = 0.05\) means that the H0 is rejected.

Conclusion
There is no heteroscedasticity in the residuals



The end