The objective of this Spatial Statistics project is to construct a
regression model employing spatial methodologies. The intention is to
explore the spatial relationships within the data to enhance the
predictive power of the model. The dataset being utilized for this
analysis is obtained from the reputable source, Badan Pusat Statistik
(BPS). By incorporating spatial considerations into the regression
framework, we aim to capture any geographical dependencies or spatial
patterns that may exist in the data, thereby providing more accurate and
robust modeling results.
MLR : Multiple Linear Regression
GWR : Geographically Weighted Regression
In this project, Y represents the dependent variable, while X
represents the independent variable. The variables employed in this
analysis are as follows:
Y : Human Development Index (IPM)
X1 : School Average Duration (RLS)
X2 : Percentage of 15-49 Years Old Women with Married Status
and Using Contraception Tools
X3 : Gini Ratio
X4 : Percentage of Student of Age 5-24 Years Old Using
Internet for The Last 3 Months
library(spgwr)
library(readxl)
library(car)
library(lmtest)
library(zoo)
data <- read_excel("D:/JAMBORE/education.xlsx")
data
## # A tibble: 34 × 8
## Provinsi Y X1 X2 X3 X4 latitude longitude
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Provinsi Aceh 72.8 9.79 43.6 0.291 53.5 4.70 96.7
## 2 Provinsi Sumatera Utara 72.7 9.99 42.9 0.326 71.8 2.12 99.5
## 3 Provinsi Sumatera Barat 73.3 9.51 45.6 0.292 72.7 -0.740 101.
## 4 Provinsi Riau 73.5 9.54 48.2 0.323 73.9 0.293 102.
## 5 Provinsi Jambi 72.1 9.07 63.6 0.335 72.2 -1.49 102.
## 6 Provinsi Sumatera Selatan 70.9 8.82 61.7 0.33 75.9 -3.32 104.
## 7 Provinsi Bengkulu 72.2 9.28 61.9 0.315 75.5 -3.58 102.
## 8 Provinsi Lampung 70.4 8.61 66.1 0.313 81.2 -4.56 105.
## 9 Provinsi Kepulauan Bangka B… 72.2 8.57 61.8 0.255 79.8 -2.74 106.
## 10 Provinsi Kepulauan Riau 76.6 10.5 41.2 0.325 79.4 3.95 108.
## # ℹ 24 more rows
summary(data[,2:6])
## Y X1 X2 X3
## Min. :61.39 Min. : 7.310 Min. :22.10 Min. :0.2550
## 1st Qu.:69.93 1st Qu.: 8.580 1st Qu.:44.29 1st Qu.:0.3095
## Median :72.19 Median : 9.225 Median :52.38 Median :0.3325
## Mean :71.86 Mean : 9.247 Mean :51.46 Mean :0.3432
## 3rd Qu.:73.22 3rd Qu.: 9.777 3rd Qu.:61.45 3rd Qu.:0.3698
## Max. :81.65 Max. :11.300 Max. :67.92 Max. :0.4590
## X4
## Min. :30.89
## 1st Qu.:68.74
## Median :75.12
## Mean :72.39
## 3rd Qu.:81.06
## Max. :94.08
These plots illustrate the linearity between each independent variable and the dependent variable.
attach(data)
par(mfrow=c(2,2))
plot(X1, Y, main="Y vs X1")
plot(X2, Y, main="Y vs X2")
plot(X3, Y, main="Y vs X3")
plot(X4, Y, main="Y vs X4")
First, we’re going to check how the data was created using the MLR method. Then, we’ll compare it with the GWR method. This helps us understand how the data varies depending on the method used. It’s like comparing two ways of cooking to see which one tastes better.
mlr<-lm(formula = data$Y ~ data$X1 + data$X2 + data$X3 + data$X4,
data = data)
summary(mlr)
##
## Call:
## lm(formula = data$Y ~ data$X1 + data$X2 + data$X3 + data$X4,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3203 -0.8071 0.2453 0.9406 3.4606
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.10177 5.41258 5.561 5.34e-06 ***
## data$X1 2.62842 0.46498 5.653 4.15e-06 ***
## data$X2 0.05298 0.04416 1.200 0.239980
## data$X3 8.25094 7.36750 1.120 0.271942
## data$X4 0.16431 0.03734 4.400 0.000134 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.816 on 29 degrees of freedom
## Multiple R-squared: 0.818, Adjusted R-squared: 0.7929
## F-statistic: 32.59 on 4 and 29 DF, p-value: 2.394e-10
vif(mlr)
## data$X1 data$X2 data$X3 data$X4
## 1.442920 2.246113 1.125901 2.308783
Now, we have obtained the VIF (Variance Inflation Factor) value for each independent variable, where VIFs less than 10 indicate that there is no multicollinearity among the independent variables. Therefore, these variables are suitable for inclusion in the multiple linear regression model.
After executing the data modeling process, the results reveal the
model obtained through the MLR (Multiple Linear Regression) method,
which is outlined below:
\(Y = 30.10177 + 2.62842X_1 + 0.05298X_2 +
8.25094X_3 + 0.16431X_4\)
In addition, an essential step in inferential statistics involves conducting hypothesis testing. By employing the Hypothesis Testing method with a significance level \(\alpha =0.05\), we can perform both Simultaneous and Partial Hypothesis Testing.
Hypothesis
H0: \(\beta_1=\beta_2=...=\beta_j=0\); \(j=1,2,...,4\)
H1: At least one \(\beta_j \neq
0\)
Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)
Test Statistics
\(p-value = 2.394 \times 10^{-10} < \alpha
= 0.05\)
Based on the test conducted above, the null hypothesis (H0)
is rejected which implies that the parameters incorporated within the
model exert a notable influence on the dependent variable.
Since there are four parameters representing each independent variable, the test will be applied to each parameter individually. The hypothesis for each parameter can be expressed as follows:
Hypothesis
H0: \(\beta_j=0\);
H1: \(\beta_j \neq 0\)
Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)
| Coefficient | \(p-value\) | Decision (\(\alpha=0.05\)) |
|---|---|---|
| X1 | \(4.15 \times 10^{-6}\) | H0 is rejected |
| X2 | \(0.239980\) | H0 is not rejected |
| X3 | \(0.271942\) | H0 is not rejected |
| X4 | \(0.000134\) | H0 is rejected |
Hence, the independent variables significantly influencing the dependent values are X1 and X4, representing ‘School Average Duration’ and ‘Percentage of Students Aged 5-24 Using the Internet in the Last 3 Months’ respectively.
R2 = 81.8%
Approximately 81.8% of the variability observed in the dependent
variable can be explained by the independent variables included in the
regression model. In other words, the model is able to account for a
large proportion of the variation in the dependent variable. This
suggests that the independent variables in the model are providing a
good fit to the data and are effective in explaining the changes in the
dependent variable.
Hypothesis
H0: The residuals of the model are normally distributed
H1: The residuals of the model are not normally
distributed
Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)
Test Statistics
residu<-abs(mlr$residuals)
res=mlr$residual
ks.test(res,"pnorm",mean(res),sd(res),alternative=c("two.sided"))
##
## Exact one-sample Kolmogorov-Smirnov test
##
## data: res
## D = 0.11087, p-value = 0.7564
## alternative hypothesis: two-sided
Decision
\(p-value = 0.7564 > \alpha = 0.05\)
means that the H0 fails to be rejected.
Conclusion
The residuals of the model are normally distributed
Hypothesis
H0: There is no autocorrelation in the residuals
H1: There is autocorrelation in the residuals
Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)
Test Statistics
dwtest(lm(mlr$residuals~X1+X2+X3+X4,data=data))
##
## Durbin-Watson test
##
## data: lm(mlr$residuals ~ X1 + X2 + X3 + X4, data = data)
## DW = 2.0742, p-value = 0.4757
## alternative hypothesis: true autocorrelation is greater than 0
Decision
\(p-value = 0.4757 > \alpha = 0.05\)
means that the H0 fails to be rejected.
Conclusion
There is no autocorrelation in the residuals
Hypothesis
H0: There is no heteroscedasticity in the residuals
H1: There is heteroscedasticity in the residuals
Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)
Test Statistics
bptest(lm(mlr$residuals~X1+X2+X3+X4,data=data))
##
## studentized Breusch-Pagan test
##
## data: lm(mlr$residuals ~ X1 + X2 + X3 + X4, data = data)
## BP = 12.591, df = 4, p-value = 0.01345
Decision
\(p-value = 0.01345 < \alpha =
0.05\) means that the H0 is rejected.
Conclusion
There is heteroscedasticity in the residuals
GWR (Geographically Weighted Regression) serves as a spatial modeling method that offers an alternative approach when global regression (MLR) yields a model with heteroscedastic errors. By incorporating the spatial coordinates of each object, GWR enables the creation of localized models, allowing for a more nuanced understanding of spatial relationships and addressing issues of spatial heterogeneity.
b <- gwr.sel(Y~X1+X2+X3+X4,
coords=cbind(data$latitude,data$longitude),
data=data, adapt=FALSE,gweight=gwr.Gauss)
## Bandwidth: 16.61727 CV score: 146.6738
## Bandwidth: 26.86046 CV score: 160.382
## Bandwidth: 10.28663 CV score: 115.5332
## Bandwidth: 6.374075 CV score: 74.12735
## Bandwidth: 3.955985 CV score: 70.86084
## Bandwidth: 4.701866 CV score: 65.94456
## Bandwidth: 5.022818 CV score: 65.71401
## Bandwidth: 4.927587 CV score: 65.67795
## Bandwidth: 4.936246 CV score: 65.67767
## Bandwidth: 4.935224 CV score: 65.67767
## Bandwidth: 4.935183 CV score: 65.67767
## Bandwidth: 4.935142 CV score: 65.67767
## Bandwidth: 4.935183 CV score: 65.67767
Based on the output, the optimized bandwidth for the GWR model is determined to be approximately 4.935183.
gwr <- gwr(Y~X1+X2+X3+X4,
coords=cbind(data$latitude,data$longitude),bandwidth=b,
data=data,hatmatrix=TRUE,gweight=gwr.Gauss)
names(gwr)
## [1] "SDF" "lhat" "lm" "results" "bandwidth" "adapt"
## [7] "hatmatrix" "gweight" "gTSS" "this.call" "fp.given" "timings"
names(gwr$SDF)
## [1] "sum.w" "(Intercept)" "X1"
## [4] "X2" "X3" "X4"
## [7] "(Intercept)_se" "X1_se" "X2_se"
## [10] "X3_se" "X4_se" "gwr.e"
## [13] "pred" "pred.se" "localR2"
## [16] "(Intercept)_se_EDF" "X1_se_EDF" "X2_se_EDF"
## [19] "X3_se_EDF" "X4_se_EDF" "pred.se"
t_X1=gwr$SDF$X1/gwr$SDF$X1_se
t_X2=gwr$SDF$X2/gwr$SDF$X2_se
t_X3=gwr$SDF$X3/gwr$SDF$X3_se
t_X4=gwr$SDF$X4/gwr$SDF$X4_se
gwr$SDF$t_X1 = t_X1
gwr$SDF$t_X2 = t_X2
gwr$SDF$t_X3 = t_X3
gwr$SDF$t_X4 = t_X4
Each province’s unique set of observations is leveraged to construct its individualized GWR model, resulting in the following outcomes:
gwr$SDF[,c(2:6, 22:25, 15)]
## coordinates X.Intercept. X1 X2 X3
## 1 (4.695135, 96.7494) 43.94621 2.9166391 0.059912483 -17.1237534
## 2 (2.115355, 99.5451) 35.04674 3.3958459 0.054028432 -6.4420558
## 3 (-0.7399397, 100.8) 31.48410 3.3618256 0.025426254 0.7002421
## 4 (0.2933469, 101.7068) 31.66935 3.3484283 0.028351194 0.2719607
## 5 (-1.485183, 102.4381) 29.99260 3.2814684 0.008956870 2.4550678
## 6 (-3.319437, 103.9144) 28.57952 3.2139652 -0.010612419 3.1811522
## 7 (-3.577847, 102.3464) 28.69965 3.2750127 -0.006789254 2.8237814
## 8 (-4.558585, 105.4068) 28.34942 3.1705309 -0.021498056 3.4094236
## 9 (-2.741051, 106.4406) 28.49231 3.1455837 -0.005905151 4.6884782
## 10 (3.945651, 108.1429) 26.76383 3.4758000 0.059666167 5.3602977
## 11 (-6.211544, 106.8452) 28.46126 3.1440758 -0.031389865 3.5671077
## 12 (-7.090911, 107.6689) 28.35714 3.1379398 -0.033035236 3.8472211
## 13 (-7.150975, 110.1403) 26.95443 3.1563842 -0.011890050 6.4624826
## 14 (-7.875385, 110.4262) 26.83899 3.1540506 -0.013031447 6.6284984
## 15 (-7.536064, 112.2384) 25.63871 3.1934093 0.004623216 8.9305002
## 16 (-6.405817, 106.064) 28.46691 3.1557113 -0.034476897 3.0652543
## 17 (-8.409518, 115.1889) 24.74199 3.2693577 0.013399612 10.8344406
## 18 (-8.652933, 117.3616) 25.16391 3.2751783 0.012765904 10.4387402
## 19 (-8.657382, 121.0794) 30.92104 2.7249229 0.007737601 5.6441310
## 20 (-0.2787808, 111.4753) 23.58667 3.3885762 0.057317764 10.3577687
## 21 (-1.681488, 113.3824) 22.23934 3.4455127 0.059800059 11.6012425
## 22 (-3.092642, 115.2838) 21.79156 3.4881644 0.055094741 11.9216520
## 23 (0.5387, 116.4194) 20.31083 3.6811028 0.078001785 10.5314686
## 24 (3.0731, 116.0414) 19.50109 3.7863547 0.091140035 9.7439821
## 25 (0.6246932, 123.975) 32.52768 2.4027618 0.128464301 -4.2773791
## 26 (-1.430025, 121.4456) 26.94207 3.0087670 0.072720021 4.0518296
## 27 (-3.668799, 119.9741) 25.53000 3.1990946 0.047182236 7.3107829
## 28 (-4.14491, 122.1746) 30.37003 2.6521854 0.061401997 3.0985647
## 29 (0.6999372, 122.4467) 28.03747 2.8676421 0.099923773 0.7903947
## 30 (-2.844137, 119.2321) 23.87406 3.3589399 0.052292061 8.4175427
## 31 (-3.238462, 130.1453) 50.45634 0.5833644 0.205221904 -2.6362001
## 32 (1.570999, 127.8088) 46.07619 1.2030198 0.191917588 -11.2923264
## 33 (-1.336115, 133.1747) 48.85850 0.7020728 0.246116699 2.6727015
## 34 (-4.269928, 138.0804) 46.97256 0.7384628 0.265913030 6.7789620
## X4 t_X1 t_X2 t_X3 t_X4 localR2
## 1 0.04814476 2.5201065 0.8103315 -1.06942582 0.9906552 0.4643943
## 2 0.06743617 4.3506665 0.8867297 -0.66324319 1.5682592 0.7982442
## 3 0.11114930 5.6059365 0.4870317 0.09623519 2.9053724 0.8933404
## 4 0.11023079 5.7683783 0.5556072 0.03839079 2.9784465 0.8954548
## 5 0.14633279 6.3557323 0.1904758 0.37748988 3.9320801 0.9246892
## 6 0.18513652 6.9127138 -0.2476963 0.51673192 4.6947897 0.9481231
## 7 0.17440215 6.4833073 -0.1474229 0.43478203 4.3062376 0.9405208
## 8 0.20089474 7.2741233 -0.5380110 0.57679274 5.1049053 0.9588565
## 9 0.18513198 7.7555063 -0.1613816 0.85466471 5.2101726 0.9559467
## 10 0.11628793 7.7597297 1.7468209 1.06472645 4.0360099 0.9543016
## 11 0.20958839 7.6017074 -0.8274121 0.62426534 5.4467760 0.9660754
## 12 0.21173639 7.8211397 -0.8812484 0.68178710 5.6403201 0.9686560
## 13 0.20054105 8.5535382 -0.3378636 1.24599273 6.2030299 0.9702189
## 14 0.20230022 8.5079736 -0.3512210 1.22659133 6.1838096 0.9713135
## 15 0.18973488 8.9105847 0.1272098 1.70396204 6.2782584 0.9711603
## 16 0.21242081 7.3342555 -0.8632432 0.51518913 5.2405650 0.9653712
## 17 0.17673616 9.2038070 0.3469666 1.84949974 5.9417011 0.9720012
## 18 0.17189592 9.0183647 0.3293828 1.74584401 5.8573502 0.9707677
## 19 0.18516290 6.3801451 0.1925776 0.85273527 6.1672842 0.9650894
## 20 0.14818675 9.7481571 1.8616983 2.51427102 5.0901171 0.9607666
## 21 0.15097282 10.4222903 1.9256155 2.81549154 5.6613555 0.9616920
## 22 0.15302556 10.9252035 1.7830614 2.80984827 6.2082054 0.9618467
## 23 0.13695293 9.9217116 2.4144735 2.10723073 5.3222123 0.9583562
## 24 0.12849569 9.1146074 2.6482010 1.74829985 4.6638256 0.9606888
## 25 0.16027796 5.2459420 3.6900779 -0.62050920 6.3703996 0.9569667
## 26 0.16208625 7.5208027 2.2620220 0.64461591 6.6082645 0.9538437
## 27 0.16245700 8.7581153 1.5002805 1.33595344 6.6216033 0.9564782
## 28 0.17345361 6.7075329 1.8017591 0.47517423 6.7395701 0.9553181
## 29 0.15953069 6.5083162 3.0330499 0.11940389 6.4427646 0.9549225
## 30 0.15634865 9.4097279 1.7010257 1.60829848 6.4448752 0.9563857
## 31 0.07966525 1.2807537 4.3034549 -0.21285836 1.7910238 0.9790792
## 32 0.11290351 2.5635618 4.8462125 -1.40514291 3.4983463 0.9707137
## 33 0.02687652 1.0773905 3.9705451 0.15525793 0.4246041 0.9868351
## 34 0.01544941 0.7520661 2.8710479 0.27262650 0.1594778 0.9936883
Hence, there exist 34 distinct models and local determination coefficient (R2), each tailored to represent the characteristics of a specific province. Below is an exemplar model:
Hypothesis
H0: \(\beta_j(u_i,
v_i)=\beta_j\); \(j=1,2,...,4\)
H1: At least one \(\beta_j(u_i,
v_i) \neq \beta_j\)
Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)
Test Statistics
BFC02.gwr.test(gwr)
##
## Brunsdon, Fotheringham & Charlton (2002, pp. 91-2) ANOVA
##
## data: gwr
## F = 7.2057, df1 = 29.000, df2 = 14.251, p-value = 0.000153
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR residuals
## 95.62953 13.27137
Decision
\(p-value = 0.000153 < \alpha =
0.05\) means that the H0 is rejected.
Conclusion
There exists a notable disparity between the MLR and GWR models in terms
of goodness of fit.
Hypothesis
H0: There is spatial effect
H1: There is no spatial effect
Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)
Test Statistics
LMZ.F3GWR.test(gwr)
##
## Leung et al. (2000) F(3) test
##
## F statistic Numerator d.f. Denominator d.f. Pr(>)
## (Intercept) 1.69258 13.19849 19.149 0.143198
## X1 3.10676 15.08858 19.149 0.010563 *
## X2 3.57655 14.68519 19.149 0.005096 **
## X3 0.59755 7.67783 19.149 0.762904
## X4 2.08687 13.09254 19.149 0.069812 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Decision| Coefficient | \(p-value\) | Decision (\(\alpha=0.05\)) |
|---|---|---|
| X1 | 0.010563 | H0 is rejected |
| X2 | 0.005096 | H0 is not rejected |
| X3 | 0.762904 | H0 is not rejected |
| X4 | 0.069812 | H0 is rejected |
Conclusion
There are spatial effect in X1 and X2
Hypothesis
H0: The residuals of the model are normally distributed
H1: The residuals of the model are not normally
distributed
Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)
Test Statistics
predlokal=gwr$SDF$"pred"
galatlokal=data$Y-predlokal
galatlokalabs<-abs(galatlokal)
ks.test(galatlokal,"pnorm",mean(galatlokal),sd(galatlokal),alternative=c("two.sided"))
##
## Exact one-sample Kolmogorov-Smirnov test
##
## data: galatlokal
## D = 0.1105, p-value = 0.7599
## alternative hypothesis: two-sided
Decision
\(p-value = 0.7599 > \alpha = 0.05\)
means that the H0 fails to be rejected.
Conclusion
The residuals of the model are normally distributed
Hypothesis
H0: There is no autocorrelation in the residuals
H1: There is autocorrelation in the residuals
Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)
Test Statistics
dwtest(lm(galatlokal~X1+X2+X3+X4, data=data))
##
## Durbin-Watson test
##
## data: lm(galatlokal ~ X1 + X2 + X3 + X4, data = data)
## DW = 1.951, p-value = 0.3376
## alternative hypothesis: true autocorrelation is greater than 0
Decision
\(p-value = 0.3376 > \alpha = 0.05\)
means that the H0 fails to be rejected.
Conclusion
There is no autocorrelation in the residuals
Hypothesis
H0: There is no heteroscedasticity in the residuals
H1: There is heteroscedasticity in the residuals
Critical Region
H0 is rejected if \(p-value<\alpha=0.05\)
Test Statistics
bptest(lm(galatlokal~X1+X2+X3+X4,data=data))
##
## studentized Breusch-Pagan test
##
## data: lm(galatlokal ~ X1 + X2 + X3 + X4, data = data)
## BP = 5.6138, df = 4, p-value = 0.2299
Decision
\(p-value = 0.2299 > \alpha = 0.05\)
means that the H0 is rejected.
Conclusion
There is no heteroscedasticity in the residuals