Abstract
The focus of this analysis is on the various factors that contribute to the development of school infrastructure in Pakistan, a developing country. This study investigates the education performance of 145 cities in all of Pakistan and a multiple regression analysis is conducted to analyze the factors that contribute to the school infrastructure. In addition to that, the paper touches on how these variables affect the retention rate in schools as well as the enrollment rate. The findings for the factors that affect school infrastructure were as expected, having a solid boundary wall around the school along with sufficient access to electricity, drinking water as well as fully functioning toilets were all major positive factors.
Introduction
Education is a significant part of our society and having the privilege to obtain it gives you access to an unlimited amount of opportunities. This significance has not yet been fully realized by Pakistan, a developing country which currently is facing a serious crisis in their education system. As of today, Pakistan has about 22.8 million children aged 5-16 that are out of school, which represent about 44 percent of the population in this age group (UNICEF). There are plenty of reasons for why Pakistan has such an underdeveloped education system, and one of the of the major contributors is the poor school infrastructure. Despite various of studies being done on the importance that good school infrastructure has on education, Pakistan has been underperforming in this area. The purpose of this paper is to focus on what exactly contributes to a sufficient school infrastructure and how we could have a requirement for each district in Pakistan to have achieved the minimum level or “score” of school infrastructure.
Importing packages and reading in CSV
library(tidyverse)
library(moments)
library(olsrr)
library(mctest)
library(base)
# Upload, attach and class
df <- read.csv("C:/Users/qures/Downloads/Consolidated (Educational Dataset) (1).csv",header=TRUE)
sapply(df,class)
## ï..BW DW SIS T E
## "numeric" "numeric" "numeric" "numeric" "numeric"
#Rename columns
names(df) <- c("Boundary.Wall","Drinking.Water","School.Infrastructure.Score","Toilet","Electricity")
#Correlation Matrix
cor(df)
## Boundary.Wall Drinking.Water School.Infrastructure.Score Toilet Electricity
## Boundary.Wall 1.0000000 0.8410532 0.9406622 0.9134030 0.8508708
## Drinking.Water 0.8410532 1.0000000 0.9360965 0.8727824 0.8871347
## School.Infrastructure.Score 0.9406622 0.9360965 1.0000000 0.9650095 0.9155520
## Toilet 0.9134030 0.8727824 0.9650095 1.0000000 0.8440545
## Electricity 0.8508708 0.8871347 0.9155520 0.8440545 1.0000000
Model Creation
#Create linear model
models <- lm(School.Infrastructure.Score~Boundary.Wall+Drinking.Water+Toilet+Electricity,data=df)
summary(models)
##
## Call:
## lm(formula = School.Infrastructure.Score ~ Boundary.Wall + Drinking.Water +
## Toilet + Electricity, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.7822 -2.1247 0.5734 1.5429 9.3985
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.63933 0.80561 0.794 0.429
## Boundary.Wall 0.21034 0.02798 7.517 6.08e-12 ***
## Drinking.Water 0.24019 0.02544 9.441 < 2e-16 ***
## Toilet 0.35248 0.02558 13.780 < 2e-16 ***
## Electricity 0.14947 0.02415 6.189 6.29e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.616 on 140 degrees of freedom
## Multiple R-squared: 0.9838, Adjusted R-squared: 0.9833
## F-statistic: 2126 on 4 and 140 DF, p-value: < 2.2e-16
The results of this analysis showed that a school infrastructure is improved greatly by having a boundary wall, a supply to fresh water, access to electricity and a sufficient number of sanitary toilets in schools. The main problem is that these requirements are not met in a large number of provinces in Pakistan, as it is reported that 48% of schools in the country do not have toilets, boundary walls, electricity or drinking water (Khan & Haq, 2016). It is likely that there are more factors that contribute to the school infrastructure, such as the current share of the GDP that is contributed to education or the poor repair and maintenance of the current infrastructure.
Model Diagnostics
#Jackknife Residuals
sort(rstudent(models))
## 57 58 59 139 132 140 125 134 32 127 46 61
## -3.16120830 -2.85531289 -2.66868122 -2.11386602 -2.01525075 -1.70000979 -1.63184059 -1.58573432 -1.53572643 -1.46999507 -1.45033620 -1.40206689
## 128 144 126 131 56 141 15 136 142 53 145 37
## -1.37410160 -1.31523786 -1.30063184 -1.27104259 -1.21793843 -1.14824835 -1.12011431 -1.11775584 -1.10753844 -1.06986440 -1.03718432 -0.98480938
## 54 123 135 130 137 20 133 27 21 129 36 49
## -0.98193415 -0.93608547 -0.87896993 -0.86962789 -0.84282367 -0.82821016 -0.82762220 -0.78459578 -0.73416607 -0.73018985 -0.70764269 -0.65529023
## 44 124 23 138 34 30 11 102 84 60 35 19
## -0.60413111 -0.58688671 -0.56408851 -0.55876119 -0.48564641 -0.38099421 -0.37041403 -0.37037878 -0.35333363 -0.26123729 -0.25809819 -0.23293936
## 110 107 7 33 1 29 50 114 18 104 25 24
## -0.22379853 -0.19507425 -0.18842805 -0.18485206 -0.18054079 -0.17924531 -0.16621501 -0.13478299 -0.12119578 -0.08027441 -0.06443375 -0.05617945
## 90 115 9 105 119 14 98 3 93 103 95 47
## -0.03968436 -0.03856111 -0.02874915 -0.01709843 0.01958597 0.02220965 0.02698759 0.04365801 0.06886204 0.09293207 0.14348319 0.15199336
## 116 40 113 117 109 41 111 88 48 22 101 106
## 0.15969705 0.16281593 0.16493946 0.16871594 0.16873486 0.17109333 0.19040105 0.19750636 0.21348836 0.21521801 0.22792505 0.23114465
## 89 118 108 8 79 120 92 16 82 65 99 39
## 0.23622923 0.24709972 0.25262161 0.27813350 0.27827054 0.27934399 0.28424606 0.28670012 0.28929910 0.30433245 0.31835030 0.32842533
## 94 122 4 100 96 28 80 10 121 5 112 45
## 0.32917877 0.35471752 0.35592559 0.36279253 0.37735014 0.38080814 0.38246375 0.38712628 0.38991604 0.39702539 0.40233477 0.42554832
## 52 78 97 87 70 143 66 31 13 63 68 6
## 0.43127996 0.44053391 0.44561058 0.45092253 0.49655009 0.50894731 0.57279572 0.57327989 0.60007799 0.64633014 0.67545350 0.71456477
## 42 91 74 69 38 86 2 43 72 76 71 75
## 0.71984089 0.72719423 0.75568548 0.78453106 0.82068868 0.83614408 0.83986215 1.02308884 1.04956559 1.30653673 1.31899102 1.39354602
## 51 81 83 55 62 67 12 17 26 64 73 77
## 1.45204490 1.51134352 1.60165448 1.71619281 1.83692064 1.85809070 1.86458518 1.86674757 2.11479792 2.25507402 2.26464274 2.48577509
## 85
## 2.70813292
Jackknife.Residuals <- rstudent(models)
plot(fitted.values(models),Jackknife.Residuals)
#Skewness and Kurtosis
skewness(Jackknife.Residuals)
## [1] -0.1356942
kurtosis(Jackknife.Residuals)
## [1] 3.799746
#Overview of diagnostics
ols_plot_diagnostics(models)
#VIF
ols_vif_tol(models)
## Variables Tolerance VIF
## 1 Boundary.Wall 0.1434641 6.970384
## 2 Drinking.Water 0.1595011 6.269548
## 3 Toilet 0.1281554 7.803027
## 4 Electricity 0.1752573 5.705895
#Eigenvalues (CI and CN)
eigprop(models)
##
## Call:
## eigprop(mod = models)
##
## Eigenvalues CI (Intercept) Boundary.Wall Drinking.Water Toilet Electricity
## 1 4.7297 1.0000 0.0053 0.0011 0.0015 0.0014 0.0021
## 2 0.1822 5.0944 0.5810 0.0001 0.0048 0.0122 0.0380
## 3 0.0455 10.1952 0.0469 0.0760 0.0420 0.2631 0.4423
## 4 0.0273 13.1674 0.0043 0.1500 0.7649 0.0091 0.3579
## 5 0.0153 17.6056 0.3624 0.7729 0.1867 0.7142 0.1597
##
## ===============================
## Row 5==> Boundary.Wall, proportion 0.772891 >= 0.50
## Row 4==> Drinking.Water, proportion 0.764940 >= 0.50
## Row 5==> Toilet, proportion 0.714241 >= 0.50
eigprop(models,Inter=FALSE)
##
## Call:
## eigprop(mod = models, Inter = FALSE)
##
## Eigenvalues CI Boundary.Wall Drinking.Water Toilet Electricity
## 1 3.9012 1.0000 0.0023 0.0024 0.0023 0.0034
## 2 0.0489 8.9323 0.1541 0.0262 0.1535 0.5919
## 3 0.0276 11.8874 0.1231 0.9409 0.0082 0.3911
## 4 0.0223 13.2212 0.7205 0.0306 0.8360 0.0136
##
## ===============================
## Row 4==> Boundary.Wall, proportion 0.720498 >= 0.50
## Row 3==> Drinking.Water, proportion 0.940877 >= 0.50
## Row 4==> Toilet, proportion 0.836027 >= 0.50
## Row 2==> Electricity, proportion 0.591878 >= 0.50
#Stepwise approach
ols_step_both_p(models)
##
## Stepwise Selection Summary
## --------------------------------------------------------------------------------------------
## Added/ Adj.
## Step Variable Removed R-Square R-Square C(p) AIC RMSE
## --------------------------------------------------------------------------------------------
## 1 Toilet addition 0.931 0.931 453.4690 994.8074 7.3720
## 2 Drinking.Water addition 0.968 0.968 135.8170 884.9308 5.0300
## 3 Boundary.Wall addition 0.979 0.979 41.3090 824.2057 4.0660
## 4 Electricity addition 0.984 0.983 5.0000 791.1336 3.6157
## --------------------------------------------------------------------------------------------
The diagnostics indicated a slight indication of a violation of the normality assumption but nothing too severe. As for the collinearity assumption, the condition index was computed, and there were no variables which exceeded 30, which suggests that there is no potential collinearity. In order to find the best possible regression model, stepwise regression was used. Having completed the diagnostics, a multiple regression analysis was conducted with the dependent variable being the school infrastructure score.