Abstract
The focus of this analysis is on the various factors that contribute to the development of school infrastructure in Pakistan, a developing country. This study investigates the education performance of 145 cities in all of Pakistan and a multiple regression analysis is conducted to analyze the factors that contribute to the school infrastructure. In addition to that, the paper touches on how these variables affect the retention rate in schools as well as the enrollment rate. The findings for the factors that affect school infrastructure were as expected, having a solid boundary wall around the school along with sufficient access to electricity, drinking water as well as fully functioning toilets were all major positive factors.

Introduction

Education is a significant part of our society and having the privilege to obtain it gives you access to an unlimited amount of opportunities. This significance has not yet been fully realized by Pakistan, a developing country which currently is facing a serious crisis in their education system. As of today, Pakistan has about 22.8 million children aged 5-16 that are out of school, which represent about 44 percent of the population in this age group (UNICEF). There are plenty of reasons for why Pakistan has such an underdeveloped education system, and one of the of the major contributors is the poor school infrastructure. Despite various of studies being done on the importance that good school infrastructure has on education, Pakistan has been underperforming in this area. The purpose of this paper is to focus on what exactly contributes to a sufficient school infrastructure and how we could have a requirement for each district in Pakistan to have achieved the minimum level or “score” of school infrastructure.

Importing packages and reading in CSV

library(tidyverse)
library(moments)
library(olsrr)
library(mctest)
library(base)

# Upload, attach and class 
df <- read.csv("C:/Users/qures/Downloads/Consolidated (Educational Dataset) (1).csv",header=TRUE)
sapply(df,class)
##     ï..BW        DW       SIS         T         E 
## "numeric" "numeric" "numeric" "numeric" "numeric"
#Rename columns
names(df) <- c("Boundary.Wall","Drinking.Water","School.Infrastructure.Score","Toilet","Electricity")

#Correlation Matrix
cor(df)
##                             Boundary.Wall Drinking.Water School.Infrastructure.Score    Toilet Electricity
## Boundary.Wall                   1.0000000      0.8410532                   0.9406622 0.9134030   0.8508708
## Drinking.Water                  0.8410532      1.0000000                   0.9360965 0.8727824   0.8871347
## School.Infrastructure.Score     0.9406622      0.9360965                   1.0000000 0.9650095   0.9155520
## Toilet                          0.9134030      0.8727824                   0.9650095 1.0000000   0.8440545
## Electricity                     0.8508708      0.8871347                   0.9155520 0.8440545   1.0000000

Model Creation

#Create linear model
models <- lm(School.Infrastructure.Score~Boundary.Wall+Drinking.Water+Toilet+Electricity,data=df)
summary(models)
## 
## Call:
## lm(formula = School.Infrastructure.Score ~ Boundary.Wall + Drinking.Water + 
##     Toilet + Electricity, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.7822  -2.1247   0.5734   1.5429   9.3985 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.63933    0.80561   0.794    0.429    
## Boundary.Wall   0.21034    0.02798   7.517 6.08e-12 ***
## Drinking.Water  0.24019    0.02544   9.441  < 2e-16 ***
## Toilet          0.35248    0.02558  13.780  < 2e-16 ***
## Electricity     0.14947    0.02415   6.189 6.29e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.616 on 140 degrees of freedom
## Multiple R-squared:  0.9838, Adjusted R-squared:  0.9833 
## F-statistic:  2126 on 4 and 140 DF,  p-value: < 2.2e-16



The results of this analysis showed that a school infrastructure is improved greatly by having a boundary wall, a supply to fresh water, access to electricity and a sufficient number of sanitary toilets in schools. The main problem is that these requirements are not met in a large number of provinces in Pakistan, as it is reported that 48% of schools in the country do not have toilets, boundary walls, electricity or drinking water (Khan & Haq, 2016). It is likely that there are more factors that contribute to the school infrastructure, such as the current share of the GDP that is contributed to education or the poor repair and maintenance of the current infrastructure.

Model Diagnostics

#Jackknife Residuals
sort(rstudent(models))
##          57          58          59         139         132         140         125         134          32         127          46          61 
## -3.16120830 -2.85531289 -2.66868122 -2.11386602 -2.01525075 -1.70000979 -1.63184059 -1.58573432 -1.53572643 -1.46999507 -1.45033620 -1.40206689 
##         128         144         126         131          56         141          15         136         142          53         145          37 
## -1.37410160 -1.31523786 -1.30063184 -1.27104259 -1.21793843 -1.14824835 -1.12011431 -1.11775584 -1.10753844 -1.06986440 -1.03718432 -0.98480938 
##          54         123         135         130         137          20         133          27          21         129          36          49 
## -0.98193415 -0.93608547 -0.87896993 -0.86962789 -0.84282367 -0.82821016 -0.82762220 -0.78459578 -0.73416607 -0.73018985 -0.70764269 -0.65529023 
##          44         124          23         138          34          30          11         102          84          60          35          19 
## -0.60413111 -0.58688671 -0.56408851 -0.55876119 -0.48564641 -0.38099421 -0.37041403 -0.37037878 -0.35333363 -0.26123729 -0.25809819 -0.23293936 
##         110         107           7          33           1          29          50         114          18         104          25          24 
## -0.22379853 -0.19507425 -0.18842805 -0.18485206 -0.18054079 -0.17924531 -0.16621501 -0.13478299 -0.12119578 -0.08027441 -0.06443375 -0.05617945 
##          90         115           9         105         119          14          98           3          93         103          95          47 
## -0.03968436 -0.03856111 -0.02874915 -0.01709843  0.01958597  0.02220965  0.02698759  0.04365801  0.06886204  0.09293207  0.14348319  0.15199336 
##         116          40         113         117         109          41         111          88          48          22         101         106 
##  0.15969705  0.16281593  0.16493946  0.16871594  0.16873486  0.17109333  0.19040105  0.19750636  0.21348836  0.21521801  0.22792505  0.23114465 
##          89         118         108           8          79         120          92          16          82          65          99          39 
##  0.23622923  0.24709972  0.25262161  0.27813350  0.27827054  0.27934399  0.28424606  0.28670012  0.28929910  0.30433245  0.31835030  0.32842533 
##          94         122           4         100          96          28          80          10         121           5         112          45 
##  0.32917877  0.35471752  0.35592559  0.36279253  0.37735014  0.38080814  0.38246375  0.38712628  0.38991604  0.39702539  0.40233477  0.42554832 
##          52          78          97          87          70         143          66          31          13          63          68           6 
##  0.43127996  0.44053391  0.44561058  0.45092253  0.49655009  0.50894731  0.57279572  0.57327989  0.60007799  0.64633014  0.67545350  0.71456477 
##          42          91          74          69          38          86           2          43          72          76          71          75 
##  0.71984089  0.72719423  0.75568548  0.78453106  0.82068868  0.83614408  0.83986215  1.02308884  1.04956559  1.30653673  1.31899102  1.39354602 
##          51          81          83          55          62          67          12          17          26          64          73          77 
##  1.45204490  1.51134352  1.60165448  1.71619281  1.83692064  1.85809070  1.86458518  1.86674757  2.11479792  2.25507402  2.26464274  2.48577509 
##          85 
##  2.70813292
Jackknife.Residuals <- rstudent(models)
plot(fitted.values(models),Jackknife.Residuals)

#Skewness and Kurtosis
skewness(Jackknife.Residuals)
## [1] -0.1356942
kurtosis(Jackknife.Residuals)
## [1] 3.799746
#Overview of diagnostics
ols_plot_diagnostics(models)

plot of chunk unnamed-chunk-1plot of chunk unnamed-chunk-1plot of chunk unnamed-chunk-1plot of chunk unnamed-chunk-1

#VIF
ols_vif_tol(models)
##        Variables Tolerance      VIF
## 1  Boundary.Wall 0.1434641 6.970384
## 2 Drinking.Water 0.1595011 6.269548
## 3         Toilet 0.1281554 7.803027
## 4    Electricity 0.1752573 5.705895
#Eigenvalues (CI and CN)
eigprop(models)
## 
## Call:
## eigprop(mod = models)
## 
##   Eigenvalues      CI (Intercept) Boundary.Wall Drinking.Water Toilet Electricity
## 1      4.7297  1.0000      0.0053        0.0011         0.0015 0.0014      0.0021
## 2      0.1822  5.0944      0.5810        0.0001         0.0048 0.0122      0.0380
## 3      0.0455 10.1952      0.0469        0.0760         0.0420 0.2631      0.4423
## 4      0.0273 13.1674      0.0043        0.1500         0.7649 0.0091      0.3579
## 5      0.0153 17.6056      0.3624        0.7729         0.1867 0.7142      0.1597
## 
## ===============================
## Row 5==> Boundary.Wall, proportion 0.772891 >= 0.50 
## Row 4==> Drinking.Water, proportion 0.764940 >= 0.50 
## Row 5==> Toilet, proportion 0.714241 >= 0.50
eigprop(models,Inter=FALSE)
## 
## Call:
## eigprop(mod = models, Inter = FALSE)
## 
##   Eigenvalues      CI Boundary.Wall Drinking.Water Toilet Electricity
## 1      3.9012  1.0000        0.0023         0.0024 0.0023      0.0034
## 2      0.0489  8.9323        0.1541         0.0262 0.1535      0.5919
## 3      0.0276 11.8874        0.1231         0.9409 0.0082      0.3911
## 4      0.0223 13.2212        0.7205         0.0306 0.8360      0.0136
## 
## ===============================
## Row 4==> Boundary.Wall, proportion 0.720498 >= 0.50 
## Row 3==> Drinking.Water, proportion 0.940877 >= 0.50 
## Row 4==> Toilet, proportion 0.836027 >= 0.50 
## Row 2==> Electricity, proportion 0.591878 >= 0.50
#Stepwise approach
ols_step_both_p(models)
## 
##                                  Stepwise Selection Summary                                  
## --------------------------------------------------------------------------------------------
##                            Added/                   Adj.                                        
## Step       Variable       Removed     R-Square    R-Square      C(p)        AIC        RMSE     
## --------------------------------------------------------------------------------------------
##    1        Toilet        addition       0.931       0.931    453.4690    994.8074    7.3720    
##    2    Drinking.Water    addition       0.968       0.968    135.8170    884.9308    5.0300    
##    3    Boundary.Wall     addition       0.979       0.979     41.3090    824.2057    4.0660    
##    4     Electricity      addition       0.984       0.983      5.0000    791.1336    3.6157    
## --------------------------------------------------------------------------------------------


The diagnostics indicated a slight indication of a violation of the normality assumption but nothing too severe. As for the collinearity assumption, the condition index was computed, and there were no variables which exceeded 30, which suggests that there is no potential collinearity. In order to find the best possible regression model, stepwise regression was used. Having completed the diagnostics, a multiple regression analysis was conducted with the dependent variable being the school infrastructure score.