I have loaded the basic TEA dataset into Rmarkdown. My next step will be to merge DZEXOTHP, DZEXADSP and DZEXADMP into one category to better represent the administrative allocation expenditure percentage of the school budgets. I will first select the variables and weed out the N/A’s.

Researchdraftcleaned<-Researchdraft %>% select(DISTNAME,DZCAMPUS,DAGC4X21R,DA0AT21R,DPSTTOSA,DPSSTOSA,DPSUTOSA,DPSTKIDR,DPSTURNR,DPFEAINSP,DZEXADMP,DZEXADSP,DZEXOTHP) %>% na.omit(.)

I have taken out the NA’s and now will merge the DZEXOTHP, DZEXADSP, DZEXADMP variables into one to better represent the public administration expenditure in the education system for each district. I will title the new one DZADMIN.

Researchdraftcleaned2<- Researchdraftcleaned %>% mutate(DZADMIN = DZEXOTHP + DZEXADSP + DZEXADMP) %>% select(-DZEXOTHP, -DZEXADSP, -DZEXADMP)

I will now do a linear regression model with DAGC4X21R as the continuous dependent variable.

Researchdraftmodel <- lm(DAGC4X21R ~  DA0AT21R + DPSTTOSA + 
            DPSSTOSA + DPSUTOSA + DPSTKIDR + DPSTURNR + DPFEAINSP + DZADMIN, data = Researchdraftcleaned2)

summary(Researchdraftmodel)
## 
## Call:
## lm(formula = DAGC4X21R ~ DA0AT21R + DPSTTOSA + DPSSTOSA + DPSUTOSA + 
##     DPSTKIDR + DPSTURNR + DPFEAINSP + DZADMIN, data = Researchdraftcleaned2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -98.522  -1.648   1.484   4.071  18.013 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.577e+01  1.439e+01  -2.486  0.01307 *  
## DA0AT21R     1.771e+00  1.091e-01  16.235  < 2e-16 ***
## DPSTTOSA    -2.784e-04  8.654e-05  -3.217  0.00134 ** 
## DPSSTOSA     4.117e-05  3.344e-05   1.231  0.21852    
## DPSUTOSA    -1.815e-05  4.190e-05  -0.433  0.66500    
## DPSTKIDR     3.613e-01  1.238e-01   2.919  0.00359 ** 
## DPSTURNR    -9.907e-02  3.612e-02  -2.743  0.00620 ** 
## DPFEAINSP   -3.209e-01  1.001e-01  -3.207  0.00138 ** 
## DZADMIN     -2.965e-01  1.196e-01  -2.479  0.01332 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.795 on 1035 degrees of freedom
## Multiple R-squared:  0.2301, Adjusted R-squared:  0.2241 
## F-statistic: 38.66 on 8 and 1035 DF,  p-value: < 2.2e-16

I want to see first if the data is homoscedastic.

I will do a residual plot.

plot(Researchdraftmodel, which = 1)

I will attempt to correct the red line by square rooting the variables.

Researchdraftmodelsquareroot <- lm(DAGC4X21R ~ I(sqrt(DA0AT21R)) + I(sqrt(DPSTTOSA)) + I(sqrt(DPSSTOSA)) + I(sqrt(DPSUTOSA)) + I(sqrt(DPSTKIDR)) + I(sqrt(DPSTURNR)) + I(sqrt(DPFEAINSP)) + I(sqrt(DZADMIN)), data = Researchdraftcleaned2)
## Warning in sqrt(DPSSTOSA): NaNs produced
## Warning in sqrt(DPSUTOSA): NaNs produced
## Warning in sqrt(DPSTKIDR): NaNs produced
summary(Researchdraftmodelsquareroot)
## 
## Call:
## lm(formula = DAGC4X21R ~ I(sqrt(DA0AT21R)) + I(sqrt(DPSTTOSA)) + 
##     I(sqrt(DPSSTOSA)) + I(sqrt(DPSUTOSA)) + I(sqrt(DPSTKIDR)) + 
##     I(sqrt(DPSTURNR)) + I(sqrt(DPFEAINSP)) + I(sqrt(DZADMIN)), 
##     data = Researchdraftcleaned2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -98.634  -1.769   1.494   3.961  18.928 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -1.787e+02  2.686e+01  -6.653 4.64e-11 ***
## I(sqrt(DA0AT21R))   3.501e+01  1.968e+00  17.788  < 2e-16 ***
## I(sqrt(DPSTTOSA))  -8.660e-02  3.935e-02  -2.201 0.027980 *  
## I(sqrt(DPSSTOSA))  -1.871e-02  2.054e-02  -0.911 0.362583    
## I(sqrt(DPSUTOSA))  -8.099e-03  2.008e-02  -0.403 0.686743    
## I(sqrt(DPSTKIDR))   2.921e+00  8.845e-01   3.302 0.000992 ***
## I(sqrt(DPSTURNR))  -7.379e-01  3.210e-01  -2.298 0.021738 *  
## I(sqrt(DPFEAINSP)) -4.165e+00  1.406e+00  -2.963 0.003116 ** 
## I(sqrt(DZADMIN))   -2.884e+00  1.358e+00  -2.123 0.033981 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.417 on 1030 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.2573, Adjusted R-squared:  0.2516 
## F-statistic: 44.61 on 8 and 1030 DF,  p-value: < 2.2e-16

I will now residual plots if that worked.

plot(Researchdraftmodelsquareroot, which = 1)

I will try the Sharpio-Wilk Test now.

shapiro <- shapiro.test(residuals(Researchdraftmodelsquareroot))

print(shapiro)
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(Researchdraftmodelsquareroot)
## W = 0.5304, p-value < 2.2e-16
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
vif(Researchdraftmodelsquareroot)
##  I(sqrt(DA0AT21R))  I(sqrt(DPSTTOSA))  I(sqrt(DPSSTOSA))  I(sqrt(DPSUTOSA)) 
##           1.114780           2.062981           1.414584           1.616921 
##  I(sqrt(DPSTKIDR))  I(sqrt(DPSTURNR)) I(sqrt(DPFEAINSP))   I(sqrt(DZADMIN)) 
##           1.294203           1.150013           2.443574           2.410522