library(readxl)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following object is masked from 'package:purrr':
##
## some
District_data <-read_excel("district.xls")
Percent_Meets_STAAR<-District_data$DDA00A001222R
Ratio<-District_data$DPSTKIDR
Turnover<-District_data$DPSTURNR
Experience_Yrs<-District_data$DPSTEXPA
Avg_Salary<-District_data$DPSTTOSA
District_Model<-lm(Percent_Meets_STAAR~Experience_Yrs+Avg_Salary,data=District_data)
plot(District_Model,which=1)
#Linearity
raintest(District_Model)
##
## Rainbow test
##
## data: District_Model
## Rain = 1.2234, df1 = 599, df2 = 596, p-value = 0.006926
The low P value on the rain test indicates that the data is NOT linear.
#Independence of Errors
durbinWatsonTest(District_Model)
## lag Autocorrelation D-W Statistic p-value
## 1 0.04620498 1.905911 0.124
## Alternative hypothesis: rho != 0
The P value for the Durbin Watson Test is greater than 0.05 meaning that the errors ARE independent. The D-W statistic is nearly 2 which indicates there is no autocorrelation in the model.
#Homoscedastiticity
plot(District_Model,which=3)
bptest(District_Model)
##
## studentized Breusch-Pagan test
##
## data: District_Model
## BP = 2.0926, df = 2, p-value = 0.3512
This BP test, signifies the acceptance of the null hypothesis. For this data, the null hypothesis is that the model is homoscedastic. In this case, the p-value is not significant due to far exceeding .05 leading me to ACCEPT homoscedasticity of the model.
#Normality of Residuals
plot(District_Model, which=2)
shapiro.test(District_Model$residuals)
##
## Shapiro-Wilk normality test
##
## data: District_Model$residuals
## W = 0.99141, p-value = 1.859e-06
The p-value is well below 0.05, suggesting the residuals are significantly different from a normal distribution despite the QQ plotting the residuals nearly straight.
#No Multicolinarity
vif(District_Model)
## Experience_Yrs Avg_Salary
## 1.007856 1.007856
Neither VIF is over 10 means which denotes the variables are not strongly correlated with some other variable. For research purposes this means the assumption of “no multicolinearity” is not violated.
District_data_var<-District_data %>% dplyr::select(DDA00A001222R,DPSTTOSA,DPSTEXPA) %>% na.omit(.)
cor(District_data_var)
## DDA00A001222R DPSTTOSA DPSTEXPA
## DDA00A001222R 1.0000000 0.10909469 0.34628360
## DPSTTOSA 0.1090947 1.00000000 0.08828914
## DPSTEXPA 0.3462836 0.08828914 1.00000000