Ashley Cecil-Folds Homework 7

library(readxl)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(lmtest)

## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(MASS)

## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select

library(car)

## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:purrr':
## 
##     some

District_data <-read_excel("district.xls")

Percent_Meets_STAAR<-District_data$DDA00A001222R
Ratio<-District_data$DPSTKIDR
Turnover<-District_data$DPSTURNR
Experience_Yrs<-District_data$DPSTEXPA
Avg_Salary<-District_data$DPSTTOSA

District_Model<-lm(Percent_Meets_STAAR~Experience_Yrs+Avg_Salary,data=District_data)
plot(District_Model,which=1)

#Linearity

raintest(District_Model)

## 
##  Rainbow test
## 
## data:  District_Model
## Rain = 1.2234, df1 = 599, df2 = 596, p-value = 0.006926

The low P value on the rain test indicates that the data is NOT linear.

#Independence of Errors

durbinWatsonTest(District_Model)

##  lag Autocorrelation D-W Statistic p-value
##    1      0.04620498      1.905911   0.124
##  Alternative hypothesis: rho != 0

The P value for the Durbin Watson Test is greater than 0.05 meaning that the errors ARE independent. The D-W statistic is nearly 2 which indicates there is no autocorrelation in the model.

#Homoscedastiticity

plot(District_Model,which=3)

bptest(District_Model)

## 
##  studentized Breusch-Pagan test
## 
## data:  District_Model
## BP = 2.0926, df = 2, p-value = 0.3512

This BP test, signifies the acceptance of the null hypothesis. For this data, the null hypothesis is that the model is homoscedastic. In this case, the p-value is not significant due to far exceeding .05 leading me to ACCEPT homoscedasticity of the model.

#Normality of Residuals

plot(District_Model, which=2)

shapiro.test(District_Model$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  District_Model$residuals
## W = 0.99141, p-value = 1.859e-06

The p-value is well below 0.05, suggesting the residuals are significantly different from a normal distribution despite the QQ plotting the residuals nearly straight.

#No Multicolinarity

vif(District_Model)

## Experience_Yrs     Avg_Salary 
##       1.007856       1.007856

Neither VIF is over 10 means which denotes the variables are not strongly correlated with some other variable. For research purposes this means the assumption of “no multicolinearity” is not violated.

District_data_var<-District_data %>% dplyr::select(DDA00A001222R,DPSTTOSA,DPSTEXPA) %>% na.omit(.)

cor(District_data_var)

##               DDA00A001222R   DPSTTOSA   DPSTEXPA
## DDA00A001222R     1.0000000 0.10909469 0.34628360
## DPSTTOSA          0.1090947 1.00000000 0.08828914
## DPSTEXPA          0.3462836 0.08828914 1.00000000

Ashley Cecil-Folds Homework 7

2025-04-15