library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
teacher_data <- read_csv("Teacher_Hiring_Certification_Turnover.csv")
## Rows: 33 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): REGION, distname, geotype_new, region_lea, Year
## dbl (20): district, schyr, intern, other_temp, oos_std, lag_starter, no_cert...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(teacher_data)
## # A tibble: 6 × 25
## district schyr REGION intern other_temp oos_std lag_starter no_cert reenterer
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 101902 2013 04 145 71 11 60 22 165
## 2 101902 2014 04 201 102 8 36 50 215
## 3 101902 2015 04 267 120 16 21 38 162
## 4 101902 2016 04 306 105 14 27 55 159
## 5 101902 2017 04 371 106 15 17 74 179
## 6 101902 2018 04 245 70 8 9 55 117
## # ℹ 16 more variables: emer <dbl>, std_all <dbl>, distname <chr>,
## # geotype_new <chr>, total_new_hires <dbl>, region_lea <chr>, Year <chr>,
## # total_teachers <dbl>, turnover_rate_teachers <dbl>, beg_year <dbl>,
## # `1-5_years` <dbl>, `6-10_years` <dbl>, `11-20_years` <dbl>,
## # over20_years <dbl>, `st-per-tch` <dbl>, num_st_mem <dbl>
dependent variable: turnover rate
independent variables: certification type, years of experience, student-teacher ratio
turnover_model <- lm(turnover_rate_teachers~intern + other_temp + oos_std + lag_starter + no_cert + reenterer + emer + std_all + beg_year + `1-5_years` + `6-10_years` + `11-20_years` + `over20_years` + `st-per-tch`, data=teacher_data)
summary(turnover_model)
##
## Call:
## lm(formula = turnover_rate_teachers ~ intern + other_temp + oos_std +
## lag_starter + no_cert + reenterer + emer + std_all + beg_year +
## `1-5_years` + `6-10_years` + `11-20_years` + over20_years +
## `st-per-tch`, data = teacher_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.026441 -0.007286 -0.001636 0.005288 0.050850
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.860e-02 1.568e-01 0.119 0.90686
## intern -1.330e-04 1.608e-04 -0.827 0.41904
## other_temp -1.221e-03 7.402e-04 -1.649 0.11642
## oos_std -7.055e-04 1.931e-03 -0.365 0.71904
## lag_starter -5.347e-04 5.932e-04 -0.901 0.37926
## no_cert 3.065e-04 1.589e-04 1.929 0.06967 .
## reenterer -3.633e-05 1.771e-04 -0.205 0.83977
## emer 1.363e-03 3.719e-03 0.367 0.71826
## std_all 1.039e-03 9.052e-04 1.148 0.26610
## beg_year 1.802e-04 7.556e-05 2.385 0.02826 *
## `1-5_years` 1.234e-04 4.134e-05 2.986 0.00792 **
## `6-10_years` 9.695e-05 1.013e-04 0.957 0.35123
## `11-20_years` 2.001e-04 1.046e-04 1.913 0.07182 .
## over20_years -7.627e-04 1.528e-04 -4.991 9.48e-05 ***
## `st-per-tch` 6.035e-03 9.500e-03 0.635 0.53322
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02133 on 18 degrees of freedom
## Multiple R-squared: 0.9225, Adjusted R-squared: 0.8622
## F-statistic: 15.3 on 14 and 18 DF, p-value: 3.288e-07
The model’s R-squared value is 0.9225, meaning it explains about 92.25% of the variation in the teacher turnover rate. This means the model does a good job of explaining the dependent variable (teacher turnover rates).
Significant Variables:
beg_year has a p-value of 0.02826
1-5_years has a p-value of 0.00792
over20_years has a p-value of 9.48e-05
Insignificant Variables: these variables have high p-values and do not significantly predict turnover rates in this model: intern, other_temp, oos_std, lag_starter, reenterer, emer, std_all, 6-10_years, 11-20_years, and st-per-tch
beg_year: The positive coefficient (1.802e-04) suggests that as the number of beginning teachers increases, the turnover rate also increases slightly
1-5_years: The positive coefficient (1.234e-04) suggests that as the number of teachers with 1-5 years of experience increases, the turnover rate increases
over20_years: The negative coefficient (-7.627e-04) suggests that as the number of teachers with over 20 years of experience increases, the turnover rate decreases
plot(turnover_model, which=1)
The model meets the assumption of linearity. There are some outliers (33,16,12) that might affect the model.