library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
I have selected district teacher hiring, turnover, and certification data from Aldine ISD, Spring ISD, and Tomball ISD (Houston area school districts).
teacher_data <- read_csv("Teacher_Hiring_Certification_Turnover.csv")
## Rows: 33 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): REGION, distname, geotype_new, region_lea, Year
## dbl (20): district, schyr, intern, other_temp, oos_std, lag_starter, no_cert...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(teacher_data)
## district schyr REGION intern
## Min. :101902 Min. :2013 Length:33 Min. : 11.0
## 1st Qu.:101902 1st Qu.:2015 Class :character 1st Qu.: 23.0
## Median :101919 Median :2018 Mode :character Median :101.0
## Mean :101914 Mean :2018 Mean :119.1
## 3rd Qu.:101921 3rd Qu.:2021 3rd Qu.:162.0
## Max. :101921 Max. :2023 Max. :371.0
## other_temp oos_std lag_starter no_cert
## Min. : 2.00 Min. : 2.000 Min. : 1.0 Min. : 0.00
## 1st Qu.: 9.00 1st Qu.: 4.000 1st Qu.: 8.0 1st Qu.: 5.00
## Median : 18.00 Median : 6.000 Median :11.0 Median : 25.00
## Mean : 32.85 Mean : 7.303 Mean :14.7 Mean : 44.73
## 3rd Qu.: 42.00 3rd Qu.:10.000 3rd Qu.:15.0 3rd Qu.: 48.00
## Max. :120.00 Max. :16.000 Max. :60.0 Max. :395.00
## reenterer emer std_all distname
## Min. : 19.0 Min. :0.000 Min. : 10.00 Length:33
## 1st Qu.: 55.0 1st Qu.:0.000 1st Qu.: 18.00 Class :character
## Median :121.0 Median :1.000 Median : 35.00 Mode :character
## Mean :117.9 Mean :1.606 Mean : 50.48
## 3rd Qu.:162.0 3rd Qu.:2.000 3rd Qu.: 67.00
## Max. :243.0 Max. :6.000 Max. :143.00
## geotype_new total_new_hires region_lea Year
## Length:33 Min. : 65.0 Length:33 Length:33
## Class :character 1st Qu.:133.0 Class :character Class :character
## Mode :character Median :381.0 Mode :character Mode :character
## Mean :388.6
## 3rd Qu.:578.0
## Max. :878.0
## total_teachers turnover_rate_teachers beg_year 1-5_years
## Min. : 711 Min. :0.1080 Min. : 21.0 Min. : 173.0
## 1st Qu.:1172 1st Qu.:0.1520 1st Qu.: 36.0 1st Qu.: 256.0
## Median :2221 Median :0.1900 Median :264.0 Median : 711.0
## Mean :2455 Mean :0.1932 Mean :258.5 Mean : 797.2
## 3rd Qu.:3942 3rd Qu.:0.2320 3rd Qu.:414.0 3rd Qu.:1276.0
## Max. :4644 Max. :0.3100 Max. :646.0 Max. :1761.0
## 6-10_years 11-20_years over20_years st-per-tch
## Min. :170.0 Min. :229.0 Min. :118.0 Min. :14.90
## 1st Qu.:286.0 1st Qu.:405.0 1st Qu.:172.0 1st Qu.:15.60
## Median :434.0 Median :546.0 Median :208.0 Median :16.10
## Mean :516.7 Mean :582.2 Mean :300.2 Mean :16.05
## 3rd Qu.:706.0 3rd Qu.:841.0 3rd Qu.:528.0 3rd Qu.:16.40
## Max. :987.0 Max. :949.0 Max. :588.0 Max. :17.20
## num_st_mem
## Min. :11723
## 1st Qu.:18606
## Median :36028
## Mean :39264
## 3rd Qu.:63146
## Max. :70277
teacher_data<-teacher_data %>%rename(turnover = turnover_rate_teachers)
ggplot(teacher_data, aes(turnover)) + geom_histogram(binwidth = 0.025, fill = "darkgreen", color = "black")
ggplot(teacher_data, aes(x = total_teachers, y = turnover)) + geom_point(color = "darkgreen") + labs(title = "Scatterplot of Total Teachers vs Turnover Rate", x = "Total Teachers", y = "Turnover Rate")
cor(teacher_data$total_teachers, teacher_data$turnover, use = "complete.obs")
## [1] 0.3323673
There is a moderate positive correlation between the total amount of teachers and the turnover rate.
…