library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

I have selected district teacher hiring, turnover, and certification data from Aldine ISD, Spring ISD, and Tomball ISD (Houston area school districts).

teacher_data <- read_csv("Teacher_Hiring_Certification_Turnover.csv")
## Rows: 33 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): REGION, distname, geotype_new, region_lea, Year
## dbl (20): district, schyr, intern, other_temp, oos_std, lag_starter, no_cert...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. summary(data$x)
summary(teacher_data)
##     district          schyr         REGION              intern     
##  Min.   :101902   Min.   :2013   Length:33          Min.   : 11.0  
##  1st Qu.:101902   1st Qu.:2015   Class :character   1st Qu.: 23.0  
##  Median :101919   Median :2018   Mode  :character   Median :101.0  
##  Mean   :101914   Mean   :2018                      Mean   :119.1  
##  3rd Qu.:101921   3rd Qu.:2021                      3rd Qu.:162.0  
##  Max.   :101921   Max.   :2023                      Max.   :371.0  
##    other_temp        oos_std        lag_starter      no_cert      
##  Min.   :  2.00   Min.   : 2.000   Min.   : 1.0   Min.   :  0.00  
##  1st Qu.:  9.00   1st Qu.: 4.000   1st Qu.: 8.0   1st Qu.:  5.00  
##  Median : 18.00   Median : 6.000   Median :11.0   Median : 25.00  
##  Mean   : 32.85   Mean   : 7.303   Mean   :14.7   Mean   : 44.73  
##  3rd Qu.: 42.00   3rd Qu.:10.000   3rd Qu.:15.0   3rd Qu.: 48.00  
##  Max.   :120.00   Max.   :16.000   Max.   :60.0   Max.   :395.00  
##    reenterer          emer          std_all         distname        
##  Min.   : 19.0   Min.   :0.000   Min.   : 10.00   Length:33         
##  1st Qu.: 55.0   1st Qu.:0.000   1st Qu.: 18.00   Class :character  
##  Median :121.0   Median :1.000   Median : 35.00   Mode  :character  
##  Mean   :117.9   Mean   :1.606   Mean   : 50.48                     
##  3rd Qu.:162.0   3rd Qu.:2.000   3rd Qu.: 67.00                     
##  Max.   :243.0   Max.   :6.000   Max.   :143.00                     
##  geotype_new        total_new_hires  region_lea            Year          
##  Length:33          Min.   : 65.0   Length:33          Length:33         
##  Class :character   1st Qu.:133.0   Class :character   Class :character  
##  Mode  :character   Median :381.0   Mode  :character   Mode  :character  
##                     Mean   :388.6                                        
##                     3rd Qu.:578.0                                        
##                     Max.   :878.0                                        
##  total_teachers turnover_rate_teachers    beg_year       1-5_years     
##  Min.   : 711   Min.   :0.1080         Min.   : 21.0   Min.   : 173.0  
##  1st Qu.:1172   1st Qu.:0.1520         1st Qu.: 36.0   1st Qu.: 256.0  
##  Median :2221   Median :0.1900         Median :264.0   Median : 711.0  
##  Mean   :2455   Mean   :0.1932         Mean   :258.5   Mean   : 797.2  
##  3rd Qu.:3942   3rd Qu.:0.2320         3rd Qu.:414.0   3rd Qu.:1276.0  
##  Max.   :4644   Max.   :0.3100         Max.   :646.0   Max.   :1761.0  
##    6-10_years     11-20_years     over20_years     st-per-tch   
##  Min.   :170.0   Min.   :229.0   Min.   :118.0   Min.   :14.90  
##  1st Qu.:286.0   1st Qu.:405.0   1st Qu.:172.0   1st Qu.:15.60  
##  Median :434.0   Median :546.0   Median :208.0   Median :16.10  
##  Mean   :516.7   Mean   :582.2   Mean   :300.2   Mean   :16.05  
##  3rd Qu.:706.0   3rd Qu.:841.0   3rd Qu.:528.0   3rd Qu.:16.40  
##  Max.   :987.0   Max.   :949.0   Max.   :588.0   Max.   :17.20  
##    num_st_mem   
##  Min.   :11723  
##  1st Qu.:18606  
##  Median :36028  
##  Mean   :39264  
##  3rd Qu.:63146  
##  Max.   :70277
  1. hist(data$x) - for continuous variables
teacher_data<-teacher_data %>%rename(turnover = turnover_rate_teachers)

ggplot(teacher_data, aes(turnover)) + geom_histogram(binwidth = 0.025, fill = "darkgreen", color = "black")

  1. plot(data\(x,data\)y) - to compare variables (ggplot is fine if you are comfortable with it)
ggplot(teacher_data, aes(x = total_teachers, y = turnover)) + geom_point(color = "darkgreen") + labs(title = "Scatterplot of Total Teachers vs Turnover Rate", x = "Total Teachers", y = "Turnover Rate")

  1. cor(data\(x,data\)y) - to see a correlation between two variables
cor(teacher_data$total_teachers, teacher_data$turnover, use = "complete.obs")
## [1] 0.3323673

There is a moderate positive correlation between the total amount of teachers and the turnover rate.