library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
  1. Use your approved data-set
teacher_data <- read_csv("Teacher_Hiring_Certification_Turnover.csv")
## Rows: 33 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): REGION, distname, geotype_new, region_lea, Year
## dbl (20): district, schyr, intern, other_temp, oos_std, lag_starter, no_cert...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. Select some variables of interest and see if there is any obvious correlations using the COR command
teacher_data <- teacher_data %>% rename(teacher_attrition = turnover_rate_teachers)

Variable Definitions:

(Dependent) teacher_attrition: indicates whether a teacher left their position within a given time academic year

(Independent) beg_year: teachers who are in their first year of teaching experience

1-5_years: teachers with 1 to 5 years of teaching experience

6-10_years: teachers with 6 to 10 years of teaching experience

11-20_years: 11 to 20 years of teaching experience

over20_years: teachers with over 20 years of teaching experience

cor(teacher_data$beg_year, teacher_data$teacher_attrition)
## [1] 0.5663391
cor(teacher_data$`1-5_years`, teacher_data$teacher_attrition)
## [1] 0.2895374
cor(teacher_data$`6-10_years`, teacher_data$teacher_attrition)
## [1] 0.3116338
cor(teacher_data$`11-20_years`, teacher_data$teacher_attrition)
## [1] 0.327162
cor(teacher_data$over20_years, teacher_data$teacher_attrition)
## [1] 0.1164551
  1. Examine the same variables visually using the PAIRS command
pairs(~ beg_year + `1-5_years` + `6-10_years` + `11-20_years` + over20_years + teacher_attrition, data = teacher_data)

  1. Select two variables that seem correlated (positively or negatively) and examine them using PEARSON, SPEARMAN or KENDALL (depending on which is more appropriate)
cor.test(teacher_data$beg_year,teacher_data$teacher_attrition, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  teacher_data$beg_year and teacher_data$teacher_attrition
## t = 3.826, df = 31, p-value = 0.0005911
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2768597 0.7615755
## sample estimates:
##       cor 
## 0.5663391
  1. Explain your findings and justify your choice in selecting the correlation method

The Pearson correlation analysis showed a coefficient of about 0.5663, indicating a moderate positive relationship between “beg_year” and “teacher_attrition”. The p-value of 0.0005911 indicates that this correlation is statistically significant, suggesting that more recent teachers are likely to leave their positions.

Using Pearson’s correlation makes sense in this case because both “beg_year” and “teacher_attrition” are continuous variables that show a linear relationship.

  1. publish to Rpubs and submit via CANVAS