Trying to play catch up after missing lecture this week, using the lecture .rmd - hopefully I’m understanding it as much as I think I am!
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
setwd("~/Desktop/UTSA/Quantitative Methods/RStudio")
district <- read_excel("district.xls")
clean_district<-district |> select(DPETGIFP, DPFVTOTK, DPSTURNR, DPSTEXPA, DPSAMIFP) |> drop_na()
cor(clean_district)
## DPETGIFP DPFVTOTK DPSTURNR DPSTEXPA DPSAMIFP
## DPETGIFP 1.00000000 0.04646257 -0.1905602 0.1323772 -0.08875024
## DPFVTOTK 0.04646257 1.00000000 -0.1499868 0.1769907 -0.07686949
## DPSTURNR -0.19056023 -0.14998682 1.0000000 -0.4851745 0.20198320
## DPSTEXPA 0.13237721 0.17699073 -0.4851745 1.0000000 -0.42871796
## DPSAMIFP -0.08875024 -0.07686949 0.2019832 -0.4287180 1.00000000
pairs(~DPETGIFP+DPFVTOTK+DPSTURNR+DPSTEXPA+DPSAMIFP,data=clean_district)
cor.test(clean_district$DPSTURNR, clean_district$DPSTEXPA, method = "spearman")
## Warning in cor.test.default(clean_district$DPSTURNR, clean_district$DPSTEXPA, :
## Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: clean_district$DPSTURNR and clean_district$DPSTEXPA
## S = 416156254, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.4486065
I selected the variables Teacher Turnover Rate (DPSTURNR) and Teacher Average Years of Experience (DPSTEXPA) because they showed the strongest relationship in the correlation matrix (r = –0.485) I ran. This shows a moderate relationship that as turnover increases, average experience decreases - which makes sense if I think of it practically, I was suprised that number wasn’t even stronger off the bat. I used the Spearman rank correlation because both variables are continuous but not normally distributed, and the sample size is pretty large. The result (ρ = –0.449, p < 0.001) shows a statistically significant MODERATE negative correlation, suggesting that districts with higher teacher turnover rates tend to have less experienced teachers. Again, this seems quite obvious practically, but to see something seemingly simple broken down like this helped me understand it a bit better.