Trying to play catch up after missing lecture this week, using the lecture .rmd - hopefully I’m understanding it as much as I think I am!

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)

setwd("~/Desktop/UTSA/Quantitative Methods/RStudio")

district <- read_excel("district.xls")

clean_district<-district |> select(DPETGIFP, DPFVTOTK, DPSTURNR, DPSTEXPA, DPSAMIFP) |> drop_na()
  1. Use your approved data-set
  2. Select some variables of interest and see if there is any obvious correlations using the COR command
cor(clean_district)
##             DPETGIFP    DPFVTOTK   DPSTURNR   DPSTEXPA    DPSAMIFP
## DPETGIFP  1.00000000  0.04646257 -0.1905602  0.1323772 -0.08875024
## DPFVTOTK  0.04646257  1.00000000 -0.1499868  0.1769907 -0.07686949
## DPSTURNR -0.19056023 -0.14998682  1.0000000 -0.4851745  0.20198320
## DPSTEXPA  0.13237721  0.17699073 -0.4851745  1.0000000 -0.42871796
## DPSAMIFP -0.08875024 -0.07686949  0.2019832 -0.4287180  1.00000000
  1. Examine the same variables visually using the PAIRS command
pairs(~DPETGIFP+DPFVTOTK+DPSTURNR+DPSTEXPA+DPSAMIFP,data=clean_district)

  1. Select two variables that seem correlated (positively or negatively) and examine them using PEARSON, SPEARMAN or KENDALL (depending on which is more appropriate)
  2. Explain your findings and justify your choice in selecting the correlation method
cor.test(clean_district$DPSTURNR, clean_district$DPSTEXPA, method = "spearman")
## Warning in cor.test.default(clean_district$DPSTURNR, clean_district$DPSTEXPA, :
## Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  clean_district$DPSTURNR and clean_district$DPSTEXPA
## S = 416156254, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.4486065

I selected the variables Teacher Turnover Rate (DPSTURNR) and Teacher Average Years of Experience (DPSTEXPA) because they showed the strongest relationship in the correlation matrix (r = –0.485) I ran. This shows a moderate relationship that as turnover increases, average experience decreases - which makes sense if I think of it practically, I was suprised that number wasn’t even stronger off the bat. I used the Spearman rank correlation because both variables are continuous but not normally distributed, and the sample size is pretty large. The result (ρ = –0.449, p < 0.001) shows a statistically significant MODERATE negative correlation, suggesting that districts with higher teacher turnover rates tend to have less experienced teachers. Again, this seems quite obvious practically, but to see something seemingly simple broken down like this helped me understand it a bit better.