I don’t quite understand this in practice, apparently. And need to go back and study a lot more and practice different things - I think I’m getting lost in some basics. I decided to use the district data because a) I am currently in Capstone and we are doing qualitative research, and b) this doesn’t come naturally to me, so I thought I would use the one that would be simplest to learn as you obviously have experience and familiarity with it already.

library(readxl)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## āœ” dplyr     1.1.4     āœ” readr     2.1.5
## āœ” forcats   1.0.0     āœ” stringr   1.5.1
## āœ” ggplot2   3.5.2     āœ” tibble    3.3.0
## āœ” lubridate 1.9.4     āœ” tidyr     1.3.1
## āœ” purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## āœ– dplyr::filter() masks stats::filter()
## āœ– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
district<-read_excel("district.xls")
summary(district$DZCAMPUS)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   7.428   5.000 273.000
summary(district$DPSTTOSA)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   36081   50439   53382   53971   56919  110560       4
summary(district$DPETALLC)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##      4.0    337.5    884.0   4476.3   2746.0 193727.0
hist(district$DPETALLC)

plot(district$DPSSTOSA,district$DZCAMPUS)

cor(district$DPSTTOSA,district$DPSSTOSA)
## [1] NA