library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)

1) Create an Rmarkdown document with “district” data

district<-read_excel("district.xls")

2) Create a new data frame with “DISTNAME”, “DPETSPEP” (percent special education) and “DPFPASPEP”

DISTNAME<-district$DISTNAME
DPETSPEP<-district$DPETSPEP
DPFPASPEP<-district$DPFPASPEP
data<-district %>% select(DISTNAME,DPETSPEP,DPFPASPEP)

3) Give me “summary()” statistics for both DPETSPEP and DFPASPEP

summary(data)
##    DISTNAME            DPETSPEP       DPFPASPEP     
##  Length:1207        Min.   : 0.00   Min.   : 0.000  
##  Class :character   1st Qu.: 9.90   1st Qu.: 5.800  
##  Mode  :character   Median :12.10   Median : 8.900  
##                     Mean   :12.27   Mean   : 9.711  
##                     3rd Qu.:14.20   3rd Qu.:12.500  
##                     Max.   :51.70   Max.   :49.000  
##                                     NA's   :5

4) Which variable has missing values?

This variable with missing vales is DPFPASPEP.

5) remove the missing observations. How many are left overall?

DPFPASPEP_Cleaned<-data$DPFPASPEP %>% na.omit(.)
length(DPFPASPEP_Cleaned)
## [1] 1202

6) Create a point graph (hint: ggplot + geom_point()) to compare DPFPASPEP and DPETSPEP. Are they correlated?

compare_two<-district %>% select(DPFPASPEP,DPETSPEP)
compare_two_clean<-compare_two %>% na.omit(.)

ggplot(compare_two_clean,aes(x=DPFPASPEP,y=DPETSPEP)) + geom_point()

7) Do a mathematical check (cor()) of DPFPASPEP and DPETSPEP. What is the result?

cor(compare_two_clean$DPFPASPEP,compare_two_clean$DPETSPEP)
## [1] 0.3700234

8) How would you interpret these results?

To me, there does seem to be some correlation between the two variables. It doesn’t seem to be very strong, but it does appear that as the percent of special education increases so does the money spent on special education.