Ashley Cecil-Folds PAD 6833 Homework 3

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)

district<-read_excel("district.xls")

Homework_Dataset<-district %>% select(DISTNAME,DPETSPEP,DPFPASPEP)

head(Homework_Dataset)

## # A tibble: 6 × 3
##   DISTNAME      DPETSPEP DPFPASPEP
##   <chr>            <dbl>     <dbl>
## 1 CAYUGA ISD        14.6      28.9
## 2 ELKHART ISD       12.1       8.8
## 3 FRANKSTON ISD     13.1       8.4
## 4 NECHES ISD        10.5      10.1
## 5 PALESTINE ISD     13.5       6.1
## 6 WESTWOOD ISD      14.5       9.4

summary(Homework_Dataset)

##    DISTNAME            DPETSPEP       DPFPASPEP     
##  Length:1207        Min.   : 0.00   Min.   : 0.000  
##  Class :character   1st Qu.: 9.90   1st Qu.: 5.800  
##  Mode  :character   Median :12.10   Median : 8.900  
##                     Mean   :12.27   Mean   : 9.711  
##                     3rd Qu.:14.20   3rd Qu.:12.500  
##                     Max.   :51.70   Max.   :49.000  
##                                     NA's   :5

DPFPASPEP is missing values with 5 NAs returned on the summary table

Homework_Dataset_cleaned<-Homework_Dataset %>% filter(DPFPASPEP>0)

summary(Homework_Dataset_cleaned)

##    DISTNAME            DPETSPEP      DPFPASPEP     
##  Length:1201        Min.   : 0.0   Min.   : 0.200  
##  Class :character   1st Qu.: 9.9   1st Qu.: 5.800  
##  Mode  :character   Median :12.2   Median : 8.900  
##                     Mean   :12.3   Mean   : 9.719  
##                     3rd Qu.:14.2   3rd Qu.:12.500  
##                     Max.   :51.7   Max.   :49.000

After cleaning the data frame, there are 1201 observations left overall

compare_two<-district %>% select(DISTNAME,DPETSPEP,DPFPASPEP)

compare_two<-compare_two %>% filter(DPFPASPEP>0)

ggplot(compare_two,aes(DPETSPEP,DPFPASPEP)) + geom_point()

looking at the collection of points, the two variables seem at least loosely correlated as there is an upward trend as values increase

cor(Homework_Dataset_cleaned$DPETSPEP, Homework_Dataset_cleaned$DPFPASPEP)

## [1] 0.371033

cor.test(Homework_Dataset_cleaned$DPETSPEP, Homework_Dataset_cleaned$DPFPASPEP)

## 
##  Pearson's product-moment correlation
## 
## data:  Homework_Dataset_cleaned$DPETSPEP and Homework_Dataset_cleaned$DPFPASPEP
## t = 13.835, df = 1199, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3212085 0.4188092
## sample estimates:
##      cor 
## 0.371033

the scientific notation of the p value is very close to zero which would indicate a significant correlation. One would expect to see a correlation between the percentage of students enrolled in special education and that of the amount spent on special education. A district would need to spend more as the percentage of students enrolled in special services increases to include the cost of employing special educators.

Ashley Cecil-Folds PAD 6833 Homework 3

2025-02-11

DPFPASPEP is missing values with 5 NAs returned on the summary table

After cleaning the data frame, there are 1201 observations left overall

looking at the collection of points, the two variables seem at least loosely correlated as there is an upward trend as values increase