Homework3.knit

1) create an Rmarkdown document with “district” data (like this one)

title: “Victoria Escobedo PAD6833 Homework 3” author: “Victoria Escobedo” date: “2025-02-15” output: html_document —

District is saved as an xls so we cannot use the read.csv command. I had issues with this command since I forgot to load readxl.

library(readxl)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

district<-read_excel("district.xls")

create a new data frame with “DISTNAME”, “DPETSPEP” (percent special education) and “DPFPASPEP” (money spent on special education). call the dataframe whatever you want

To narrow the district data to just the district name, enrollment % special education, and Expenditures % special education are to be isolated in a new dataframe

New_District<-district%>%select(DISTNAME,DPETSPEP,DPFPASPEP)

give me “summary()” statistics for both DPETSPEP and DFPASPEP. You can summarize them separately if you want.

summary(New_District$DPETSPEP)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    9.90   12.10   12.27   14.20   51.70

summary(New_District$DPFPASPEP)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   5.800   8.900   9.711  12.500  49.000       5

Which variable has missing values? The expenditures has NA’s meaning some districts did not report.
remove the missing observations. How many are left overall?

Rev_District<-New_District%>%filter(DPFPASPEP>0)

There are 1,201 observations left.

Create a point graph (hint: ggplot + geom_point()) to compare DPFPASPEP and DPETSPEP. Are they correlated?

ggplot(data=Rev_District,mapping=aes(x=DPFPASPEP,y=DPETSPEP))+geom_point()

I tried to get fancy with titles and lables and failed. I used this code. Not sure where I messed up.

+labs(title=“Special Education”, subtitle=“Enrollment vs Expenditures”, x=“Expenditures”, y=“Enrollment”) This was the error: Error in +labs(title = “Special Education”, subtitle = “Enrollment vs Expenditures”, : invalid argument to unary operator

Do a mathematical check (cor()) of DPFPASPEP and DPETSPEP. What is the result?

cor(Rev_District$DPETSPEP,Rev_District$DPFPASPEP)

## [1] 0.371033

These correlate at 37.1%.

How would you interpret these results? (No real right or wrong answer – just tell me what you see)

With a 37% correlation, that means the relationship between expenditures and enrollment is not directly associated. I would like to check the data against a third variable like district size or community type.

Knit the Rmarkdown and submit to Rpubs for publishing
submit the link to Rpubs on CANVAS