library(readxl)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
district<-read_excel("district.xls")
#STEP 1 - Create R Markdown Phew, this took quite a while. I had to start fresh on a new computer, and upload R and R Studio and all the packages and get the library reading the data set “district.xls” I almost had to Zoom you in for assistance!
#STEP 2 - Create a new Data Frame let’s see if I can do this part quicker than step 1.
SpecialEdSpending<-data.frame(district$DISTNAME,district$DPETSPEP,district$DPFPASPEP)
I think I made it! I know have a Data in the upper right-hand corner called “SpecialEdSpending” that has 1207 observations as well as 3 variables, so it created a smaller data set from the larger district data frame.
#STEP 3 - SUMMARY
summary(SpecialEdSpending$district.DPETSPEP)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 9.90 12.10 12.27 14.20 51.70
summary(SpecialEdSpending$district.DPFPASPEP)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 5.800 8.900 9.711 12.500 49.000 5
Maybe I’m getting the hang of this….
#STEP 4 - Missing Variables From the summaries above, the column for DPFPASPEP or “money spent on special education” is missing 5 observations.
#STEP 5 - Remove Missing Observations
SpecialEdSpendingCLEAN<-SpecialEdSpending%>% na.omit(district.DPFPASPEP)
Not sure I had to create a new data frame, but I did. I had to filter with something other than “>0” because one of the observations for DPFPASPEP is in fact 0 percent so we should keep that as an observation, so I really only wanted to remove the “NA” observations.
#STEP 6 - Point Graph
ggplot(SpecialEdSpendingCLEAN,aes(x=SpecialEdSpendingCLEAN$district.DPETSPEP,y=SpecialEdSpendingCLEAN$district.DPFPASPEP)) + geom_point() +
labs(title = "Special Education",
x = "STUDENTS: % SPECIAL EDUCATION",
y = "EXPENDITURE: % SPECIAL EDUCATION")
This graph shows the percent of the student body in special education on
the X axis and percent of students in Special education on the Y Axis.
From this graph, there does look like a correlation, but it’s mainly
limited below 20% spending and 20% student body. There is a large clump
of the points, rather than spread out evenly in a positive or negative
correlation relationship. Instead, there seems to be some general
limitations on spending of about 20%, with a few exceptions here or
there.
#STEP 7 - Correlation
cor(SpecialEdSpendingCLEAN$district.DPETSPEP,SpecialEdSpendingCLEAN$district.DPFPASPEP)
## [1] 0.3700234
The percentage of students in Special Ed and the percentage of Spending on special ed are 0.3700234 correlated.
#STEP 8 - Interpretation The result of .037 is not close to 1.0 (which would be a VERY strong positive correlation). There IS a positive correlation here (i.e. when one increases so does the other variable) but it’s quite middle of the road correlation. This tells me that there is no indication that a higher percentage of students in higher ed correlate to a higher spending amount, and vice versa, a higher spending amount does not necessarily point to a higher percentage of the student body in higher ed.