R Markdown

library(readxl)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
district<-read_excel("district.xls")

#STEP 1 - Create R Markdown Phew, this took quite a while. I had to start fresh on a new computer, and upload R and R Studio and all the packages and get the library reading the data set “district.xls” I almost had to Zoom you in for assistance!

#STEP 2 - Create a new Data Frame let’s see if I can do this part quicker than step 1.

SpecialEdSpending<-data.frame(district$DISTNAME,district$DPETSPEP,district$DPFPASPEP)

I think I made it! I know have a Data in the upper right-hand corner called “SpecialEdSpending” that has 1207 observations as well as 3 variables, so it created a smaller data set from the larger district data frame.

#STEP 3 - SUMMARY

summary(SpecialEdSpending$district.DPETSPEP)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    9.90   12.10   12.27   14.20   51.70
summary(SpecialEdSpending$district.DPFPASPEP)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   5.800   8.900   9.711  12.500  49.000       5

Maybe I’m getting the hang of this….

#STEP 4 - Missing Variables From the summaries above, the column for DPFPASPEP or “money spent on special education” is missing 5 observations.

#STEP 5 - Remove Missing Observations

SpecialEdSpendingCLEAN<-SpecialEdSpending%>% na.omit(district.DPFPASPEP)

Not sure I had to create a new data frame, but I did. I had to filter with something other than “>0” because one of the observations for DPFPASPEP is in fact 0 percent so we should keep that as an observation, so I really only wanted to remove the “NA” observations.

#STEP 6 - Point Graph

ggplot(SpecialEdSpendingCLEAN,aes(x=SpecialEdSpendingCLEAN$district.DPETSPEP,y=SpecialEdSpendingCLEAN$district.DPFPASPEP)) + geom_point() +
labs(title = "Special Education",
       x = "STUDENTS: % SPECIAL EDUCATION",
       y = "EXPENDITURE: % SPECIAL EDUCATION")

This graph shows the percent of the student body in special education on the X axis and percent of students in Special education on the Y Axis. From this graph, there does look like a correlation, but it’s mainly limited below 20% spending and 20% student body. There is a large clump of the points, rather than spread out evenly in a positive or negative correlation relationship. Instead, there seems to be some general limitations on spending of about 20%, with a few exceptions here or there.

#STEP 7 - Correlation

cor(SpecialEdSpendingCLEAN$district.DPETSPEP,SpecialEdSpendingCLEAN$district.DPFPASPEP)
## [1] 0.3700234

The percentage of students in Special Ed and the percentage of Spending on special ed are 0.3700234 correlated.

#STEP 8 - Interpretation The result of .037 is not close to 1.0 (which would be a VERY strong positive correlation). There IS a positive correlation here (i.e. when one increases so does the other variable) but it’s quite middle of the road correlation. This tells me that there is no indication that a higher percentage of students in higher ed correlate to a higher spending amount, and vice versa, a higher spending amount does not necessarily point to a higher percentage of the student body in higher ed.