INTRODUCTION

First ub_data will have the funding amounts to Upward Bound programs in 2022.

There is a column (the one that begins wiht FY.2022.Funding…) that has currency data as a character class which does not allow us to create summations unless we change it to numeric. That is why I used the parse_numer() function to transform that into a numeric variable.

ub_data <- read.csv("~/Desktop/edu_opp_proj/ubfinal2022.csv")
opportunity_index <- read.csv("~/Desktop/edu_opp_proj/opp_index.csv")

summary <- ub_data %>%
  drop_na() %>% # removing random blank rows
  mutate(Funding2022 = parse_number(FY.2022.Funding..2022.23.Project.Yr.)) %>%
  group_by(State) %>%
  rename(INITALS = State) %>% # rename(NewColumnName = OldColumnName)
  summarise(TotalPrograms = n_distinct(PR.Award.Number),
            TotalServed = sum(Number.of.Participants),
            AvgServed = mean(Number.of.Participants),
            TotalFunded = sum(Funding2022),
            AvgFunded = mean(Funding2022),
            AvgPerStudent = TotalFunded / TotalServed)

COMBINING THE DATA

We are going to now combine the data from the Opportunity Index with the UB funding rates to do even more analysis

all_data <- merge(summary, opportunity_index, by = "INITALS")
graph <- ggplot(all_data, aes(TotalServed, TotalFunded)) +
  geom_point() + stat_smooth(method = "lm",
                             formula = y ~ x,
                             geom = "smooth") +
  geom_text_repel(aes(label = INITALS))

graph
## Warning: ggrepel: 46 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps