First ub_data will have the funding amounts to Upward
Bound programs in 2022.
There is a column (the one that begins wiht FY.2022.Funding…) that
has currency data as a character class which does not allow us to create
summations unless we change it to numeric. That is why I used the
parse_numer() function to transform that into a numeric
variable.
ub_data <- read.csv("~/Desktop/edu_opp_proj/ubfinal2022.csv")
opportunity_index <- read.csv("~/Desktop/edu_opp_proj/opp_index.csv")
summary <- ub_data %>%
drop_na() %>% # removing random blank rows
mutate(Funding2022 = parse_number(FY.2022.Funding..2022.23.Project.Yr.)) %>%
group_by(State) %>%
rename(INITALS = State) %>% # rename(NewColumnName = OldColumnName)
summarise(TotalPrograms = n_distinct(PR.Award.Number),
TotalServed = sum(Number.of.Participants),
AvgServed = mean(Number.of.Participants),
TotalFunded = sum(Funding2022),
AvgFunded = mean(Funding2022),
AvgPerStudent = TotalFunded / TotalServed)
We are going to now combine the data from the Opportunity Index with the UB funding rates to do even more analysis
all_data <- merge(summary, opportunity_index, by = "INITALS")
graph <- ggplot(all_data, aes(TotalServed, TotalFunded)) +
geom_point() + stat_smooth(method = "lm",
formula = y ~ x,
geom = "smooth") +
geom_text_repel(aes(label = INITALS))
graph
## Warning: ggrepel: 46 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps