The college scorecard is a data set collected and provided by the US department of education.Each row of this data is a post secondary educational institution and each column describes something about the institution.
Here is a collection of summary statistics I find interesting about this data.
scorecard %>%
summarise(`Most expensive tuition` = max(COSTT4_A, na.rm = TRUE),
`Most students` = max(as.numeric(UGDS), na.rm = TRUE),
`Highest acceptance rate` = max(ADM_RATE, na.rm = TRUE))
## # A tibble: 1 × 3
## `Most expensive tuition` `Most students` `Highest acceptance rate`
## <dbl> <dbl> <dbl>
## 1 93704 77269 1
A institution has a total number of students equal to a medium sized city and I imagine that is not the same institution charging over $93k for tuition every year.
Do Ohio schools have a higher cost to attend the Californa Schools?
scorecard %>%
filter(STABBR %in% c("OH", "CA")) %>%
group_by(STABBR) %>%
summarise(`Average Cost` = mean(COSTT4_A, na.rm=TRUE)) %>%
ggplot(aes(x=STABBR, y= `Average Cost`))+
geom_col()
It appears that California School have a higher cost of tuition then Ohio, this could be due to the higher cost of living in California.