The College Scorecard

The college scorecard is a dataset collected and provided by the US department of education. Each row of this data is a post-secondary educational institution and each column describes something about the institution.

Summary Statistics

Here is a collection of summary statistics I find interesting about this data.

scorecard %>% 
  summarise(`Most expensive tuition` = max(COSTT4_A, na.rm = TRUE),
            `Most students` = max(as.numeric(UGDS), na.rm = TRUE),
            `Highest acceptance rate` = max(ADM_RATE, na.rm = TRUE))
## # A tibble: 1 × 3
##   `Most expensive tuition` `Most students` `Highest acceptance rate`
##                      <dbl>           <dbl>                     <dbl>
## 1                    93704           77269                         1

An institution has a total number of students equal to a median sized city and I imagine that is not the same institution charging over 93K for tuition every year.

A hypothesis

How is being a first generation student correlated to your family income

scorecard %>%
  ggplot(aes(x=FIRST_GEN, y=FAMINC))+
  geom_point()

scorecard %>% 
  filter(STABBR %in% c("OH", "MI")) %>% 
  group_by(STABBR) %>% 
  summarise(`Average Cost` = mean(COSTT4_A, na.rm=TRUE)) %>% 
  ggplot(aes(x=STABBR, y=`Average Cost`))+
  geom_col()