The College Scorecard

The college scorecard is a dataset collected and provided by the US department of education. Each row of this data is a post-secondary educational institution and each column describes something about the institution.

scorecard <- read_csv("http://asayanalytics.com/scorecard_csv")
## Rows: 7115 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): INSTNM, CITY, STABBR, CONTROL, HBCU, UGDS, FAMINC
## dbl (22): ID, ZIP, LOCALE, LATITUDE, LONGITUDE, MENONLY, WOMENONLY, ADM_RATE...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Summary Statistics

Here is a collection of summary statistics I find interesting about this data.

scorecard %>%
  summarise(`Most expensive tuition` = max(COSTT4_A, na.rm = TRUE),
            `Most students` = max(as.numeric(UGDS), na.rm = TRUE),
            `Highest acceptance rate` = max(ADM_RATE, na.rm = TRUE))
## # A tibble: 1 × 3
##   `Most expensive tuition` `Most students` `Highest acceptance rate`
##                      <dbl>           <dbl>                     <dbl>
## 1                    93704           77269                         1

An institution has a total number of students equal to a medium sized city and I imagine that is not the same institution charging over $93k for tution every year.

A hypothesis

Here is a visualization of

scorecard %>%
  ggplot(aes(x=AVGFACSAL, y=COSTT4_A))+
  geom_point()+
  labs(title = "Relationship Between Average Faculty Salary and Cost for Attendence")

The average faculty salary seems to go up as cost for attendance goes up.