#The College Scorecard# The College Scorecard is a data set collected and provided by the us department of education. Each row is a post secondary institution and each column describes something about the institution.

#Summary Statistics# Here is a collection of summary Statistics I find interesting about this data.

scorecard %>% 
  summarise(`Most Expensive Tuition` = max(COSTT4_A, na.rm = TRUE),
            `Most Students` = max(as.numeric(UGDS), na.rm = TRUE),
            `Highest Acceptance Rate` = max(ADM_RATE , na.rm = TRUE),
            )
## # A tibble: 1 × 3
##   `Most Expensive Tuition` `Most Students` `Highest Acceptance Rate`
##                      <dbl>           <dbl>                     <dbl>
## 1                    93704           77269                         1

An institution has a total number of students equal to a medium sized city and I imagine that this is not the same institution charging over $93k for tuition every year.

#Hypothesis#

Do Students who come from higher income famlies tend to go to more expensive schools?

scorecard %>%
  select(FAMINC,COSTT4_A) %>% 
  na.omit(FAMINC) %>%
  ggplot(aes(as.numeric(FAMINC), COSTT4_A))+
  geom_point()+
  geom_smooth(method = lm)

As seen in the scatter plot, as family income increases the cost of attending an instituion increases. This indicates that there exists some correlation between houshold income and cost for attending.