#The College Scorecard# The College Scorecard is a data set collected and provided by the us department of education. Each row is a post secondary institution and each column describes something about the institution.
#Summary Statistics# Here is a collection of summary Statistics I find interesting about this data.
scorecard %>%
summarise(`Most Expensive Tuition` = max(COSTT4_A, na.rm = TRUE),
`Most Students` = max(as.numeric(UGDS), na.rm = TRUE),
`Highest Acceptance Rate` = max(ADM_RATE , na.rm = TRUE),
)
## # A tibble: 1 × 3
## `Most Expensive Tuition` `Most Students` `Highest Acceptance Rate`
## <dbl> <dbl> <dbl>
## 1 93704 77269 1
An institution has a total number of students equal to a medium sized city and I imagine that this is not the same institution charging over $93k for tuition every year.
#Hypothesis#
Do Students who come from higher income famlies tend to go to more expensive schools?
scorecard %>%
select(FAMINC,COSTT4_A) %>%
na.omit(FAMINC) %>%
ggplot(aes(as.numeric(FAMINC), COSTT4_A))+
geom_point()+
geom_smooth(method = lm)
As
seen in the scatter plot, as family income increases the cost of
attending an instituion increases. This indicates that there exists some
correlation between houshold income and cost for attending.