The college admissions dataset explores how income relates to college accessibility across characteristics using variables such as:
- income percentile
- income group
- attend_rate
- attend_rate_sat
- apply_rate
- apply_rate_sat
- rel_apply -rel_attend
12/15/2025
The college admissions dataset explores how income relates to college accessibility across characteristics using variables such as:
In this study we examine whether patterns of college selectivity vary systematically by examine differences in institutional type, tier, specific institutions, and socioeconomic status of the student. The study explores whether and to what extend these characteristics of an institution or applicant apply availability for college.
1st- Explored variables in our data set and metadata
2nd- Cleaned raw data set
3rd- Explored relation between college characteristics
colleges_1 <- colleges |> #selecting the variables to look at
select(name, par_income_bin, attend, attend_sat, rel_apply,
rel_attend, rel_att_cond_app,rel_apply_sat,
rel_attend_sat,
attend_unwgt, public, tier)
colleges_clean1 <- colleges_1 |>
mutate(public = if_else(public == TRUE,"public", "private")) |>
rename(type = "public", income_percentile = "par_income_bin",
attend_rate = "attend",
attend_raw = "attend_unwgt",
rel_attend_apply_ratio = "rel_att_cond_app",
) |> #renaming columns
mutate(name = factor(name),#making categorical into factors
type = factor(type),
tier = factor(tier)) |>
relocate(type, .after = name) |>
#relocating type to after the college name- makes sense to me
mutate(income_lable = case_when(
income_percentile <= 20 ~ "lowest",
income_percentile <= 40 ~ "low",
income_percentile <= 60 ~ "medium",
income_percentile <= 80 ~ "high",
TRUE ~ "highest"))
Using the visualizations and analysis of our data, it is reasonable to conclude that socioeconomic status is related to college accessibility. Across all institutions in our study, there was not only a higher attendance and application rate among students with higher income backgrounds, but also much more data on those students. Additionally when we examined the relation between income percentile and attendance rate we saw a positive correlation that could be used to clarify why exactly there is so much more data on higher socioeconomic backgrounds.