12/15/2025

College Accesability

The college admissions dataset explores how income relates to college accessibility across characteristics using variables such as:

  • income percentile
  • income group
  • attend_rate
  • attend_rate_sat
  • apply_rate
  • apply_rate_sat
  • rel_apply -rel_attend

Thesis

In this study we examine whether patterns of college selectivity vary systematically by examine differences in institutional type, tier, specific institutions, and socioeconomic status of the student. The study explores whether and to what extend these characteristics of an institution or applicant apply availability for college.

Exploring and Cleaning Data

1st- Explored variables in our data set and metadata

2nd- Cleaned raw data set

3rd- Explored relation between college characteristics

Cleaning Process 1

colleges_1 <- colleges |> #selecting the variables to look at
  select(name, par_income_bin, attend, attend_sat, rel_apply,
         rel_attend, rel_att_cond_app,rel_apply_sat, 
         rel_attend_sat, 
         attend_unwgt, public, tier)

Cleaning Process 2

colleges_clean1 <- colleges_1 |>
  mutate(public = if_else(public == TRUE,"public", "private")) |>
  rename(type = "public", income_percentile = "par_income_bin", 
         attend_rate = "attend",
         attend_raw = "attend_unwgt", 
         rel_attend_apply_ratio = "rel_att_cond_app",
        ) |> #renaming columns
  mutate(name = factor(name),#making categorical into factors
         type = factor(type),
         tier = factor(tier)) |> 
  relocate(type, .after = name) |>
  #relocating type to after the college name- makes sense to me 
 mutate(income_lable = case_when( 
  income_percentile <= 20 ~ "lowest",
  income_percentile <= 40 ~ "low",
  income_percentile <=  60 ~ "medium",
  income_percentile <= 80 ~ "high",
  TRUE ~ "highest"))

Public vs Private Selectability

Application Rate by Tier

Attendance Rate by Tier

Attendance Rate by Ivy League College

Apply to Attendace Rate by Income Group Plot

Count by Income Group

Average College Attendance Rate by Income Group

Conclusion

Using the visualizations and analysis of our data, it is reasonable to conclude that socioeconomic status is related to college accessibility. Across all institutions in our study, there was not only a higher attendance and application rate among students with higher income backgrounds, but also much more data on those students. Additionally when we examined the relation between income percentile and attendance rate we saw a positive correlation that could be used to clarify why exactly there is so much more data on higher socioeconomic backgrounds.