In recent decades, the financial blueprint of university tuition has altered greatly, with tuition seeming to be generally on the rise. However, increases in tuition are not uniform from school to school, indicating that further study of the relationship between a university’s characteristics is important in ascertaining changes in costs. For example, according to the Education Data Initiative1, private university tuition has risen over 15,000 US dollars since 2010, compared to less than a 3,000 US dollar increase for public undergraduate programs. We are approaching this project with the following question in mind: How does a school’s location, institution type, and length of program affect tuition and room & board fees over time? Understanding factors that contribute to tuition costs is important, as the decision to attend college becomes increasingly economic due to rising prices. For example, according to the National Association of Financial Aid Administrators2, the majority of Americans who did not go to college in 2023 cited the financial burden as their reason. By analyzing the affordability of undergraduate education, we are providing insight into how the financials of these institutions have changed over time, potentially aiding in bringing awareness to the disparities that high costs have on the accessibility of higher education.
This observational data was collected by the The National Center for Education Statistics (NCES) and can be found in their annual Digest of Education Statistics. There are 3,548 total observations, with each row corresponding to one expense from one undergraduate institution, measured along 6 variables. The data, which was collected every year from 2013 to 2021, is a compilation of recorded expenses in American undergraduate institutions, divided by whether the expense is from a private, public in-state, or public out-of-state payment plan, and whether the expense arises from a 2-year or 4-year program. For each expense, the type, including if it is for room and board or tuition and general fees, and the actual price in US dollars are recorded, which will be imperative for quantitative comparisons between states and types of institutions. There is data from a different institution in each of the 50 US states, including Washington D.C. Considering there are relatively few variables, and they each could reveal a different aspect of our research question, we plan to use all of the data.
We took several steps to adapt this data for our own research. The first thing we wanted to do was to group states into broader geographical categories so we could create more readable visualizations and observe trends concerning geographical areas of the US rather than just individual states. First, we created the new variable “Region.” We used the function fct_collapse() to categorize states into the divisions New England, Mid-Atlantic, East North Central, West North Central, South Atlantic, Mountain, Pacific, East South Central, and West South Central regions. Next, we created the variable “Division” and used fct_collapse() again to further sort the divisions into the regions Northeast, Midwest, South, or West. Our next step was using the function pivot_wider() to turn the two levels from the original variable “Expense,” which were “Fees/Tuition Price” and “Room/Board Price,” into their own variables. Finally, all 2-year colleges had NA values for their room and board price, considering they do not have room and board expenses due to the nature of their institution. Thus, we filled these NA values with a 0 so that any subsequent price calculations would be more accurate.
# Loading in original dataset
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
LilyLizMiles_Data <- read_csv("~/Sds_164_S26/Project/Lily, Liz, Miles/Stage 2/nces330_20.csv")
## Rows: 3548 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): State, Type, Length, Expense
## dbl (2): Year, Value
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
LilyLizMiles_Data
# Creating region and division variables
LilyLizMiles_Data1 <- LilyLizMiles_Data |>
mutate(
Division = fct_collapse(
State,
"New England" = c(
"Connecticut",
"Maine",
"Massachusetts",
"New Hampshire",
"Rhode Island",
"Vermont"
),
"Mid Atlantic" = c(
"New Jersey",
"New York",
"Pennsylvania"
),
"East North Central" = c(
"Illinois",
"Indiana",
"Michigan",
"Ohio",
"Wisconsin"
),
"West North Central" = c(
"Iowa",
"Kansas",
"Minnesota",
"Missouri",
"Nebraska",
"North Dakota",
"South Dakota"
),
"South Atlantic" = c(
"Delaware",
"District of Columbia",
"Florida",
"Georgia",
"Maryland",
"North Carolina",
"South Carolina",
"Virginia",
"West Virginia"
),
"Mountain" = c(
"Arizona",
"Colorado",
"Idaho",
"Montana",
"Nevada",
"New Mexico",
"Utah",
"Wyoming"
),
"Pacific" = c(
"Alaska",
"California",
"Hawaii",
"Oregon",
"Washington"
),
"East South Central" = c(
"Alabama",
"Kentucky",
"Mississippi",
"Tennessee"
),
"West South Central" = c(
"Arkansas",
"Louisiana",
"Oklahoma",
"Texas"
)
)
) |>
relocate(Division, .after = State) |>
mutate(
Region = fct_collapse(
Division,
"Northeast" = c("New England", "Mid Atlantic"),
"Midwest" = c("East North Central", "West North Central"),
"South" = c("South Atlantic", "East South Central", "West South Central"),
"West" = c("Mountain", "Pacific")
)
) |>
relocate(Region, .after = Division)
# Pivoting columns for fees/tuition and room/board prices, renaming them, and creating a total price variable
LilyLizMiles_Data2 <- LilyLizMiles_Data1 |>
pivot_wider(
names_from = Expense,
values_from = Value,
values_fill = 0
)|>
rename(
"Fees/Tuition Price" = "Fees/Tuition",
"Room/Board Price" = "Room/Board"
) |>
mutate(
"Total Price" = `Fees/Tuition Price` + `Room/Board Price`
)
# 4-year Total Price Over Time by Type and Region Scatterplot
LilyLizMiles_Data2 |>
filter(Length == "4-year") |>
ggplot(aes(x = Year, y = `Total Price`, color = Region)) +
geom_point(alpha = 0.5) +
geom_smooth(se = FALSE) +
scale_x_continuous(breaks = c(2014, 2016, 2018, 2020)) +
facet_wrap(~ Type) +
labs(
title = "Total Price of 4-Year Colleges in US by Type and Region, 2013-2021",
y = "Total Price (USD)")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Figure 1. In general, for 4-year colleges across all regions in the US, prices rose from 2013 to 2021, however, in 2019, public in-state prices sharply dropped. Data was collected every year from 2013-2021 and includes 1,372 4-year institutions and is grouped by by payment type (Private, Public In-State, and Public Out-of-State) and US region.
Figure 1 shows that, for 4-year colleges specifically, private and public out-of-state prices rose from 2013 to 2021 in all US regions. Public in-state prices rose from 2013 to 2019 in all regions, but began to drop in 2019, and continued to drop through 2021, which could offer a direction for future research. For all three institution/payment types, the Northeast region had significantly higher average prices than all other regions. Private colleges in the West had notably lower prices than private colleges in other regions consistently from 2013 to 2021. All other regions stayed relatively similar to each another in terms of price for all payment types.
LilyLizMiles_Data2 |>
ggplot(aes(x = Year, y = `Fees/Tuition Price`, color = Length)) +
geom_point(alpha = 0.3) +
geom_smooth(se = FALSE) +
scale_x_continuous(breaks = c(2014, 2016, 2018, 2020)) +
facet_wrap(~ Region) +
labs(
title = "Fees/Tuition Prices of US Colleges/Universities By Length and Region",
subtitle = "For all 2-year and 4-year colleges/universities recorded by the NCES between 2013-2021",
y = "Fees/Tuition Price (USD)",
caption = "Data: National Center for Education Statistics (NCES)"
)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Figure 2. These scatterplots suggest that, across all four major US geographical regions between the years 2013 and 2021, 4-year colleges/universities consistently had a higher fees/tuition price (USD) per year than 2-year colleges/universities. For four year institutions, these prices appear to slightly increase between 2013 and 2019, then slightly decrease starting in 2019. This observational data was collected by the The National Center for Education Statistics (NCES) and can be found in their annual Digest of Education Statistics. The dataset included 3,548 total American undergraduate institutions and a compilation of their recorded expenses, collected every year from 2013 to 2021.
Figure 2 suggests that, across all four major US geographical regions between the years 2013 and 2021, 4-year colleges/universities consistently had a higher fees/tuition price (USD) per year than 2-year colleges/universities. On average, across the US, tuition and fees at 2-year institutions stayed under 10,000 USD, with very little variation between institutions. 4-year institutions, on the other hand, were almost double the price, sitting at almost 20,000 USD on average, a value that appeared to slightly increase between 2013 and 2019, then slightly decrease starting in 2019. Furthermore, these institutions varied much more than 2-year institutions, with some fees and tuition prices ranging from just a couple thousand USD to above 40,000 USD. Overall, Figure 2 suggests that 2-year institutions offer a much more cost-effective choice for students seeking a higher education degree, and that even for students looking for a bachelor’s degree, attending a 2-year institution and transferring to a 4-year institution could be a cost-effective choice.
LilyLizMiles_Data2 |>
mutate(
Division = fct_reorder(Division, `Fees/Tuition Price`)
) |>
ggplot(aes(x = Division, y = `Fees/Tuition Price`, fill = Region)) +
geom_boxplot() +
scale_x_discrete(guide = guide_axis(angle = 45)) +
labs(
title = "Fees/Tuition Prices by US Division and Region",
subtitle = "For all 2-year and 4-year colleges/universities recorded by the NCES between 2013-2021",
y = "Fees/Tuition Price (USD)",
caption = "Data: National Center for Education Statistics (NCES)"
)
Figure 3. Between 2013 and 2021, colleges/universities in the Northeast region, particularly in New England, tended to be more expensive than institutions in other regions, particularly those in the West. This observational data was collected by the The National Center for Education Statistics (NCES) and can be found in their annual Digest of Education Statistics. The dataset included 3,548 total American undergraduate institutions and a compilation of their recorded expenses, collected every year from 2013 to 2021.
Figure 3 offers numerous insights to the relationship between tuition prices, fees, and the school’s region. First, we see that schools in the Northeast Region (Teal) have visibly higher average costs, shown in both the Mid Atlantic and New England (NE) divisions, with NE holding a slight edge in average fees and maximum observed costs. The variation between the West, Midwest, and South are much less notable, but we do see higher average fees in the South Atlantic, East South Central, and East North Central divisions. Finally, schools in the Pacific, Mountain, West South Central, and West North Central divisions have the lowest average costs, with mean costs under 10,000 USD. Some of these divisions show lower maximum tuition prices, though these values do not seem to be related to the region of the school.
Our findings specifically investigating the role of region in US college/university costs offer numerous insights that contribute to the ongoing discussion of higher education fees in the United States. The grouping of states into geographical divisions and regions was intended to help us better understand geographical differences in costs, but the trends we observed between regions appeared similar: with the exception of public in-state tuition offerings, tuition prices have been steadily rising throughout the United States. These prices seem to be evidently higher for schools in the Northeast region and lower for those in the West, Midwest, and South, though the rate of growth seems to be quite similar. However, we also find that 2-year tuition expenses have not grown at the same rate as 4-year expenses, with much less variation between 2013 and 2021.
In summation, we see an evident disparity between the Northeast and other regions in terms of tuition fees, where the Northeast sees the highest average costs for higher education. We also see that tuition fees have steadily rose for 4-year institutions, but students that seek in-state education have actually seen cheaper education costs, signaling higher potential savings for prospective students that wish to remain within their home state. Finally, we find that 2-year programs remain the most significant cost-effective approach to earning a degree, especially if a student takes the opportunity to transfer to a 4-year institution after attending their 2-year institution, and that total costs for these 2-year institutions seem relatively stagnant across the United States. These findings help explain school-to-school and region-to-region variations in tuition prices. Further research might examine the relationship between price and the institutions education quality, but for this project, we are limited by the bounds of our dataset.
Hanson, M. (2025, September 23). Average Cost of College Over Time: Yearly Tuition Since 1970. Education Data Initiative. https://educationdata.org/average-cost-of-college-by-year↩︎
Carrasco, M. & NASFAA Staff Reporter. (2023, May 4). NASFAA | Survey: Financial Barriers Are Biggest Reasons Why People Don’t Enroll in Higher Education. NASFAA. National Association of Student Financial Aid Admissions. https://www.nasfaa.org/news-item/30616/Survey_Financial_Barriers_Are_Biggest_Reasons_Why_People_Don_t_Enroll_in_Higher_Education↩︎