We chose this project because we’ve always enjoyed browsing houses on Zillow and spotting market trends. As college students, that curiosity is becoming more relevant to our own futures. Soon, we’ll need to decide where we want to live and what kind of housing we can realistically afford. With that in mind, we set out to explore how housing affordability has changed across the United States in recent years. This project examines national and local trends in both home sale prices and rent prices, using Zillow data from 2008 to 2025 (https://www.zillow.com/research/data/). We analyze shifts in affordability over time and across cities, focusing on how these trends affect prospective buyers and renters today, especially younger adults entering the housing market for the first time.
# loading necessary libraries
library(readr)
library(tidyverse)
library(scales)
library(readxl)
library(ggplot2)
library(dplyr)
library(maps)
library(lubridate)
library(plotly)
library(tidyr)
library(gt)
# loading data
median_sale_price_all_homes <- read_csv("Metro_median_sale_price_now_uc_sfrcondo_month.csv", show_col_types = FALSE)
# filtering data to just US
us_data <- median_sale_price_all_homes %>%
filter(RegionName == "United States")
# removing columns and reshape the time-series data
us_long <- us_data %>%
select(-RegionID, -SizeRank, -RegionName, -RegionType, -StateName) %>%
pivot_longer(cols = everything(), names_to = "Date", values_to = "MedianPrice")
# converting date to proper format
us_long$Date <- as.Date(us_long$Date)
# plot
p <- ggplot(us_long, aes(x = Date, y = MedianPrice)) +
geom_line(color = "#4d8fac", linewidth = 1.5) +
labs(title = "US Median House Sale Price", x = "Year", y = "Median Sale Price") +
scale_y_continuous(labels = label_dollar(), breaks = pretty_breaks(n = 4)) +
theme_minimal(base_size = 15, base_family = "Times New Roman") +
theme(
plot.title = element_text(size = 20, face = "bold", color = "black"),
axis.title = element_text(size = 14, face = "bold", color = "black"),
axis.text = element_text(size = 12, color = "black"),
panel.grid.major = element_line(color = "#d3d3d3"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white")
)
ggplotly(p)
The plot above shows the U.S. median home sale price from February 2008 to March 2025. It clearly shows a substantial rise in home prices over time—particularly during and after the pandemic housing boom. Additionally, there is a noticeable seasonal pattern: prices tend to peak during the summer months, when demand is typically highest.
# cities to show
cities <- c(
"Seattle, WA", "Los Angeles, CA", "Salt Lake City, UT", "Austin, TX", "Tampa, FL",
"Indianapolis, IN", "New Orleans, LA", "Minneapolis, MN", "New York, NY", "Nashville, TN"
)
# filtering and selecting 2010 and 2025
cities_median_sale_price <- median_sale_price_all_homes %>%
filter(RegionName %in% cities) %>% # keeps only the rows where RegionName is one of the specified cities.
select(RegionName, `2010-03-31`, `2025-03-31`) %>% # selects only the city name and the two date columns.
pivot_longer(cols = c(`2010-03-31`, `2025-03-31`), names_to = "Date", values_to = "Price") # converts the wide format (one column per date) to long format, creating a Date column and a Price column.
# relabeling
cities_median_sale_price$Date <- recode(cities_median_sale_price$Date,"2010-03-31" = "March 2010","2025-03-31" = "March 2025")
# It converts the Date column from a character vector into a factor (a categorical variable), and it sets the order of the levels with March 2010 below March 2025
cities_median_sale_price$Date <- factor(cities_median_sale_price$Date, levels = c("March 2010", "March 2025"))
# reordering RegionName by March 2025 prices (descending)
city_order <- cities_median_sale_price %>%
filter(Date == "March 2025") %>% # keeps only the rows where the Date column is "March 2025"
arrange(desc(Price)) %>% # sorts those rows by Price, from highest to lowest
pull(RegionName) # extracts just the RegionName column as a vector
# converting the RegionName column into a factor and setting the order of the factor levels to match the vector city_order
cities_median_sale_price$RegionName <- factor(cities_median_sale_price$RegionName, levels = city_order)
# plotting
p <- ggplot(cities_median_sale_price, aes(x = RegionName, y = Price, fill = Date)) +
geom_bar(stat = "identity", position = position_dodge(), width = 0.7) + # side-by-side bars
scale_fill_manual(values = c("#4d8fac", "#003f5c")) +
labs(title = "Median House Sale Price: 2010 vs 2025",
x = "City",
y = "Median Sale Price",
fill = "Date") +
scale_y_continuous(labels = scales::label_dollar()) + # dollar formatting
theme_minimal(base_size = 15, base_family = "Times New Roman") +
theme(
plot.title = element_text(size = 20, face = "bold", color = "black"),
axis.title = element_text(size = 14, face = "bold", color = "black"),
axis.text = element_text(size = 12, color = "black"),
panel.grid.major = element_line(color = "#d3d3d3"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white"),
axis.text.x = element_text(angle = 45, hjust = 1)
)
ggplotly(p)
Next, we examine housing price shifts between March 2010 and March 2025 in 10 selected cities. This allows us to compare price growth across different regional markets.
# reading in data
rent_price_all_homes <- read_csv("Metro_zori_uc_sfr_sm_month.csv", show_col_types = FALSE)
# filtering data to just US
us_data_rent <- rent_price_all_homes %>%
filter(RegionName == "United States")
# reshape the time-series data
us_long_rent <- us_data_rent %>%
select(-RegionID, -SizeRank, -RegionName, -RegionType, -StateName) %>%
pivot_longer(cols = everything(), names_to = "Date", values_to = "RentPrice")
# converting date to proper format
us_long_rent$Date <- as.Date(us_long_rent$Date)
# plot
p <- ggplot(us_long_rent, aes(x = Date, y = RentPrice)) +
geom_line(color = "#4d8fac", linewidth = 1) +
labs(title = "US Rent Price for Single Family Residences", y = "Rent Price (per month)", x= "Year") +
scale_y_continuous(labels = label_dollar(), breaks = pretty_breaks(n = 4)) +
theme_minimal(base_size = 15, base_family = "Times New Roman") +
theme(
plot.title = element_text(size = 20, face = "bold", color = "black"),
axis.title = element_text(size = 14, face = "bold", color = "black"),
axis.text = element_text(size = 12, color = "black"),
panel.grid.major = element_line(color = "#d3d3d3"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white")
)
ggplotly(p)
This chart displays changes in the Zillow Observed Rent Index (ZORI) for single-family residences from January 2015 to March 2025. ZORI is a measure of typical market-rate rents that is calculated using repeat-rent data from the 35th to 65th percentile range. The index is weighted to reflect the broader rental housing stock which gives a representative view of market conditions beyond just currently listed rentals. Like the median sale price of homes in the United States, the graphic shows a significant increase in renting prices over time.
# filtering, selecting 2015 and 2025, and making longer
cities_rent_price <- rent_price_all_homes %>%
filter(RegionName %in% cities) %>%
select(RegionName, `2015-03-31`, `2025-03-31`) %>%
pivot_longer(cols = c(`2015-03-31`, `2025-03-31`), names_to = "Date", values_to = "Price")
# cleaning date labels
cities_rent_price$Date <- recode(cities_rent_price$Date, "2015-03-31" = "March 2015", "2025-03-31" = "March 2025")
# ensure bars are positioned side-by-side, with March 2015 below March 2025
cities_rent_price$Date <- factor(cities_rent_price$Date, levels = c("March 2015", "March 2025"))
# reordering RegionName by March 2025 rent prices (descending)
city_order_rent <- cities_rent_price %>%
filter(Date == "March 2025") %>%
arrange(desc(Price)) %>%
pull(RegionName)
# converting the RegionName column into a factor and setting the order of the factor levels to match the vector city_order_rent
cities_rent_price$RegionName <- factor(cities_rent_price$RegionName, levels = city_order_rent)
# plotting
p_rent <- ggplot(cities_rent_price, aes(x = RegionName, y = Price, fill = Date)) +
geom_bar(stat = "identity", position = position_dodge(), width = 0.7) +
scale_fill_manual(values = c("#4d8fac", "#003f5c")) +
labs(title = "Rent Price: 2015 vs 2025", x = "City", y = "Rent Price (per month)", fill = "Date") +
scale_y_continuous(labels = scales::label_dollar()) +
theme_minimal(base_size = 15, base_family = "Times New Roman") +
theme(
plot.title = element_text(size = 20, face = "bold", color = "black"),
axis.title = element_text(size = 14, face = "bold", color = "black"),
axis.text = element_text(size = 12, color = "black"),
panel.grid.major = element_line(color = "#d3d3d3"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white"),
axis.text.x = element_text(angle = 45, hjust = 1)
)
ggplotly(p_rent)
Finally, we look at rent price shifts from March 2015 to March 2025 across 10 cities, showing how housing prices have changed significantly over time in each city.
# reading in data
median_household_income <- read_csv("MEHOINUSA672N.csv", show_col_types = FALSE)
# converting date to proper format
median_household_income$observation_date <- as.Date(median_household_income$observation_date)
# plot
p <- ggplot(median_household_income, aes(x = observation_date, y = MEHOINUSA672N)) +
geom_line(color = "#4d8fac", linewidth = 1) +
labs(title = "US Median Household Income", x = "Year", y = "Income (per year)") +
scale_x_date(date_breaks = "2 year", date_labels = "%Y") +
scale_y_continuous(labels = label_dollar(), breaks = pretty_breaks(n = 4)) +
theme_minimal(base_size = 15, base_family = "Times New Roman") +
theme(
plot.title = element_text(size = 20, face = "bold", color = "black"),
axis.title = element_text(size = 14, face = "bold", color = "black"),
axis.text = element_text(size = 12, color = "black"),
panel.grid.major = element_line(color = "#d3d3d3"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white")
)
ggplotly(p)
This graph illustrates the trend in median household income in the United States from 2010 to 2023, using data from the Federal Reserve Economic Data (FRED). Over this period, the data shows a steady upward trend, indicating that median annual salaries have generally increased across the country.
# household income math
# data source: https://www2.census.gov/library/publications/2011/acs/acsbr10-02.pdf
median_household_income_2010 <- 50046
# data source: https://seekingalpha.com/article/4780401-median-household-income-march-2025
median_household_income_2025 <- 82675
percent_increase_household_income <- round(((median_household_income_2025 - median_household_income_2010) / median_household_income_2010) * 100)
# house price math (from Zillow data)
median_price_us_2010 <- us_long %>%
filter(Date == as.Date("2010-01-31")) %>%
pull(MedianPrice)
median_price_us_2025 <- us_long %>%
filter(Date == as.Date("2023-03-31")) %>%
pull(MedianPrice)
percent_increase_house_price <- round(((median_price_us_2025 - median_price_us_2010) / median_price_us_2010) * 100)
categories <- c("Median House Price", "Median Household Income")
percent_change <- c(percent_increase_house_price, percent_increase_household_income)
# data frame
growth_data <- data.frame(
Category = factor(categories, levels = categories),
PercentChange = percent_change,
Label = paste0("+", percent_change, "%"),
stringsAsFactors = FALSE # ensures character columns stay as characters
)
# plot
ggplot(growth_data, aes(x = Category, y = PercentChange)) +
geom_col(width = 0.5, fill = "#003f5c") +
geom_text(aes(label = Label), vjust = -0.5, size = 6, family = "Times New Roman") +
scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
labs(title = "Housing Prices vs Income Growth (2010–2025)", x = NULL, y = "Percent Increase") +
theme_minimal(base_size = 15, base_family = "Times New Roman") +
theme(
plot.title = element_text(size = 20, face = "bold", color = "black"),
axis.title = element_text(size = 14, face = "bold", color = "black"),
axis.text = element_text(size = 12, color = "black"),
panel.grid.major = element_line(color = "#d3d3d3"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white", color = NA),
legend.position = "none",
panel.border = element_blank(),
)
# reading in the excel file containing median home prices by state
median_sale_price_by_state <- read_xlsx("Median Home Price by State 2025.xlsx")
median_sale_price_by_state <- median_sale_price_by_state %>%
rename(state = `State or Territory`, # rename column to 'state'
price = `Median Home Price (US$)`) %>% # rename column to 'price'
mutate(state = tolower(state)) # convert state names to lowercase for matching
# load built-in map data for u.s. states
states_map <- map_data("state")
# merging map and price data
map_data_joined <- states_map %>%
left_join(median_sale_price_by_state, by = c("region" = "state"))
# join the map data with price data by matching region and state
p <- ggplot(map_data_joined, aes(x = long, y = lat, group = group, fill = price)) +
geom_polygon(color = "black") +
scale_fill_gradient(low = "#7ba9c6", high = "#002b40", labels = scales::dollar, name = "Median Home Price") +
coord_fixed(1.3) +
theme_minimal(base_size = 15, base_family = "Times New Roman") +
theme(
plot.title = element_text(size = 20, face = "bold", color = "black"),
axis.title = element_text(size = 14, face = "bold", color = "black"),
axis.text = element_text(size = 12, color = "black"),
panel.grid.major = element_line(color = "#d3d3d3"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white", color = NA),
panel.border = element_blank(),
)+
labs(title = "Median Home Price by State (2025)") +
labs(x = NULL, y = NULL)
ggplotly(p)
This map shows the median home price by state in 2025, offering a snapshot of housing affordability across the U.S. States with darker shades represent higher median home prices, helping potential buyers identify more affordable regions or prepare for the financial demands of more expensive markets.
# reading data
new_h_o_income_needed <- read_csv("Metro_new_homeowner_income_needed_downpayment_0.20_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv", show_col_types = FALSE)
# turning into long format
new_h_o_income_needed_long <- new_h_o_income_needed %>%
pivot_longer(
cols = -c(RegionID, SizeRank, RegionName, RegionType, StateName),
names_to = "Date",
values_to = "IncomeNeeded"
)
# converting to date type
new_h_o_income_needed_long$Date <- as.Date(new_h_o_income_needed_long$Date)
# filtering for March 2025
march_2025_income_needed_data <- new_h_o_income_needed_long %>%
filter(Date == "2025-03-31")
bottom_25 <- march_2025_income_needed_data %>%
arrange(IncomeNeeded) %>%
head(25)
# plotting
p <- ggplot(bottom_25, aes(x = IncomeNeeded, y = reorder(RegionName, IncomeNeeded))) +
geom_text(aes(label = "$"), size = 6, family = "Times New Roman", color = "#003f5c") +
labs(
title = "Lowest Incomes to Own a Home By City",
subtitle = "Each dollar sign marks a city's estimated income needed to afford a typical home (20% down, <30% income on housing)",
x = "Annual Income Needed (USD)",
y = NULL
) +
scale_x_continuous(labels = scales::dollar_format()) +
theme_minimal(base_size = 15, base_family = "Times New Roman") +
theme(
plot.title = element_text(size = 20, face = "bold", color = "black"),
axis.title = element_text(size = 14, face = "bold", color = "black"),
axis.text = element_text(size = 12, color = "black"),
panel.grid.major = element_line(color = "#d3d3d3"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white", color = NA),
panel.border = element_blank(),
)
ggplotly(p, tooltip = c("y", "x"))
This graph uses data from Zillow to estimate the annual income needed to afford a typical home in U.S. metro areas as of March 2025, assuming a 20% down payment and that homeowners spend no more than 30% of their monthly income on housing costs. The chart highlights the 25 metro areas where new homeowners need the lowest income, with each dollar sign representing a metro area ordered by income required. Smaller cities and southern metros tend to have the most affordable housing, where home ownership is more accessible to the average American.
new_construction_median_sale_price <- read_csv("Metro_new_con_median_sale_price_uc_sfrcondo_month.csv", show_col_types = FALSE)
# filtering for US
us_new_const_median_sp <- new_construction_median_sale_price %>%
filter(RegionName == "United States")
# reshape the time-series data
us_new_const_median_sp_long <- us_new_const_median_sp %>%
select(-RegionID, -SizeRank, -RegionName, -RegionType, -StateName) %>%
pivot_longer(cols = everything(), names_to = "Date", values_to = "MedianPriceNewConstruction")
# making date columns are date type
us_new_const_median_sp_long$Date <- as.Date(us_new_const_median_sp_long$Date)
us_long$Date <- as.Date(us_long$Date)
# add a label column to each for merging
us_new_const_median_sp_long <- us_new_const_median_sp_long %>%
rename(MedianPrice = MedianPriceNewConstruction) %>%
mutate(Source = "New Construction")
us_long <- us_long %>%
filter(Date >= as.Date("2018-01-31")) %>%
mutate(Source = "US Overall")
# combine the two datasets
combined_data <- bind_rows(us_new_const_median_sp_long, us_long)
# plot the data
p <- ggplot(combined_data, aes(x = Date, y = MedianPrice, color = Source)) +
geom_line(linewidth = 1) +
labs(title = "Median Sale Prices Over Time",
y = "Median Sale Price (USD)",
x = "Year",
color = "Data Source") +
scale_color_manual(values = c("US Overall" = "#003f5c", "New Construction" = "#4d8fac")) +
scale_y_continuous(labels = dollar) +
theme_minimal(base_size = 15, base_family = "Times New Roman") +
theme(
plot.title = element_text(size = 20, face = "bold", color = "black"),
axis.title = element_text(size = 14, face = "bold", color = "black"),
axis.text = element_text(size = 12, color = "black"),
panel.grid.major = element_line(color = "#d3d3d3"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white", color = NA),
panel.border = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1)
)
ggplotly(p)
This graph compares the national median sale price of newly constructed homes with the overall U.S. median home price from 2018 through March 2025. The “New Construction” data reflects the median price of newly built single-family homes and condos sold in each month, while the “US Overall” line shows the median price of all home sales. The graph highlights how new builds consistently cost more.
Let’s say that you are graduating from Taylor University in Spring 2025 and have secured a job where you can work remotely. You are hoping to buy your first home in the near future. You have selected your top 5 places where you would like to live post college. You are wondering which of these places you can afford to buy a home in.
# selecting cities
cities <- c("Chicago, IL", "St. Louis, MO", "Grand Rapids, MI",
"Indianapolis, IN", "Nashville, TN")
# reshaping from wide to long format, removing unnecessary columns
median_house_price_long <- median_sale_price_all_homes %>%
select(-RegionID, -SizeRank, -RegionType, -StateName) %>%
pivot_longer(cols = -RegionName, names_to = "Date", values_to = "MedianPrice")
# filter for selected cities and a specific date
case_example_data_buy <- median_house_price_long %>%
filter(Date == "2025-03-31", RegionName %in% cities) %>%
select(-Date)
# reshaping and convert wide format to long format
rent_price_long <- rent_price_all_homes %>%
select(-RegionID, -SizeRank, -RegionType, -StateName) %>%
pivot_longer(cols = -RegionName, names_to = "Date", values_to = "RentPrice")
# filter for selected cities and a specific date
case_example_data_rent <- rent_price_long %>%
filter(Date == "2025-03-31", RegionName %in% cities) %>%
select(-Date)
# join the sale and rent data together by city name
case_example_data <- left_join(case_example_data_buy, case_example_data_rent, by = join_by(RegionName))
# creating a table using gt
case_example_data %>%
gt() %>%
tab_header(title = "Median Sale and Rent Prices by City") %>%
fmt_currency(columns = c(MedianPrice, RentPrice), currency = "USD") %>%
# applying conditional coloring to price columns based on their values
data_color(columns = c(MedianPrice, RentPrice),
fn = function(x) {
dplyr::case_when(
x < 1800 ~ "#729e7a",
x < 2400 ~ "grey",
x < 6000 ~ "#9c4343",
x < 250000 ~ "#729e7a",
x < 400000 ~ "lightgrey",
x >= 400000 ~ "#9c4343",
TRUE ~ "white")})
| Median Sale and Rent Prices by City | ||
| RegionName | MedianPrice | RentPrice |
|---|---|---|
| Chicago, IL | $316,000.00 | $2,412.87 |
| St. Louis, MO | $245,626.00 | $1,560.06 |
| Indianapolis, IN | $278,776.00 | $1,821.58 |
| Nashville, TN | $442,948.00 | $2,341.83 |
| Grand Rapids, MI | $317,196.00 | $2,069.27 |
# filtering and removing
new_h_o_income_needed_long_current <- new_h_o_income_needed_long %>%
filter(new_h_o_income_needed_long$Date == "2025-03-31", RegionName %in% cities) %>%
select(-RegionID, -SizeRank, -RegionType, -StateName, -Date) %>%
arrange(desc(IncomeNeeded)) %>%
mutate(RegionName = factor(RegionName, levels = RegionName))
# plot
p <- ggplot(new_h_o_income_needed_long_current, aes(x = RegionName, y = IncomeNeeded)) +
geom_col(width = 0.5, fill = "#003f5c") +
scale_y_continuous(labels = label_dollar(), expand = expansion(mult = c(0, 0.15))) +
labs(
title = "Annual Income Needed by City",
x = NULL,
y = "Income Needed"
) +
theme_minimal(base_size = 15, base_family = "Times New Roman") +
theme(
plot.title = element_text(size = 20, face = "bold", color = "black"),
axis.title = element_text(size = 14, face = "bold", color = "black"),
axis.text = element_text(size = 12, color = "black"),
panel.grid.major = element_line(color = "#d3d3d3"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white", color = NA),
axis.text.x = element_text(angle = 45, hjust = 1)
)
ggplotly(p)
Housing prices and rent have increased increased significantly over the past several years, out pacing the growth of salaries. This presents a challenge to prospective buyers looking for an affordable home. Since the cost of homes significantly fluctuates by region, buyers can look in certain areas or cities to find more affordable housing, and smaller cities and southern metros tend to have the more affordable housing. The time of the year also plays a significant role in the price of a house. Summer months are more expensive because that is when more people tend to move. Additionally, the median price of a newly constructed house is more expansive than the median US home, suggesting that it may be more affordable to buy a used home.
One limitation of this project is that the data is from Zillow so it may not include every single sale of a house. Additionally, we had a time limit for this project so there may be trends that we did not highlight. Also, neither of us have any experience in real estate so we are not qualified to give advise in this field, we are only presenting the data.
Lastly, there is a lot more that goes into deciding where to live than just the costs of a home. Things like living near family can be priceless. Additionally, the environment of where you live can be very impactful on your life. So this project should be weightedd appropriately.