1. Motivation and Research Question

Housing markets play an important role in the U.S. economy and are closely connected to household wealth, financial stability, and broader economic conditions. Changes in housing prices affect both individual households and the overall economy. For this reason, understanding how housing prices respond to factors such as interest rates, inflation, and labor market conditions is especially relevant during periods of economic uncertainty and tighter monetary policy.

My academic background began with a major in Business Administration with a concentration in Finance, followed by a minor and later a master’s degree in Data Science. This project is motivated by the goal of integrating financial intuition with applied data science techniques to produce analysis that is both academically sound and relevant for future roles in finance, analytics, and data science.

The main research question guiding this project is:

How are changes in U.S. housing prices related to key macroeconomic indicators such as mortgage rates, inflation, and unemployment?


2. Data Sources

This project combines data from two distinct sources:

Using both a static CSV source and an external API satisfies the requirement for multiple data source types.


3. Data Science Workflow

The project follows a clear and reproducible data science workflow consistent with the OSEMN framework:

  1. Obtain: Download Zillow data and pull macroeconomic indicators via API.
  2. Scrub: Clean column names, reshape data, and align time indices.
  3. Explore: Visualize trends and relationships between variables.
  4. Model: Estimate regression models to quantify associations.
  5. Narrate: Interpret results and communicate insights clearly.

All steps are implemented using reproducible R scripts and documented within this report, providing a structured framework for the data preparation, exploratory analysis, and modeling steps that follow.


4. Data Preparation and Transformation

library(tidyverse)

housing_data <- read_csv("../data_processed/housing_final.csv", show_col_types = FALSE)

glimpse(housing_data)
## Rows: 15,861
## Columns: 12
## $ region_id         <dbl> 9, 54, 14, 43, 47, 21, 44, 16, 36, 30, 40, 56, 59, 8…
## $ size_rank         <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…
## $ region_name       <chr> "California", "Texas", "Florida", "New York", "Penns…
## $ region_type       <chr> "state", "state", "state", "state", "state", "state"…
## $ date.x            <date> 2000-01-31, 2000-01-31, 2000-01-31, 2000-01-31, 200…
## $ zhvi              <dbl> 186305.45, 112312.33, 107384.27, 152077.86, 98624.59…
## $ month             <date> 2000-01-01, 2000-01-01, 2000-01-01, 2000-01-01, 200…
## $ date.y            <date> 2000-01-01, 2000-01-01, 2000-01-01, 2000-01-01, 200…
## $ mortgage_rate     <dbl> 8.21, 8.21, 8.21, 8.21, 8.21, 8.21, 8.21, 8.21, 8.21…
## $ cpi               <dbl> 169.3, 169.3, 169.3, 169.3, 169.3, 169.3, 169.3, 169…
## $ unemployment_rate <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4…
## $ zhvi_yoy          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

Key transformations performed prior to analysis include:


5. Exploratory Data Analysis

5.1 National Housing Price Growth Trend

library(scales)
national_trend <- housing_data %>%
filter(!is.na(zhvi_yoy)) %>%
group_by(month) %>%
summarise(avg_zhvi_yoy = mean(zhvi_yoy, na.rm = TRUE))

ggplot(national_trend, aes(x = month, y = avg_zhvi_yoy)) +
geom_line(color = "steelblue", linewidth = 1) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray40") +
scale_y_continuous(labels = percent_format(scale = 1)) +
labs(
title = "Average Year-over-Year Housing Price Growth Across U.S. States",
x = "Year",
y = "YoY Housing Price Growth (%)"
) +
theme_minimal()

This figure shows the average year-over-year growth in housing prices across U.S. states over time. Housing price growth was positive and gradually rising in the early to mid-2000s, peaking just before the housing market collapse. During the Great Recession, growth turned sharply negative, reflecting widespread declines in home values. Around 2012, housing prices began to recover, followed by a long period of steady and moderate growth throughout most of the 2010s. A sharp increase is observed between 2020 and 2022, when housing prices grew at unusually high rates, consistent with pandemic-related demand shifts and limited housing supply. After this peak, growth slowed quickly and moved close to zero, indicating a cooling housing market. Overall, the figure highlights clear cycles in housing price growth and sets the stage for examining how macroeconomic conditions influence these patterns.


5.2 Housing Growth and Mortgage Rates

housing_model <- housing_data %>%
filter(!is.na(zhvi_yoy))

ggplot(housing_model, aes(x = mortgage_rate, y = zhvi_yoy)) +
geom_point(alpha = 0.2, color = "darkorange") +
geom_smooth(method = "lm", se = FALSE, color = "black") +
labs(
title = "Housing Price Growth vs Mortgage Rates",
x = "30-Year Mortgage Rate (%)",
y = "YoY Housing Price Growth (%)"
) +
theme_minimal()

This scatterplot shows the relationship between year-over-year housing price growth and the 30-year mortgage rate. Each point represents an observation across states and time, and the fitted line reflects the overall trend. The plot indicates a weak negative relationship: higher mortgage rates are generally associated with slightly lower housing price growth. At the same time, the points are widely scattered, showing substantial variation in price growth at any given interest rate. Both strong increases and sharp declines in housing prices appear across a wide range of mortgage rates. This pattern is consistent with the idea that while mortgage rates influence housing price growth, they do not fully explain changes in the housing market, and other economic factors likely play an important role.


5.3 Housing Growth and Unemployment

ggplot(housing_model, aes(x = unemployment_rate, y = zhvi_yoy)) +
geom_point(alpha = 0.2, color = "firebrick") +
geom_smooth(method = "lm", se = FALSE, color = "black") +
labs(
title = "Housing Price Growth vs Unemployment Rate",
x = "Unemployment Rate (%)",
y = "YoY Housing Price Growth (%)"
) +
theme_minimal()

This scatterplot shows the relationship between year-over-year housing price growth and the unemployment rate. The fitted trend line points to a clear negative association, with higher unemployment generally corresponding to lower housing price growth. When unemployment is low, housing price growth tends to be positive and more dispersed, reflecting periods of stronger economic activity and housing demand. As unemployment increases, housing price growth shifts downward, with more observations clustered around zero or negative values. This pattern shows a close relationship between labor market conditions and housing market performance, likely through income stability and consumer confidence. Compared to mortgage rates, unemployment appears to have a stronger and more consistent relationship with housing price growth.


6. Regression Analysis

To formally quantify the relationships observed in the exploratory analysis, two linear regression models are estimated. The first model examines the bivariate relationship between housing price growth and mortgage rates, while the second model incorporates additional macroeconomic controls.

library(broom)

model_1 <- lm(
zhvi_yoy ~ mortgage_rate,
data = housing_model
)

model_2 <- lm(
zhvi_yoy ~ mortgage_rate + unemployment_rate + cpi,
data = housing_model
)

Mortgage Rate Only

tidy(model_1)
## # A tibble: 2 × 5
##   term          estimate std.error statistic   p.value
##   <chr>            <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)      5.95     0.220      27.1  9.13e-158
## 2 mortgage_rate   -0.304    0.0420     -7.23 4.97e- 13

The first regression includes only the 30-year mortgage rate as an explanatory variable. The estimated coefficient on the mortgage rate is negative and statistically significant, demonstrating that higher mortgage rates are associated with lower year-over-year housing price growth. Specifically, a one percentage point increase in the mortgage rate is associated with an average decrease of approximately 0.30 percentage points in housing price growth. While this relationship is statistically strong, the magnitude of the effect is relatively modest, consistent with the wide dispersion observed in the scatterplot.

Multiple Macroeconomic Controls

tidy(model_2)
## # A tibble: 4 × 5
##   term              estimate std.error statistic   p.value
##   <chr>                <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)        27.4      0.516        53.1 0        
## 2 mortgage_rate      -1.30     0.0406      -32.0 2.73e-217
## 3 unemployment_rate  -1.84     0.0281      -65.5 0        
## 4 cpi                -0.0250   0.00133     -18.8 1.42e- 77

The second regression includes mortgage rates, unemployment, and inflation (CPI) at the same time. All three variables have negative and statistically significant coefficients. Compared to the bivariate model, the effect of mortgage rates becomes larger once labor market conditions and inflation are taken into account.

Unemployment shows the strongest relationship with housing price growth, with higher unemployment associated with notably lower growth. Inflation also has a negative effect, though smaller in size. Overall, the results indicate that housing price growth reflects a combination of borrowing costs, labor market conditions, and broader inflationary pressures.


7. Challenge Encountered

A key challenge involved merging datasets with different monthly date conventions. Zillow data uses end-of-month dates, while FRED series are indexed to the beginning of the month. A direct join resulted in no matching observations.

This issue was resolved by creating a standardized monthly time key using date flooring, ensuring proper alignment across data sources. This step was critical for producing a valid merged dataset and represents a common real-world data integration challenge.


8. Conclusion

This analysis examines how U.S. housing price growth relates to key macroeconomic indicators, including mortgage rates, unemployment, and inflation. The exploratory results reveal clear cyclical patterns in housing price growth and show that housing markets respond differently across economic environments. Regression results reinforce these findings by demonstrating statistically significant negative associations between housing price growth and all three macroeconomic variables.

Among the factors considered, unemployment exhibits the strongest and most consistent relationship with housing price growth, underscoring the central role of labor market conditions in shaping housing demand. Mortgage rates also matter, particularly when considered alongside other economic indicators, while inflation appears to exert a smaller but meaningful influence. Overall, the results show that housing market dynamics cannot be explained by a single factor and are better understood within a broader macroeconomic context.

These findings align with economic intuition and highlight the value of incorporating multiple macroeconomic indicators when analyzing housing market behavior.


9. Reproducibility

All code and data required to reproduce this analysis are organized within a structured project directory and executed using R Markdown. The workflow avoids hard-coded file paths and relies exclusively on relative paths, ensuring that results can be reproduced on any system.


10. References