This paper examines the factors influencing housing price resilience during economic downturns, using the Ames Housing Dataset as a case study. By analyzing the impact of structural features, neighborhood characteristics, and market conditions on housing price changes during the 2008 financial crisis, the analysis identifies key predictors of price stability. A multiple linear regression model, enhanced with interaction terms to capture pre- and post-crisis dynamics, reveals that features such as larger living areas, modern construction, and high-performing neighborhoods are strong indicators of resilience. Model evaluation, including five-fold cross-validation, demonstrates robust predictive performance with an RMSE of 0.1955 and R-squared of 0.7635. While limited to the Ames, Iowa market, this research provides actionable insights for homebuyers, investors, and policymakers, emphasizing the importance of location, structural quality, and favorable market conditions in mitigating financial risks during economic instability. Future work could expand this analysis to broader markets and incorporate macroeconomic factors for a more comprehensive understanding of housing resilience.
Economic downturns significantly impact housing markets, with some homes experiencing minimal price declines while others face steep reductions in value. This variability prompts a crucial question for potential homebuyers and investors: Can we predict how much a home’s price will decline during an economic downturn based on its features? Identifying the main factors that contribute to price decline resilience—how well a home’s value resists declines during economic downturns—can help buyers and investors make better-informed decisions. This knowledge reduces financial risks, highlights properties less prone to devaluation, and enables confident decision-making in uncertain markets.
The Ames Housing Dataset, developed by Dean De Cock (2011), provides detailed data on 2,930 residential properties in Ames, Iowa, collected through county assessor records. Spanning 2006 to 2010, the dataset captures critical structural attributes, neighborhood characteristics, market conditions, and sale prices during the 2008 financial crisis. This period, marked by widespread foreclosures and plummeting property values, offers a unique context to examine how specific housing features influenced price resilience amidst severe economic instability.
While the dataset is geographically limited to Ames, Iowa, and lacks insights into buyer or seller motivations and broader macroeconomic factors, it remains a valuable tool for identifying drivers of housing market resilience. Insights derived from this data can assist policymakers in shaping zoning regulations and prioritizing resilient features in new developments. Similarly, homebuyers and investors can leverage these findings to mitigate financial risks during economic uncertainty. Future research could enhance this analysis by incorporating data from diverse regions and additional variables, such as buyer demographics and macroeconomic indicators, for a more comprehensive understanding of housing price resilience.
Given this data and question, our goal is to develop a predictive model to estimate how much a home’s price will change, specifically decline, during an economic downturn based on its features. To do so, we first conducted some exploratory data analysis to understand the distribution of key variables, identify potential outliers, address missing values, and assess relationships between features and sale prices.
To analyze the impact of the 2008 financial crisis on Ames, we calculated the average sales price for each year from 2006 to 2010. Prices peaked in 2007, representing the market’s stability before the crisis, and dropped to their lowest point in 2010, reflecting the full effects of the downturn. Defining 2007 as the pre-crisis period and 2010 as the post-crisis period provides clear benchmarks for comparing price changes and understanding the crisis’s effects on housing markets.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Missing Data: We checked for missing values in the dataset, but we found that there were none.
Log Transformation: The distribution of sale prices was right-skewed. A log transformation normalizes the data, will improve model performance, and address heteroscedasticity.
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).
Outliers: We retained outlier sale prices because they reflect real-world extreme market behavior during downturns, which we thought are important to the understanding housing price resilience. Applying a log transformation reduced the influence of these outliers while preserving their valuable information.
This process aimed to identify variables most relevant to predicting
housing price changes during an economic downturn. We selected features
such as Gr_Liv_Area (above-ground living area), Year_Built, and
Lot_Area (lot size) because they intuitively reflect key factors that
determine a home’s value and appeal.
Gr_Liv_Area reflects living space, which is generally a universally
desirable quality and usually commands higher prices. Year_Built
indicates the home’s age, with newer homes generally valued higher due
to updated features and lower maintenance requirements. Lot_Area, due
to similar reasons as living space, is a key metric in property
valuation. Neighborhood characteristics were included to account for
factors such as access to amenities, quality of schools, and safety,
which strongly affect buyer preferences and likely resilience to market
fluctuations. Additionally, Sale_Condition, capturing distinctions
such as regular sales versus foreclosures, was included due to its
critical role during economic crises, where distressed sales typically
drive price declines.
A binary variable, CrisisPeriod, distinguished pre-crisis (2007) and
post-crisis (2010) periods, enabling the model to evaluate how the
financial crisis influenced housing prices. Interaction terms between
CrisisPeriod and features such as Gr_Liv_Area (above-ground living
area) and Neighborhood are included to capture how the effects of
these features on home prices differ before and after the crisis. This
approach allows the model to identify whether certain characteristics
make homes more or less resilient to the economic downturn. For example,
smaller homes (Gr_Liv_Area) might be more desirable and retain their
value better in the post-crisis period due to shifts in buyer
preferences. Similarly, the impact of location (Neighborhood) could
vary significantly, as some neighborhoods may experience stronger demand
or economic stability compared to others during the crisis.
The analysis assumed linearity between predictors (e.g., Gr_Liv_Area,
Year_Built) and the dependent variable (Log_Sale_Price), which was
supported by interaction terms and residual plot inspections. Key
assumptions, including independence of errors, homoscedasticity, and
normality of residuals, were verified through diagnostic plots, such as
Q-Q plots and residual analyses, while variance inflation factors (VIF)
confirmed no significant multicollinearity among predictors. Ensuring
data completeness by addressing missing values further validated the
robustness of the linear regression model in analyzing housing price
resilience.
A multiple linear regression model was constructed using the formula:
\[ \text{Log Sale Price} = \beta_0 + \sum_{i=1}^k \beta_i X_i + \sum_{j=1}^m \beta_{ij} (X_i \cdot \text{CrisisPeriod}) + \epsilon \]
Where:
Cross-validation with five folds was conducted to ensure the model’s robustness and generalizability.
Pre-crisis (\(CrisisPeriod = 0\)) and post-crisis (\(CrisisPeriod = 1\)) predictions were calculated for each home using the formula:
\[ \text{Predicted_Log_Sale_Price} = \hat{\beta}_0 + \sum_{i=1}^k \hat{\beta}_i X_i + \sum_{j=1}^m \hat{\beta}_{ij} (X_i \cdot \text{CrisisPeriod}), \]
where:
Gr_Liv_Area, Year_Built,
Lot_Area, etc.).To assess price declines, crisis resistance was redefined to reflect only price reductions during the economic downturn:
\[ \text{Crisis Resistance} = \max\left(0, \text{Predicted_Log_Sale_Price}_{\text{Pre-Crisis}} - \text{Predicted_Log_Sale_Price}_{\text{Post-Crisis}}\right). \]
This measure quantifies houses that experienced the smallest declines in price, focusing on homes that maintained or increased their value during the crisis. By excluding increases from the resistance metric, this approach emphasizes the resilience of homes to price reductions, aligning with the study’s focus on mitigating financial risk during downturns.
The dataset was split into training (80%) and testing (20%) sets. We used the following performance metrics: Root Mean Squared Error (RMSE) = 0.1961, Mean Absolute Error (MAE) = 0.1402, and R-squared = 0.898. These metrics indicate that the model fits the data well, effectively explaining a significant proportion of the variability in housing prices, with relatively low prediction errors.
To ensure the model’s robustness and generalizability, five-fold cross-validation was conducted. During this process, the dataset was divided into five subsets, and the model was trained and tested iteratively on different combinations of these subsets. Cross-validation produced an average RMSE of 0.1955, R-squared of 0.7635, and MAE of 0.1381. These results highlight the model’s consistent performance across different data splits and reinforce its reliability for predicting housing price resilience during economic downturns. Together, these metrics confirm that the model is both accurate and generalizable.
The table below shows the coefficients calculated as significant.
| Variable | Estimate | p-value |
|---|---|---|
| Gr_Liv_Area | 0.0003190 | <0.001 |
| Year_Built | 0.0067600 | <0.001 |
| Lot_Area | 0.0000088 | <0.001 |
| NeighborhoodEdwards | -0.1699000 | <0.001 |
| NeighborhoodNorthridge_Heights | 0.1971000 | 0.006 |
| NeighborhoodCrawford | 0.3384000 | <0.001 |
| NeighborhoodIowa_DOT_and_Rail_Road | -0.3297000 | <0.001 |
| NeighborhoodMeadow_Village | -0.4430000 | <0.001 |
| NeighborhoodBriardale | -0.3288000 | 0.005 |
| NeighborhoodVeenker | 0.2337000 | 0.003 |
| NeighborhoodGreen_Hills | 0.6155000 | 0.001 |
| Sale_ConditionAdjLand | 0.2964000 | 0.012 |
| Sale_ConditionNormal | 0.1457000 | <0.001 |
| Sale_ConditionPartial | 0.2380000 | <0.001 |
| CrisisPeriod:Lot_Area | -0.0000058 | 0.043 |
| CrisisPeriod:Year_Built | -0.0027300 | 0.006 |
| CrisisPeriod:NeighborhoodSomerset | 0.1890000 | 0.015 |
| CrisisPeriod:NeighborhoodIowa_DOT_and_Rail_Road | 0.2370000 | 0.026 |
| CrisisPeriod:NeighborhoodTimberland | 0.2470000 | 0.019 |
| CrisisPeriod:NeighborhoodStone_Brook | 0.2390000 | 0.040 |
The analysis highlights that a home’s size, age, and location play crucial roles in its ability to maintain value. Larger, newer homes are more likely to hold their value, also location plays a large role.
This graph displays the distribution of house price declines, with most homes showing minimal declines, indicating strong resilience despite economic challenges. The long tail, however, reveals a smaller subset of properties with significant price reductions, pointing to vulnerabilities linked to specific structural or locational factors.
This bar chart shows the median house prices across different
neighborhoods (Neighborhood), with the orange bars indicating
neighborhoods identified as significant in the regression analysis. This
reveals a clear relationship between neighborhood characteristics and
housing price resilience during economic downturns. High-priced
neighborhoods, such as Stone_Brook, Northridge_Heights, and
Green_Hills, demonstrate strong resilience, with significant positive
coefficients indicating stable or increasing demand even during crises.
Conversely, lower-priced neighborhoods like Meadow_Village and
Iowa_DOT_and_Rail_Road appear more vulnerable, with significant
negative coefficients reflecting steeper price declines. The majority of
neighborhoods in the middle of these extremes show no strong correlation
with price resilience, suggesting neutral performance. For buyers and
investors, focusing on high-priced, resilient neighborhoods can mitigate
financial risks, while policymakers could prioritize improvements in
vulnerable areas to enhance their long-term stability and appeal.
When making housing investment decisions during economic downturns, it
is important to consider structural features (Gr_Liv_Area,
Year_Built, Lot_Area), neighborhood stability (Neighborhood), and
market-specific sale conditions (Sale_Condition). Homes with larger
living spaces (Gr_Liv_Area) and modern construction (Year_Built) are
generally more likely to retain value in challenging economic climates.
Additionally, selecting properties in neighborhoods with a history of
stability, supported by strong infrastructure and amenities, can help
mitigate risks. Favorable sale conditions (Sale_Condition) also play a
role in enhancing value stability. Conversely, caution should be
exercised when investing in areas that have shown vulnerability to
economic shifts, unless there are other factors that offset these risks.
The analysis suggests that housing price resilience during an economic
downturn can be predicted based on structural features, neighborhood
characteristics, and market conditions. Larger homes (Gr_Liv_Area),
newer construction (Year_Built), and properties in high-demand
neighborhoods demonstrated greater resilience, while smaller homes and
properties in lower-priced neighborhoods were more vulnerable to price
declines.
To enhance the analysis, data could be collected from multiple geographic regions to account for broader market variations. Including macroeconomic indicators such as interest rates, unemployment rates, and inflation would provide a more comprehensive view of economic conditions. Additionally, incorporating buyer and seller characteristics, such as income, credit scores, and motivations, would help in understanding behavioral influences on price resilience.
More advanced statistical methods, such as generalized additive models (GAMs) or machine learning approaches (e.g., random forests or gradient boosting), could capture non-linear relationships and interactions more effectively than linear regression. Time-series analysis could also help in understanding trends in housing prices over time, while hierarchical or multilevel models might address regional variations in resilience.
The assumptions of linear regression were evaluated through residual diagnostics. The residual plot indicated no major violations of homoscedasticity, and the Q-Q plot suggested that residuals were approximately normally distributed. Additionally, multicollinearity was checked using Variance Inflation Factors (VIFs), which were within acceptable limits, ensuring independent predictors.
Additional evidence could include formal statistical tests, such as the Breusch-Pagan test for homoscedasticity or the Shapiro-Wilk test for normality. Collecting a larger dataset from diverse regions would help confirm the robustness of model assumptions across different contexts. Conducting sensitivity analyses by excluding or transforming potential outliers could further verify the reliability of the model’s results.
This analysis has several limitations that should be considered when interpreting the results. First, the data is sourced from Ames, Iowa, which may not generalize to broader housing markets or regions with different economic, demographic, and infrastructural contexts. The absence of buyer and seller motivations, such as financial constraints or investment goals, limits our understanding of the behavioral factors influencing price resilience. Additionally, the dataset lacks macroeconomic variables, such as employment rates or lending conditions, which could provide a more comprehensive picture of market dynamics during economic downturns.
Future studies could address these limitations by incorporating data from multiple regions and including variables that capture macroeconomic conditions and buyer demographics. This would allow for a more holistic understanding of housing market resilience. Moreover, exploring non-linear relationships and using advanced machine learning models could capture more complex interactions between features. Lastly, analyzing post-crisis recovery trends could provide additional insights into long-term price resilience and factors contributing to market stabilization. These enhancements would make the analysis more robust and its findings more actionable for a wider audience.