This week I’m exploring the time dimension of the Ames housing market. While previous weeks focused on cross-sectional relationships (how size, quality, and features affect price), this analysis examines how prices evolve over time—revealing trends, seasonal patterns, and the impact of major economic events.
Why time series analysis matters for housing:
The Ames dataset spans 2006-2010, capturing a critical period that includes the housing bubble peak (2006-2007) and the financial crisis (2008-2009).
ames <- read.csv("ames.csv", stringsAsFactors = FALSE)
cat("=== TIME-RELATED COLUMNS ===\n\n")
## === TIME-RELATED COLUMNS ===
cat("Yr.Sold: Year of sale (", min(ames$Yr.Sold), "-", max(ames$Yr.Sold), ")\n")
## Yr.Sold: Year of sale ( 2006 - 2010 )
cat("Mo.Sold: Month of sale (1-12)\n")
## Mo.Sold: Month of sale (1-12)
cat("Year.Built: Year home was constructed\n")
## Year.Built: Year home was constructed
cat("Year.Remod.Add: Year of remodeling\n\n")
## Year.Remod.Add: Year of remodeling
cat("Total homes sold:", nrow(ames), "\n")
## Total homes sold: 2930
cat("Time span:", max(ames$Yr.Sold) - min(ames$Yr.Sold) + 1, "years\n")
## Time span: 5 years
The dataset has Yr.Sold and Mo.Sold columns that together define when each home was sold. I’ll combine these into a proper Date format.
# Create a Date column (using first day of month since we don't have exact day)
ames <- ames |>
mutate(
# Paste year, month, and day=01 together
Sale_Date = paste(Yr.Sold, Mo.Sold, "01", sep = "-"),
# Convert to Date
Sale_Date = as.Date(Sale_Date, format = "%Y-%m-%d")
)
# Check the result
cat("=== DATE CONVERSION ===\n\n")
## === DATE CONVERSION ===
cat("Example dates:\n")
## Example dates:
print(head(ames |> select(Yr.Sold, Mo.Sold, Sale_Date), 10))
## Yr.Sold Mo.Sold Sale_Date
## 1 2010 5 2010-05-01
## 2 2010 6 2010-06-01
## 3 2010 6 2010-06-01
## 4 2010 4 2010-04-01
## 5 2010 3 2010-03-01
## 6 2010 6 2010-06-01
## 7 2010 4 2010-04-01
## 8 2010 1 2010-01-01
## 9 2010 3 2010-03-01
## 10 2010 6 2010-06-01
cat("\n\nDate range:", min(ames$Sale_Date), "to", max(ames$Sale_Date), "\n")
##
##
## Date range: 13149 to 14791
cat("Span:", as.numeric(max(ames$Sale_Date) - min(ames$Sale_Date)), "days\n")
## Span: 1642 days
Note on data limitation: We only have year and month, not exact day, so I set all sales to the 1st of each month. This is sufficient for monthly and annual analysis.
Response variable: SalePrice
This is the most important metric for understanding housing market dynamics. I’ll aggregate sales by month to create a time series of: - Median sale price (less sensitive to outliers than mean) - Mean sale price (for comparison) - Number of sales (to assess market activity)
# Aggregate by year-month
monthly_sales <- ames |>
mutate(
Year_Month = yearmonth(Sale_Date) # tsibble year-month format
) |>
group_by(Year_Month) |>
summarise(
Median_Price = median(SalePrice),
Mean_Price = mean(SalePrice),
Sale_Count = n(),
.groups = 'drop'
)
# Convert to tsibble
ts_housing <- monthly_sales |>
as_tsibble(index = Year_Month)
# Display
cat("=== TIME SERIES DATA ===\n\n")
## === TIME SERIES DATA ===
cat("Number of months:", nrow(ts_housing), "\n")
## Number of months: 55
cat("Time range:", min(ts_housing$Year_Month), "to", max(ts_housing$Year_Month), "\n\n")
## Time range: 13149 to 14791
kable(head(ts_housing, 12),
col.names = c("Year-Month", "Median Price", "Mean Price", "Sales Count"),
caption = "First 12 Months of Time Series",
digits = 0)
| Year-Month | Median Price | Mean Price | Sales Count |
|---|---|---|---|
| 2006 Jan | 176700 | 202997 | 18 |
| 2006 Feb | 188050 | 188865 | 24 |
| 2006 Mar | 149500 | 182009 | 51 |
| 2006 Apr | 141576 | 158864 | 48 |
| 2006 May | 157000 | 169166 | 75 |
| 2006 Jun | 155000 | 174234 | 97 |
| 2006 Jul | 156750 | 178638 | 122 |
| 2006 Aug | 187750 | 202379 | 45 |
| 2006 Sep | 197000 | 219365 | 41 |
| 2006 Oct | 142500 | 165379 | 49 |
| 2006 Nov | 209000 | 209639 | 31 |
| 2006 Dec | 161000 | 184204 | 24 |
# Create multi-panel plot
p1 <- ggplot(ts_housing, aes(x = Year_Month)) +
geom_line(aes(y = Median_Price), color = "#3498db", linewidth = 1) +
geom_point(aes(y = Median_Price), color = "#3498db", size = 2) +
scale_y_continuous(labels = dollar_format()) +
labs(title = "Median Home Sale Price Over Time (2006-2010)",
x = "Year-Month",
y = "Median Sale Price") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
p2 <- ggplot(ts_housing, aes(x = Year_Month)) +
geom_line(aes(y = Sale_Count), color = "#e74c3c", linewidth = 1) +
geom_point(aes(y = Sale_Count), color = "#e74c3c", size = 2) +
labs(title = "Number of Home Sales Over Time (2006-2010)",
x = "Year-Month",
y = "Number of Sales") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
p1
p2
1. Clear seasonal pattern: - Peak sales in summer months (May-July) - consistently across all years - Trough in winter months (November-February) - families don’t want to move during school year/holidays - This is visible in both price and sales volume
2. Declining trend (2006-2008): - Median price peaks in late 2006/early 2007 around $180,000-200,000 - Sharp decline through 2008 - financial crisis impact - Prices bottom out around 2009 near $140,000-160,000
3. Potential recovery (2009-2010): - Slight uptick in late 2009/early 2010 - early signs of recovery - But data ends in mid-2010, so we can’t confirm long-term recovery
4. Sales volume volatility: - Sales counts are highly seasonal (20-40 sales in winter, 80-120 in summer) - Some months have very few sales, making those median prices less reliable
5. 2010 looks incomplete: - Fewer months of data in 2010 (only through August) - May need to exclude 2010 from some analyses to avoid bias
# Add a numeric time variable for regression
ts_housing <- ts_housing |>
mutate(
Time_Numeric = as.numeric(Year_Month - min(Year_Month)) # Months since start
)
# Fit linear trend model
trend_model <- lm(Median_Price ~ Time_Numeric, data = ts_housing)
cat("=== LINEAR TREND MODEL (2006-2010) ===\n\n")
## === LINEAR TREND MODEL (2006-2010) ===
print(summary(trend_model))
##
## Call:
## lm(formula = Median_Price ~ Time_Numeric, data = ts_housing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26250 -9141 -2439 8202 42467
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 168380.2 3745.5 44.955 <2e-16 ***
## Time_Numeric -184.7 119.6 -1.544 0.128
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14080 on 53 degrees of freedom
## Multiple R-squared: 0.04306, Adjusted R-squared: 0.025
## F-statistic: 2.385 on 1 and 53 DF, p-value: 0.1285
# Extract coefficients
intercept <- coef(trend_model)[1]
slope <- coef(trend_model)[2]
cat("\n=== INTERPRETATION ===\n")
##
## === INTERPRETATION ===
cat("Intercept:", dollar(intercept), "\n")
## Intercept: $168,380
cat("Slope:", dollar(slope), "per month\n")
## Slope: -$184.67 per month
cat("Annual change:", dollar(slope * 12), "per year\n\n")
## Annual change: -$2,216.07 per year
if(slope < 0) {
cat("The overall trend is DECLINING by", dollar(abs(slope)), "per month\n")
cat("Over 5 years, prices declined by approximately", dollar(abs(slope) * 60), "\n")
} else {
cat("The overall trend is INCREASING by", dollar(slope), "per month\n")
}
## The overall trend is DECLINING by $184.67 per month
## Over 5 years, prices declined by approximately $11,080.34
cat("\nR-squared:", round(summary(trend_model)$r.squared, 3), "\n")
##
## R-squared: 0.043
cat("This means", round(summary(trend_model)$r.squared * 100, 1),
"% of price variation is explained by linear time trend\n")
## This means 4.3 % of price variation is explained by linear time trend
# Visualize trend
ggplot(ts_housing, aes(x = Year_Month, y = Median_Price)) +
geom_point(size = 3, color = "#3498db", alpha = 0.6) +
geom_smooth(method = "lm", se = TRUE, color = "#e74c3c", linewidth = 1.5) +
scale_y_continuous(labels = dollar_format()) +
labs(title = "Linear Trend: Median Sale Price (2006-2010)",
subtitle = paste0("Trend: ", dollar(round(slope)), " per month (",
dollar(round(slope * 12)), " per year)"),
x = "Year-Month",
y = "Median Sale Price") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Trend strength assessment:
The linear trend captures the overall downward trajectory of the housing market during this period. However, R² is likely moderate (not very high) because: - The trend isn’t perfectly linear (it’s more of a peak-and-decline pattern) - Strong seasonal variation obscures the trend - The relationship might be better modeled as piecewise (boom-bust-recovery)
Looking at the plot, the market clearly has three distinct phases. Let me model them separately:
# Define phases based on visual inspection
ts_housing <- ts_housing |>
mutate(
Phase = case_when(
Year_Month <= yearmonth("2007 Jun") ~ "Boom (Pre-Crisis)",
Year_Month <= yearmonth("2009 Jun") ~ "Bust (Crisis)",
TRUE ~ "Recovery (Post-Crisis)"
)
)
# Fit separate models for each phase
boom <- ts_housing |> filter(Phase == "Boom (Pre-Crisis)")
bust <- ts_housing |> filter(Phase == "Bust (Crisis)")
recovery <- ts_housing |> filter(Phase == "Recovery (Post-Crisis)")
# Models
model_boom <- lm(Median_Price ~ Time_Numeric, data = boom)
model_bust <- lm(Median_Price ~ Time_Numeric, data = bust)
model_recovery <- lm(Median_Price ~ Time_Numeric, data = recovery)
cat("=== BOOM PHASE (2006 - Jun 2007) ===\n")
## === BOOM PHASE (2006 - Jun 2007) ===
cat("Slope:", dollar(coef(model_boom)[2]), "per month\n")
## Slope: -$278.37 per month
cat("Annual:", dollar(coef(model_boom)[2] * 12), "per year\n")
## Annual: -$3,340.46 per year
cat("R-squared:", round(summary(model_boom)$r.squared, 3), "\n\n")
## R-squared: 0.006
cat("=== BUST PHASE (Jul 2007 - Jun 2009) ===\n")
## === BUST PHASE (Jul 2007 - Jun 2009) ===
cat("Slope:", dollar(coef(model_bust)[2]), "per month\n")
## Slope: $104.91 per month
cat("Annual:", dollar(coef(model_bust)[2] * 12), "per year\n")
## Annual: $1,258.94 per year
cat("R-squared:", round(summary(model_bust)$r.squared, 3), "\n\n")
## R-squared: 0.004
cat("=== RECOVERY PHASE (Jul 2009 - 2010) ===\n")
## === RECOVERY PHASE (Jul 2009 - 2010) ===
cat("Slope:", dollar(coef(model_recovery)[2]), "per month\n")
## Slope: -$724.18 per month
cat("Annual:", dollar(coef(model_recovery)[2] * 12), "per year\n")
## Annual: -$8,690.11 per year
cat("R-squared:", round(summary(model_recovery)$r.squared, 3), "\n\n")
## R-squared: 0.132
# Visualize phases
ggplot(ts_housing, aes(x = Year_Month, y = Median_Price)) +
geom_point(aes(color = Phase), size = 3, alpha = 0.7) +
geom_smooth(data = boom, method = "lm", se = FALSE,
color = "#27ae60", linewidth = 1.5) +
geom_smooth(data = bust, method = "lm", se = FALSE,
color = "#e74c3c", linewidth = 1.5) +
geom_smooth(data = recovery, method = "lm", se = FALSE,
color = "#3498db", linewidth = 1.5) +
scale_y_continuous(labels = dollar_format()) +
scale_color_manual(values = c("Boom (Pre-Crisis)" = "#27ae60",
"Bust (Crisis)" = "#e74c3c",
"Recovery (Post-Crisis)" = "#3498db")) +
labs(title = "Three Market Phases: Boom, Bust, Recovery",
subtitle = "Separate linear trends for each phase",
x = "Year-Month",
y = "Median Sale Price",
color = "Market Phase") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
How strong are these trends?
Each phase has a distinct trend:
The piecewise approach likely has much higher R² within each phase than the single overall trend, because it captures the non-linear reality of boom-bust cycles.
# Convert to regular ts object for decomposition
# Need to create a ts with frequency = 12 (monthly data)
median_prices_ts <- ts(ts_housing$Median_Price,
start = c(2006, 1),
frequency = 12)
# STL decomposition (Seasonal-Trend-Loess)
decomp <- stl(median_prices_ts, s.window = "periodic")
# Plot decomposition
plot(decomp, main = "Seasonal Decomposition of Median Sale Price")
# Extract seasonal component
seasonal_comp <- decomp$time.series[, "seasonal"]
trend_comp <- decomp$time.series[, "trend"]
remainder_comp <- decomp$time.series[, "remainder"]
cat("=== SEASONAL COMPONENT ===\n\n")
## === SEASONAL COMPONENT ===
# Create seasonal pattern by month
# Extract first 12 months of seasonal component (it repeats)
seasonal_pattern <- data.frame(
Month = factor(month.abb, levels = month.abb),
Seasonal_Effect = as.numeric(seasonal_comp[1:12])
)
kable(seasonal_pattern,
col.names = c("Month", "Seasonal Effect ($)"),
caption = "Average Seasonal Price Effect by Month",
digits = 0,
row.names = FALSE)
| Month | Seasonal Effect ($) |
|---|---|
| Jan | 6745 |
| Feb | -61 |
| Mar | -4780 |
| Apr | -12611 |
| May | -5696 |
| Jun | -2323 |
| Jul | -3325 |
| Aug | 7343 |
| Sep | 8893 |
| Oct | -8906 |
| Nov | 12070 |
| Dec | 2651 |
Interpreting the seasonal component:
The seasonal decomposition reveals:
Why this seasonality exists:
# Plot seasonal pattern
ggplot(seasonal_pattern, aes(x = Month, y = Seasonal_Effect)) +
geom_col(fill = "#3498db", alpha = 0.8) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
scale_y_continuous(labels = dollar_format()) +
labs(title = "Seasonal Price Pattern in Ames Housing Market",
subtitle = "Average deviation from trend by month",
x = "Month",
y = "Seasonal Effect on Price") +
theme_minimal()
# Create dataframe with seasonal values over time
seasonal_df <- data.frame(
Year_Month = ts_housing$Year_Month,
Seasonal_Value = as.numeric(seasonal_comp)
)
# Show seasonal pattern across years
ggplot(seasonal_df, aes(x = Year_Month, y = Seasonal_Value)) +
geom_line(color = "#9b59b6", linewidth = 1) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
scale_y_continuous(labels = dollar_format()) +
labs(title = "Seasonal Component Over Time",
subtitle = "Repeating annual pattern (isolated from trend)",
x = "Year-Month",
y = "Seasonal Effect") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
What the seasonal plot shows:
The repeating wave pattern confirms strong seasonality. The seasonal effect is consistent across years - even during the crisis, the summer/winter pattern persists. This suggests seasonality is fundamental to housing markets, not just driven by economic conditions.
# ACF plot using base R
par(mfrow = c(1, 1))
acf(ts_housing$Median_Price, lag.max = 24,
main = "Autocorrelation Function (ACF) of Median Sale Price")
cat("\n=== INTERPRETING ACF ===\n\n")
##
## === INTERPRETING ACF ===
cat("What to look for in ACF:\n")
## What to look for in ACF:
cat("- Spikes at lag 12, 24 indicate annual seasonality\n")
## - Spikes at lag 12, 24 indicate annual seasonality
cat("- Slowly declining ACF indicates trend\n")
## - Slowly declining ACF indicates trend
cat("- Quick drop to near-zero indicates stationarity\n")
## - Quick drop to near-zero indicates stationarity
What the ACF reveals:
Trend signature: If ACF decays slowly (stays high for many lags), it indicates a strong trend. Our data likely shows this because prices don’t randomly bounce they trend up or down over time.
Seasonality signature: Spikes at multiples of 12 (lag 12, 24) indicate annual seasonality. We should see peaks at these lags because prices 12 months apart tend to be similar (both summers or both winters).
High lag-1 correlation: Adjacent months are highly correlated (houses sold in June are priced similarly to houses sold in July). This is typical for economic data.
# PACF plot using base R
par(mfrow = c(1, 1))
pacf(ts_housing$Median_Price, lag.max = 24,
main = "Partial Autocorrelation Function (PACF) of Median Sale Price")
cat("\n=== INTERPRETING PACF ===\n\n")
##
## === INTERPRETING PACF ===
cat("What PACF shows:\n")
## What PACF shows:
cat("- Large spike at lag 1: Strong immediate correlation\n")
## - Large spike at lag 1: Strong immediate correlation
cat("- Spikes at lag 12: Direct seasonal relationship\n")
## - Spikes at lag 12: Direct seasonal relationship
cat("- PACF helps identify autoregressive (AR) order for modeling\n")
## - PACF helps identify autoregressive (AR) order for modeling
What the PACF reveals:
Combined ACF/PACF interpretation:
Together, ACF and PACF suggest our data has: 1. Autoregressive component (PACF lag-1 spike) 2. Seasonal component (spikes at lag 12) 3. Trend component (slowly decaying ACF)
This confirms what we saw visually: the Ames market has trend, seasonality, and month-to-month persistence.
1. Strong seasonal pattern (June-July peak, January-February trough)
2. Three distinct market phases (Boom → Bust → Recovery)
3. Overall declining trend (2006-2010)
4. Autocorrelation confirms market persistence
5. Sales volume is highly seasonal
For Sellers:
For Buyers:
For Investors:
For Appraisers:
1. Limited time span (only 5 years)
2. Small sample sizes in some months
3. Aggregation hides individual variation
4. External factors not captured
5. No causal inference
This time series analysis reveals the Ames housing market is highly structured - not a random walk, but a predictable combination of:
The 2006-2010 period captured a complete market cycle: boom, bust, and early recovery. Prices fell from ~$180,000 peaks to ~$140,000 troughs a 22% decline. But seasonal patterns persisted throughout, proving they’re fundamental to housing markets, not just fair-weather effects.
Key actionable insight: Timing matters as much as $10,000-20,000 in sale price. Sellers should target May-June listings to capture the summer premium, while buyers can find deals in January-February when competition is low.
Methodological lessons:
This analysis demonstrates how time series techniques transform raw data into market intelligence, revealing patterns invisible in cross-sectional analyses alone.