Quarterly median home price data was retrieved from FRED for the 1963Q1 through 2021Q4 time frame. In its original form, the data was quoted in nominal dollars. Therefore, the median prices were transformed to be inflation-adjusted to 1/1/2021 USD.
A plot of the price-adjusted data is below. The data shows an upward trend in the median price-adjusted value of homes in the US; there does not appear to be any clear seasonality.
In order to best model the data, percent changes in the real price of homes were calculated across the timeseries. A plot of this data can be seen to the right of the price data.
It is clear in the percent change data that there is varying volatility in the change in the real price of homes suggesting that GARCH modeling may be a good fit for modeling the data. The data was split into a training and test dataset with the intention of modeling on the training dataset and comparing forecasted values to the test dataset.
The results of the GARCH model can be seen below. With some insignificant coefficients, it’s unclear if the model produced is the correct fit for the data; more will need to be done to assess this.
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : norm
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.004106 0.001614 2.544308 0.010949
## ar1 -0.172278 0.065292 -2.638585 0.008325
## omega 0.000001 0.000003 0.325834 0.744550
## alpha1 0.000000 0.001230 0.000008 0.999994
## beta1 0.999000 0.001400 713.522265 0.000000
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.004106 0.001911 2.148338 0.031687
## ar1 -0.172278 0.074751 -2.304710 0.021183
## omega 0.000001 0.000017 0.054722 0.956360
## alpha1 0.000000 0.010078 0.000001 0.999999
## beta1 0.999000 0.009461 105.589632 0.000000
##
## LogLikelihood : 492.1471
##
## Information Criteria
## ------------------------------------
##
## Akaike -4.2177
## Bayes -4.1432
## Shibata -4.2186
## Hannan-Quinn -4.1877
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.01273 0.91015
## Lag[2*(p+q)+(p+q)-1][2] 0.57733 0.94304
## Lag[4*(p+q)+(p+q)-1][5] 6.48788 0.03047
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.2833 0.594541
## Lag[2*(p+q)+(p+q)-1][5] 10.4887 0.006892
## Lag[4*(p+q)+(p+q)-1][9] 12.7293 0.012340
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 2.892 0.500 2.000 0.08902
## ARCH Lag[5] 3.467 1.440 1.667 0.22886
## ARCH Lag[7] 3.996 2.315 1.543 0.34715
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 28.5468
## Individual Statistics:
## mu 0.04601
## ar1 0.62515
## omega 2.70713
## alpha1 0.12562
## beta1 0.11256
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.4687 0.6398
## Negative Sign Bias 1.1480 0.2522
## Positive Sign Bias 0.3147 0.7533
## Joint Effect 3.5154 0.3188
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 17.57 0.5512
## 2 30 34.06 0.2369
## 3 40 37.92 0.5191
## 4 50 58.83 0.1588
##
##
## Elapsed time : 0.03925705
##
## please wait...calculating quantiles...
Finally, a 4-quarter forecast was calculated from the GARCH model. A plot of the forecasts (red) vs. actual (black) are below, and suggest irregular performance. Once again, more work needs to be done to understand how to best approach and assess the performance of forecasts. In terms of any comparison to prior models, it should be noted that past models attempted to forecast the real price of homes, versus the price change; it is unclear from this work which approach has resulted in better performance. However, given the volatility in home price data, it is likely that a GARCH approach of some kind is most appropriate.
knitr::opts_chunk$set(echo = TRUE)
library("feasts")
library("seasonal")
library("tsibble")
library("tsibbledata")
library("dplyr")
library("ggplot2")
library("forecast")
library("fable")
library("fpp3")
library("sqldf")
library("quantmod")
library("xts")
library("PerformanceAnalytics")
library("rugarch")
local({
hook_source <- knitr::knit_hooks$get('source')
knitr::knit_hooks$set(source = function(x, options) {
x <- x[!grepl('# SECRET!!$', x)]
hook_source(x, options)
})
})
#csv path defined here; excluded from publication
#Pull in Median US House Price data
UShouse_csv <- read.csv(paste0(csv_path,
"MedianSalesHouseUS.csv"))
UShouse_df <- data.frame(UShouse_csv)
#Pull in US CPI data (urban consumers)
USCPI_csv <- read.csv(paste0(csv_path,
"CPIUrbanUS.csv"))
USCPI_df <- data.frame(USCPI_csv)
#Join Price and CPI data
UShouse_CPI1 <- sqldf('
SELECT H.DATE
, H.MSPUS AS MED_PRICE
, C.CPIAUCSL AS US_CPI
FROM UShouse_df AS H
LEFT JOIN USCPI_df AS C
ON C.DATE = H.DATE
')
#Create CPI reference and adjust home prices to 1/1/2021 USD
UShouse_CPI2 <-
UShouse_CPI1 %>%
mutate(CPI_REF = UShouse_CPI1$US_CPI[UShouse_CPI1$DATE == "2021-01-01"],
CPI_FAC = CPI_REF/US_CPI,
MED_PRICE_ADJ = MED_PRICE*CPI_FAC)
#Create tsibble and mutate median price to be in $000's
USdata <-
UShouse_CPI2 %>%
mutate(MED_PRICE_ADJ = MED_PRICE_ADJ/1000) %>%
mutate(DATE = yearquarter(as.Date(UShouse_CPI2$DATE))) %>%
as_tsibble(index = DATE)
#Adjust prices to changes in price & exclude null row
USdata_return <-
USdata %>%
mutate(MED_PRICE_ADJ1L = lag(MED_PRICE_ADJ, 1L),
MED_PRICE_ADJ_CHG = MED_PRICE_ADJ/MED_PRICE_ADJ1L - 1) %>%
select(DATE,
MED_PRICE_ADJ_CHG)
USdata_return <- USdata_return[-1,]
#Plot of adjusted median home prices
USdata %>%
autoplot(MED_PRICE_ADJ) +
labs(x = "Quarter",
y = "Median Home Price ($000's)",
title = "Median US Home Price",
subtitle = "1/1/2021 Dollars")
USdata_return %>%
autoplot(MED_PRICE_ADJ_CHG) +
labs(x = "Quarter",
y = "Percent Change",
title = "Real Median US Home Price",
subtitle = "1/1/2021 Dollars")
#GARCH modeling
train_end <- length(rownames(USdata_return)) - 4
test_start <- train_end + 1
test_end <- length(rownames(USdata_return))
USdata_train <- USdata_return[1:train_end,]
USdata_test <- USdata_return[test_start:test_end,]
USdata_return_ts <- as.ts(USdata_return)
USdata_train_ts <- as.ts(USdata_train)
g1_USdata_spec <- ugarchspec(mean.model=list(armaOrder = c(1,0)))
g1_USdata <- ugarchfit(g1_USdata_spec, data = USdata_train_ts)
print(g1_USdata)
plot(g1_USdata, which = 'all')
g1_USdata_fc4 <- ugarchforecast(g1_USdata, n.ahead = 4)
g1_USdata_fc <- data.frame(g1_USdata_fc4@forecast$seriesFor)
g1_USdata_fc <- cbind(c(1:4),
g1_USdata_fc)
colnames(g1_USdata_fc) <- c("Index", "PriceChange")
USdata_test <- data.frame(USdata_test)
USdata_test_use <- data.frame(cbind(c(1:4),
USdata_test$MED_PRICE_ADJ_CHG))
colnames(USdata_test_use) <- c("Index", "PriceChange")
ggplot(g1_USdata_fc,
aes(x = Index,
y = PriceChange)) +
geom_line(col = "red",
lty = "dotted") +
geom_line(data = USdata_test_use,
aes(x = Index,
y = PriceChange)) +
labs(x = "Forecast Period",
y = "Price Change",
title = "Percent Change Real Median US Home Price",
subtitle = "4-quarter Forecast Comparison")