Quarterly median home price data was retrieved from FRED for the 1963Q1 through 2021Q4 time frame. In its original form, the data was quoted in nominal dollars. Therefore, the median prices were transformed to be inflation-adjusted to 1/1/2021 USD.

A plot of the price-adjusted data is below. The data shows an upward trend in the median price-adjusted value of homes in the US; there does not appear to be any clear seasonality.

The data was split into training (230 observations) and testing (6 observations) datasets. An ETS and ARIMA model were developed on the training dataset and can be summarized as follows:

Results of the models’ forecasts across the test dataset are shown below. As is clear from the data, each model hovers around a mean value. The ETS model displays slight seasonality, while the ARIMA model converges to a specific value; this is expected given that the ARIMA model’s constant is 0 and a first-order difference is all that is taken.

Further model results are shown below, specifically their individual performance on the training and testing datasets.

## # A tibble: 4 × 10
##   .model .type        ME  RMSE   MAE     MPE  MAPE    MASE   RMSSE    ACF1
##   <chr>  <chr>     <dbl> <dbl> <dbl>   <dbl> <dbl>   <dbl>   <dbl>   <dbl>
## 1 ARIMA  Training  0.267  6.68  4.92  0.0994  2.04   0.464   0.498 -0.0157
## 2 ETS    Training  0.468  6.65  4.90  0.185   2.02   0.462   0.496 -0.0650
## 3 ARIMA  Test     38.5   42.0  38.5  10.1    10.1  NaN     NaN      0.438 
## 4 ETS    Test     42.8   46.3  42.8  11.3    11.3  NaN     NaN      0.428

Overall, the ETS model performed marginally better on the training data with RMSE of \(6.65\) vs. \(6.68\); the ETS model’s metrics were lower than the ARIMA model’s across the other specified metrics, too. However, the opposite is true on the test data where the ARIMA model performed better. As the plots above show, the ARIMA model’s prediction intervals were also narrower than those given by the ETS model. Of note, metrics similar to the RMSE were used for model comparison. Other model performance metrics, such as the \(AIC_c\), were not considered as they cannot be compared across different model classes.

The test interval for the data covers the period of an unexpected spike in the value of homes in the United States as demand for homes increased due to the pandemic. As one final test, only home price data prior to 2019Q4 was used for model training and testing.

The results of the second modeling iterations are shown below. The models produced are similar in characteristic to the ones produced on the larger dataset.

Once again, the ETS performed better on the training data, while the ARIMA model performed better on the test data.

## # A tibble: 4 × 10
##   .model .type        ME  RMSE   MAE    MPE  MAPE    MASE   RMSSE      ACF1
##   <chr>  <chr>     <dbl> <dbl> <dbl>  <dbl> <dbl>   <dbl>   <dbl>     <dbl>
## 1 ARIMA  Training  0.577  6.82  5.08  0.229  2.14   0.480   0.510 -0.000946
## 2 ETS    Training  0.446  6.64  4.88  0.180  2.04   0.460   0.497 -0.0493  
## 3 ARIMA  Test     -0.478  4.55  4.10 -0.166  1.23 NaN     NaN     -0.248   
## 4 ETS    Test      4.64   7.12  5.88  1.37   1.75 NaN     NaN      0.0743

Appendix

Residual Plots - First Model Definitions

Residual Plots - Second Model Definitions

R Code

knitr::opts_chunk$set(echo = TRUE)

library("feasts")
library("seasonal")
library("tsibble")
library("tsibbledata")
library("dplyr")
library("ggplot2")
library("forecast")
library("fable")
library("fpp3")
library("sqldf")

local({
  hook_source <- knitr::knit_hooks$get('source')
  knitr::knit_hooks$set(source = function(x, options) {
    x <- x[!grepl('# SECRET!!$', x)]
    hook_source(x, options)
  })
})

#csv path defined here; excluded from publication

#Pull in Median US House Price data
UShouse_csv <- read.csv(paste0(csv_path, 
                                  "MedianSalesHouseUS.csv"))

UShouse_df <- data.frame(UShouse_csv)

#Pull in US CPI data (urban consumers)
USCPI_csv <- read.csv(paste0(csv_path,
                             "CPIUrbanUS.csv"))

USCPI_df <- data.frame(USCPI_csv)

#Join Price and CPI data
UShouse_CPI1 <- sqldf('
                        
    SELECT H.DATE
    , H.MSPUS AS MED_PRICE
    , C.CPIAUCSL AS US_CPI
    
    FROM UShouse_df AS H
    
    LEFT JOIN USCPI_df AS C
    ON C.DATE = H.DATE
                        
')

#Create CPI reference and adjust home prices to 1/1/2021 USD
UShouse_CPI2 <- 
  UShouse_CPI1 %>%
  mutate(CPI_REF = UShouse_CPI1$US_CPI[UShouse_CPI1$DATE == "2021-01-01"],
         CPI_FAC = CPI_REF/US_CPI,
         MED_PRICE_ADJ = MED_PRICE*CPI_FAC)

#Create tsibble and mutate median price to be in $000's
USdata <- 
  UShouse_CPI2 %>%
  mutate(MED_PRICE_ADJ = MED_PRICE_ADJ/1000) %>%
  mutate(DATE = yearquarter(as.Date(UShouse_CPI2$DATE))) %>%
  as_tsibble(index = DATE)


#Plot of adjusted median home prices
USdata %>%
  autoplot(MED_PRICE_ADJ) +
  labs(x = "Quarter",
       y = "Median Home Price ($000's)",
       title = "Median US Home Price",
       subtitle = "1/1/2021 Dollars")


#Create training and testing datasets for ETS and ARIMA models
train_cutoff <- length(rownames(USdata))-6
test_pickup <- train_cutoff + 1

USdata_train <- USdata[1:train_cutoff,]
USdata_test <- USdata[test_pickup:length(rownames(USdata)),]

#Create modeling variable
USdata_ETS <- 
  USdata_train %>%
  model(ETS = ETS(MED_PRICE_ADJ))

USdata_ETS_forecast <-
  USdata_ETS %>%
  forecast(h = length(rownames(USdata_test)))

USdata_ARIMA <-
  USdata_train %>%
  model(ARIMA = ARIMA(MED_PRICE_ADJ, stepwise = FALSE, approximation = FALSE))

USdata_ARIMA_forecast <-
  USdata_ARIMA %>%
  forecast(h = length(rownames(USdata_test)))


USdata_ARIMA_forecast %>%
  autoplot(USdata_train) +
  labs(x = "Quarter",
       y = "Median Home Price ($000)",
       title = "US Median Home Price, ARIMA",
       subtitle = "Quarterly, 1/1/2021 USD")

USdata_ETS_forecast %>%
  autoplot(USdata_train) +
  labs(x = "Quarter",
       y = "Median Home Price ($000)",
       title = "US Median Home Price, ETS",
       subtitle = "Quarterly, 1/1/2021 USD")


print(
  rbind(USdata_ARIMA %>% accuracy(),
      USdata_ETS %>% accuracy(),
      USdata_ARIMA %>% 
        forecast(h = length(rownames(USdata_test))) %>% accuracy(USdata_test),
      USdata_ETS %>% 
        forecast(h = length(rownames(USdata_test))) %>% accuracy(USdata_test))
)


#Create tsibble and mutate median price to be in $000's
USdata <- 
  UShouse_CPI2 %>%
  filter(DATE <= "2019-10-01")

USdata <-
  USdata %>%
  mutate(MED_PRICE_ADJ = MED_PRICE_ADJ/1000) %>%
  mutate(DATE = yearquarter(as.Date(USdata$DATE))) %>%
  as_tsibble(index = DATE)
  
#Create training and testing datasets for ETS and ARIMA models
train_cutoff <- length(rownames(USdata))-6
test_pickup <- train_cutoff + 1

USdata_train <- USdata[1:train_cutoff,]
USdata_test <- USdata[test_pickup:length(rownames(USdata)),]

#Create modeling variable
USdata_ETS2 <- 
  USdata_train %>%
  model(ETS = ETS(MED_PRICE_ADJ))

USdata_ETS_forecast2 <-
  USdata_ETS2 %>%
  forecast(h = length(rownames(USdata_test)))

USdata_ARIMA2 <-
  USdata_train %>%
  model(ARIMA = ARIMA(MED_PRICE_ADJ, stepwise = FALSE, approximation = FALSE))

USdata_ARIMA_forecast2 <-
  USdata_ARIMA2 %>%
  forecast(h = length(rownames(USdata_test)))


USdata_ARIMA_forecast2 %>%
  autoplot(USdata_train) +
  labs(x = "Quarter",
       y = "Median Home Price ($000)",
       title = "US Median Home Price, ARIMA",
       subtitle = "Quarterly, 1/1/2021 USD")

USdata_ETS_forecast2 %>%
  autoplot(USdata_train) +
  labs(x = "Quarter",
       y = "Median Home Price ($000)",
       title = "US Median Home Price, ETS",
       subtitle = "Quarterly, 1/1/2021 USD")


print(
  rbind(USdata_ARIMA2 %>% accuracy(),
      USdata_ETS2 %>% accuracy(),
      USdata_ARIMA2 %>% 
        forecast(h = length(rownames(USdata_test))) %>% accuracy(USdata_test),
      USdata_ETS2 %>% 
        forecast(h = length(rownames(USdata_test))) %>% accuracy(USdata_test))
)


USdata_ARIMA %>%
  gg_tsresiduals(lag_max = 24) + 
  labs(title = "Residual Plots",
       subtitle = "ARIMA(3,1,0)(1,0,0)")

USdata_ETS %>%
  gg_tsresiduals(lag_max = 24) +
  labs(title = "Residual Plots",
       subtitle = "ETS(M,Ad,A)")


USdata_ARIMA2 %>%
  gg_tsresiduals(lag_max = 24) + 
  labs(title = "Residual Plots",
       subtitle = "ARIMA(3,1,0)(1,0,0)")

USdata_ETS2 %>%
  gg_tsresiduals(lag_max = 24) +
  labs(title = "Residual Plots",
       subtitle = "ETS(M,Ad,A)")