## $serif
## [1] "TT Times New Roman"
## 
## $sans
## [1] "TT Arial"
## 
## $mono
## [1] "TT Courier New"
## [1] "tbl_df"     "tbl"        "data.frame"
## [1] "tbl_ts"     "tbl_df"     "tbl"        "data.frame"
## [1] "tbl_df"     "tbl"        "data.frame"
## [1] "tbl_ts"     "tbl_df"     "tbl"        "data.frame"
## [1] "tbl_df"     "tbl"        "data.frame"
## [1] "tbl_ts"     "tbl_df"     "tbl"        "data.frame"

Data

The data-sets within this report provide a comprehensive view of the housing market in the San Jose-Sunnyvale-Santa Clara metropolitan area, covering both supply (inventory) and demand (market hotness), as well as the overall trend in house prices.

## [1] "ndiffs for ts_hpi_SJ_SV_SC:  1"
## [1] "ndiffs for ts_inventory_SJ_SV_SC:  0"
## [1] "ndiffs for ts_market_hotness_SJ_SV_SC:  1"

Research Questions

In this outlook we ask the following questions:

  1. Can we identify a trended relationship between Housing Prices, Inventory, and Market Hotness?

  2. Given our plots, can we see a common pattern of seasonality that aligns with each other?

Topics covered:

I. Housing Price Indices

  • The data-set employed represents the All-Transactions House Price Index for the San Jose-Sunnyvale-Santa Clara metropolitan statistical area. The data is quarterly, starting from July 1975 to October 2022. The House Price Index is a broad measure of the movement of single-family house prices in the region.

  • House Price Index for San Jose-Sunnyvale-Santa Clara, CA (MSA) (ATNHPIUS41940Q)

  1. Inventory
  • This dataset represents the number of active housing listings in the San Jose-Sunnyvale-Santa Clara metropolitan statistical area. The data is monthly, starting from July 2016 to April 2023. This gives an indication of the supply side of the housing market in the region.

  • Active Listing Count in San Jose-Sunnyvale-Santa Clara, CA (CBSA) (ACTLISCOU41940)

  1. Market Hotness
  • This dataset represents the demand score, a measure of market hotness, for the San Jose-Sunnyvale-Santa Clara metropolitan statistical area. The data is monthly, starting from August 2017 to April 2023. This gives an indication of the demand side of the housing market in the region.

  • Demand Score in San Jose-Sunnyvale-Santa Clara, CA (CBSA) (DESCMSA41940)

Note: All data implemented in this study is gathered from FRED.

Summary of Findings

In light of the comprehensive analysis and forecasting conducted in this study, our preferred models project a promising trajectory for the Housing Price Index in Silicon Valley over a five-year horizon. The models indicate a continuous upward trend in home prices, suggesting a robust real estate market in the region.

In terms of housing inventory, our forecasts predict a stable and seasonal pattern, with no discernible upward or downward trends. This steady state of inventory suggests a balanced housing market where supply adequately meets demand. However, when juxtaposed with the market hotness index, intriguing patterns emerge.

The market hotness index, a measure of market demand, is forecasted to exhibit a decreasing trend over the next 14 months according to our optimal model. Interestingly, its seasonality exhibits an inverse relationship with housing inventory. As market hotness increases, indicating heightened demand, inventory correspondingly decreases, and vice versa. This inverse relationship underscores the fundamental economic principles of supply and demand at play in the Silicon Valley housing market.

In conclusion, while the upward trajectory of the Housing Price Index signals a healthy real estate market, the steady state of inventory coupled with a decreasing market hotness index could indicate a potential shift towards a more balanced market in the future. Taken together, these trends suggest that while the Silicon Valley housing market has been strong, with rising home prices, it may be moving towards a more balanced state. This could mean that the rapid price increases of the past may slow, and buyers may find themselves with more options and less competition

General Overview

Analysis

Housing Price Index

Overview & Synopsis

The House Price Index (HPI) is a broad measure of the movement of single-family property prices. This indicator serves as an indicator of house price trends, functioning as a for estimating changes in the rates of mortgage defaults, prepayments, and housing affordability.

Number of Observations: 190 (Quarterly)

Range of data: 1975 to 2022 (San Jose, Sunnyvale, and Santa Clara)

Range of Values: 19.29 (Min) - 532.26 (Max)

Trends: The HPI is trended upward across time, with a dip occurring during the Great Recession.

Seasonality: There is evidence of seasonality, there are peaks almost eveery second quarter.

Cycles: No cycles could be observed from the illustrations illustrated in the visuals.

Directly below are necessary visuals of the original data (HPI).

##       date             series_id             value       realtime_start      
##  Min.   :1975-07-01   Length:190         Min.   : 19.2   Min.   :2024-08-06  
##  1st Qu.:1987-04-23   Class :character   1st Qu.: 68.7   1st Qu.:2024-08-06  
##  Median :1999-02-15   Mode  :character   Median :142.3   Median :2024-08-06  
##  Mean   :1999-02-15                      Mean   :189.6   Mean   :2024-08-06  
##  3rd Qu.:2010-12-09                      3rd Qu.:290.3   3rd Qu.:2024-08-06  
##  Max.   :2022-10-01                      Max.   :530.4   Max.   :2024-08-06  
##   realtime_end       
##  Min.   :2024-08-06  
##  1st Qu.:2024-08-06  
##  Median :2024-08-06  
##  Mean   :2024-08-06  
##  3rd Qu.:2024-08-06  
##  Max.   :2024-08-06

Original Plots and Important Figures

Plotting Training Data (For ARIMA Accuracy)

Differeneced Data For Stationarity

This section will discuss the auto-correlation, and partial auto-correlation function plots (ACF/PACF), these plots allowed for the estimation of our performed forecasts.

First Difference (Full-Sample)

##   kpss_stat kpss_pvalue 
##  0.42483288  0.06645134

Second Difference (Full-Sample)

We visualize an ACF plot for first difference (I=1): since we notice that a few significant spikes that are decreasing to a zero line, and since we only have 2 significant spikes in the PACF we feel confident in using an full AR model).

PACF for the first difference (I=1): There are three significant spikes, prior to non-seasonal lags, however, the third spike does not warrant that much concern. Thus, we can implement AR2 Model.There are two non-seasonal spikes to consider.

ACF Plot for second difference(I=1,d=1): We observe two significant spikes spikes in the ACF, and a sinusoidal pattern, thus we assume an AR model.

PACF for the second difference (I=1,d=1): 1 seasonal spike, 1 non-seasonal spike.

ARIMA(p,d,q)(P,D,Q) models after analysis:

  • ARIMA(2,1,0)(2,0,0)
  • ARIMA(2,1,0)(2,1,0)

In our forecast we implement auto-ARIMA as well.

  • ARIMA(2,1,2)(1,0,0)

Differenced Data (Restricting to Training Data)

##   kpss_stat kpss_pvalue 
##   0.1097486   0.1000000

The ACF plot for the first difference of the training data HPI (SJ-SV-SC) shows a sinusoidal pattern. This suggests that a seasonal ARIMA model may be a better fit than a non-seasonal ARIMA model.

In this case, the AR component with p=1 models the correlation between the current value of the HPI and its previous value. The MA component with q=0 models the correlation between the current value of the HPI and its errors. The constant term is a constant value that is added to the model. The seasonal component with d=1 models the correlation between the current value of the HPI and its values at the same time of year in previous years.

It is important to note that the seasonal ARIMA(p,1,q) model is just one possible model that could be used to forecast the HPI. Other models, such as seasonal ARIMA(p,d,q) models, could also be used. The best model to use will depend on the specific characteristics of the data.

The PACF plot shows that there is no significant autocorrelation at lags greater than 1. This suggests that an ARIMA(p,1,q) model with p=1 and q=0 would be a good fit for the data.

ARIMA models after analysis:

  • ARIMA(1,1,0)

The use of auto.arima() produced the same results as our analysis.

Forecasts (ARIMA)

## Series: value 
## Model: ARIMA(1,1,0) w/ drift 
## 
## Coefficients:
##          ar1  constant
##       0.7809    0.4139
## s.e.  0.0520    0.2418
## 
## sigma^2 estimated as 9.341:  log likelihood=-382.43
## AIC=770.85   AICc=771.01   BIC=779.9
## Series: value 
## Model: ARIMA(1,1,0)(1,0,0)[4] w/ drift 
## 
## Coefficients:
##          ar1    sar1  constant
##       0.4382  0.4605    0.8376
## s.e.  0.0649  0.0984    0.3837
## 
## sigma^2 estimated as 29.49:  log likelihood=-587.05
## AIC=1182.1   AICc=1182.32   BIC=1195.07
## Series: value 
## Model: ARIMA(2,1,0)(2,1,0)[4] 
## 
## Coefficients:
##          ar1     ar2     sar1     sar2
##       0.7474  0.0437  -0.5965  -0.2623
## s.e.  0.0832  0.0828   0.0823   0.0792
## 
## sigma^2 estimated as 12.51:  log likelihood=-393.29
## AIC=796.57   AICc=797   BIC=811.52
## Series: value 
## Model: ARIMA(2,1,2)(2,1,0)[4] 
## 
## Coefficients:
##           ar1     ar2     ma1      ma2     sar1     sar2
##       -0.0274  0.6744  0.8684  -0.1316  -0.5458  -0.2255
## s.e.   0.0869  0.0715  0.1080   0.1027   0.0894   0.0832
## 
## sigma^2 estimated as 11.71:  log likelihood=-388.88
## AIC=791.75   AICc=792.56   BIC=812.69
## Series: value 
## Model: ARIMA(2,1,0)(2,0,0)[4] w/ drift 
## 
## Coefficients:
##          ar1     ar2    sar1    sar2  constant
##       0.3672  0.1834  0.3468  0.1026    0.6653
## s.e.  0.0741  0.0842  0.1127  0.1119    0.3767
## 
## sigma^2 estimated as 29.09:  log likelihood=-584.72
## AIC=1181.43   AICc=1181.89   BIC=1200.88
## Series: value 
## Model: ARIMA(2,1,0)(2,1,0)[4] 
## 
## Coefficients:
##          ar1     ar2     sar1     sar2
##       0.4506  0.1623  -0.4630  -0.1775
## s.e.  0.0740  0.0844   0.1103   0.1006
## 
## sigma^2 estimated as 31.02:  log likelihood=-578.74
## AIC=1167.47   AICc=1167.81   BIC=1183.58
## Series: value 
## Model: ARIMA(2,1,0)(2,0,0)[4] w/ drift 
## 
## Coefficients:
##          ar1     ar2    sar1    sar2  constant
##       0.3672  0.1834  0.3468  0.1026    0.6653
## s.e.  0.0741  0.0842  0.1127  0.1119    0.3767
## 
## sigma^2 estimated as 29.09:  log likelihood=-584.72
## AIC=1181.43   AICc=1181.89   BIC=1200.88
## Time Series:
## Start = 1 
## End = 190 
## Frequency = 1 
##   [1]           NA           NA           NA           NA   0.00276998
##   [6]   1.34370365  -0.17301537   0.48276996   0.80536665  -0.50182765
##  [11]  -0.49589433  -0.65011918  -1.07718626   0.83023424   0.65337484
##  [16]   0.46671240   0.94171122   0.12071846   0.70783935  -0.37847213
##  [21]  -0.26749930  -0.62600543  -0.52972444   0.68714154  -0.85241062
##  [26]  -0.24537991  -4.62325388  -0.83742528  -0.31017243   4.90358000
##  [31]   0.85423254  -0.90281725  -0.07132231  -2.53538842   2.36115247
##  [36]   0.19165440   0.61448610  -1.06952010   0.78946037   0.89506023
##  [41]   0.03563683  -1.12482688   0.31685237  -0.18669823   0.60584698
##  [46]   0.11675913   0.08984127   0.27208727   0.47203794   0.28748217
##  [51]   1.35171319   1.73154887   2.28787832   2.58143052   1.66844169
##  [56]   2.22364753   0.67012071  -4.52774879  -5.00188297  -4.36825760
##  [61]  -3.17739121  -1.88222472   1.10360011  -0.08884039  -0.26668568
##  [66]   1.01567936  -0.71671745  -0.82541865  -0.32351822   0.77368355
##  [71]  -0.27740564   0.32750163  -0.38772936   0.50316972   0.35007488
##  [76]  -0.40061024   0.02170946   0.49615164   0.70892077   1.35450812
##  [81]  -0.03837993  -0.03154500   0.44390478   0.72456842   0.85911964
##  [86]   0.59153175   1.70132912   2.63797710  -0.06464273   0.41418647
##  [91]   0.70565498   0.56030937   1.45252677  -1.64002533  -1.47605845
##  [96]   0.46079617   3.43171263   2.10054729   8.57994483   3.25580048
## [101]  -0.63345437   0.47166395  -3.62898190  -5.81402469  -9.46596037
## [106]  -6.31909145  -0.09800519   4.20735813   2.96563236   1.47041185
## [111]  -5.50744181  -5.06799932   1.42145945   6.04916634   0.30105644
## [116]   5.31931869  13.04919363  -4.64693687   5.86260995   9.57608455
## [121]  -7.93865318   4.47101647  -7.76821406  -7.08979810  -3.44901135
## [126]  -5.83926728  -4.93282039  -0.12492573  -9.09415973  -2.64381688
## [131]  -1.73642909 -14.76558021  -3.43507520   4.39358946   3.52634496
## [136]  -6.34624047   9.58073356   6.73567052   3.84338159   7.68124559
## [141]   1.12990575  -5.74152179  -3.67615000   5.76931464   1.88331359
## [146]  -1.49717901   1.27006081   4.44551777   4.06343446   3.53465321
## [151]   3.73884516   5.25817756   3.91253836  -3.77703552   0.69417738
## [156]   1.26978353  -4.03532656  -0.79466738   1.18810451   2.92487191
## [161]   0.94929596  -1.36328899  -0.95865107  -4.03150321  -2.65805086
## [166]  -1.36859557  -2.25909270  -1.19058505   3.73059689   4.95811358
## [171]  10.08320694   0.57279166 -10.02882700  -8.30105303 -10.29301103
## [176]  -0.92150087  -1.18087077   1.66354655   3.30359356  -5.37190787
## [181]   0.48059985   6.78216140   1.62366778   7.84241980   2.13694162
## [186]  13.49915617  12.44859464  22.37767097 -45.58618374   5.30420314

## Time Series:
## Start = 1 
## End = 190 
## Frequency = 1 
##   [1]            NA            NA            NA            NA  9.233319e-04
##   [6]  1.424490e-03  1.308230e-03  9.659944e-04  1.020476e-02 -1.323994e+00
##  [11]  1.913912e-02 -1.255253e+00 -1.973280e+00  5.948173e-01  8.174018e-01
##  [16]  5.092992e-01  1.186075e+00 -3.858721e-01  7.309537e-01 -4.838086e-01
##  [21] -2.912330e-01 -7.871945e-01 -1.079667e+00  9.351580e-01 -1.130319e+00
##  [26]  6.364053e-02 -4.788178e+00 -5.270881e-01 -1.275351e-01  5.242872e+00
##  [31]  2.892526e+00 -8.192962e-01  1.001610e+00 -5.286309e+00  4.768110e+00
##  [36]  2.386558e-02  9.426966e-01 -1.728414e+00 -2.621343e-01  1.314030e+00
##  [41] -5.913637e-01  2.403231e-01 -1.721542e+00 -5.474940e-01  3.487706e-01
##  [46]  1.873970e+00 -6.612526e-01  9.975373e-02  4.362701e-02  3.064579e-01
##  [51]  1.308243e+00  1.458925e+00  1.690547e+00  2.289881e+00  5.951258e-01
##  [56]  1.331104e+00 -9.720657e-01 -5.889213e+00 -5.578980e+00 -5.885274e+00
##  [61] -4.025841e+00 -2.509532e-01  2.732381e+00  6.913934e-01  9.715944e-01
##  [66]  3.594258e+00  1.720199e-01  1.181211e+00  1.250374e+00  1.579940e+00
##  [71]  1.839836e-01  1.556544e+00  6.176708e-02 -6.619726e-01  2.637203e-01
##  [76] -8.432558e-01  1.710860e-01 -3.064375e-01  1.038653e+00  1.509421e+00
##  [81]  5.338545e-02 -2.833731e-01 -8.179839e-02  4.055407e-02  8.940880e-01
##  [86]  2.506684e-01  1.083975e+00  1.604725e+00 -7.691340e-01  3.091319e-01
##  [91] -4.630951e-01 -1.320812e+00  1.369598e+00 -2.340743e+00 -2.172838e+00
##  [96] -6.224929e-01  2.455058e+00  2.312551e+00  8.721858e+00  1.388868e+00
## [101] -2.801900e+00  7.373080e-01 -7.935662e+00 -6.386130e+00 -1.005203e+01
## [106] -6.280855e+00 -1.207071e+00  5.583724e+00  7.216413e+00  3.434913e+00
## [111] -5.127201e+00 -4.293752e+00  4.536172e+00  7.109812e+00  3.943686e+00
## [116]  7.261049e+00  1.098871e+01 -9.243639e+00  9.547277e+00  6.015734e+00
## [121] -1.775378e+01  7.128509e+00 -1.235404e+01 -1.173799e+01 -2.406615e+00
## [126] -7.239733e+00 -2.937313e+00 -1.402316e+00 -5.346935e+00  3.488069e-01
## [131]  3.355373e+00 -1.274578e+01  7.209877e+00  5.334138e+00  7.112902e+00
## [136]  3.060661e+00  1.430136e+01  4.755653e+00  2.092125e+00  1.668664e+01
## [141] -4.284484e+00 -1.010509e+01 -5.470449e+00  6.091011e+00 -4.437862e+00
## [146] -6.687080e-01  1.498126e+00 -3.425425e+00  2.022981e+00  6.324966e+00
## [151]  3.604998e+00 -8.525076e-01  2.405674e+00 -4.356954e+00 -4.860163e-01
## [156] -2.955861e+00 -8.022486e+00  1.934489e-01 -1.969994e+00 -6.206125e-02
## [161]  1.205300e+00  3.824022e-02 -2.023471e+00 -6.190704e+00 -6.331518e-01
## [166]  2.970055e-01 -2.135716e+00  5.290317e-01  5.231333e+00  4.883976e+00
## [171]  1.105806e+01  1.301494e+00 -1.121834e+01 -8.264521e+00 -1.405890e+01
## [176]  1.444464e+00  2.396960e+00  3.642278e+00  4.501366e+00 -6.130160e+00
## [181]  5.785221e+00  7.555761e+00  1.845293e+00  1.151608e+01  2.348705e+00
## [186]  1.070718e+01  1.129148e+01  1.944828e+01 -4.996987e+01  2.584037e+00

## Time Series:
## Start = 1 
## End = 190 
## Frequency = 1 
##   [1]            NA            NA            NA            NA   0.002769976
##   [6]   1.279227642  -0.235423247   0.459212211   0.749542616  -0.540140774
##  [11]  -0.420222666  -0.582338768  -0.951076982   0.864092991   0.854245376
##  [16]   0.610000923   1.010138687   0.257588197   0.766702392  -0.203411696
##  [21]   0.013545237  -0.612171379  -0.491153476   0.795848111  -0.819522217
##  [26]   0.070847161  -4.446183169  -0.803743664  -0.386302028   4.723389199
##  [31]   1.110262805  -1.118342296  -0.347931913  -3.378015508   2.485162062
##  [36]   0.134977839   0.550657696  -1.612092186  -0.012356150   0.784668255
##  [41]  -0.019987955  -0.261555275   0.051637008  -0.352915451   0.544322235
##  [46]   0.133475065   0.235538155   0.267801081   0.540857724   0.453509561
##  [51]   1.456788585   2.032127784   2.540856376   3.067470383   2.322037052
##  [56]   3.013040140   1.711745297  -3.233855597  -3.803022291  -3.510916437
##  [61]  -2.414940005  -0.993335810   1.857174150   0.263707533  -0.093367448
##  [66]   1.389291069  -0.769496452  -0.885529926  -0.792670183  -0.380202096
##  [71]  -1.647009187  -0.820858477  -1.463560522  -0.645261203  -0.222281132
##  [76]  -1.052214406  -0.645419133  -0.289883402  -0.036103269   0.667325905
##  [81]  -0.542484319  -0.592351539  -0.185124737   0.296571486   0.568793153
##  [86]   0.411283268   1.576154508   2.464963673   0.217971376   0.878408148
##  [91]   1.083995176   1.080052359   2.242907463  -0.701528779  -0.633387466
##  [96]   1.086593337   4.414000884   3.489144744  10.176815713   5.136291616
## [101]   1.212179613   2.906384612  -1.510413585  -3.224373808  -7.047276797
## [106]  -4.522839407   0.560618545   5.542190126   5.158044886   2.880547895
## [111]  -3.991510717  -4.581016343   1.651747305   6.222611584   0.345294502
## [116]   4.685512259  11.738564750  -5.004703501   7.739154699  11.435507070
## [121]  -6.104461901   7.272357183  -5.706525263  -5.469974130  -1.496701583
## [126]  -2.449190697  -2.674228122   1.083382160  -6.009718583  -3.273636042
## [131]  -0.895129031 -14.655166269  -5.656145775   1.788291506  -0.289901222
## [136]  -9.849571922   5.856507857   1.209562611  -0.702924282   6.205992151
## [141]  -3.090027059  -9.572026507  -6.952436504   2.620867574  -1.143101812
## [146]  -2.320879004   0.271727531   1.522423540   3.916332953   4.565172511
## [151]   4.485645437   5.467882034   4.415101432  -2.657557136   1.893241475
## [156]   2.809230150  -2.004762829   1.877293211   3.449019239   5.149894233
## [161]   4.024768769   2.442984430   1.866057248  -1.506426950   0.411804877
## [166]   0.607507014  -0.350634162   0.492407152   4.785993335   6.489017301
## [171]  11.924663558   3.270509506  -7.689595686  -6.939732060  -9.732722528
## [176]  -0.163539721  -0.307928482   2.079905023   2.527512398  -5.978903578
## [181]   1.008787133   6.469508863   2.381588322   7.667541690   0.861675292
## [186]  11.955596478  11.721066715  24.666201499 -41.686055862   6.744814089

## AIC score for hpi_sj_Sv_sc_fcARIMA210000:  1181.433
## AIC score for hpi_sj_Sv_sc_fcARIMA210210:  1167.474
## AIC score for hpi_auto_ARIMA212100:  1176.748
## AIC score for fit_full_ARIMA110:  1181.33

After analyzing the AIC values we see that when forecasting, given the the full data-set, the best model is ARIMA210210. We will verify this by plotting upon the test set, and analyze the regression plots respectively.

Plotting ARIMA Forecasts & Test Set for Accuracy (Restricted Forecast Horizon to Test Set)

Residual Diagnostic Tests, Determining The Best Model

Naive Plot (We utilize this as a baseline):

  • ACF plot details no-white noise, skewed distribution of residuals.
  • The graph identifies this wouldn’t be a good forecast.

ARIMA(2,1,0)(2,1,0)

  • Uni-modal histogram, white noise in ACF plot.

Auto ARIMA(2,1,2)(1,0,0)

  • Very Similar to ARIMA210210. White noise in ACF plot.

  • Both ARIMA models can effectively predict next five years. However, given the AUTO ARIMA’s tighter confidence intervals, and more realistic trajectory, we employ ARIMA210210 as our most effective model.

Forecasts

This section visualizes the forecasts for the ARIMA models utilizing the full sample of HPI.

Preferred Model

Broader Implications

The forecasts performed in this research illustrate predicted increase over the time horizon of five years.

Forecasting Bay Area housing price index data, particularly in Silicon Valley, is crucial for a variety of parties. It enables real estate investors to make informed judgments by identifying trends and predicting future price fluctuations. This allows them to maximize their investment returns by purchasing, selling, or holding properties strategically. The second advantage of housing price forecasts is that purchasers and sellers can anticipate market conditions and make well-informed decisions. Such forecasts enable customers to plan purchases based on future price changes, while sellers can strategically schedule property listings to achieve the highest possible sale prices. Thirdly, housing price forecasts are crucial for policy and planning purposes. Government officials and urban planners can use this information to shape housing policies and urban development plans, ensuring a balanced supply of affordable housing and sustainable growth. In addition, financial institutions use housing price forecasts to evaluate risk, manage mortgage portfolios, and determine lending practices. Lastly, housing price index forecasts offer valuable insights for economic analysis, as they serve as leading indicators of economic health and influence consumer purchasing, construction activity, and overall economic growth.

The projected upward trajectory in housing prices also carries implications for the broader economy. It signifies a potential wealth effect, as homeowners experience an increase in their property values, leading to increased consumer confidence and spending.

Inventory

Overview & Synopsis:

This data is collected on a monthly basis, starting from July 2016 and extending to April 2023. The inventory data provides valuable insights into the supply side of the housing market in the SJ-SV-SC MSA. The inventory level in a housing market is a key indicator of market conditions. A high inventory level indicates a buyer’s market, where buyers have more options and may have more negotiating power. On the other hand, a low inventory level indicates a seller’s market, where sellers may have the upper hand due to increased competition among buyers.

Number of Observations: 82

Range of data: July 2016 - April 2023

Range of Values: 489.0 (Min) - 2366.0 (Max)

Trends: Generally average trend over the last 7 years, no downward nor upward trend.

Seasonality: Peaks in Summer months, troughs in winter months.

Cycles: No cycles could be identified

Original Plots and Important Figures

Observe, the Inventory data is practically stationary, there is strong sense of seasonality in the summer months (June, July are peak months across time) Comparatively the inventory of houses is at its lowest points in January.

Plotting Training Data

This is an eighty-percent split between training data, and testing data. We employ this in our research

Stationarity Tests (Original Data was already Stationary Enough)

##   kpss_stat kpss_pvalue 
##   0.1721824   0.1000000
##   kpss_stat kpss_pvalue 
##   0.1466226   0.1000000

The ACF plot is sinusoidal, Two significant spikes in the PACF plot. One spike outside seasonal

AR 2 Model, with no differences:

  • AR(2,0,0)

We employ an Auto ARIMA as well, providing us with a model:

  • ARIMA(2,0,0)(1,1,0)

Training Data, Stationarity & Models

Thus the plot guide us to an ARIMA(2, d, 0): If the ACF plot shows a significant spike at lag 2 and the PACF plot shows significant spikes at lags 1 and 2, an ARIMA(2, d, 0) model might be suitable. This indicates that there is an auto-regressive component with lag 1 and 2. Moreover, we do not difference this data so d=0.

  • ARIMA(2,0,0) should be used for the forecast of training data.

An auto ARIMA function aligns with our findings. An interesting note, the model derived from the original, full-sample, data-set also produced an ARIMA(2,0,0).

## Series: train_df_inv$value 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2       mean
##       1.5421  -0.7072  1373.5526
## s.e.  0.0869   0.0882   107.5885
## 
## sigma^2 = 22265:  log likelihood = -424.01
## AIC=856.02   AICc=856.68   BIC=864.78

Forecasts

## Series: ts_inventory_SJ_SV_SC 
## ARIMA(2,0,0)(0,1,2)[12] 
## 
## Coefficients:
##          ar1      ar2     sma1    sma2
##       1.3154  -0.3846  -0.9719  0.1425
## s.e.  0.1108   0.1137   0.3033  0.1710
## 
## sigma^2 = 13828:  log likelihood = -439.72
## AIC=889.45   AICc=890.39   BIC=900.69
## Series: value 
## Model: ARIMA(2,0,0) w/ mean 
## 
## Coefficients:
##          ar1      ar2  constant
##       1.5421  -0.7072  226.7922
## s.e.  0.0869   0.0882   17.7643
## 
## sigma^2 estimated as 22265:  log likelihood=-424.01
## AIC=856.02   AICc=856.68   BIC=864.78
## Series: value 
## Model: ARIMA(2,0,0)(1,1,0)[12] 
## 
## Coefficients:
##          ar1      ar2     sar1
##       1.4196  -0.4809  -0.4278
## s.e.  0.1185   0.1217   0.1336
## 
## sigma^2 estimated as 18109:  log likelihood=-342.45
## AIC=692.91   AICc=693.72   BIC=700.86
## Series: value 
## Model: ARIMA(2,0,0)(1,1,0)[12] 
## 
## Coefficients:
##          ar1      ar2     sar1
##       1.3798  -0.4522  -0.5275
## s.e.  0.1039   0.1047   0.1136
## 
## sigma^2 estimated as 18014:  log likelihood=-444.04
## AIC=896.07   AICc=896.69   BIC=905.07
## Series: value 
## Model: ARIMA(2,0,0) w/ mean 
## 
## Coefficients:
##          ar1      ar2  constant
##       1.5055  -0.6816  238.1773
## s.e.  0.0785   0.0794   16.4333
## 
## sigma^2 estimated as 23433:  log likelihood=-528.8
## AIC=1065.59   AICc=1066.11   BIC=1075.22

Plotting ARIMA Forecasts & Test Set for Accuracy

Accuracy Test

Given the visualization and AIC analysis, the ARIMA(2,0,0)(1,1,0) seems to more accurately forecast the reality of housing inventory.

Residual Diagnostic Tests, Full-Sample, Determining The Best Model

In order to determine which model we prefer, we run through residual tests in order to see what model should be employed for the full-sample forecast.

ARIMA(2,0,0)

  • 2 significant spike in ACF, There might be a better type of forecast

ARIMA(2,0,0)(1,1,0)

ARIMA200110 is the best model utilized, histogram is most normal, residuals are fairly normalized, and there is white noise.

  • ARIMA200110 is our preferred model.

Forecast

Preferred Model ARIMA(2,0,0)(1,1,0)

Broader Implications

Our forecasts determine that housing inventory withing the silicon valley MSA will most likely stay stable, with seasonal periods. The forecast of a stable housing inventory suggests that the supply of houses in the market is expected to remain steady. This could be due to a balance between new houses being built and old houses being sold or taken off the market. A stable supply could help prevent drastic fluctuations in housing prices.

Market Hotness

Overview & Synopsis:

The Market Hotness index is calculated based on various factors such as the number of views per property on real estate websites, the number of days a property is listed before it is sold, and the change in the median listing price. A higher Market Hotness index indicates a higher demand for properties, suggesting a seller’s market where sellers have the upper hand due to increased competition among buyers. Conversely, a lower Market Hotness index indicates a buyer’s market where buyers have more options and may have more negotiating power. Number of Observations: 69 (Monthly)

Range of data: 2017-08-01 to 2023-04-01 (San Jose, Sunnyvale, and Santa Clara)

Range of Values: 1.338 (Not so hot), 95.652 (Very Hot)

Trends: Decreasing Trend across the last six years.

Seasonality: Market Hotness peaks in January and June

Cycles: No cycles could be identified.

Original Plots and Important Figures

Plotting Training Data (For ARIMA Accuracy)

Differeneced Data For Stationarity (Full-Sample)

The residual plot of the stationary data (second difference), indicates that there are three significant lags in the ACF plot at lags 1, 2, 3. Additionally, there are two significant spikes at lags 8 and 12 in the ACF lot.

When viewing the PACF plot, there is 1 significant spike at lag 1, and 1 spike at lag 12.

Based on the description of the ACF and PACF plots for the second difference, it suggests that there are significant spikes at lags 1, 2, and 3 in the ACF plot for the non-seasonal component. This indicates the presence of autoregressive (AR) terms. Additionally, there are significant spikes at lags 8 and 12 in the ACF plot for the seasonal component, suggesting the presence of seasonal AR terms.

In the PACF plot, there is a significant spike at lag 1 for the non-seasonal component, indicating the presence of a moving average (MA) term. There is also a spike at lag 12 for the seasonal component, suggesting the presence of a seasonal MA term.

Therefore, based on the ACF and PACF plots, a possible ARIMA model for the second difference could be ARIMA(0, 1, 1)(0, 1, 1)[12] or ARIMA(3, 1, 0)(1, 1, 0)[12].

The auto ARIMA function suggest a different model: ARIMA(1,1,1)(1,1,0).

Differenced Data (Restricting to Training Data)

##   kpss_stat kpss_pvalue 
##  0.07339159  0.10000000

The ACF plot shows significant spikes at lags 1, 2, and 6, indicating the presence of autocorrelation at these lags. The slightly sinusoidal pattern suggests possible seasonality in the data.

The PACF plot shows a significant spike at lag 1, indicating the presence of an autoregressive (AR) term. The spikes at lags 5 and 6 in the PACF plot suggest the need for additional autoregressive terms.

Based on these findings, an appropriate ARIMA model for the data could be ARIMA(p, 1, q), where p represents the order of the autoregressive component and q represents the order of the moving average component. In this case, a possible model specification could be ARIMA(2, 1, 0) taking into consideration the significant spikes in the ACF and PACF plots.

The Auto-Arima function reccommends an ARIMA(1,1,0) model.

Forecasts (Training Data)

## Series: value 
## Model: ARIMA(2,1,0) 
## 
## Coefficients:
##          ar1      ar2
##       0.5981  -0.1948
## s.e.  0.1380   0.1368
## 
## sigma^2 estimated as 41.06:  log likelihood=-176.09
## AIC=358.19   AICc=358.67   BIC=364.15
## Series: value 
## Model: ARIMA(1,1,1) 
## 
## Coefficients:
##          ar1     ma1
##       0.2103  0.4102
## s.e.  0.2139  0.1871
## 
## sigma^2 estimated as 40.49:  log likelihood=-175.73
## AIC=357.46   AICc=357.94   BIC=363.43
## Series: value 
## Model: ARIMA(1,1,1) 
## 
## Coefficients:
##          ar1     ma1
##       0.2103  0.4102
## s.e.  0.2139  0.1871
## 
## sigma^2 estimated as 40.49:  log likelihood=-175.73
## AIC=357.46   AICc=357.94   BIC=363.43
## Series: value 
## Model: ARIMA(3,1,0)(1,1,0)[12] 
## 
## Coefficients:
##          ar1     ar2     ar3     sar1
##       0.3484  -0.004  0.1715  -0.4409
## s.e.  0.1548   0.161  0.1506   0.1536
## 
## sigma^2 estimated as 33.77:  log likelihood=-132.82
## AIC=275.64   AICc=277.31   BIC=284.33

Given the training data, ARIMA210 & ARIMA111 are indistinguishable from one another. Moreover, while these models do not follow the trend, tand produce steady predictions. Yet, these models still predict better than ARIMA310–ARIMA310 flows the trend, but only captures observations in its confidence interval. ARIMA210, does not seem to account for seasonal changes, but is closest in its predictions to reality. We will analyze the Residual plots in the following section to determine the best model between ARIMA110 and ARIMA210. These results are interesting as ARIMA310 produced the lowest AIC results.

Residual Plots (Determining the best Model, Training Data)

ARIMA310 Residual Plot

ARIMA011 Residual Plot

All forecast models except ARIMA310, produces no white noise, thus given this we declare ARIMA310 as our employed forecast. Moreover, this Model had the lowest AIC.

## Series: value 
## Model: ARIMA(2,1,0) 
## 
## Coefficients:
##          ar1      ar2
##       0.5782  -0.1945
## s.e.  0.1181   0.1178
## 
## sigma^2 estimated as 34.4:  log likelihood=-215.93
## AIC=437.87   AICc=438.24   BIC=444.53
## Series: value 
## Model: ARIMA(1,1,1) 
## 
## Coefficients:
##          ar1     ma1
##       0.1809  0.4256
## s.e.  0.1870  0.1630
## 
## sigma^2 estimated as 33.83:  log likelihood=-215.39
## AIC=436.79   AICc=437.16   BIC=443.44
## Series: value 
## Model: ARIMA(0,1,1)(0,1,1)[12] 
## 
## Coefficients:
##          ma1     sma1
##       0.3213  -0.7442
## s.e.  0.1114   0.2769
## 
## sigma^2 estimated as 23.35:  log likelihood=-171.33
## AIC=348.66   AICc=349.12   BIC=354.74
## Series: value 
## Model: ARIMA(3,1,0)(1,1,0)[12] 
## 
## Coefficients:
##          ar1     ar2     ar3     sar1
##       0.3134  0.0447  0.1201  -0.3522
## s.e.  0.1343  0.1412  0.1333   0.1286
## 
## sigma^2 estimated as 29.43:  log likelihood=-172.96
## AIC=355.92   AICc=357.12   BIC=366.05

Forecasts

Preferred Forecast (ARIMA310)

Residual Plots (Full-Sample)

ARIMA011

ARIMA310 (Preferred Model)

  • White noise, however residual plot is more normal.

  • We employ ARIMA310 as our best model however, due to a lower AIC score and more normal Residual plot.

Broader Implications

Decreasing Market Hotness: The decreasing trend in market hotness indicates that demand for houses may be cooling off. This could be due to a variety of factors, such as changes in economic conditions, demographic shifts, or changes in housing preferences. A decrease in demand could potentially lead to a slowdown in housing price growth.

Housing Inventory & Market Hotness (VAR Tests)

In this section, we aim to test a Vector Autoregressive (VAR) model for housing inventory and market hotness to assess its potential accuracy in forecasting. We create a forecast from the training data and compare that to figures of the orignal plot.

Differenced Market Hotness With Inventory

From the differenced data, it almost seems that inventory is a leading indicator to market hotness.

## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: Market.Hotness_diff, Inventory_diff 
## Deterministic variables: const 
## Sample size: 45 
## Log Likelihood: -354.827 
## Roots of the characteristic polynomial:
## 0.9664 0.9664 0.9535 0.9535 0.9394 0.9394 0.931 0.9031 0.9031 0.8806 0.8806 0.8413 0.8413 0.8368 0.8306 0.8306 0.6934 0.6934
## Call:
## VAR(y = htrain_diff_ts, season = 12L, lag.max = 25, ic = "AIC")
## 
## 
## Estimation results for equation Market.Hotness_diff: 
## ==================================================== 
## Market.Hotness_diff = Market.Hotness_diff.l1 + Inventory_diff.l1 + Market.Hotness_diff.l2 + Inventory_diff.l2 + Market.Hotness_diff.l3 + Inventory_diff.l3 + Market.Hotness_diff.l4 + Inventory_diff.l4 + Market.Hotness_diff.l5 + Inventory_diff.l5 + Market.Hotness_diff.l6 + Inventory_diff.l6 + Market.Hotness_diff.l7 + Inventory_diff.l7 + Market.Hotness_diff.l8 + Inventory_diff.l8 + Market.Hotness_diff.l9 + Inventory_diff.l9 + const + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 + sd8 + sd9 + sd10 + sd11 
## 
##                          Estimate Std. Error t value Pr(>|t|)  
## Market.Hotness_diff.l1   0.015874   0.255349   0.062   0.9513  
## Inventory_diff.l1       -0.021062   0.009616  -2.190   0.0447 *
## Market.Hotness_diff.l2  -0.067856   0.247907  -0.274   0.7880  
## Inventory_diff.l2       -0.009493   0.011230  -0.845   0.4112  
## Market.Hotness_diff.l3   0.199276   0.227423   0.876   0.3947  
## Inventory_diff.l3        0.007725   0.013016   0.594   0.5617  
## Market.Hotness_diff.l4   0.131119   0.236942   0.553   0.5882  
## Inventory_diff.l4        0.007892   0.012791   0.617   0.5465  
## Market.Hotness_diff.l5   0.049986   0.242077   0.206   0.8392  
## Inventory_diff.l5       -0.009299   0.011859  -0.784   0.4452  
## Market.Hotness_diff.l6  -0.016349   0.238301  -0.069   0.9462  
## Inventory_diff.l6        0.005904   0.011326   0.521   0.6098  
## Market.Hotness_diff.l7  -0.416777   0.232112  -1.796   0.0927 .
## Inventory_diff.l7       -0.005552   0.011375  -0.488   0.6325  
## Market.Hotness_diff.l8  -0.163591   0.261774  -0.625   0.5414  
## Inventory_diff.l8       -0.005639   0.011808  -0.478   0.6398  
## Market.Hotness_diff.l9  -0.104552   0.267639  -0.391   0.7016  
## Inventory_diff.l9        0.004917   0.010296   0.478   0.6398  
## const                   -2.315223   1.244180  -1.861   0.0825 .
## sd1                     -0.359813   6.530320  -0.055   0.9568  
## sd2                     -6.316160   6.302976  -1.002   0.3322  
## sd3                     -5.709202   6.300315  -0.906   0.3792  
## sd4                     -3.602282   7.937789  -0.454   0.6565  
## sd5                    -15.417372   8.132583  -1.896   0.0774 .
## sd6                    -19.568380   8.356318  -2.342   0.0334 *
## sd7                    -13.486616   9.863323  -1.367   0.1917  
## sd8                     -0.681424   8.735584  -0.078   0.9389  
## sd9                    -12.697419   8.493672  -1.495   0.1557  
## sd10                    -3.636090   7.457130  -0.488   0.6329  
## sd11                    -0.557816   6.379090  -0.087   0.9315  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 4.29 on 15 degrees of freedom
## Multiple R-Squared: 0.8862,  Adjusted R-squared: 0.6661 
## F-statistic: 4.027 on 29 and 15 DF,  p-value: 0.003204 
## 
## 
## Estimation results for equation Inventory_diff: 
## =============================================== 
## Inventory_diff = Market.Hotness_diff.l1 + Inventory_diff.l1 + Market.Hotness_diff.l2 + Inventory_diff.l2 + Market.Hotness_diff.l3 + Inventory_diff.l3 + Market.Hotness_diff.l4 + Inventory_diff.l4 + Market.Hotness_diff.l5 + Inventory_diff.l5 + Market.Hotness_diff.l6 + Inventory_diff.l6 + Market.Hotness_diff.l7 + Inventory_diff.l7 + Market.Hotness_diff.l8 + Inventory_diff.l8 + Market.Hotness_diff.l9 + Inventory_diff.l9 + const + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 + sd8 + sd9 + sd10 + sd11 
## 
##                          Estimate Std. Error t value Pr(>|t|)  
## Market.Hotness_diff.l1    5.60459    7.33382   0.764   0.4566  
## Inventory_diff.l1         0.57073    0.27617   2.067   0.0565 .
## Market.Hotness_diff.l2   -4.10304    7.12009  -0.576   0.5730  
## Inventory_diff.l2        -0.02904    0.32254  -0.090   0.9294  
## Market.Hotness_diff.l3   -5.66835    6.53176  -0.868   0.3992  
## Inventory_diff.l3        -0.17924    0.37382  -0.479   0.6385  
## Market.Hotness_diff.l4    6.60098    6.80517   0.970   0.3474  
## Inventory_diff.l4         0.25763    0.36736   0.701   0.4938  
## Market.Hotness_diff.l5  -12.36564    6.95263  -1.779   0.0956 .
## Inventory_diff.l5        -0.44394    0.34061  -1.303   0.2121  
## Market.Hotness_diff.l6    2.22630    6.84420   0.325   0.7495  
## Inventory_diff.l6         0.12456    0.32528   0.383   0.7072  
## Market.Hotness_diff.l7    2.55275    6.66646   0.383   0.7071  
## Inventory_diff.l7         0.23310    0.32669   0.714   0.4865  
## Market.Hotness_diff.l8    2.93011    7.51835   0.390   0.7022  
## Inventory_diff.l8        -0.12397    0.33912  -0.366   0.7198  
## Market.Hotness_diff.l9    8.62885    7.68682   1.123   0.2793  
## Inventory_diff.l9         0.42326    0.29571   1.431   0.1728  
## const                     9.39673   35.73387   0.263   0.7962  
## sd1                     279.85369  187.55605   1.492   0.1564  
## sd2                     182.08627  181.02655   1.006   0.3304  
## sd3                     -71.08186  180.95012  -0.393   0.7000  
## sd4                    -104.34542  227.97971  -0.458   0.6537  
## sd5                     218.45464  233.57435   0.935   0.3645  
## sd6                     398.61489  240.00020   1.661   0.1175  
## sd7                     374.31098  283.28258   1.321   0.2062  
## sd8                     263.24700  250.89303   1.049   0.3107  
## sd9                     201.96667  243.94511   0.828   0.4207  
## sd10                    249.27230  214.17478   1.164   0.2627  
## sd11                    303.69407  183.21260   1.658   0.1182  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 123.2 on 15 degrees of freedom
## Multiple R-Squared: 0.8963,  Adjusted R-squared: 0.6957 
## F-statistic: 4.468 on 29 and 15 DF,  p-value: 0.001821 
## 
## 
## 
## Covariance matrix of residuals:
##                     Market.Hotness_diff Inventory_diff
## Market.Hotness_diff               18.41         -248.2
## Inventory_diff                  -248.24        15183.3
## 
## Correlation matrix of residuals:
##                     Market.Hotness_diff Inventory_diff
## Market.Hotness_diff              1.0000        -0.4696
## Inventory_diff                  -0.4696         1.0000