The purpose of this document is to present the findings of an empirical model on the determinants of house prices. I will format the document as to be readable for a general audience without excluding the key statistical operations. It will be necessary to have a suite of models for any meaningful analysis and this document will be the baseline. The intention being to be a worked through econometric example, not a crash course in econometrics.
Unreliability: Econometric approaches have been unable to reliably detect market disequilibrium. Some examples of this can be seen in section 2. For that reason it would be a stretch to assume the first iterations of such a model, developed for discussion as an introduction, to be overly insightful and reliable.
While generally a simpler model specification is preferred over a complicated one, the models in this document surely lack the complex to accurately predict disequilibrium. The result of the preferred specification here has the Irish house prices 13% above the model prediction. See Figure 4.1.
Going Forward an improved understanding is required. - Why that on one hand models are useful, but on the other there are clear and not so clear limitations. Moving closer to a specification of that developed by McQuinn in the ESRI as discussed in their Autumn Economic Outlook. - Where the model suggests Irish house prices were over-valued by approximately 7% at that time. Also prioritise applying Time Series methods to improve predictions and forecasting.
Analyzing the ratio of completions to planning from Model 11, the main shortcoming of which was that it was a yearly model, and did not have enough observations, the ratio was insignificant. - It does not seem plausible that a supply ratio is insignificant. Other specification issues aside, and prior to running another model I’ve dug deeper into the relationship between house prices and the completion to planning ratio, hereafter the supply ratio.
Figure 1.1: Supply Ratio - Clear disparity between houses & apartments
Figure 1.2: The resulting balance of units from the supply ratio
Viability issues suppresses the supply of apartments, a consequence of this is that the supply ratio does not reflect the true tightness in the market. The relatively high quantity of apartment stock in the pipeline that can be seen in Figure 1.2 is illusory.
What is the relationship between the tightness of supply for houses and the price of residential property?
The 12 month moving average (mean) was used in to answer this question, the code for this dataset accessed through PxStat on the CSO website is: HPM08.
Figure 1.3: The correlation between prices and the 12 month moving average supply ratio for houses
Correlation is measure from +1 to -1, the correlation (R value) between property prices and the supply ratio can be seen in Figure 1.3. The rule of thumb applied regularly in economics and the behavioural sciences is that anything above 0.7 is regarded as having a high correlation. Given that the correlation is positive the result is that tighter supply can be associated with higher prices. - Exactly as we would expect.
The supply ratio used here is the 12 month moving average in a given quarter. I’ve labelled the quarter of the first and last three observations, recall figure 1.1 for yearly change in ratio.
Measuring affordability in an empirical model is more difficult than would be assumed. An appropriate explanatory variable should not be in of itself a function of the independent variable.2 Given that mortgage repayments (possible explanatory variable) are a function of house prices (independent variable).
By no means an exhaustive view of the literature
“In general, the choice of a house price model and its empirical estimation is very much influenced by the quality and availability of data” - Bragoudakis, Bank of Greece
Bragoudakis et al (2016)
When housing is treated as a consumption good, it’s demand is generally a function of a number of variables such as:
As an investment good:
Credit for house price forecasting point to mixed results:
Supply-side factors are useful in forecasting house prices are primarily real construction costs and construction technology shocks.
Price momentum: price is effected by its lagged value. Price momentum present in the short-run & reversal in the long run
Sunega et al (2014)
Econometric models have produced contradictory results and have failed to provide warning of housing market crashed. The standard econometric approaches have been unable to reliably detect market disequilibrium - Sunega et al (2014)
Studies published up to 2007 tended to include that house prices for the most part were not too far from their fundamental / equilibrium value. Contemporaneously similar models were reaching the opposite conclusion. The authors provide two distinct but well specified models which produce different outcomes, even when using the same data. The authors attribute this variation as largely stemming from the interest rate variable.
Examples of such models to the literature regarding Irish data are McQuinn & O’Reilly (2006), Central Bank of Ireland, McQuinn (2017), ESRI and Roche, ESRI
Whittle (2014)
Some behavioural explanations which may effect house price bubbles.
Herd Behaviour
Amateurs vs Experts
Anchoring, Loss Aversion and Endowment Bias
Considering the findings in the literature discussed in the last section, its necessasy take care when interpretating of the results.
There will be various specifications of the following model:
\[ \operatorname{P} = \alpha + \beta_{1}(\operatorname{COMP}) + \beta_{2}(\operatorname{MPI}) + \beta_{3}(\operatorname{POP}) + \beta_{4}(\operatorname{SAVC}) + \beta_{5}(\operatorname{DISPC}) + \beta_{6}(\operatorname{UNEMP}) + \epsilon \]
Our independent variable is P, which is house prices. This is explained by the variables on the right hand side of the equation to some degree, which is represented by the Greek symbol beta. The alpha is what is known as the intercept and the epsilon, at the end of the equation is the error term. The variables will be tested to see if they are appropriate.
Figure 3.1: All property types included, that is a blend of new and secondhand homes
Naturally HPM05, which is mean value by month is not as smooth as the 12 month moving average which was used previously 1.3. - This will be the house price applied in the model.
The explanatory variables are detailed in the following subsections in order of appearance.
COMP
Dataset: NDQ01 which is the number of new dwelling completions by type of house.
Figure 3.2: New Dwelling Completions for Houses and Apartments - Seasonally Adjusted
MPI
Dataset: CPM03 which is the consumer price index by sub indices. Mortgage Interest is the filtered sub index.
Figure 3.3: This represents the average monthly percentage change in mortgage interest paid in a given quarter
Figure 3.4: This represents the average yearly percentage change in mortgage interest paid in a given quarter
POP
Dataset: PEA01 which is estimate population in April of a given year. This was interpolated and can be seen below.
Figure 3.5: Demographic variable consists of the age group typically in a position to purchase a home
Figure 3.6: Where necessary this series will be used to generate per capita values
SAVC
Dataset: ISQ03 which is the Quarterly Accounts at Current Market Prices Seasonally Adjusted, Institutional Sector filtered to “Households including NPISH (S.14+S.15)”. The Current Account variable was filtered to “Gross saving (B.8g)”.
Figure 3.7: Series must be adjusted to per capita basis
Figure 3.8: Value expressed in euro terms, denominator (population) 25 to 64 years
DISPC
Dataset: ISQ03. The Current Account variable was filtered to “Gross disposable income (B.6g)”. The population between 25 & 64 were used as per the POP variable previously.
Figure 3.9: Series must be adjusted to per capita basis
Figure 3.10: Value expressed in euro terms, denominator (population) 25 to 64 years
UNEMP
Dataset: LRM11 which is the number of people on the live register, the population between 25 & 64 were used as per the POP variable previously.
Figure 3.11: Series must be adjusted to per capita basis
Figure 3.12: High correlation between completions and house prices, higher prices being associated with higher completions. On the face of it this may be counterintuitive given supply constraints are usually associated with higher prices. Given how far prices and completions fell from peak, which was out of sample this relationship is less surprising. Completions alone may not be the best metric of housing supply
Figure 3.13: Low to moderate correlation between change in mortgage interest payments and house prices. Consider the largest increases took place in 2011 while house prices were in decline post GFC. These increases immediately preceeded the ECB cutting rates toward zero and later going negative. Throughout the sample the increases clustered around zero and negative. The low rate environment of the last decade drove this trend.
Figure 3.14: Strong correlation between house prices and estimate population
Figure 3.15: The correlation is slightly short of the 0.7 threshold for a strong correlation, which may (again) on first inspection seem surprising. However the 10 highest observations for savings per capita all came post pandemic which as we well aware was ‘unprecedented’
Figure 3.16: Strong positive correlation as would be expected, as previously notes the top 10 observations came post pandemic.
Figure 3.17: Strong negative correlation between house prices and persons on the live register
Recall Figure 3.8
Figure 3.18: The above matrix is separated into three portions, The diagonal density plots which illustrates the density of observations for each of the explanatory (independent variables), this separates the: Correlation value for each pair in the top right of the plot and the scatter plot on the bottom left.
In keeping with the objective of having this document readable to a general audience the statistical explantions will not be exhaustive. Here is a useful quote from the book Statistics Done Wrong by Alex Reinhart: “If you want to prove that your drug works, you do so by showing the data is inconsistent with the drug not working… Remember, p is a measure of surprise, with a smaller value suggesting that you should be more surprised”.
Year_Q | price | COMP | MPI | POP | SAVC | DISPC | UNEMP |
---|---|---|---|---|---|---|---|
2011 Q1 | 246445.3 | 2205 | 2.2333333 | 2473920 | 875.5337 | 8713.299 | 360836.3 |
2011 Q2 | 227540.7 | 1900 | 1.2666667 | 2479066 | 733.7441 | 8577.424 | 363691.0 |
2011 Q3 | 228576.0 | 1682 | 2.0333333 | 2484212 | 828.0290 | 8519.803 | 372824.0 |
2011 Q4 | 212705.3 | 1368 | -0.9666667 | 2487238 | 1006.3371 | 8667.043 | 355724.0 |
2012 Q1 | 201048.7 | 1336 | -2.9333333 | 2487758 | 1152.0414 | 8798.686 | 362753.0 |
2012 Q2 | 196781.7 | 1180 | -0.4333333 | 2488277 | 924.3343 | 8650.965 | 363538.3 |
2012 Q3 | 213900.0 | 1198 | -1.8666667 | 2488797 | 978.7862 | 8830.773 | 369714.3 |
2012 Q4 | 199807.0 | 1221 | 0.0333333 | 2489018 | 1042.9818 | 8822.357 | 351946.7 |
2013 Q1 | 190612.0 | 1067 | -0.1666667 | 2488820 | 813.2369 | 8700.510 | 359439.7 |
2013 Q2 | 198411.0 | 1176 | -0.7000000 | 2488622 | 840.2241 | 8702.809 | 356688.7 |
2013 Q3 | 210688.3 | 1023 | -0.8333333 | 2488424 | 759.9188 | 8712.343 | 358086.7 |
2013 Q4 | 216347.7 | 1272 | -1.2333333 | 2488569 | 945.1214 | 8898.286 | 333687.3 |
2014 Q1 | 197728.0 | 1260 | -1.0333333 | 2489287 | 844.4186 | 8792.077 | 336137.0 |
2014 Q2 | 210135.0 | 1399 | -0.4666667 | 2490004 | 626.9066 | 8769.463 | 332753.7 |
2014 Q3 | 229335.3 | 1389 | -0.8333333 | 2490722 | 609.8634 | 8829.168 | 331164.7 |
2014 Q4 | 217916.7 | 1451 | -1.3666667 | 2492272 | 682.1085 | 9074.049 | 306164.0 |
2015 Q1 | 213758.0 | 1597 | -0.1000000 | 2495489 | 804.6520 | 9145.303 | 305634.0 |
2015 Q2 | 218073.3 | 1642 | -1.0666667 | 2498705 | 915.2741 | 9307.621 | 300837.0 |
2015 Q3 | 240377.0 | 2003 | -0.3666667 | 2501922 | 826.1650 | 9334.025 | 303140.3 |
2015 Q4 | 231720.3 | 1920 | -0.8000000 | 2506026 | 636.4659 | 9260.080 | 279733.0 |
2016 Q1 | 236915.7 | 2350 | -0.4000000 | 2512286 | 599.0561 | 9481.008 | 279483.7 |
2016 Q2 | 242098.3 | 2406 | -0.6000000 | 2518545 | 698.8161 | 9559.883 | 271309.7 |
2016 Q3 | 257509.0 | 2436 | -0.4333333 | 2524805 | 992.5518 | 9782.933 | 269092.3 |
2016 Q4 | 257258.0 | 2584 | 0.0666667 | 2530818 | 698.5885 | 9679.086 | 244388.7 |
2017 Q1 | 261512.7 | 3091 | -0.0666667 | 2536088 | 1022.8353 | 10014.638 | 241822.0 |
2017 Q2 | 265434.3 | 3520 | 0.0000000 | 2541358 | 1065.1787 | 10048.172 | 234378.3 |
2017 Q3 | 281928.3 | 3668 | 0.0333333 | 2546628 | 1196.8769 | 10243.350 | 230189.7 |
2017 Q4 | 281432.7 | 3909 | -0.2666667 | 2551764 | 1036.5379 | 10304.637 | 209970.0 |
2018 Q1 | 285160.3 | 4202 | -0.1000000 | 2556391 | 899.7060 | 10267.210 | 209595.7 |
2018 Q2 | 286976.3 | 4363 | 0.2000000 | 2561018 | 1028.1070 | 10485.676 | 200321.7 |
2018 Q3 | 301115.0 | 4566 | 0.1000000 | 2565644 | 1137.3361 | 10621.114 | 197014.0 |
2018 Q4 | 290483.0 | 4621 | 0.2333333 | 2570684 | 1111.3776 | 10739.556 | 178063.0 |
2019 Q1 | 289525.0 | 4799 | 0.3666667 | 2577785 | 1165.3418 | 11009.842 | 175591.0 |
2019 Q2 | 289774.0 | 5192 | 0.1000000 | 2584886 | 1194.6370 | 11049.232 | 172749.7 |
2019 Q3 | 303818.0 | 5633 | 0.4000000 | 2591987 | 1197.9229 | 11047.511 | 174656.3 |
2019 Q4 | 298901.0 | 5402 | 0.2333333 | 2598952 | 1265.5103 | 11252.999 | 161420.0 |
2020 Q1 | 291812.3 | 5618 | 0.2666667 | 2604964 | 2693.3195 | 12109.573 | 169620.0 |
2020 Q2 | 283920.7 | 3488 | 0.1000000 | 2610976 | 4091.5730 | 11777.204 | 192160.0 |
2020 Q3 | 295987.3 | 5079 | 0.3666667 | 2616989 | 2481.8602 | 11437.573 | 196362.0 |
2020 Q4 | 310155.3 | 6092 | -0.0333333 | 2622822 | 2894.5925 | 11694.276 | 172666.3 |
2021 Q1 | 311370.0 | 4753 | 0.2333333 | 2626681 | 4006.5764 | 12253.865 | 165420.0 |
2021 Q2 | 315762.0 | 5133 | 0.2666667 | 2630541 | 3040.8191 | 12344.608 | 155450.0 |
2021 Q3 | 336212.0 | 4729 | 0.3000000 | 2634401 | 2830.6245 | 12622.983 | 155903.7 |
2021 Q4 | 339565.0 | 5723 | 0.3666667 | 2638523 | 2369.8868 | 12585.831 | 147805.7 |
2022 Q1 | 340572.7 | 6447 | 0.3666667 | 2648667 | 2678.7059 | 12696.575 | 151299.7 |
2022 Q2 | 347849.0 | 8245 | 0.3666667 | 2658811 | 2799.3712 | 12905.391 | 160827.7 |
2022 Q3 | 370965.3 | 7681 | 1.3666667 | 2668956 | 2653.8470 | 12979.609 | 170403.7 |
The below table is the results of an OLS regression: Simple/Multiple linear regression. This is the method used to estimate the coefficients (the Greek letter beta from the equation in section 3), which describe the relationship between that explanatory variables and the dependent variables.
Dependent variable: | |
price | |
COMP | 6.039* |
(3.542) | |
MPI | 6,720.093*** |
(2,042.728) | |
POP | 0.096 |
(0.258) | |
SAVC | -8.919* |
(4.714) | |
DISPC | 21.301** |
(9.959) | |
UNEMP | -0.040 |
(0.070) | |
Constant | -196,485.000 |
(587,797.800) | |
Observations | 47 |
R2 | 0.960 |
Adjusted R2 | 0.954 |
Residual Std. Error | 10,225.460 (df = 40) |
F Statistic | 161.551*** (df = 6; 40) |
Note: | p<0.1; p<0.05; p<0.01 |
When interpreting regression coefficients the results show the change in the dependent variable given a unit change in the explanatory variable, all else equal. The asterix indicates statistical significance. p<0.05 is the typical level considered significant (95% confidence) in the social sciences, with anything lower also indicating significant at a higher confidence level.
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## studentized Breusch-Pagan test
##
## data: BASE
## BP = 7.0743, df = 6, p-value = 0.314
From the above there is a clear multicollinearity problem, meaning that several of the indepdendet variables in the model are correlated. As a result disposable income (DISPC) and population (POP), which are above the threshold will be dropped. See Figure 3.18 - Disposable income and Population are highly correlated with eachother and with unemployment.
Inspecting the plot of residuals it is not obvious that there is not a heteroscedasticity issue. A heteroscedasticity can occur when the standard deviation of a predicted variable varies over time.
The interpretation of the Breusch-Pagan test for heteroscedasticity is that due to the p-value being insignificant (i.e., >0.05), we do not reject the null hypothesis of homoscedasticity. This double negative can be confusing, but in simple terms the test could be said to have passed.
Dependent variable: | |
price | |
COMP | 13.455*** |
(2.387) | |
MPI | 5,674.707** |
(2,202.194) | |
SAVC | 4.172* |
(2.445) | |
UNEMP | -0.200*** |
(0.054) | |
Constant | 263,042.700*** |
(21,501.430) | |
Observations | 47 |
R2 | 0.950 |
Adjusted R2 | 0.945 |
Residual Std. Error | 11,237.460 (df = 42) |
F Statistic | 198.425*** (df = 4; 42) |
Note: | p<0.1; p<0.05; p<0.01 |
[[1]]
[[2]]
[[3]]
[[4]]
studentized Breusch-Pagan test
data: SPEC2
BP = 4.3213, df = 4, p-value = 0.3643
As we can see Specification 2 is suitable.
In general, log specifications are used to interpret price elasticities. The coefficient of a log explanatory variable can be interpreted as the percentage change of the log dependent variable.4
Dependent variable: | |
log(price) | |
log(COMP) | 0.209*** |
(0.039) | |
MPI | 0.012 |
(0.009) | |
log(SAVC) | 0.027 |
(0.017) | |
log(UNEMP) | -0.095 |
(0.076) | |
Constant | 11.788*** |
(1.268) | |
Observations | 47 |
R2 | 0.956 |
Adjusted R2 | 0.952 |
Residual Std. Error | 0.040 (df = 42) |
F Statistic | 227.508*** (df = 4; 42) |
Note: | p<0.1; p<0.05; p<0.01 |
[[1]]
[[2]]
[[3]]
[[4]]
As we can see there is a multicollinearity problem. From our earlier assertion (Figure 3.12) completions alone may not be representative of supply. The supply ratio for houses (SRH) may be a better measure as discussed from 1.1. So before removing the log of unemployment from the model I will include SRH.
The SRH is ratio for a given quarter, not the moving average. - To preserve observations given the completions data begin from 2011. SRH.6 and SRH.12 are the 6 and 12 month moving average which necessarily drops some observations.
Dependent variable: | |||
log(price) | |||
(1) | (2) | (3) | |
SRH | 0.041 | ||
(0.051) | |||
SRH.6 | 0.035 | ||
(0.078) | |||
SRH.12 | -0.082 | ||
(0.128) | |||
MPI | 0.040*** | 0.032*** | 0.014 |
(0.009) | (0.011) | (0.016) | |
log(SAVC) | 0.015 | 0.013 | 0.027 |
(0.022) | (0.025) | (0.029) | |
log(UNEMP) | -0.464*** | -0.483*** | -0.533*** |
(0.038) | (0.040) | (0.049) | |
Constant | 18.079*** | 18.333*** | 18.939*** |
(0.583) | (0.602) | (0.705) | |
Observations | 47 | 46 | 44 |
R2 | 0.927 | 0.929 | 0.931 |
Adjusted R2 | 0.920 | 0.922 | 0.924 |
Residual Std. Error | 0.052 (df = 42) | 0.052 (df = 41) | 0.051 (df = 39) |
F Statistic | 132.820*** (df = 4; 42) | 133.748*** (df = 4; 41) | 132.432*** (df = 4; 39) |
Note: | p<0.1; p<0.05; p<0.01 |
“All models are wrong but some are useful.” - George Box, Statistician.
Using tslm() function to fit a linear model with time series components produces the fitted value.
The variation from fitted value can be inferred as prices being above or below “equilibrium”, as defined by the model specification. It does not mean that prices will return to this equilibrium or mean that this fitted value is optimal.
Figure 4.1: The fitted model closely tracks the actual house price value. There has been a divergence to the latest period.
Clear shortfalls of the model:
In the latest period on the preferred specification actual house prices are over-valued by 13%.
Figure 4.2: Expressing the variance between the actual and fitted values as a percentage of the fitted values shows that the actual price varies from being ‘under’ and ‘over’ valued across the sample period. This is what we would expect, however whether this is representative of the market dynamic remains to be seen.
The very next step is to develop various other specifications and analyse the results over the same time horizon. Using proxys for some of the data series which date back further than 2012 should give some insight to house prices responsiveness to credit conditions in different economic cycles.
A model I was working on which I did not circulate.↩︎
An independent variable is the key variable of interest in an econometric model. We want to explain the independent variable given a series of well specified dependent variables↩︎
The relationship between a series and its lags↩︎
A log transformation would also be commonly applied when dealing with heteroscedasticity↩︎