The purpose of this document is to present the findings of an empirical model on the determinants of house prices. I will format the document as to be readable for a general audience without excluding the key statistical operations. It will be necessary to have a suite of models for any meaningful analysis and this document will be the baseline. The intention being to be a worked through econometric example, not a crash course in econometrics.


Unreliability: Econometric approaches have been unable to reliably detect market disequilibrium. Some examples of this can be seen in section 2. For that reason it would be a stretch to assume the first iterations of such a model, developed for discussion as an introduction, to be overly insightful and reliable.

While generally a simpler model specification is preferred over a complicated one, the models in this document surely lack the complex to accurately predict disequilibrium. The result of the preferred specification here has the Irish house prices 13% above the model prediction. See Figure 4.1.

Going Forward an improved understanding is required. - Why that on one hand models are useful, but on the other there are clear and not so clear limitations. Moving closer to a specification of that developed by McQuinn in the ESRI as discussed in their Autumn Economic Outlook. - Where the model suggests Irish house prices were over-valued by approximately 7% at that time. Also prioritise applying Time Series methods to improve predictions and forecasting.

1 Prerequisites

Analyzing the ratio of completions to planning from Model 11, the main shortcoming of which was that it was a yearly model, and did not have enough observations, the ratio was insignificant. - It does not seem plausible that a supply ratio is insignificant. Other specification issues aside, and prior to running another model I’ve dug deeper into the relationship between house prices and the completion to planning ratio, hereafter the supply ratio.

Supply Ratio - Clear disparity between houses & apartments

Figure 1.1: Supply Ratio - Clear disparity between houses & apartments

The resulting balance of units from the supply ratio

Figure 1.2: The resulting balance of units from the supply ratio

Viability issues suppresses the supply of apartments, a consequence of this is that the supply ratio does not reflect the true tightness in the market. The relatively high quantity of apartment stock in the pipeline that can be seen in Figure 1.2 is illusory.

1.1 Supply Ratio & Prices

What is the relationship between the tightness of supply for houses and the price of residential property?

The 12 month moving average (mean) was used in to answer this question, the code for this dataset accessed through PxStat on the CSO website is: HPM08.

The correlation between prices and the 12 month moving average supply ratio for houses

Figure 1.3: The correlation between prices and the 12 month moving average supply ratio for houses

Correlation is measure from +1 to -1, the correlation (R value) between property prices and the supply ratio can be seen in Figure 1.3. The rule of thumb applied regularly in economics and the behavioural sciences is that anything above 0.7 is regarded as having a high correlation. Given that the correlation is positive the result is that tighter supply can be associated with higher prices. - Exactly as we would expect.

The supply ratio used here is the 12 month moving average in a given quarter. I’ve labelled the quarter of the first and last three observations, recall figure 1.1 for yearly change in ratio.

1.2 Other Considerations

Measuring affordability in an empirical model is more difficult than would be assumed. An appropriate explanatory variable should not be in of itself a function of the independent variable.2 Given that mortgage repayments (possible explanatory variable) are a function of house prices (independent variable).

2 Literature

By no means an exhaustive view of the literature

“In general, the choice of a house price model and its empirical estimation is very much influenced by the quality and availability of data” - Bragoudakis, Bank of Greece

Bragoudakis et al (2016)

When housing is treated as a consumption good, it’s demand is generally a function of a number of variables such as:

  • Household income
  • Interest rates
  • Financial wealth or demographic and labour market factors

As an investment good:

  • Risk-free rate or alternative incestment returns
  • Housing stock premium
  • Housing services - generally proxied by the rental yield of a property

Credit for house price forecasting point to mixed results:

  • Generally credit availability is found to have a strong positive effect on house prices
    • Not straightforward: credit is statistically insignificant in the short and medium run, but remains a key determinant of house prices in the long-run Annett, 2005.
    • Multi-directional causality between house prices and credit Goodhart and Hofmann i.e. Money growth has a significant effect on house prices and credit, and credit influences money and house prices and house prices influence both credit and money
    • Some evidence that the impact of private credit on house prices is stronger in a boom
    • Credit and money aggregates lead developments in house prices
    • Asymmetrically effects through the cycle:
      • Demographic variables
      • Unemployment rate
      • Disposable income
      • Debt-to-income ratio

Supply-side factors are useful in forecasting house prices are primarily real construction costs and construction technology shocks.

Price momentum: price is effected by its lagged value. Price momentum present in the short-run & reversal in the long run

  • Fundamental variables alone are typically not enough to explain house prices
  • Theoretical models may explain momentum:
    • Irrational exuberance and unrealistic expectations.
    • Risk-shifting behaviour by banks Allen and Gale, 2000
    • Procyclical behaviour
    • Down payment constraints in sellers reservation prices Stein, 1995
  • The autocorrelation structure3 is typically found to be market specific and to differ across countries

Sunega et al (2014)

Econometric models have produced contradictory results and have failed to provide warning of housing market crashed. The standard econometric approaches have been unable to reliably detect market disequilibrium - Sunega et al (2014)

Studies published up to 2007 tended to include that house prices for the most part were not too far from their fundamental / equilibrium value. Contemporaneously similar models were reaching the opposite conclusion. The authors provide two distinct but well specified models which produce different outcomes, even when using the same data. The authors attribute this variation as largely stemming from the interest rate variable.

Examples of such models to the literature regarding Irish data are McQuinn & O’Reilly (2006), Central Bank of Ireland, McQuinn (2017), ESRI and Roche, ESRI

Whittle (2014)

Some behavioural explanations which may effect house price bubbles.

Herd Behaviour

  • Morone & Samanidou (2008) showed an individual making a decision is likely to override the private information they hold, to conform to a popular trend of thinking.

Amateurs vs Experts

  • Behavioural biases impact negotiation of a selling price. Asking prices, however erroneous will affect a buyers judgement Diaz & Black, 1996. Individuals do not use sound economic valuations, and overemphasize the potentially arbitrary reference point (asking price).
  • Amateur investors are quicker to admit to biases, experts are more likely to continue to endorse a flawed estimation.

Anchoring, Loss Aversion and Endowment Bias

  • Disposition effect: “Investors are risk averse when in profit and risk loving when in a loss” - DeWeaver & Shannon, 2010. Consider the money illusion, where inflation devalues nominal prices. Home owners are unlikely to sell their property at a nominal loss but would sell at a nominal gain, even if it that would be a loss in real terms.

3 Linear Regression Model

Considering the findings in the literature discussed in the last section, its necessasy take care when interpretating of the results.

3.1 Baseline Specification

There will be various specifications of the following model:

\[ \operatorname{P} = \alpha + \beta_{1}(\operatorname{COMP}) + \beta_{2}(\operatorname{MPI}) + \beta_{3}(\operatorname{POP}) + \beta_{4}(\operatorname{SAVC}) + \beta_{5}(\operatorname{DISPC}) + \beta_{6}(\operatorname{UNEMP}) + \epsilon \]

Our independent variable is P, which is house prices. This is explained by the variables on the right hand side of the equation to some degree, which is represented by the Greek symbol beta. The alpha is what is known as the intercept and the epsilon, at the end of the equation is the error term. The variables will be tested to see if they are appropriate.

3.1.1 House Prices

All property types included, that is a blend of new and secondhand homes

Figure 3.1: All property types included, that is a blend of new and secondhand homes

Naturally HPM05, which is mean value by month is not as smooth as the 12 month moving average which was used previously 1.3. - This will be the house price applied in the model.

The explanatory variables are detailed in the following subsections in order of appearance.

3.2 Variables

3.2.1 Completions

COMP

Dataset: NDQ01 which is the number of new dwelling completions by type of house.

New Dwelling Completions for Houses and Apartments - Seasonally Adjusted

Figure 3.2: New Dwelling Completions for Houses and Apartments - Seasonally Adjusted

3.2.2 Mortgage Interest

MPI

Dataset: CPM03 which is the consumer price index by sub indices. Mortgage Interest is the filtered sub index.

This represents the average monthly percentage change in mortgage interest paid in a given quarter

Figure 3.3: This represents the average monthly percentage change in mortgage interest paid in a given quarter

This represents the average yearly percentage change in mortgage interest paid in a given quarter

Figure 3.4: This represents the average yearly percentage change in mortgage interest paid in a given quarter

3.2.3 Population

POP

Dataset: PEA01 which is estimate population in April of a given year. This was interpolated and can be seen below.

Demographic variable consists of the age group typically in a position to purchase a home

Figure 3.5: Demographic variable consists of the age group typically in a position to purchase a home

Where necessary this series will be used to generate per capita values

Figure 3.6: Where necessary this series will be used to generate per capita values

3.2.4 Savings

SAVC

Dataset: ISQ03 which is the Quarterly Accounts at Current Market Prices Seasonally Adjusted, Institutional Sector filtered to “Households including NPISH (S.14+S.15)”. The Current Account variable was filtered to “Gross saving (B.8g)”.

Series must be adjusted to per capita basis

Figure 3.7: Series must be adjusted to per capita basis

Value expressed in euro terms, denominator (population) 25 to 64 years

Figure 3.8: Value expressed in euro terms, denominator (population) 25 to 64 years

3.2.5 Disposable Income

DISPC

Dataset: ISQ03. The Current Account variable was filtered to “Gross disposable income (B.6g)”. The population between 25 & 64 were used as per the POP variable previously.

Series must be adjusted to per capita basis

Figure 3.9: Series must be adjusted to per capita basis

Value expressed in euro terms, denominator (population) 25 to 64 years

Figure 3.10: Value expressed in euro terms, denominator (population) 25 to 64 years

3.2.6 Unemployment

UNEMP

Dataset: LRM11 which is the number of people on the live register, the population between 25 & 64 were used as per the POP variable previously.

Series must be adjusted to per capita basis

Figure 3.11: Series must be adjusted to per capita basis

3.3 Correlation

3.3.1 Dependent & Independent

High correlation between completions and house prices, higher prices being associated with higher completions. On the face of it this may be counterintuitive given supply constraints are usually associated with higher prices. Given how far prices and completions fell from peak, which was out of sample this relationship is less surprising. Completions alone may not be the best metric of housing supply

Figure 3.12: High correlation between completions and house prices, higher prices being associated with higher completions. On the face of it this may be counterintuitive given supply constraints are usually associated with higher prices. Given how far prices and completions fell from peak, which was out of sample this relationship is less surprising. Completions alone may not be the best metric of housing supply

Low to moderate correlation between change in mortgage interest payments and house prices. Consider the largest increases took place in 2011 while house prices were in decline post GFC. These increases immediately preceeded the ECB cutting rates toward zero and later going negative. Throughout the sample the increases clustered around zero and negative. The low rate environment of the last decade drove this trend.

Figure 3.13: Low to moderate correlation between change in mortgage interest payments and house prices. Consider the largest increases took place in 2011 while house prices were in decline post GFC. These increases immediately preceeded the ECB cutting rates toward zero and later going negative. Throughout the sample the increases clustered around zero and negative. The low rate environment of the last decade drove this trend.

Strong correlation between house prices and estimate population

Figure 3.14: Strong correlation between house prices and estimate population

The correlation is slightly short of the 0.7 threshold for a strong correlation, which may (again) on first inspection seem surprising. However the 10 highest observations for savings per capita all came post pandemic which as we well aware was 'unprecedented'

Figure 3.15: The correlation is slightly short of the 0.7 threshold for a strong correlation, which may (again) on first inspection seem surprising. However the 10 highest observations for savings per capita all came post pandemic which as we well aware was ‘unprecedented’

Strong positive correlation as would be expected, as previously notes the top 10 observations came post pandemic.

Figure 3.16: Strong positive correlation as would be expected, as previously notes the top 10 observations came post pandemic.

Strong negative correlation between house prices and persons on the live register

Figure 3.17: Strong negative correlation between house prices and persons on the live register

Recall Figure 3.8

3.3.2 Independent & Independent

The above matrix is separated into three portions, The diagonal density plots which illustrates the density of observations for each of the explanatory (independent variables), this separates the: Correlation value for each pair in the top right of the plot and the scatter plot on the bottom left.

Figure 3.18: The above matrix is separated into three portions, The diagonal density plots which illustrates the density of observations for each of the explanatory (independent variables), this separates the: Correlation value for each pair in the top right of the plot and the scatter plot on the bottom left.

3.4 Results

In keeping with the objective of having this document readable to a general audience the statistical explantions will not be exhaustive. Here is a useful quote from the book Statistics Done Wrong by Alex Reinhart: “If you want to prove that your drug works, you do so by showing the data is inconsistent with the drug not working… Remember, p is a measure of surprise, with a smaller value suggesting that you should be more surprised”.

3.4.1 Base Specification

Table 3.1: Model Dataframe
Year_Q price COMP MPI POP SAVC DISPC UNEMP
2011 Q1 246445.3 2205 2.2333333 2473920 875.5337 8713.299 360836.3
2011 Q2 227540.7 1900 1.2666667 2479066 733.7441 8577.424 363691.0
2011 Q3 228576.0 1682 2.0333333 2484212 828.0290 8519.803 372824.0
2011 Q4 212705.3 1368 -0.9666667 2487238 1006.3371 8667.043 355724.0
2012 Q1 201048.7 1336 -2.9333333 2487758 1152.0414 8798.686 362753.0
2012 Q2 196781.7 1180 -0.4333333 2488277 924.3343 8650.965 363538.3
2012 Q3 213900.0 1198 -1.8666667 2488797 978.7862 8830.773 369714.3
2012 Q4 199807.0 1221 0.0333333 2489018 1042.9818 8822.357 351946.7
2013 Q1 190612.0 1067 -0.1666667 2488820 813.2369 8700.510 359439.7
2013 Q2 198411.0 1176 -0.7000000 2488622 840.2241 8702.809 356688.7
2013 Q3 210688.3 1023 -0.8333333 2488424 759.9188 8712.343 358086.7
2013 Q4 216347.7 1272 -1.2333333 2488569 945.1214 8898.286 333687.3
2014 Q1 197728.0 1260 -1.0333333 2489287 844.4186 8792.077 336137.0
2014 Q2 210135.0 1399 -0.4666667 2490004 626.9066 8769.463 332753.7
2014 Q3 229335.3 1389 -0.8333333 2490722 609.8634 8829.168 331164.7
2014 Q4 217916.7 1451 -1.3666667 2492272 682.1085 9074.049 306164.0
2015 Q1 213758.0 1597 -0.1000000 2495489 804.6520 9145.303 305634.0
2015 Q2 218073.3 1642 -1.0666667 2498705 915.2741 9307.621 300837.0
2015 Q3 240377.0 2003 -0.3666667 2501922 826.1650 9334.025 303140.3
2015 Q4 231720.3 1920 -0.8000000 2506026 636.4659 9260.080 279733.0
2016 Q1 236915.7 2350 -0.4000000 2512286 599.0561 9481.008 279483.7
2016 Q2 242098.3 2406 -0.6000000 2518545 698.8161 9559.883 271309.7
2016 Q3 257509.0 2436 -0.4333333 2524805 992.5518 9782.933 269092.3
2016 Q4 257258.0 2584 0.0666667 2530818 698.5885 9679.086 244388.7
2017 Q1 261512.7 3091 -0.0666667 2536088 1022.8353 10014.638 241822.0
2017 Q2 265434.3 3520 0.0000000 2541358 1065.1787 10048.172 234378.3
2017 Q3 281928.3 3668 0.0333333 2546628 1196.8769 10243.350 230189.7
2017 Q4 281432.7 3909 -0.2666667 2551764 1036.5379 10304.637 209970.0
2018 Q1 285160.3 4202 -0.1000000 2556391 899.7060 10267.210 209595.7
2018 Q2 286976.3 4363 0.2000000 2561018 1028.1070 10485.676 200321.7
2018 Q3 301115.0 4566 0.1000000 2565644 1137.3361 10621.114 197014.0
2018 Q4 290483.0 4621 0.2333333 2570684 1111.3776 10739.556 178063.0
2019 Q1 289525.0 4799 0.3666667 2577785 1165.3418 11009.842 175591.0
2019 Q2 289774.0 5192 0.1000000 2584886 1194.6370 11049.232 172749.7
2019 Q3 303818.0 5633 0.4000000 2591987 1197.9229 11047.511 174656.3
2019 Q4 298901.0 5402 0.2333333 2598952 1265.5103 11252.999 161420.0
2020 Q1 291812.3 5618 0.2666667 2604964 2693.3195 12109.573 169620.0
2020 Q2 283920.7 3488 0.1000000 2610976 4091.5730 11777.204 192160.0
2020 Q3 295987.3 5079 0.3666667 2616989 2481.8602 11437.573 196362.0
2020 Q4 310155.3 6092 -0.0333333 2622822 2894.5925 11694.276 172666.3
2021 Q1 311370.0 4753 0.2333333 2626681 4006.5764 12253.865 165420.0
2021 Q2 315762.0 5133 0.2666667 2630541 3040.8191 12344.608 155450.0
2021 Q3 336212.0 4729 0.3000000 2634401 2830.6245 12622.983 155903.7
2021 Q4 339565.0 5723 0.3666667 2638523 2369.8868 12585.831 147805.7
2022 Q1 340572.7 6447 0.3666667 2648667 2678.7059 12696.575 151299.7
2022 Q2 347849.0 8245 0.3666667 2658811 2799.3712 12905.391 160827.7
2022 Q3 370965.3 7681 1.3666667 2668956 2653.8470 12979.609 170403.7


The below table is the results of an OLS regression: Simple/Multiple linear regression. This is the method used to estimate the coefficients (the Greek letter beta from the equation in section 3), which describe the relationship between that explanatory variables and the dependent variables.

Base Specification - Results
Dependent variable:
price
COMP 6.039*
(3.542)
MPI 6,720.093***
(2,042.728)
POP 0.096
(0.258)
SAVC -8.919*
(4.714)
DISPC 21.301**
(9.959)
UNEMP -0.040
(0.070)
Constant -196,485.000
(587,797.800)
Observations 47
R2 0.960
Adjusted R2 0.954
Residual Std. Error 10,225.460 (df = 40)
F Statistic 161.551*** (df = 6; 40)
Note: p<0.1; p<0.05; p<0.01


When interpreting regression coefficients the results show the change in the dependent variable given a unit change in the explanatory variable, all else equal. The asterix indicates statistical significance. p<0.05 is the typical level considered significant (95% confidence) in the social sciences, with anything lower also indicating significant at a higher confidence level.

3.4.1.1 Tests

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
##  studentized Breusch-Pagan test
## 
## data:  BASE
## BP = 7.0743, df = 6, p-value = 0.314

From the above there is a clear multicollinearity problem, meaning that several of the indepdendet variables in the model are correlated. As a result disposable income (DISPC) and population (POP), which are above the threshold will be dropped. See Figure 3.18 - Disposable income and Population are highly correlated with eachother and with unemployment.

Inspecting the plot of residuals it is not obvious that there is not a heteroscedasticity issue. A heteroscedasticity can occur when the standard deviation of a predicted variable varies over time.

The interpretation of the Breusch-Pagan test for heteroscedasticity is that due to the p-value being insignificant (i.e., >0.05), we do not reject the null hypothesis of homoscedasticity. This double negative can be confusing, but in simple terms the test could be said to have passed.

Specification 2 - Results
Dependent variable:
price
COMP 13.455***
(2.387)
MPI 5,674.707**
(2,202.194)
SAVC 4.172*
(2.445)
UNEMP -0.200***
(0.054)
Constant 263,042.700***
(21,501.430)
Observations 47
R2 0.950
Adjusted R2 0.945
Residual Std. Error 11,237.460 (df = 42)
F Statistic 198.425*** (df = 4; 42)
Note: p<0.1; p<0.05; p<0.01

[[1]] [[2]] [[3]] [[4]]

studentized Breusch-Pagan test
data: SPEC2 BP = 4.3213, df = 4, p-value = 0.3643

As we can see Specification 2 is suitable.

3.4.2 Log Specification

In general, log specifications are used to interpret price elasticities. The coefficient of a log explanatory variable can be interpreted as the percentage change of the log dependent variable.4

Log Specification 2 - Results
Dependent variable:
log(price)
log(COMP) 0.209***
(0.039)
MPI 0.012
(0.009)
log(SAVC) 0.027
(0.017)
log(UNEMP) -0.095
(0.076)
Constant 11.788***
(1.268)
Observations 47
R2 0.956
Adjusted R2 0.952
Residual Std. Error 0.040 (df = 42)
F Statistic 227.508*** (df = 4; 42)
Note: p<0.1; p<0.05; p<0.01

3.4.2.1 Tests

[[1]] [[2]] [[3]] [[4]]

As we can see there is a multicollinearity problem. From our earlier assertion (Figure 3.12) completions alone may not be representative of supply. The supply ratio for houses (SRH) may be a better measure as discussed from 1.1. So before removing the log of unemployment from the model I will include SRH.

3.4.3 Specification B

The SRH is ratio for a given quarter, not the moving average. - To preserve observations given the completions data begin from 2011. SRH.6 and SRH.12 are the 6 and 12 month moving average which necessarily drops some observations.

Specification B - Results
Dependent variable:
log(price)
(1) (2) (3)
SRH 0.041
(0.051)
SRH.6 0.035
(0.078)
SRH.12 -0.082
(0.128)
MPI 0.040*** 0.032*** 0.014
(0.009) (0.011) (0.016)
log(SAVC) 0.015 0.013 0.027
(0.022) (0.025) (0.029)
log(UNEMP) -0.464*** -0.483*** -0.533***
(0.038) (0.040) (0.049)
Constant 18.079*** 18.333*** 18.939***
(0.583) (0.602) (0.705)
Observations 47 46 44
R2 0.927 0.929 0.931
Adjusted R2 0.920 0.922 0.924
Residual Std. Error 0.052 (df = 42) 0.052 (df = 41) 0.051 (df = 39)
F Statistic 132.820*** (df = 4; 42) 133.748*** (df = 4; 41) 132.432*** (df = 4; 39)
Note: p<0.1; p<0.05; p<0.01


4 Findings

4.1 Actual vs Fitted

“All models are wrong but some are useful.” - George Box, Statistician.

Using tslm() function to fit a linear model with time series components produces the fitted value.

The variation from fitted value can be inferred as prices being above or below “equilibrium”, as defined by the model specification. It does not mean that prices will return to this equilibrium or mean that this fitted value is optimal.

The fitted model closely tracks the actual house price value. There has been a divergence to the latest period.

Figure 4.1: The fitted model closely tracks the actual house price value. There has been a divergence to the latest period.


Clear shortfalls of the model:

  • Data availability
  • Lack of complexity of credit variable
  • Not enough consideration for lagged effects


4.2 Variation

In the latest period on the preferred specification actual house prices are over-valued by 13%.

Expressing the variance between the actual and fitted values as a percentage of the fitted values shows that the actual price varies from being 'under' and 'over' valued across the sample period. This is what we would expect, however whether this is representative of the market dynamic remains to be seen.

Figure 4.2: Expressing the variance between the actual and fitted values as a percentage of the fitted values shows that the actual price varies from being ‘under’ and ‘over’ valued across the sample period. This is what we would expect, however whether this is representative of the market dynamic remains to be seen.


5 Going Forward

The very next step is to develop various other specifications and analyse the results over the same time horizon. Using proxys for some of the data series which date back further than 2012 should give some insight to house prices responsiveness to credit conditions in different economic cycles.


  1. A model I was working on which I did not circulate.↩︎

  2. An independent variable is the key variable of interest in an econometric model. We want to explain the independent variable given a series of well specified dependent variables↩︎

  3. The relationship between a series and its lags↩︎

  4. A log transformation would also be commonly applied when dealing with heteroscedasticity↩︎