1. Introduction

In this report, the GLARMA method was applied to 4 different variables trying to link them with the number of sold houses by Stirling Ackroyd real estate company between the \(1^{st}\) of January 2007 and the \(30^{th}\) of June 2016. For each variable, a correspondent model was fitted and another model including all variables was also adjusted to the dataset. The variables used here were: Gold Price, Dolar to Sterling, Real Estate Index and the Average House Price.

2. Data Manipulation

The original data were manipulated and reduced to 114 observations (to match the time of interest) and all the variables were coupled into a matrix composed by a 1 column, an annual sine and an annual cosine columns, a semiannual sine and a semiannual cosine columns, resulting into a 114x9 matrix. The annual and semiannual columns represents control variables whose function is to capture seasonality , if any.

3. Distribution’s Adjustment

In the next section, the GLARMA methodology will be explained. The GLARMA is usually used for counts distribution (as Poisson and Negative Binomial) that presentes non independent data. Hence, the first step to adjust a GLARMA model is to check which distribution best fits our response variable, what can be done by computing a skewness-kurtosis plot, as shown below:

## summary statistics
## ------
## min:  2   max:  43 
## median:  11 
## mean:  13.71053 
## estimated sd:  9.019864 
## estimated skewness:  1.335941 
## estimated kurtosis:  4.38887

It’s easy to realize that the number of sold houses follows a Negative Binomial or a Poisson distribution. To choose one between these two the following plots can be used:

The graphs above show that our dataset is better fitted by the Negative Binomial distribution. It´s also possible to use the AIC (Akaike’s information criterion). Analysing the AIC, the lower this value, the best the data fits the distribution. The AIC for both distributions is shown below:

##       Poisson    Negative Binomial
## AIC  1087.397             785.3195

As the AIC is lower for the Negative Binomial and the graphs analyzed indicate a better fit to this distribution, we conclude that the Negative Binomial GLARMA model should be used for the number of sold houses.

4. GLARMA Model

As explained before, the GLARMA methodology is used for counts distributions that presentes non independent data. To check if the response variable and the explanatory variables are correlated we use the Pearson coefficient and test it’s significance. If the test p-value \(<\) 0,05, the coefficient is significant. Otherwise (p-value \(\geq\) 0,05) the coefficient is non significant. For each one of the 4 variables selected (Doalr, Avgprice, DSRE and Gold) the p-value and the Pearson coefficient are shown below:

##                    Dolar    Avgprice       DSRE          Gold
## Correlation 5.306842e-01 -0.21171290 0.20568480 -6.039861e-01
## P-value     1.252955e-09  0.02374313 0.02813198  1.127074e-12

Once we checked the Pearson coefficient, it’s time to adjust the correct GLARMA model. Basically, this Model is composed by an autorregressive (AR) and a moving average (MA) parts. In this report we will adjust four models (one for every explanatory variable) and another one with all the variables show be significant . The AIC still a criterion to choose which model fits the dataset better. The models adjusted to the Number of Sold Houses are presented below, being compared with the real series.

In the plots above we can observe how the models behave along the dataset. In most of the cases, even if the fitted value doesn’t match with the real number of sold houses, it’s easy to see that the model captures peaks along the Date. In the General GLARMA Model the only variables considered significant was the Gold Price and the annual sine, capable of detect seasonality.

A point that has to be considered is that the dataset is from quite a long time (9 years) and economic changes can be notice in short periods of time. Taking this into account two differents GLARMA Models werer fitted according to time series length: i) the last 5 years and ii) the last 3 years.

i) Last 5 years

When a smaller time series is used, the GLARMA adjusts to the real serie a little better. A point that can be compared is that when only the last 5 years are used, the model does not present a delay to capture the peaks, actually in some cases it anticipates the peaks. In the General model the only variable significant was the Dolar, that’s why the General model overlap the Dolar one.

ii) Last 3 years

When only the last 3 years is analyzed, the GLARMA models doesn’t fitt so well to the dataset. It can’t capture the peaks and the estimated values gets too distant of the real values.

The following plot shows the best fitted models above. The GLARMA Dolar for the entire series, the GLARMA Dolar for the last 5 years and the GLARMA Gold for the last 3 years.

As we can clearly see in this graphic, the best model is the GLARMA Dolar for the last 5 years. It’s the model that has the most realistic fitted values and is the only one capable of capture the peaks before it happens.