In this project, the variable ‘Number of Sold Houses’ from \(1^{st}\) of January 2007 to \(30^{th}\) of June 2016 on the Stirling Ackroyd real estate company wil be analysed and explored.
The data used in this report were taken from Stirling Ackroyd real estate company database. The file has two columns and 129 observations (from December 2005 to August 2016). As described above, the original data were manipulated and reduced to 114 observations (to match the time of interest) and a column containing only the year of the observation was added. In this report it will only be considered the closing value of the index.
The dataset were summarized and some results can be seen below:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 8.00 11.00 13.71 16.75 43.00
The present dataset has standard deviation of 9.0198637 and amplitude of 41. It is of interest to analise how the variable behaves along time, as it can be seen in the plot below. There is some techniques in the analysis of time series which support the identification of dynamic changes in the behavior of the series. In the plot below, the orange line identifies one point classified as a “changing point”.
The poind identified as a “changing point” is February 2008 when the company sold 25 houses. It is the time when the 2008 crisis emerged in the U.S and impacted all the world. In the subsequent month, the company sold 12 houses, less than half when compared with the previous month. The series presents a big variation in short periods of time. For example, in December 2015 the Stirling Ackroyd company sold 8 houses and in the following month (January 2016) it sold 33 houses. Visually, it seems that the series has a negative tendency till the end of 2013, because the number of houses tend to decrease with time: in October 2007, 36 houses were sold while 6 houses were sold in October 2013. One possible reason for that decay in the number of sold houses is that the company may have changed its niche. The reasons for that change might be explored in another opportunity.
In the analysis of time series it is comum that the observations are correlated among time, this characteristic is called autocorrelation. The Durbin-Watson test for the Number of Sold Houses returns p-value is 1.617146810^{-12}, so the hypothesis of non-correlation is rejected in a significance level of 5%. Right below, the autocorrelation function is presented and it confirms that the series is autocorrelated among time.
Something algo very important in the analysis of time series is to check if the series has a stationary behavior or not. The test used in this report is the Dickey-Fuller test. Stationarity is one way of modeling the dependence structure. It turns out that a lot of nice results which holds for independent random variables hold for stationary random variables. And of course it turns out that a lot of data can be considered stationary, so the concept of stationarity is very important in modeling non-independent data.
The Dickey-Fuller test for this case presents p-value of 0.3791127. So, in a 5% level of significance, the series is non-stationary.
To observe and compare the variation among the years of the dataset the plot below shows a BoxPlot for the index in each year separately.
It can be noticed that the median values and the distributions tended to decrease till 2013. The median value in 2013 is 5 while 2016 presents value of 19. Nevertheless, it is important to highlight that 2016 has less observations than the other years and its distribution (and median value) might change till the end of the year.