In this project, possible associated variables with the number of sold houses between the \(1^{st}\) of January 2007 and the \(30^{th}\) of June 2016 on the Stirling Ackroyd real estate company were analysed.
In this report the variable Average House Price will be explored and investigated.
The data used in this report were taken from Stirling Ackroyd database. The file has two columns and 129 observations (from December 2005 to August 2016). As described above, the original data were manipulated and reduced to 114 observations (to match the time of interest) and a column containing only the year of the observation was added.
The dataset were summarized and some results can be seen below:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 297900 436800 500500 532700 597300 1007000
The Average House Price has a big variation: squared deviation of 141264.5 pounds and amplitude of 709285.9 pounds. It is of interest to analyze how the variable behaves along time, as it can be seen in the plot below.
Not in a usual way, but it can be seen that the time series above seems to present some seasonality among the years. The autocorrelation function (ACF) below confirms the hypotheses of multiplicative seasonality. In addition, the Durbin-Watson test is a popular option to check the hypothesis of autocorrelation. In a confidence level of 95% the output of the test is a value (p-value) between 0 and 1: if (p-value \(>\) 0,05) the null hypothesis of non-autocorrelation is not rejected, otherwise (p-value \(\leq\) 0.05) we assume that the observations of the series are correlated among time. For the time series of Average House Price the results for the Durbin-Watson are shown right below the ACF. The p-value is smaller than 0,05, so we assume that autocorrelation is significant.
##
## Durbin-Watson test
##
## data: avg.house.price$AvgValue ~ avg.house.price$Date
## DW = 1.5275, p-value = 0.004124
## alternative hypothesis: true autocorrelation is greater than 0
To observe and compare the variation among the years of Average House Price the plot below shows a BoxPlot for the average price in each year separately.
An interesting analisys from the plot above is to observe the box from 2008: it has a very small amplitude and sightly modest median price, if compared with the other years. This is a reflection of the economic crisis between 2008 and 2009, leading the most powerful nations to a deep recession. Another interesting behavior of the plots is the considerable incrise in the distribuition of average prices from 2012 to 2016. That will be investigated in another occasion.
As explained in section 1, the aim of the report is to verify the relationship between the Avg House Price and the Number of Sold Houses on the Stirling Ackroyd real estate company. The first exploratory analysis is to check the Pearson Correlation between the two variables, as shown below.
##
## Pearson's product-moment correlation
##
## data: avg.house.price$AvgValue and house.sold$numSoldHouses
## t = -2.2925, df = 112, p-value = 0.02374
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.38080042 -0.02892433
## sample estimates:
## cor
## -0.2117129
In the output there are two main informations: the correlation coefficient (-0.2117129 ) and the p-value (0.02374). The correlation coefficient (CC) can be interpreted as a measure of the degree of linear relationship between two variables and the p-value is a test to check the CC is significant: if (p-value \(>\) 0,05) the null hypothesis of correlation equals to 0 is not rejected, otherwise (p-valor \(\leq\) 0.05) the correlation is significant. In this case, the linear relationship between the two variables is negative and weak. As a complementary analysis, the plot shown above is a scatter plot between the variables and a regression line. It is clearly seen that an linear model is not well fitted to this dataset.
The results were not very interesting, but a point that has to be considered is that the data from the two variables are from quite a long time (9 years) and economic changes can be notice in short periods of time. Taking this into account the correlation coefficient will be analized considering different periods of time: i) the last 5 years and ii) the last 3 years. The output from the correlation test and coefficient can be seen below and right beneath that a scatter plot is displayed, aiming to observe the existence of linear relationship.
##
## Pearson's product-moment correlation
##
## data: avg.house.price.5$AvgValue and house.sold.5$numSoldHouses
## t = 3.1617, df = 58, p-value = 0.002495
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1434674 0.5808014
## sample estimates:
## cor
## 0.383424
Something very insteresting happend when the data was filtered: the correlation coefficient presented higher absolute value and became positive. Somehow, the number of houses were considerably bigger in the earlys 2000. In the last 5 years, the relation between the two variables is more linear.
##
## Pearson's product-moment correlation
##
## data: avg.house.price.3$AvgValue and house.sold.3$numSoldHouses
## t = 2.2436, df = 34, p-value = 0.03148
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.03466766 0.61508081
## sample estimates:
## cor
## 0.3591133
The results for the last 3 years are quite similar to the last 5 years