In this report some variables related to the London Real Estate were analysed. The data represents the rescords of the English Government about the houses that were sold in London. The data starts at January 2006 and ends at December 2014. There are four main information in the dataset: the date (in months), the number of sold houses in each month, the average house price and the property type (Flats, Semi-detached and Terraced).
The term ‘flat’ is synonymous with British English. From studio flats, to maisonettes and 2-storey flats, a flat is a living area that is self-contained and in one part of a building. A building is usually split into individual flats and the communal areas are those that are shared e.g. lifts, stairwells, receptions etc.
Semi-detached properties are a lot more common for homeowners to purchase/rent. There are a lot more semi-detached properties in the UK as they save a lot of space as they are houses paired together by a common wall. Semi-detached properties are fantastic options for homeowners to extend at the back and side and have an element of privacy too.
Terraced houses are common in old industrial towns and cities such as Manchester, Bath and areas of central London. Terraced houses became extremely popular to provide high-density accommodation for the working class in the 19th century. Terraced houses are structurally built the same and both sides of each house shares walls with neighbours.
It is of interest to investigate the relationship between the data from London (in general) and the data from Stirling Ackroyd real estate company. It is worth mentioning that the company has a big market in London, so it is very possible that some the records from the English Government are from Stirling Ackroyd.
The data used in this report were taken from the Office for National Statistics of the UK government. The file has four columns and 323 observations. The aim is to observe the main differences between property types and try to correlate the data from London to the Stirling Ackroyd real estate company statistics.
The table below summarizes the variables per type of property.
Average House Price | Min | 1st.Qu. | Median | Mean | 3rd.Qu. | Max. | Standard Deviation |
---|---|---|---|---|---|---|---|
Flat | 293600 | 372400 | 406800 | 430500 | 481300 | 664300 | 82345.37 |
Terraced | 326900 | 458100 | 581200 | 634400 | 758900 | 1610000 | 238125.55 |
Semi-detached | 165200 | 357200 | 473800 | 606400 | 621100 | 4422000 | 514498.27 |
It can be seen that the average prices of flats presents the smallest variation among the property types.
Number of Sold Houses | Min | 1st.Qu. | Median | Mean | 3rd.Qu. | Max. | Standard Deviation |
---|---|---|---|---|---|---|---|
Flat | 901 | 2252.00 | 2511 | 2723.000 | 3310 | 4594 | 867.00381 |
Terraced | 8 | 32.75 | 43 | 45.240 | 55 | 101 | 19.12129 |
Semi-detached | 1 | 3.00 | 6 | 5.748 | 8 | 15 | 3.06869 |
In the other hand, the number of flats sold tend to vary a lot, presenting a big amplitude.
The plots below presents the distributions of the variables per property type.
It’s clear that Flats are widely more sold than the others properties. This can be seen as a reflexion of contemporary big cities, where Semi-detached and detached properties are more common in the suburbs and, generally, costs a lot more.
It is of interest to observe how the variables behave among time, as it can be seen below.
The first three series shows the average house price among the years. For the Flats category, it can be seen that the average price has a growing tendence and it did not varied considerably in short periods of time. In the other hand, the Terrace category presented a lot of variation among the years, but it is not possible to see a clear central tendency. For the Semi-detached category a linear tendence seems to be rasonable to describe the behavior of the the series. Excluding some outliers (as the big peak in April 2013) the line seems to vary below and above the mean price which is 606406.8 pounds.
It is impressive to obeserve the decline on the number of sold houses between 2008 and 2009, specially for the two first plots. The 2008-09 financial crisis is considered by many economists to have been the worst financial crisis since the Great Depression of the 1930s. In the Flat category, the number of houses drops from 3695 in November 2007 to 901 in January 2009. In Terraced category these drop is from 59 to 11. After 2010, the first series seems to present an annual seasonality: it increases in the first half of the years and drops till the end of the year. In the other hand, the number of sold houses of the type ‘Terrace’ tends to establish after the crisis.
In the analysis of time series it is comum that the observations are correlated among time, this characteristic is called autocorrelation. The autocorrelation function (ACF) tests the significance of the coeficiente among time (lags). As it can be seen below, there are significative autocorrelations is all categories for both variables.
To observe and compare the variation among the years of the Average House Price and the Number of Sold Houses the plots below shows some boxplots by year.
As already analysed the average price of Flats increase among the years and also does its variance: the amplitude of the variable is consistently bigger in 2012 and 2014.
As seen previously, 2008 and 2009 are marked as a decline in the series of the Number os Sold Houses and, in this new plot, It can be seen a bigger variation in these years, mainly for the Flat category. Another inference that can be taken from those two plots is that the average price and the number of sold houses doesn’t behave in the same conditions and in the first variable is not possible to observe the effects of the crisis in 2008/2009.
In the dataset from Stirling Ackroyd there are the same variables observed in this report, however there is not classification about the type of the property. Furthermore, the data from the company covers a bigger period of time (December 2005 to August 2016) and will be filtered to match the same time period of the London Data.
The plot below is very influenced by outliers (as the 4,42 milions pounds Semi-detached house), so it is advisable to give a zoom in the plot and observe the main regions of the boxes. The medians of the variables are not very different and they also present similar distributions. However, the variability in the average house price is considerably smaller in the flat category as in the Stirling Ackroyd company.
TEM QUE ARRUMAR O GRÁFICO AINDA
The Person Correlation Coefficient was calculated ans the results are shown below.
##
## Pearson's product-moment correlation
##
## data: flat$AvgPrice and avg.house.price$AvgValue
## t = 7.4372, df = 106, p-value = 2.807e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4459385 0.6973714
## sample estimates:
## cor
## 0.5855664
##
## Pearson's product-moment correlation
##
## data: terraced$AvgPrice and avg.house.price$AvgValue
## t = 4.8421, df = 106, p-value = 4.397e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2573082 0.5688151
## sample estimates:
## cor
## 0.4255881
##
## Pearson's product-moment correlation
##
## data: semi$AvgPrice and avg.house.price$AvgValue
## t = 1.2008, df = 106, p-value = 0.2325
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.07476042 0.29829373
## sample estimates:
## cor
## 0.1158502
In the output there are two main informations: the correlation coefficient and the p-value. The correlation coefficient (CC) can be interpreted as a measure of the degree of linear relationship between two variables and the p-value is a test to check the CC is significant: if (p-value \(>\) 0,05) the null hypothesis of correlation equals to 0 is not rejected, otherwise (p-valor \(\leq\) 0.05) the correlation is significant.
In this case, the first two correlation coefficients were significative. So, the average price of Flats and Terraced properties in London are positively correlated with the average price at Stirling Ackroyd. However, the relationship is not very strong.
The number of sold houses among the 4 different categories are considerabily different, so the boxplot will be omitted. The Person Correlation Coefficient was calculated ans the results are shown below.
##
## Pearson's product-moment correlation
##
## data: flat$numSoldHouses and house.sold$numSoldHouses
## t = 7.2644, df = 106, p-value = 6.607e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4349284 0.6902875
## sample estimates:
## cor
## 0.5765183
##
## Pearson's product-moment correlation
##
## data: terraced$numSoldHouses and house.sold$numSoldHouses
## t = 7.9941, df = 106, p-value = 1.716e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4799358 0.7189396
## sample estimates:
## cor
## 0.6132874
##
## Pearson's product-moment correlation
##
## data: semi$numSoldHouses and house.sold$numSoldHouses
## t = 3.3395, df = 106, p-value = 0.001159
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.126964 0.470100
## sample estimates:
## cor
## 0.3085353
The results show that all CC’s are significante, but the Semi-detached category presented a smaller absolute value: it seemes that this type of property is not a good explanatory variable in both cases.
As an example and also a complementary analysis an scatterplot with a regression line of the biggest correlation coefficient (0.6132874) can be foubd right below.
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
It can be clearly seen that a linear model is not well fitted to the dataset, otherwise it gives a good image of how the variables behave.