1. Introduction

In this project, possible associated variables with the number of sold houses between the \(1^{st}\) of January 2007 and the \(30^{th}\) of June 2016 on the Stirling Ackroyd real estate company were analysed.

In this report the index FTSE 100 will be explored and investigated. The FTSE 100 consists of the largest 100 qualifying UK companies by full market value. The constituents of the FTSE are determined quarterly, on the Wednesday after the first Friday of the month in March, June, September and December. The values used to make the changes to the indices are taken at the close of business the night before the review.

2. Data Manipulation

The data used in this report were taken from Quandl Fincancial and Economic Data. The file has six columns and 165 observations (from December 2002 to August 2016). As described above, the original data were manipulated and reduced to 114 observations (to match the time of interest) and a column containing only the year of the observation was added. In this report it will only be considered the closing value of the index.

3. Analysis

The dataset were summarized and some results can be seen below:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3830    5545    6024    5906    6466    6984

The present dataset has standard deviation of 734.1524 and amplitude of 3154.34. It is of interest to analise how the variable behaves along time, as it can be seen in the plot below.

The time series has some not very usual behavior. As an example, from august 2008 to february 2009 the value of the index suffered a big decline as a reflexion of the crisis started in the US housing market. Also, between may 2012 and may 2013 the series presented a big increase on the index. In 2015, the index suffered again a decline, possibly a reflection of the drop in the oil prices. The Oil and Gas sector represents about 200 million pounds of the british economy.

In the analysis of time series it is comum that the observations are correlated among time,this characteristic is called autocorrelation. The Durbin-Watson test is a popular option to check the hypothesis of autocorrelation. In a confidence level of 95% the output of the test is a value (p-value) between 0 and 1: if (p-value \(>\) 0,05) the null hypothesis of non-autocorrelation is not rejected, otherwise (p-value \(\leq\) 0.05) we assume that the observations of the series are correlated among time. Also, the autocorrelation function (ACF) tests the significance of the coeficiente among time (lags). The p-value is smaller than 0,05, so we assume that autocorrelation is significant.

## 
##  Durbin-Watson test
## 
## data:  ftse.100$Close ~ ftse.100$Date
## DW = 0.13188, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0

To observe and compare the variation among the years of the FTSE 100 the plot below shows a BoxPlot for the index in each year separately.

As seen previously, 2008 and 2009 are marked as a big decline in the series and, in this new plot, It can be seen the enormous variation in these years. In this kind of index, big variation among short periods of times are not expected. As a contrast, 2012 and 2014 were marked as very stable years. To ilustrate that difference, the standard deviation in 2008 was 674.40 and the amplitude was 1799.29 points, while in 2014 the standard deviation was 116.7068 and the amplitude was 334.07.

As explained in section 1, the aim of the report is to verify the relationship between the FTSE index 100 and the Number of Sold Houses on the Stirling Ackroyd real estate company. The first exploratory analysis is to check the Pearson Correlation between the two variables, as shown below.

## 
##  Pearson's product-moment correlation
## 
## data:  ftse.100$Close and house.sold$numSoldHouses
## t = 0.1586, df = 112, p-value = 0.8743
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1693974  0.1983524
## sample estimates:
##        cor 
## 0.01498422

In the output there are two main informations: the correlation coefficient (0.01498422) and the p-value (0.8743). The correlation coefficient (CC) can be interpreted as a measure of the degree of linear relationship between two variables and the p-value is a test to check the CC is significant: if (p-value \(>\) 0,05) the null hypothesis of correlation equals to 0 is not rejected, otherwise (p-valor \(\leq\) 0.05) the correlation is significant. In this case, the linear relationship between the two variables is insignificant. As a complementary analysis, the plot shown above is a scatter plot between the variables and a regression line. It is clearly seen that an linear model is not well fitted to this dataset.

A point that has to be considered is that the data from the two variables are from quite a long time (9 years) and economic changes can be notice in short periods of time. Taking this into account the correlation coefficient will be analized considering different periods of time: i) the last 5 years and ii) the last 3 years. The output from the correlation test and coeficiente can be seen below and right beneath that a scatter plot is displayed, aiming to observe the existence of linear relationship.

i) Last 5 years

## 
##  Pearson's product-moment correlation
## 
## data:  ftse.100.5$Close and house.sold.5$numSoldHouses
## t = -1.1062, df = 58, p-value = 0.2732
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3836602  0.1143600
## sample estimates:
##        cor 
## -0.1437387

The correlation coefficient gets a greater absolute value but it keeps presenting a big p-value (0.2732).

ii) Last 3 years

## 
##  Pearson's product-moment correlation
## 
## data:  ftse.100.3$Close and house.sold.3$numSoldHouses
## t = -4.9782, df = 34, p-value = 1.833e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8059213 -0.4077497
## sample estimates:
##        cor 
## -0.6493044

Something interesting happens when only the 3 last years are considered: the correlation coefficient gets an absolute value considerably bigger and significant (p-value = 1.833e-05). This kind of conclusion was expected, as the recent situation of economy tends to be more correlated with short periods of time.