1. Introduction

In this project, possible associated variables with the number of sold houses between the \(1^{st}\) of January 2007 and the \(30^{th}\) of June 2016 on the Stirling Ackroyd real estate company were analyzed.

In this report the index Base Interest Rate will be explored and investigated. The Bank of England set the base rate. This is the rate at which they charge commercial banks to borrow from the Bank of England. In normal economic circumstances, this base rate will influence all the interest rates set by other banks and financial institutions. If the Bank of England cut the base rate, you would expect banks to also cut their mortgage and lending rates and if the Bank of England put up the base rate, you would expect banks to increase their mortgage rates.

2. Data Manipulation

The data used in this report were taken from Quandl Fincancial and Economic Data. The file has 2 columns and 115 observations. As described above, the original data were manipulated and reduced to 114 observations (to match the time of interest) and a column containing only the year of the observation was added. In this report it will only be considered the closing value of the index.

3. Analysis

The dataset were summarized and some results can be seen below:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.510   2.945   3.565   3.876   3.915   7.040

The Box Plot is a exploratory graphic used to show the distribution of a dataset. In the present dataset the biggest value is 7.04 , 25% of data is greater than 3.915, 50% of data is greater than 3.565 (median value) and the smallest value is 2.51. The standard deviation is 1.2393645 and the amplitude is 4.53. It is of interest to analise how the variable behaves along time, as it can be seen in the plot below.

Some remarkable economic moments can be identified in the time series. For example, the global crisis which started in the US housing market between 2008 and 2009 reflected the decline of the rate in the plot. The aim was to put more money into the financial system and try to minimize the crisis. In a recent interview, an agent from the Bank of England said: ‘We are cutting rates because lower costs will encourage households and businesses to borrow more, encouraging banks to create more money for loans and keep money flowing into the economy. As rates are so low, the effect may be limited.’

In the analysis of time series it is common that the observations are correlated among time,this characteristic is called autocorrelation. The Durbin-Watson test is a popular option to check the hypothesis of autocorrelation. In a confidence level of 95% the output of the test is a value (p-value) between 0 and 1: if (p-value \(>\) 0,05) the null hypothesis of non-autocorrelation is not rejected, otherwise (p-valor \(\leq\) 0.05) we assume that the observations of the series are correlated among time. Also, the autocorrelation function (ACF) tests the significance of the coefficient among time (lags). The p-value is smaller than 0,05, so we assume that autocorrelation is significant.

## 
##  Durbin-Watson test
## 
## data:  base_index ~ base_date
## DW = 0.10834, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0

To observe and compare the variation among the years of the Base Interest Rate the plot below shows a BoxPlot for the index in each year separately.

As seen previously, 2008 and 2009 are marked as a decline in the series and, in this new plot, It can be seen a bigger variation in these years. In this kind of rate, big variation among short periods of times are not expected. The index tends to establish among time, as the boxes became shorter. A big variation can also be seen in 2013, probably a reflexion of the decline in the prices of oil

As explained in section 1, the aim of the report is to verify the relationship between the Base Interest Rate index and the Number of Sold Houses on the Stirling Ackroyd real estate company. The first exploratory analysis is to check the Pearson Correlation between the two variables, as shown below.

## 
##  Pearson's product-moment correlation
## 
## data:  base_index and house.sold$numSoldHouses
## t = 6.2736, df = 112, p-value = 6.831e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3597593 0.6343563
## sample estimates:
##       cor 
## 0.5099342

It can be seen that the biggest values are very influential and they are also from the years of 2007 and 2008

In the output there are two main informations: the correlation coefficient (0.5099342) and the p-value (6.831184910^{-9}). The correlation coefficient (CC) can be interpreted as a measure of the degree of linear relationship between two variables and the p-value is a test to check if the CC is significant: if (p-value \(>\) 0,05) the null hypothesis of correlation equals to 0 is not rejected, otherwise (p-valor \(\leq\) 0.05) the correlation is significant. In this case, the linear relationship between the two variables is positive and moderate. As a complementary analysis, the plot shown above is a scatter plot between the variables and a regression line.

A point that has to be considered is that the data from the two variables are from quite a long time (9 years) and economic changes can be notice in short periods of time. Taking this into account the correlation coefficient will be analized considering different periods of time: i) the last 5 years and ii) the last 3 years. The output from the correlation test and coefficient can be seen below and right beneath that a scatter plot is displayed, aiming to observe the existence of linear relationship.

i) Last 5 years

## 
##  Pearson's product-moment correlation
## 
## data:  base.5[, names(base.5) == variavel[2]] and house.sold.5$numSoldHouses
## t = -3.2762, df = 58, p-value = 0.001779
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5899036 -0.1570017
## sample estimates:
##        cor 
## -0.3951722

The correlation coefficient is significant , in a 95% confidence level, when only the 5 last years are considered (p-value = 0.0017789), so there is significant linear relationship between the two variables.

Something interesting happens when only the 5 last years are used: the relationship becomes negative. That happened because of the big values attributed to the first years. As seen in this report, after 2008 the rate declined almost all the time.

ii) Last 3 years

## 
##  Pearson's product-moment correlation
## 
## data:  base.3[, names(base.3) == variavel[2]] and house.sold.3$numSoldHouses
## t = -4.2629, df = 34, p-value = 0.0001515
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7695136 -0.3245838
## sample estimates:
##        cor 
## -0.5901839

Using only the last 3 years, the correlation coefficient is significant and the linear relationship between the variables is negative and moderate. The linear relationship keeps negative and gets more power when only the 3 last years are considered.