We would like to understand the performance of assets. A standard way of doing this is to use a sample from the past to understand the bigger picture population. How well this sample can be used to understand the future will depend on the extent to which the sample is representative of the population. Therefore, the sample should be as large as possible and should include positive and negative periods as well as contrasting financial situations - such as rising and falling interest rates or booms and recessions. We would also like to include things like financial and political shocks. There is always the limitation that future, unforeseen black swans that have never been seen before may arise. These are not part of the sample and therefore will not be part of the understanding of the asset. Therefore, even with a large, representative sample, we still need to be cautious about our ability to use the sample to understand the future.
Import data for Bank of American and the S&P 500 index into R
Explain the relationship between the individual stock and the index
da <- read.csv('../../Data/BACSPY.csv')
head(da)
## Date BAC S.P500
## 1 03/01/2000 24.21875 1455.22
## 2 04/01/2000 22.78125 1399.42
## 3 05/01/2000 23.03125 1402.11
## 4 06/01/2000 25.00000 1403.45
## 5 07/01/2000 24.34375 1441.47
## 6 10/01/2000 23.50000 1457.60
tail(da)
## Date BAC S.P500
## 4774 04/01/2019 25.58 2531.94
## 4775 07/01/2019 25.56 2549.69
## 4776 08/01/2019 25.51 2574.41
## 4777 09/01/2019 25.76 2584.96
## 4778 10/01/2019 25.73 2596.64
## 4779 11/01/2019 26.03 2596.26
str(da)
## 'data.frame': 4779 obs. of 3 variables:
## $ Date : chr "03/01/2000" "04/01/2000" "05/01/2000" "06/01/2000" ...
## $ BAC : num 24.2 22.8 23 25 24.3 ...
## $ S.P500: num 1455 1399 1402 1403 1441 ...
This code will import the data into R and allow you to look at the
first 6 lines of the data as well as the last 6 lines. We have also
looked at the structure of the data with the function
str.
You can see that the Date is a chr or character. In the
future, it might be useful if we have that as a Date
object. There will then be a chronological sequence. We can turn words
and numbers into dates with the as.Date function.
da$Date <- as.Date(da$Date, format = "%d/%m/%Y")
str(da)
## 'data.frame': 4779 obs. of 3 variables:
## $ Date : Date, format: "2000-01-03" "2000-01-04" ...
## $ BAC : num 24.2 22.8 23 25 24.3 ...
## $ S.P500: num 1455 1399 1402 1403 1441 ...
We have created the Date object.
format that you would use to identify this
date 2000-01-01 and 01-Jan-20.It is useful to look at the data that has been imported into R. We
use the plot function to do this. For Bank of America:
plot(da$Date, da$BAC, type = 'l', main = "Bank of America")
and the S&P 500
plot(da$Date, da$S.P500, 'l', main = "S&P 500 index")
We would like to compare the two assets on the same graph. This is difficult because BAC is around 30 dollars and the S&P 500 index is 3000. We will rebase the two variables to start at 100 at the beginning of the series.
If we perform the same adjustment on each element of a series, we can change the numbers while maintaining the shape. This is like transforming miles into kilometres: the numbers change but the distance remains the same. We will divide the starting value by itself and then multiply by 100. Do the transformation to all rows.
da$BACre <- da$BAC/da$BAC[1] * 100
da$SPYre <- da$S.P500/da$S.P500[1] * 100
head(da)
## Date BAC S.P500 BACre SPYre
## 1 2000-01-03 24.21875 1455.22 100.00000 100.00000
## 2 2000-01-04 22.78125 1399.42 94.06452 96.16553
## 3 2000-01-05 23.03125 1402.11 95.09677 96.35038
## 4 2000-01-06 25.00000 1403.45 103.22581 96.44246
## 5 2000-01-07 24.34375 1441.47 100.51613 99.05513
## 6 2000-01-10 23.50000 1457.60 97.03226 100.16355
tail(da)
## Date BAC S.P500 BACre SPYre
## 4774 2019-01-04 25.58 2531.94 105.6206 173.9902
## 4775 2019-01-07 25.56 2549.69 105.5381 175.2099
## 4776 2019-01-08 25.51 2574.41 105.3316 176.9086
## 4777 2019-01-09 25.76 2584.96 106.3639 177.6336
## 4778 2019-01-10 25.73 2596.64 106.2400 178.4362
## 4779 2019-01-11 26.03 2596.26 107.4787 178.4101
str(da)
## 'data.frame': 4779 obs. of 5 variables:
## $ Date : Date, format: "2000-01-03" "2000-01-04" ...
## $ BAC : num 24.2 22.8 23 25 24.3 ...
## $ S.P500: num 1455 1399 1402 1403 1441 ...
## $ BACre : num 100 94.1 95.1 103.2 100.5 ...
## $ SPYre : num 100 96.2 96.4 96.4 99.1 ...
and now plot the two series
plot(da$Date, da$BACre, type = 'l', main = "Bank of America and S&P 500",
xlab = 'Date', ylab = 'Price')
lines(da$Date, da$SPYre, col = 'red', lty = 2)
legend('topright', inset = 0.1, legend = c('Bank of America', 'S&P500'),
col = c('black', 'red'), lty = c(1, 2), cex = 0.7)
You can see that Bank of America has been much more volatile than the S&P 500. Note the build up to and aftermath of the global financial crisis.
Generally we will use returns rather than prices when trying to understand asset performance. This is because asset returns are usually stationary while asset prices very often have a unit root. A stationary series is one where the key characteristics do not depend on the time that the series is observed. The mean and standard deviation are the same at each point. If the series has a trend of seasonal patterns, it will not be stationary.You can read more about stationary data and see some examples here.
The statistical test of non-stationary data will assess whether there is a unit root. If we take a standard autoregression series:
\[y_t = \rho y_{t-1} + \varepsilon_t\] we would like to test whether \(\rho\) is equal to more or less than one (unit). If it is more than one, we have an explosive series (being driven by shocks); if it is less than one, the shocks will gradually die out over time.
The last equation can be re-arranged as:
\[\Delta y_t = (\rho - 1)y_{t-1} + \varepsilon_t\]
Therefore we can say that \(\alpha = \rho -1\) and then we test
\[\Delta y_t = \alpha y_{t-1} + \varepsilon\]
in the usual way. If \(\alpha = 0, \rho = 1\). However, the critical values are not standard and therefore we use critical values that were derived from simulations by either Dickey and Fuller or MacKinnon Dickey and Fuller (1979), Dickey and Fuller (1981), MacKinnon (1991).
We can use the urca package to test for a unit root. There are a number of tests. The most common is the Augmented Dickey Fuller Test (ADF Test). You can find the others in the documentation. The null is a unit root apart from the Kwiatkowski, Phillips, Schmidt and Shin (KPSS) Test which has a null of stationary Kwaitkowski et al. (n.d.).
require(urca)
urtest <- ur.df(da$S.P500, 'none')
summary(urtest)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -116.889 -7.135 0.677 7.856 112.658
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 0.0001614 0.0001449 1.114 0.265333
## z.diff.lag -0.0543641 0.0144341 -3.766 0.000168 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.9 on 4775 degrees of freedom
## Multiple R-squared: 0.00318, Adjusted R-squared: 0.002762
## F-statistic: 7.616 on 2 and 4775 DF, p-value: 0.0004983
##
##
## Value of test-statistic is: 1.114
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
The null hypothesis of a unit root cannot be rejected.
One of the main ways to deal with non-stationary data is to take the first difference or percentage change. When working with financial data, we use percentage change. This is called returns. Here is the same plot with returns data.
We want to calculate the returns for these two assets. Returns are stationary data and returns can be compared to other assets. We will calculate the simple returns (percentage change). It is also possible to calculate the log returns (continuously compounded). Our calculation is:
\[r_i = \frac{P_{i,t} - P_{i, t-1}}{P_{i, t-1}}\] There are packages with set functions that can calculate returns, but this is a crude replication of the basic formula which can be re-arranged as:
\[r_i = \frac{P_{i,t}}{P_{i,t-1}} - 1\]
da$BACR <- c(NA, da$BAC[2:length(da$BAC)]/
da$BAC[1:length(da$BAC) -1 ] -1)
da$SPR <- c(NA, da$S.P500[2:length(da$S.P500)]/
da$S.P500[1:length(da$S.P500) -1 ] -1)
plot(da$Date, da$BACR, type = 'l', main = "SPY and BAC returns",
xlab = "Date", ylab = "Price", col = 'red')
lines(da$Date, da$SPR, type = 'l', col = 'black')
legend('topleft', inset = 0.02, c('SPY','BAC'),
col = c('black', 'red'), lty = c(1,1))
It is evident from this that SPY (black) has more volatile returns than TLT (red).
Write the code to calculate continuously compounded returns.
Add this series to your dataframe.
We would like to look at the distribution of these returns to know more about potential outcomes. What is the most likely value? How are returns dispersed around this likely outcome?
par(mfrow = c(2,1))
hist(da$BACR, breaks = 200, xlim = c(-0.3, 0.3), col = 'light blue',
main = "BAC returns", xlab = "BAC")
hist(da$SPR, breaks = 200, xlim = c(-0.3, 0.3), col = 'light blue',
main = "S&P 500 returns", xlab = "S&P")
Though it is evident that the distribution of Bank of America returns is spread wider than that of the S&P 500, we would like to describe precisely what we see in the pictures. The most important measures will be the mean and the standard deviation of the returns. The mean is the most likely outcome and is sometimes called the expected value; the standard deviation is the dispersion around the mean. It is widely used in finance as a measure of risk.
We would also like to scale up our measures of expected return and risk from the daily data that we have to annual figures so that they are easier to understand and comparable with other investments. This is done in the following way.
(1 + mean(da$BACR, na.rm = TRUE))^255 - 1
## [1] 0.1187788
(1 + mean(da$SPR, na.rm = TRUE))^255 - 1
## [1] 0.05069662
sd(da$BACR, na.rm = TRUE) * sqrt(255)
## [1] 0.4670491
sd(da$SPR, na.rm = TRUE) * sqrt(255)
## [1] 0.1926116
We can write our own functions in R. This will be useful as it will prevent us having to re-write difficult code over and over again.
A function is an object just like everything else and we use the
function function to create a new function. For
example,
mysquare <- function(x){
mynumber <- x^2
return(mynumber)
}
Look at this carefully. Make sure that you know what is happening.
Try to create a function that will double the value input.
Search for basic function writing and create your own function.
This is an advanced function that will calculate the key descriptive statistics for our series.
mystats <- function(x){
if(NA %in% x){
x <- x[!is.na(x)]
warning("NA removed")
}
mynumber <- length(x)
mymean <- mean(x)
mymed <- median(x)
mysd <- sd(x)
myskew <- sum((x-mymean)^3/mysd^3)/mynumber
mykurt <- sum((x - mymean)^4/mysd^4)/mynumber - 3
mymax <- max(x)
mymin <- min(x)
return(c("Number" = round(mynumber, 0),
"Mean" = round(mymean, 4),
"Median" = round(mymed, 4),
"St Dev" = round(mysd, 4),
"Skew" = round(myskew, 2),
"Kurtosis" = round(mykurt, 2),
"Min" = round(mymin, 4),
"Max" = round(mymax, 4)))
}
mystats(da$BACR)
## Warning in mystats(da$BACR): NA removed
## Number Mean Median St Dev Skew Kurtosis Min Max
## 4778.0000 0.0004 0.0000 0.0292 0.9100 27.5600 -0.2897 0.3527
mystats(da$SPR)
## Warning in mystats(da$SPR): NA removed
## Number Mean Median St Dev Skew Kurtosis Min Max
## 4778.0000 0.0002 0.0005 0.0121 -0.0200 8.6800 -0.0903 0.1158