Risk and return

Introduction

We would like to understand the performance of assets. A standard way of doing this is to use a sample from the past to understand the bigger picture population. How well this sample can be used to understand the future will depend on the extent to which the sample is representative of the population. Therefore, the sample should be as large as possible and should include positive and negative periods as well as contrasting financial situations - such as rising and falling interest rates or booms and recessions. We would also like to include things like financial and political shocks. There is always the limitation that future, unforeseen black swans that have never been seen before may arise. These are not part of the sample and therefore will not be part of the understanding of the asset. Therefore, even with a large, representative sample, we still need to be cautious about our ability to use the sample to understand the future.

Practice 2.1

Import data for Bank of American and the S&P 500 index into R
Explain the relationship between the individual stock and the index

da <- read.csv('../../Data/BACSPY.csv')
head(da)

##         Date      BAC  S.P500
## 1 03/01/2000 24.21875 1455.22
## 2 04/01/2000 22.78125 1399.42
## 3 05/01/2000 23.03125 1402.11
## 4 06/01/2000 25.00000 1403.45
## 5 07/01/2000 24.34375 1441.47
## 6 10/01/2000 23.50000 1457.60

tail(da)

##            Date   BAC  S.P500
## 4774 04/01/2019 25.58 2531.94
## 4775 07/01/2019 25.56 2549.69
## 4776 08/01/2019 25.51 2574.41
## 4777 09/01/2019 25.76 2584.96
## 4778 10/01/2019 25.73 2596.64
## 4779 11/01/2019 26.03 2596.26

str(da)

## 'data.frame':    4779 obs. of  3 variables:
##  $ Date  : chr  "03/01/2000" "04/01/2000" "05/01/2000" "06/01/2000" ...
##  $ BAC   : num  24.2 22.8 23 25 24.3 ...
##  $ S.P500: num  1455 1399 1402 1403 1441 ...

This code will import the data into R and allow you to look at the first 6 lines of the data as well as the last 6 lines. We have also looked at the structure of the data with the function str.

The Date class

You can see that the Date is a chr or character. In the future, it might be useful if we have that as a Date object. There will then be a chronological sequence. We can turn words and numbers into dates with the as.Date function.

da$Date <- as.Date(da$Date, format = "%d/%m/%Y")
str(da)

## 'data.frame':    4779 obs. of  3 variables:
##  $ Date  : Date, format: "2000-01-03" "2000-01-04" ...
##  $ BAC   : num  24.2 22.8 23 25 24.3 ...
##  $ S.P500: num  1455 1399 1402 1403 1441 ...

We have created the Date object.

Practice 2.2

Work out the format that you would use to identify this date 2000-01-01 and 01-Jan-20.

Plotting

It is useful to look at the data that has been imported into R. We use the plot function to do this. For Bank of America:

plot(da$Date, da$BAC, type = 'l', main = "Bank of America")

and the S&P 500

plot(da$Date, da$S.P500, 'l', main = "S&P 500 index")

We would like to compare the two assets on the same graph. This is difficult because BAC is around 30 dollars and the S&P 500 index is 3000. We will rebase the two variables to start at 100 at the beginning of the series.

Re-basing

If we perform the same adjustment on each element of a series, we can change the numbers while maintaining the shape. This is like transforming miles into kilometres: the numbers change but the distance remains the same. We will divide the starting value by itself and then multiply by 100. Do the transformation to all rows.

da$BACre <- da$BAC/da$BAC[1] * 100
da$SPYre <- da$S.P500/da$S.P500[1] * 100
head(da)

##         Date      BAC  S.P500     BACre     SPYre
## 1 2000-01-03 24.21875 1455.22 100.00000 100.00000
## 2 2000-01-04 22.78125 1399.42  94.06452  96.16553
## 3 2000-01-05 23.03125 1402.11  95.09677  96.35038
## 4 2000-01-06 25.00000 1403.45 103.22581  96.44246
## 5 2000-01-07 24.34375 1441.47 100.51613  99.05513
## 6 2000-01-10 23.50000 1457.60  97.03226 100.16355

tail(da)

##            Date   BAC  S.P500    BACre    SPYre
## 4774 2019-01-04 25.58 2531.94 105.6206 173.9902
## 4775 2019-01-07 25.56 2549.69 105.5381 175.2099
## 4776 2019-01-08 25.51 2574.41 105.3316 176.9086
## 4777 2019-01-09 25.76 2584.96 106.3639 177.6336
## 4778 2019-01-10 25.73 2596.64 106.2400 178.4362
## 4779 2019-01-11 26.03 2596.26 107.4787 178.4101

str(da)

## 'data.frame':    4779 obs. of  5 variables:
##  $ Date  : Date, format: "2000-01-03" "2000-01-04" ...
##  $ BAC   : num  24.2 22.8 23 25 24.3 ...
##  $ S.P500: num  1455 1399 1402 1403 1441 ...
##  $ BACre : num  100 94.1 95.1 103.2 100.5 ...
##  $ SPYre : num  100 96.2 96.4 96.4 99.1 ...

and now plot the two series

plot(da$Date, da$BACre, type = 'l', main = "Bank of America and S&P 500", 
     xlab = 'Date', ylab = 'Price')
lines(da$Date, da$SPYre, col = 'red', lty = 2)
legend('topright', inset = 0.1, legend = c('Bank of America', 'S&P500'), 
       col = c('black', 'red'), lty = c(1, 2), cex = 0.7)

You can see that Bank of America has been much more volatile than the S&P 500. Note the build up to and aftermath of the global financial crisis.

Stationary data

Generally we will use returns rather than prices when trying to understand asset performance. This is because asset returns are usually stationary while asset prices very often have a unit root. A stationary series is one where the key characteristics do not depend on the time that the series is observed. The mean and standard deviation are the same at each point. If the series has a trend of seasonal patterns, it will not be stationary.You can read more about stationary data and see some examples here.

The statistical test of non-stationary data will assess whether there is a unit root. If we take a standard autoregression series:

\[y_t = \rho y_{t-1} + \varepsilon_t\] we would like to test whether \(\rho\) is equal to more or less than one (unit). If it is more than one, we have an explosive series (being driven by shocks); if it is less than one, the shocks will gradually die out over time.

The last equation can be re-arranged as:

\[\Delta y_t = (\rho - 1)y_{t-1} + \varepsilon_t\]

Therefore we can say that \(\alpha = \rho -1\) and then we test

\[\Delta y_t = \alpha y_{t-1} + \varepsilon\]

in the usual way. If \(\alpha = 0, \rho = 1\). However, the critical values are not standard and therefore we use critical values that were derived from simulations by either Dickey and Fuller or MacKinnon Dickey and Fuller (1979), Dickey and Fuller (1981), MacKinnon (1991).

Testing for a unit root

We can use the urca package to test for a unit root. There are a number of tests. The most common is the Augmented Dickey Fuller Test (ADF Test). You can find the others in the documentation. The null is a unit root apart from the Kwiatkowski, Phillips, Schmidt and Shin (KPSS) Test which has a null of stationary Kwaitkowski et al. (n.d.).

require(urca)
urtest <- ur.df(da$S.P500, 'none')
summary(urtest)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -116.889   -7.135    0.677    7.856  112.658 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## z.lag.1     0.0001614  0.0001449   1.114 0.265333    
## z.diff.lag -0.0543641  0.0144341  -3.766 0.000168 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.9 on 4775 degrees of freedom
## Multiple R-squared:  0.00318,    Adjusted R-squared:  0.002762 
## F-statistic: 7.616 on 2 and 4775 DF,  p-value: 0.0004983
## 
## 
## Value of test-statistic is: 1.114 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

The null hypothesis of a unit root cannot be rejected.

Returns

One of the main ways to deal with non-stationary data is to take the first difference or percentage change. When working with financial data, we use percentage change. This is called returns. Here is the same plot with returns data.

We want to calculate the returns for these two assets. Returns are stationary data and returns can be compared to other assets. We will calculate the simple returns (percentage change). It is also possible to calculate the log returns (continuously compounded). Our calculation is:

\[r_i = \frac{P_{i,t} - P_{i, t-1}}{P_{i, t-1}}\] There are packages with set functions that can calculate returns, but this is a crude replication of the basic formula which can be re-arranged as:

\[r_i = \frac{P_{i,t}}{P_{i,t-1}} - 1\]

da$BACR <- c(NA, da$BAC[2:length(da$BAC)]/
               da$BAC[1:length(da$BAC) -1 ] -1)
da$SPR <- c(NA, da$S.P500[2:length(da$S.P500)]/
              da$S.P500[1:length(da$S.P500) -1 ] -1)
plot(da$Date, da$BACR, type = 'l', main = "SPY and BAC returns", 
     xlab = "Date", ylab = "Price", col = 'red')
lines(da$Date, da$SPR, type = 'l', col = 'black')
legend('topleft', inset = 0.02, c('SPY','BAC'), 
       col = c('black', 'red'), lty = c(1,1))

It is evident from this that SPY (black) has more volatile returns than TLT (red).

Practice 2.3

Write the code to calculate continuously compounded returns.
Add this series to your dataframe.

Histogram

We would like to look at the distribution of these returns to know more about potential outcomes. What is the most likely value? How are returns dispersed around this likely outcome?

par(mfrow = c(2,1))
hist(da$BACR, breaks = 200, xlim = c(-0.3, 0.3), col = 'light blue', 
     main = "BAC returns", xlab = "BAC")
hist(da$SPR, breaks = 200, xlim = c(-0.3, 0.3), col = 'light blue', 
     main = "S&P 500 returns", xlab = "S&P")

Descriptive statistics

Though it is evident that the distribution of Bank of America returns is spread wider than that of the S&P 500, we would like to describe precisely what we see in the pictures. The most important measures will be the mean and the standard deviation of the returns. The mean is the most likely outcome and is sometimes called the expected value; the standard deviation is the dispersion around the mean. It is widely used in finance as a measure of risk.

We would also like to scale up our measures of expected return and risk from the daily data that we have to annual figures so that they are easier to understand and comparable with other investments. This is done in the following way.

(1 + mean(da$BACR, na.rm = TRUE))^255 - 1

## [1] 0.1187788

(1 + mean(da$SPR, na.rm = TRUE))^255 - 1

## [1] 0.05069662

sd(da$BACR, na.rm = TRUE) * sqrt(255)

## [1] 0.4670491

sd(da$SPR, na.rm = TRUE) * sqrt(255)

## [1] 0.1926116

Functions

We can write our own functions in R. This will be useful as it will prevent us having to re-write difficult code over and over again.

A function is an object just like everything else and we use the function function to create a new function. For example,

mysquare <- function(x){
  mynumber <- x^2
  return(mynumber)
  }

Practice 2.4

Look at this carefully. Make sure that you know what is happening.
Try to create a function that will double the value input.
Search for basic function writing and create your own function.

Practice 2.5

Which other statistics do we want to know?
What do they tell us about the performance of our financial asset?
Add them to the function

This is an advanced function that will calculate the key descriptive statistics for our series.

mystats <- function(x){
  if(NA %in% x){
    x <- x[!is.na(x)]
    warning("NA removed")
  }
  mynumber <- length(x)
  mymean <- mean(x)
  mymed <- median(x)
  mysd <- sd(x)
  myskew <- sum((x-mymean)^3/mysd^3)/mynumber
  mykurt <- sum((x - mymean)^4/mysd^4)/mynumber - 3
  mymax <- max(x)
  mymin <- min(x)
  return(c("Number" = round(mynumber, 0), 
           "Mean" = round(mymean, 4),
           "Median" = round(mymed, 4),
           "St Dev" = round(mysd, 4), 
           "Skew" = round(myskew, 2), 
           "Kurtosis" = round(mykurt, 2),
           "Min" = round(mymin, 4),
           "Max" = round(mymax, 4))) 

}
mystats(da$BACR)

## Warning in mystats(da$BACR): NA removed

##    Number      Mean    Median    St Dev      Skew  Kurtosis       Min       Max 
## 4778.0000    0.0004    0.0000    0.0292    0.9100   27.5600   -0.2897    0.3527

mystats(da$SPR)

## Warning in mystats(da$SPR): NA removed

##    Number      Mean    Median    St Dev      Skew  Kurtosis       Min       Max 
## 4778.0000    0.0002    0.0005    0.0121   -0.0200    8.6800   -0.0903    0.1158

Bibliography

Dickey, D. A., and W. A. Fuller. 1979. “Distribution of the Estimators for Autoregressive Time Series with Unit Root.” Journal of American Statistical Association 74: 427–31.

———. 1981. “Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root.” Econometrica 49 (4): 1057–72.

Kwaitkowski, D., P. C. B. Phillips, P. Schmidt, and Y. Shinn. n.d. “Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root: How Sure Are We That Economic Time Series Have a Unit Root?” Journal of Econometics 54: 553–64.

MacKinnon, J. 1991. “Critical Values for Cointegration Tests.” In, edited by R. F. Engle and C. W. J. Granger. Advanced Texts in Econometrics. Oxford: Oxford University Press.

Performance

Rob Hayward