Introduction

We would like to understand the past performance of a firm to try to say something about the possible future outcomes. We will

Download data from Yahoo finance
Import the data into R
Manipulate the data so that we have the required object in the correct class
Calculate the key descriptive statistics

Download data

da <- read.csv('../../Data/BACSPY.csv')
head(da)

##         Date      BAC  S.P500
## 1 03/01/2000 24.21875 1455.22
## 2 04/01/2000 22.78125 1399.42
## 3 05/01/2000 23.03125 1402.11
## 4 06/01/2000 25.00000 1403.45
## 5 07/01/2000 24.34375 1441.47
## 6 10/01/2000 23.50000 1457.60

tail(da)

##            Date   BAC  S.P500
## 4774 2019-01-04 25.58 2531.94
## 4775 2019-01-07 25.56 2549.69
## 4776 2019-01-08 25.51 2574.41
## 4777 2019-01-09 25.76 2584.96
## 4778 2019-01-10 25.73 2596.64
## 4779 2019-01-11 26.03 2596.26

str(da)

## 'data.frame':    4779 obs. of  3 variables:
##  $ Date  : chr  "03/01/2000" "04/01/2000" "05/01/2000" "06/01/2000" ...
##  $ BAC   : num  24.2 22.8 23 25 24.3 ...
##  $ S.P500: num  1455 1399 1402 1403 1441 ...

We have loaded the data and we have looked at the first 6 lines of the data as well as the last 6 lines. We have also looked at the structure of the data with the function str. ## The Date class

You can see that the Date is a chr or character. In the future, it might be useful if we have that as a Date object. There will then be a chronological sequence. We can turn words and numbers into dates with as.Date.

da$Date <- as.Date(da$Date, format = "%d/%m/%Y")
str(da)

## 'data.frame':    4779 obs. of  3 variables:
##  $ Date  : Date, format: "2000-01-03" "2000-01-04" ...
##  $ BAC   : num  24.2 22.8 23 25 24.3 ...
##  $ S.P500: num  1455 1399 1402 1403 1441 ...

Plotting

Let’s plot the data,

Bank of America

plot(da$Date, da$BAC, type = 'l', main = "Bank of America")

and the S&P 500

plot(da$Date, da$S.P500, 'l', main = "S&P 500 index")

I would like to compare the two on the same graph. This is difficult because BAC is around thirty dollars and the S&P 500 index is 3000. We will rebase the two variables to start at 100 at the beginning of the series.

Re-basing

We will divide the starting value by itself and then multiply by 100. Do the transformation to all rows.

da$BACre <- da$BAC/da$BAC[1] * 100
da$SPYre <- da$S.P500/da$S.P500[1] * 100
head(da)

##         Date      BAC  S.P500     BACre     SPYre
## 1 2000-01-03 24.21875 1455.22 100.00000 100.00000
## 2 2000-01-04 22.78125 1399.42  94.06452  96.16553
## 3 2000-01-05 23.03125 1402.11  95.09677  96.35038
## 4 2000-01-06 25.00000 1403.45 103.22581  96.44246
## 5 2000-01-07 24.34375 1441.47 100.51613  99.05513
## 6 2000-01-10 23.50000 1457.60  97.03226 100.16355

tail(da)

##      Date   BAC  S.P500    BACre    SPYre
## 4774 <NA> 25.58 2531.94 105.6206 173.9902
## 4775 <NA> 25.56 2549.69 105.5381 175.2099
## 4776 <NA> 25.51 2574.41 105.3316 176.9086
## 4777 <NA> 25.76 2584.96 106.3639 177.6336
## 4778 <NA> 25.73 2596.64 106.2400 178.4362
## 4779 <NA> 26.03 2596.26 107.4787 178.4101

str(da)

## 'data.frame':    4779 obs. of  5 variables:
##  $ Date  : Date, format: "2000-01-03" "2000-01-04" ...
##  $ BAC   : num  24.2 22.8 23 25 24.3 ...
##  $ S.P500: num  1455 1399 1402 1403 1441 ...
##  $ BACre : num  100 94.1 95.1 103.2 100.5 ...
##  $ SPYre : num  100 96.2 96.4 96.4 99.1 ...

and plot the two series

plot(da$Date, da$BACre, type = 'l', main = "Bank of America and S&P 500", 
     xlab = 'Date', ylab = 'Price')
lines(da$Date, da$SPYre, col = 'red', lty = 2)
legend('topright', inset = 0.1, legend = c('Bank of America', 'S&P500'), 
       col = c('black', 'red'), lty = c(1, 2), cex = 0.7)

You can see that Bank of America has been a much more volatile ride. You can see that the S&P 500 has had a recent out performance.

Returns

We want to calculate the returns for these two assets. Returns are stationary data and returns can be compared to other assets. We will calculate the simple returns (percentage change). It is also possible to calculate the log returns (continuously compounded). Out calculation is:

\[r_i = \frac{p_{i,t} - p_{i, t-1}}{p_{i, t-1}}\]

da$BACR <- c(NA, da$BAC[2:length(da$BAC)] / da$BAC[1:length(da$BAC) - 1] -1)
da$SPYR <- c(NA, da$SPY[2:length(da$SPY)] / da$SPY[1:length(da$SPY) - 1] -1)
head(da)

##         Date      BAC  S.P500     BACre     SPYre        BACR          SPYR
## 1 2000-01-03 24.21875 1455.22 100.00000 100.00000          NA            NA
## 2 2000-01-04 22.78125 1399.42  94.06452  96.16553 -0.05935484 -0.0383446682
## 3 2000-01-05 23.03125 1402.11  95.09677  96.35038  0.01097394  0.0019221827
## 4 2000-01-06 25.00000 1403.45 103.22581  96.44246  0.08548168  0.0009556782
## 5 2000-01-07 24.34375 1441.47 100.51613  99.05513 -0.02625000  0.0270903996
## 6 2000-01-10 23.50000 1457.60  97.03226 100.16355 -0.03465982  0.0111899695

tail(da)

##      Date   BAC  S.P500    BACre    SPYre          BACR          SPYR
## 4774 <NA> 25.58 2531.94 105.6206 173.9902  0.0415309870  0.0343357143
## 4775 <NA> 25.56 2549.69 105.5381 175.2099 -0.0007818999  0.0070104348
## 4776 <NA> 25.51 2574.41 105.3316 176.9086 -0.0019561425  0.0096952851
## 4777 <NA> 25.76 2584.96 106.3639 177.6336  0.0098000784  0.0040980455
## 4778 <NA> 25.73 2596.64 106.2400 178.4362 -0.0011645963  0.0045184189
## 4779 <NA> 26.03 2596.26 107.4787 178.4101  0.0116595803 -0.0001462979

str(da)

## 'data.frame':    4779 obs. of  7 variables:
##  $ Date  : Date, format: "2000-01-03" "2000-01-04" ...
##  $ BAC   : num  24.2 22.8 23 25 24.3 ...
##  $ S.P500: num  1455 1399 1402 1403 1441 ...
##  $ BACre : num  100 94.1 95.1 103.2 100.5 ...
##  $ SPYre : num  100 96.2 96.4 96.4 99.1 ...
##  $ BACR  : num  NA -0.0594 0.011 0.0855 -0.0262 ...
##  $ SPYR  : num  NA -0.038345 0.001922 0.000956 0.02709 ...

Histogram

We would like to look at the distribution of these returns. What is the most likely value? How are they dispersed?

par(mfrow = c(2,1))
hist(da$BACR, breaks = 200, xlim = c(-0.4, 0.4), col = 'light blue')
hist(da$SPYR, breaks = 200, xlim = c(-0.4, 0.4), col = 'light blue')

Descriptive statistics

Finally, we would like to describe what we see in the histograms. The descriptive statistics will allow us to quantify what we see in the diagram.

(1 + mean(da$BACR, na.rm = TRUE))^255 - 1

## [1] 0.1187788

(1 + mean(da$SPYR, na.rm = TRUE))^255 - 1

## [1] 0.05069662

sd(da$BACR, na.rm = TRUE) * sqrt(255)

## [1] 0.4670491

sd(da$SPYR, na.rm = TRUE) * sqrt(255)

## [1] 0.1926116

Functions

We can write our own functions in R. This will be useful as it will prevent us having to re-write difficult code over and over again.

A function is an object just like everything else and we use the function function to create a new function. For example,

mysquare <- function(x){
  mynumber <- x^2
  return(mynumber)
  }

Look at this carefully. Make sure that you know what is happening. Try to create a function that will double the value input.

This is an advanced function that will calculate the key descriptive statistics for our series.

mystats <- function(x){
  if(NA %in% x){
    x <- x[!is.na(x)]
    warning("NA removed")
  }
  mynumber <- length(x)
  mymean <- mean(x)
  mymed <- median(x)
  mysd <- sd(x)
  myskew <- sum((x-mymean)^3/mysd^3)/mynumber
  mykurt <- sum((x - mymean)^4/mysd^4)/mynumber - 3
  mymax <- max(x)
  mymin <- min(x)
  return(c("Number" = round(mynumber, 0), 
           "Mean" = round(mymean, 4),
           "Median" = round(mymed, 4),
           "St Dev" = round(mysd, 4), 
           "Skew" = round(myskew, 2), 
           "Kurtosis" = round(mykurt, 2),
           "Min" = round(mymin, 4),
           "Max" = round(mymax, 4))) 

}
mystats(da$BACR)

## Warning in mystats(da$BACR): NA removed

##    Number      Mean    Median    St Dev      Skew  Kurtosis       Min       Max 
## 4778.0000    0.0004    0.0000    0.0292    0.9100   27.5600   -0.2897    0.3527

mystats(da$SPYR)

## Warning in mystats(da$SPYR): NA removed

##    Number      Mean    Median    St Dev      Skew  Kurtosis       Min       Max 
## 4778.0000    0.0002    0.0005    0.0121   -0.0200    8.6800   -0.0903    0.1158

Firm Performance

Rob Hayward

2024-02-11