We would like to understand the past performance of a firm to try to say something about the possible future outcomes. We will
da <- read.csv('../../Data/BACSPY.csv')
head(da)
## Date BAC S.P500
## 1 03/01/2000 24.21875 1455.22
## 2 04/01/2000 22.78125 1399.42
## 3 05/01/2000 23.03125 1402.11
## 4 06/01/2000 25.00000 1403.45
## 5 07/01/2000 24.34375 1441.47
## 6 10/01/2000 23.50000 1457.60
tail(da)
## Date BAC S.P500
## 4774 2019-01-04 25.58 2531.94
## 4775 2019-01-07 25.56 2549.69
## 4776 2019-01-08 25.51 2574.41
## 4777 2019-01-09 25.76 2584.96
## 4778 2019-01-10 25.73 2596.64
## 4779 2019-01-11 26.03 2596.26
str(da)
## 'data.frame': 4779 obs. of 3 variables:
## $ Date : chr "03/01/2000" "04/01/2000" "05/01/2000" "06/01/2000" ...
## $ BAC : num 24.2 22.8 23 25 24.3 ...
## $ S.P500: num 1455 1399 1402 1403 1441 ...
We have loaded the data and we have looked at the first 6 lines of
the data as well as the last 6 lines. We have also looked at the
structure of the data with the function str. ## The Date
class
You can see that the Date is a chr or character. In the
future, it might be useful if we have that as a Date
object. There will then be a chronological sequence. We can turn words
and numbers into dates with as.Date.
da$Date <- as.Date(da$Date, format = "%d/%m/%Y")
str(da)
## 'data.frame': 4779 obs. of 3 variables:
## $ Date : Date, format: "2000-01-03" "2000-01-04" ...
## $ BAC : num 24.2 22.8 23 25 24.3 ...
## $ S.P500: num 1455 1399 1402 1403 1441 ...
Let’s plot the data,
Bank of America
plot(da$Date, da$BAC, type = 'l', main = "Bank of America")
and the S&P 500
plot(da$Date, da$S.P500, 'l', main = "S&P 500 index")
I would like to compare the two on the same graph. This is difficult because BAC is around thirty dollars and the S&P 500 index is 3000. We will rebase the two variables to start at 100 at the beginning of the series.
We will divide the starting value by itself and then multiply by 100. Do the transformation to all rows.
da$BACre <- da$BAC/da$BAC[1] * 100
da$SPYre <- da$S.P500/da$S.P500[1] * 100
head(da)
## Date BAC S.P500 BACre SPYre
## 1 2000-01-03 24.21875 1455.22 100.00000 100.00000
## 2 2000-01-04 22.78125 1399.42 94.06452 96.16553
## 3 2000-01-05 23.03125 1402.11 95.09677 96.35038
## 4 2000-01-06 25.00000 1403.45 103.22581 96.44246
## 5 2000-01-07 24.34375 1441.47 100.51613 99.05513
## 6 2000-01-10 23.50000 1457.60 97.03226 100.16355
tail(da)
## Date BAC S.P500 BACre SPYre
## 4774 <NA> 25.58 2531.94 105.6206 173.9902
## 4775 <NA> 25.56 2549.69 105.5381 175.2099
## 4776 <NA> 25.51 2574.41 105.3316 176.9086
## 4777 <NA> 25.76 2584.96 106.3639 177.6336
## 4778 <NA> 25.73 2596.64 106.2400 178.4362
## 4779 <NA> 26.03 2596.26 107.4787 178.4101
str(da)
## 'data.frame': 4779 obs. of 5 variables:
## $ Date : Date, format: "2000-01-03" "2000-01-04" ...
## $ BAC : num 24.2 22.8 23 25 24.3 ...
## $ S.P500: num 1455 1399 1402 1403 1441 ...
## $ BACre : num 100 94.1 95.1 103.2 100.5 ...
## $ SPYre : num 100 96.2 96.4 96.4 99.1 ...
and plot the two series
plot(da$Date, da$BACre, type = 'l', main = "Bank of America and S&P 500",
xlab = 'Date', ylab = 'Price')
lines(da$Date, da$SPYre, col = 'red', lty = 2)
legend('topright', inset = 0.1, legend = c('Bank of America', 'S&P500'),
col = c('black', 'red'), lty = c(1, 2), cex = 0.7)
You can see that Bank of America has been a much more volatile ride. You can see that the S&P 500 has had a recent out performance.
We want to calculate the returns for these two assets. Returns are stationary data and returns can be compared to other assets. We will calculate the simple returns (percentage change). It is also possible to calculate the log returns (continuously compounded). Out calculation is:
\[r_i = \frac{p_{i,t} - p_{i, t-1}}{p_{i, t-1}}\]
da$BACR <- c(NA, da$BAC[2:length(da$BAC)] / da$BAC[1:length(da$BAC) - 1] -1)
da$SPYR <- c(NA, da$SPY[2:length(da$SPY)] / da$SPY[1:length(da$SPY) - 1] -1)
head(da)
## Date BAC S.P500 BACre SPYre BACR SPYR
## 1 2000-01-03 24.21875 1455.22 100.00000 100.00000 NA NA
## 2 2000-01-04 22.78125 1399.42 94.06452 96.16553 -0.05935484 -0.0383446682
## 3 2000-01-05 23.03125 1402.11 95.09677 96.35038 0.01097394 0.0019221827
## 4 2000-01-06 25.00000 1403.45 103.22581 96.44246 0.08548168 0.0009556782
## 5 2000-01-07 24.34375 1441.47 100.51613 99.05513 -0.02625000 0.0270903996
## 6 2000-01-10 23.50000 1457.60 97.03226 100.16355 -0.03465982 0.0111899695
tail(da)
## Date BAC S.P500 BACre SPYre BACR SPYR
## 4774 <NA> 25.58 2531.94 105.6206 173.9902 0.0415309870 0.0343357143
## 4775 <NA> 25.56 2549.69 105.5381 175.2099 -0.0007818999 0.0070104348
## 4776 <NA> 25.51 2574.41 105.3316 176.9086 -0.0019561425 0.0096952851
## 4777 <NA> 25.76 2584.96 106.3639 177.6336 0.0098000784 0.0040980455
## 4778 <NA> 25.73 2596.64 106.2400 178.4362 -0.0011645963 0.0045184189
## 4779 <NA> 26.03 2596.26 107.4787 178.4101 0.0116595803 -0.0001462979
str(da)
## 'data.frame': 4779 obs. of 7 variables:
## $ Date : Date, format: "2000-01-03" "2000-01-04" ...
## $ BAC : num 24.2 22.8 23 25 24.3 ...
## $ S.P500: num 1455 1399 1402 1403 1441 ...
## $ BACre : num 100 94.1 95.1 103.2 100.5 ...
## $ SPYre : num 100 96.2 96.4 96.4 99.1 ...
## $ BACR : num NA -0.0594 0.011 0.0855 -0.0262 ...
## $ SPYR : num NA -0.038345 0.001922 0.000956 0.02709 ...
We would like to look at the distribution of these returns. What is the most likely value? How are they dispersed?
par(mfrow = c(2,1))
hist(da$BACR, breaks = 200, xlim = c(-0.4, 0.4), col = 'light blue')
hist(da$SPYR, breaks = 200, xlim = c(-0.4, 0.4), col = 'light blue')
Finally, we would like to describe what we see in the histograms. The descriptive statistics will allow us to quantify what we see in the diagram.
(1 + mean(da$BACR, na.rm = TRUE))^255 - 1
## [1] 0.1187788
(1 + mean(da$SPYR, na.rm = TRUE))^255 - 1
## [1] 0.05069662
sd(da$BACR, na.rm = TRUE) * sqrt(255)
## [1] 0.4670491
sd(da$SPYR, na.rm = TRUE) * sqrt(255)
## [1] 0.1926116
We can write our own functions in R. This will be useful as it will prevent us having to re-write difficult code over and over again.
A function is an object just like everything else and we use the
function function to create a new function. For
example,
mysquare <- function(x){
mynumber <- x^2
return(mynumber)
}
Look at this carefully. Make sure that you know what is happening. Try to create a function that will double the value input.
This is an advanced function that will calculate the key descriptive statistics for our series.
mystats <- function(x){
if(NA %in% x){
x <- x[!is.na(x)]
warning("NA removed")
}
mynumber <- length(x)
mymean <- mean(x)
mymed <- median(x)
mysd <- sd(x)
myskew <- sum((x-mymean)^3/mysd^3)/mynumber
mykurt <- sum((x - mymean)^4/mysd^4)/mynumber - 3
mymax <- max(x)
mymin <- min(x)
return(c("Number" = round(mynumber, 0),
"Mean" = round(mymean, 4),
"Median" = round(mymed, 4),
"St Dev" = round(mysd, 4),
"Skew" = round(myskew, 2),
"Kurtosis" = round(mykurt, 2),
"Min" = round(mymin, 4),
"Max" = round(mymax, 4)))
}
mystats(da$BACR)
## Warning in mystats(da$BACR): NA removed
## Number Mean Median St Dev Skew Kurtosis Min Max
## 4778.0000 0.0004 0.0000 0.0292 0.9100 27.5600 -0.2897 0.3527
mystats(da$SPYR)
## Warning in mystats(da$SPYR): NA removed
## Number Mean Median St Dev Skew Kurtosis Min Max
## 4778.0000 0.0002 0.0005 0.0121 -0.0200 8.6800 -0.0903 0.1158