Introduction

The easiest way to prepare data for R is to hold the data in a csv file with the dates in the first column and variables in columns. It is also possible to store data in excel but this requires you to download an additional package of functions. We will take a look at that later in the week.

If you store data is a set format it is easier to process. It is best practice to do all the data processing in R so that it can be reproduced easily when the data is updated and so that all the steps are clear.

Practice

  • Go to Yahoo Finance and download monthly data for Apple from January 2000 to December 2022.

Managing projects and data

If you have not yet found a way to manage your information you should find a solution as soon as possible. RStudio can help you with Projects. You can start at new project by selecting File and New Project. You can create a new folder or directory for this or use an existing one. You might then decide to divide this project into set folders or directories that contain things like Data, docs, R and Knitr files. It is good practice to have a README file to explain the layout and details of the directory.

For example, I have the following set up for my RStudio files.

### Practice

Create a project in an appropriate folder or directory

Reading data into R

Assuming that you have saved the data into a file called bac.csv in your Data directory, the function to read data into R is read.csv. If it works correctly, you should see the new object called da (or whatever you want to call it) in the Environment. You can take a look at this object using the functions head and str. The first will show the first 6 lines and the second will show the structure.

da <- read.csv('../../Data/bac.csv')
head(da)
##         Date   BAC    SPY
## 1 03/12/2012 11.58 139.59
## 2 01/11/2012  9.82 138.36
## 3 01/10/2012  9.28 137.58
## 4 04/09/2012  8.79 140.13
## 5 01/08/2012  7.95 136.66
## 6 02/07/2012  7.30 133.32
str(da)
## 'data.frame':    156 obs. of  3 variables:
##  $ Date: Factor w/ 156 levels "01/02/2000","01/02/2001",..: 152 91 81 156 65 119 47 37 112 20 ...
##  $ BAC : num  11.58 9.82 9.28 8.79 7.95 ...
##  $ SPY : num  140 138 138 140 137 ...

Practice

  • Import your Apple data into R.
  • Check the data

Changing the class of objects

You can see from these functions that we have a data frame with monthly data for Bank of America and the S&P 500 index with dates in the first column. The price data are numerics (num) while the date is a character (chr). We would like to turn the dates into a date object so that R knows that they run in sequence from yesterday through today. We can doi this using the as.Date function. We need to tell the function the structure of the data. You can find out more about this here:

Mollie's Research Blog: Importing dates

da$Date <- as.Date(da$Date, format = "%d/%m/%Y")
str(da)
## 'data.frame':    156 obs. of  3 variables:
##  $ Date: Date, format: "2012-12-03" "2012-11-01" ...
##  $ BAC : num  11.58 9.82 9.28 8.79 7.95 ...
##  $ SPY : num  140 138 138 140 137 ...

We now have dates and not just words.

Practice

  • Turn the Date column for your Apple data into a date object rather than a character
  • Check that first column is a date

Assessing performance

To assess the performance of Bank of America and the S&P 500 index we will use the sample (from the past) as a way to understand the population (including the future). If the sample is representative of the future, the mean will give us the most likely outcome. This is the expected return.

First we need to calculate the returns. This is a little more tricky than excel, but once we have written the code once, we can use it over and over again. Next week we will write an return function.

da$BACR <- c(da$BAC[1:length(da$BAC) - 1] / da$BAC[2:length(da$BAC)] - 1, NA)
da$SPYR <- c(da$SPY[1:length(da$SPY) - 1] / da$SPY[2:length(da$SPY)] - 1, NA)
head(da)
##         Date   BAC    SPY        BACR         SPYR
## 1 2012-12-03 11.58 139.59  0.17922607  0.008889853
## 2 2012-11-01  9.82 138.36  0.05818966  0.005669429
## 3 2012-10-01  9.28 137.58  0.05574516 -0.018197388
## 4 2012-09-04  8.79 140.13  0.10566038  0.025391483
## 5 2012-08-01  7.95 136.66  0.08904110  0.025052505
## 6 2012-07-02  7.30 133.32 -0.10319410  0.011839709
tail(da)
##           Date   BAC    SPY        BACR        SPYR
## 151 2000-06-01 14.29 112.96 -0.22421281  0.01977070
## 152 2000-05-01 18.42 110.77  0.14197148 -0.01572774
## 153 2000-04-03 16.13 112.54 -0.06546929 -0.03523360
## 154 2000-03-01 17.26 116.65  0.15220294  0.09695317
## 155 2000-02-01 14.98 106.34 -0.05009512 -0.01518800
## 156 2000-01-03 15.77 107.98          NA          NA

Practice

  • Calculate Apple returns
  • Make sure you are clear what is being done here with the square brackets. Why do we need NA?

Mean, standard deviation and other metrics

Now calculate the mean, standard deviation, max and min.

mean(da$BACR, na.rm = TRUE)
## [1] 0.006797748
mean(da$SPYR, na.rm = TRUE)
## [1] 0.00272119
sd(da$BACR, na.rm = TRUE)
## [1] 0.1308765
sd(da$SPYR, na.rm = TRUE)
## [1] 0.04593242

Practice

  • Calculate mean and standard deviation for Apple returns
  • Find how to calculate the skew and kurtosis
  • Calculate skew and kurtosis for Bank of America, the S&P 500 and Apple.

Plotting returns and performance

We would like to look at the distribution of the returns. We do do this by constructing a histogram using the hist function.

hist(da$BACR, prob = TRUE, main = "Histogram of BAC Returns", 
     xlab = "Returns", ylab = 'Probability', col = 'cornflowerblue', 
     breaks = seq(-0.6, 0.8, by = 0.05))

Now calculate the price over time.

plot(da$Date, da$BAC, type = 'l', main = "Bank of America", 
     xlab = 'Date', ylab = "Price")

Plotting both stock and market

We would like to plot both on the same graph. Remember the re-basing that we did with the GDP data? We can create new series that are re-based to be 100 at the beginning. We do this by dividing by the first price and then multiplying by 100. Apply this formula to all the rows of data.

da$BACRebased <- da$BAC/da$BAC[length(da$BAC)] * 100
da$SPYRebased <- da$SPY/da$SPY[length(da$SPY)] * 100
head(da)
##         Date   BAC    SPY        BACR         SPYR BACRebased SPYRebased
## 1 2012-12-03 11.58 139.59  0.17922607  0.008889853   73.43056   129.2739
## 2 2012-11-01  9.82 138.36  0.05818966  0.005669429   62.27013   128.1348
## 3 2012-10-01  9.28 137.58  0.05574516 -0.018197388   58.84591   127.4125
## 4 2012-09-04  8.79 140.13  0.10566038  0.025391483   55.73874   129.7740
## 5 2012-08-01  7.95 136.66  0.08904110  0.025052505   50.41218   126.5605
## 6 2012-07-02  7.30 133.32 -0.10319410  0.011839709   46.29042   123.4673
tail(da)
##           Date   BAC    SPY        BACR        SPYR BACRebased SPYRebased
## 151 2000-06-01 14.29 112.96 -0.22421281  0.01977070   90.61509   104.6120
## 152 2000-05-01 18.42 110.77  0.14197148 -0.01572774  116.80406   102.5838
## 153 2000-04-03 16.13 112.54 -0.06546929 -0.03523360  102.28282   104.2230
## 154 2000-03-01 17.26 116.65  0.15220294  0.09695317  109.44832   108.0293
## 155 2000-02-01 14.98 106.34 -0.05009512 -0.01518800   94.99049    98.4812
## 156 2000-01-03 15.77 107.98          NA          NA  100.00000   100.0000

Now plot

plot(da$Date, da$BACRebased, type = 'l', xlab = "Date", 
     ylab = "Price", main = "BAC and S&P 500", lty = 1)
lines(da$Date, da$SPYRebased, col = 'blue', lty = 2)
legend('topleft', inset = 0.05, legend = c("BAC", "SPY"), 
       col = c('black', 'blue'), lty = c(1, 2), cex = 0.8)

Practice

  • Create an alternative base point for the series
  • Plot data with the new base