FIN 654 - Week 3

NOTES for reading this HTML document

R code is shown in gray boxes.
R comments are italicized.
R Output is shown in White boxes.
Today’s lecture get’s into some R coding details.
Here is a link to a PDF summary of some common R coding syntax: R Basics PDF

Following up on Questions from Last Week (OPTIONAL)

Question: How can we add multiple stocks to the same plot?
Answer (and Preview): We will cover this in Weeks 5 and 6 using a single interactive dygraph.

For now, here is a simple preview using hchart function.

First we load the packages and download multiple data sets:

# packages need to be installed if they weren't already
library(quantmod)
library(highcharter)
library(magrittr) 

getSymbols("SBUX", from="2020-1-1") # Starbucks data starting Jan. 1, 2020
getSymbols("GRUB", from="2020-1-1") # GrubHub data starting Jan. 1, 2020
getSymbols("DASH", from="2020-1-1") # DoorDash data starting Jan. 1, 2020

Then we build a plot and add to it using pipes from the magrittr package and new command, hc_add_series.

Note: This is not required at this time in this course.

hchart(SBUX, name="StarBucks", color="brown") %>%
    hc_add_series(GRUB, name="GrubHub" , color="red")  %>% 
    hc_add_series(DASH, name="DoorDash" , color="blue")

SETUP for Week 3 R Examples:

File downloads

Before we begin, download the following PDFs from the Course Wall or Files.

Save both files to a specific location on your computer.

For example, I created a ‘Week 3’ folder and saved these files there:

The Dunkin data set which is an Excel file:

EXCEL FILE LINK
A PDF lookup table for formatting dates to be readable by R:

PDF LINK

Setting a Working Directory

There are many ways to do this. I like this option to keep projects organized:

Open a new script file in R

File > New File > R Script
Save that script file to the SAME folder where the downloaded files were saved.

File > Save as > then navigate to folder
Set working directory to be that folder

Session > Set Working Directory > To Source File Location

or

Session > Set Working Directory > Choose Directory > then navigate to folder

Resulting R code from this process is an R statement using the setwd(…) function:

# This is from my computer.  
# Your setwd output will show where you set your working directory on your computer
setwd("C:/Users/pspoo/Dropbox/0 - Spring 2021/FIN 654/Week 3")

If you want to check what your working directory you can use the getwd function:

getwd()

## [1] "C:/Users/pspoo/Dropbox/0 - Spring 2021/FIN 654/Week 3"

Importing data (Always works)

Convert Excel file to .csv format.
(.csv stands for comma separated values)
Use read.csv function in R

donut <- read.csv(file = "DUNKIN.csv")

Importing data using file.choose function (Easier but doesn’t work in all platforms)

DONUT <- read.csv(file.choose())

Examine the Data

We use the head and tail functions to examine the data.

We can ALSO examine data in the ‘Global Environment’.

head(donut)

##           DAY  OPEN  HIGH   LOW CLOSE   VOLUME ADJUSTED
## 1 2011 Jul,27 25.00 29.62 24.97 27.85 45399000  23.1321
## 2 2011 Jul,28 29.21 30.49 27.10 28.39  7421800  23.5806
## 3 2011 Jul,29 28.99 29.20 28.01 28.93  2200200  24.0292
## 4 2011 Aug,01 31.37 31.94 28.30 29.66  3147700  24.6355
## 5 2011 Aug,02 29.86 29.88 27.44 27.76  2483500  23.0574
## 6 2011 Aug,03 27.50 27.97 25.75 26.59  2865600  22.0856

tail(donut)

##              DAY   OPEN   HIGH    LOW  CLOSE  VOLUME ADJUSTED
## 2358 2020 Dec,07 106.38 106.43 106.35 106.42  813000   106.42
## 2359 2020 Dec,08 106.40 106.43 106.38 106.41  633900   106.41
## 2360 2020 Dec,09 106.40 106.46 106.39 106.45 1067200   106.45
## 2361 2020 Dec,10 106.42 106.47 106.42 106.45  579600   106.45
## 2362 2020 Dec,11 106.43 106.45 106.42 106.42 1080500   106.42
## 2363 2020 Dec,14 106.43 106.50 106.43 106.48 1826500   106.48

Data must be in correct format for some plot functions

Let’s try to create create a candlestick plot using chartSeries function.

The chartSeries function is used to plot time series data.

We see that it won’t work with our data in it’s current format:

# install command not required in every session
# install.packages("quantmod")
library(quantmod)
chartSeries(donut, type="candlesticks")

## Error in try.xts(x, error = "chartSeries requires an xtsible object"): chartSeries requires an xtsible object

We can use the str command to examine the structure of the data

We can ALSO examine the structure in the ‘Global Environment’.

# examine structure of data using str command
str(donut)

## 'data.frame':    2363 obs. of  7 variables:
##  $ DAY     : chr  "2011 Jul,27" "2011 Jul,28" "2011 Jul,29" "2011 Aug,01" ...
##  $ OPEN    : num  25 29.2 29 31.4 29.9 ...
##  $ HIGH    : num  29.6 30.5 29.2 31.9 29.9 ...
##  $ LOW     : num  25 27.1 28 28.3 27.4 ...
##  $ CLOSE   : num  27.9 28.4 28.9 29.7 27.8 ...
##  $ VOLUME  : int  45399000 7421800 2200200 3147700 2483500 2865600 2356500 1840400 1644300 1769200 ...
##  $ ADJUSTED: num  23.1 23.6 24 24.6 23.1 ...

The donut dataset is a generic dataframe.

It is not an .xts (extensible time series).

xts files are self-aware time series data that are self-aware.

We will change our generic data to .xts format.

Steps to convert a text variable to a date variable in R

Run tail command so data appears on console
Use as.Date function
- First input in as.Date function in the variable to be converted.
- Second input is the CURRENT format of the date and time information.
- The provided pdf, WeeK 3 Date Format Lookup provides information.

In the code example below, we make a mistake which overwrites our ‘Day’ variable.

To fix this, we reimport the data.

# incorrect command
# % is missing before d
# overwrites the 'DAY' variable with missing values NAs
donut$DAY <- as.Date(donut$DAY, format="%Y %b,d")
tail(donut)

##       DAY   OPEN   HIGH    LOW  CLOSE  VOLUME ADJUSTED
## 2358 <NA> 106.38 106.43 106.35 106.42  813000   106.42
## 2359 <NA> 106.40 106.43 106.38 106.41  633900   106.41
## 2360 <NA> 106.40 106.46 106.39 106.45 1067200   106.45
## 2361 <NA> 106.42 106.47 106.42 106.45  579600   106.45
## 2362 <NA> 106.43 106.45 106.42 106.42 1080500   106.42
## 2363 <NA> 106.43 106.50 106.43 106.48 1826500   106.48

# DAY values were overwritten and can't be recovered.
# reimport data and correct the command:
donut <- read.csv(file = "DUNKIN.csv")
donut$DAY <- as.Date(donut$DAY, format="%Y %b,%d")
tail(donut)

##             DAY   OPEN   HIGH    LOW  CLOSE  VOLUME ADJUSTED
## 2358 2020-12-07 106.38 106.43 106.35 106.42  813000   106.42
## 2359 2020-12-08 106.40 106.43 106.38 106.41  633900   106.41
## 2360 2020-12-09 106.40 106.46 106.39 106.45 1067200   106.45
## 2361 2020-12-10 106.42 106.47 106.42 106.45  579600   106.45
## 2362 2020-12-11 106.43 106.45 106.42 106.42 1080500   106.42
## 2363 2020-12-14 106.43 106.50 106.43 106.48 1826500   106.48

# rerun str command
str(donut)

## 'data.frame':    2363 obs. of  7 variables:
##  $ DAY     : Date, format: "2011-07-27" "2011-07-28" ...
##  $ OPEN    : num  25 29.2 29 31.4 29.9 ...
##  $ HIGH    : num  29.6 30.5 29.2 31.9 29.9 ...
##  $ LOW     : num  25 27.1 28 28.3 27.4 ...
##  $ CLOSE   : num  27.9 28.4 28.9 29.7 27.8 ...
##  $ VOLUME  : int  45399000 7421800 2200200 3147700 2483500 2865600 2356500 1840400 1644300 1769200 ...
##  $ ADJUSTED: num  23.1 23.6 24 24.6 23.1 ...

Note that if we try to create our time series chart it STILL won’t work:

chartSeries(donut, type="candlesticks")

## Error in try.xts(x, error = "chartSeries requires an xtsible object"): chartSeries requires an xtsible object

We have converted the DAY variable to to Date format BUT the dataset is still a generic data frame.

In order for this plot function to work, we need to convert our dataset to xts format

Steps to convert a generic data frame to xts format

Print a subset of the data to the console for reference.
Use the xts function.
First input is core data, all variables EXCEPT Date variable (DAY)
- We can include or exclude ANY part of a data frame or matrix using subset brackets [,]
- Entries in [,] are rows and columns to be included or excluded in subset [rows,columns]
- no value BEFORE comma in [,] indicates include all rows
- no value AFTER comma in [,] indicates include all columns
- -1 after comma in [,] indicates include all rows and all columns except first one [,-1]
- 1 after comma in [,] indicates include all rows and first column only [,1]
Second input is the order.by variable, our Time variable, which is Column 1 in our dataset (our Date variable, DAY.

Here are two ways to create this xts object and save it as GLAZED:

# First way specifies column 1 in the dataset as the Time variable
GLAZED <- xts(donut[,-1], order.by = donut[,1])
str(GLAZED)

## An 'xts' object on 2011-07-27/2020-12-14 containing:
##   Data: num [1:2363, 1:6] 25 29.2 29 31.4 29.9 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:6] "OPEN" "HIGH" "LOW" "CLOSE" ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
##  NULL

# Second way specifies the  column by it's name within the data object and the $
GLAZED <- xts(donut[,-1], order.by = donut$DAY)
str(GLAZED)

## An 'xts' object on 2011-07-27/2020-12-14 containing:
##   Data: num [1:2363, 1:6] 25 29.2 29 31.4 29.9 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:6] "OPEN" "HIGH" "LOW" "CLOSE" ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
##  NULL

head(GLAZED)

##             OPEN  HIGH   LOW CLOSE   VOLUME ADJUSTED
## 2011-07-27 25.00 29.62 24.97 27.85 45399000  23.1321
## 2011-07-28 29.21 30.49 27.10 28.39  7421800  23.5806
## 2011-07-29 28.99 29.20 28.01 28.93  2200200  24.0292
## 2011-08-01 31.37 31.94 28.30 29.66  3147700  24.6355
## 2011-08-02 29.86 29.88 27.44 27.76  2483500  23.0574
## 2011-08-03 27.50 27.97 25.75 26.59  2865600  22.0856

NOTE that the Date variable, DAY, is no longer a variable. It has no column name.

In the GLAZED dataset, an xts object, the Date variable, DAY, is now the index or row ID for each observation.

We have seen xts objects already when we imported data from YaHoo Finance.

Here’s a reminder:

# If we exclude 'to' argument, the default is today
getSymbols("SBUX", from="2021-1-1")

## [1] "SBUX"

Now we can make the chart we planned (but it is static):

# static time series plot
chartSeries(GLAZED, type="candlesticks")

Alternatively, we can make an interactive plot using the highcharter package:

library(highcharter)
hchart(GLAZED, color="green")

Let’s examine this plot to determine why are some candlesticks colored and others aren’t?

Last Topic for Week 3: Risk Free Rate

So far we have downloaded data from YaHoo Finance.

We can also download data from other sources.

For example: the Federal Reserve of St. Louis.

We will download 6 month treasury bills from the Federal Reserve of St. Louis.

With these data, we will practice converting annualized returns to approx. monthly returns.

getSymbols("DGS6MO", src="FRED")

## [1] "DGS6MO"

head(DGS6MO)

##            DGS6MO
## 1981-09-01  17.17
## 1981-09-02  17.32
## 1981-09-03  17.42
## 1981-09-04  17.37
## 1981-09-07     NA
## 1981-09-08  17.43

Steps for converting annualized returns to approx. monthly returns

Divide values by 100 (to convert percent to decimal value)
- e.g. 17.37% = 0.1737 and not 17.37
Divide values by 12 to re-scale them to approximately one month.

Each value is is divided by 100*12 = 1200

Note: If you don’t divide by 100, answer will be a percent.

# DGS6MO only has one variable so we don't have to specify a variable name
risk.free <- DGS6MO / 1200  
head(risk.free)

##                DGS6MO
## 1981-09-01 0.01430833
## 1981-09-02 0.01443333
## 1981-09-03 0.01451667
## 1981-09-04 0.01447500
## 1981-09-07         NA
## 1981-09-08 0.01452500

# or in percent format:
risk.free.pct <- DGS6MO/12

# examine both decimal and percent versions side by side
# same values in decimal and percent format
head(cbind(risk.free, risk.free.pct))

##                DGS6MO DGS6MO.1
## 1981-09-01 0.01430833 1.430833
## 1981-09-02 0.01443333 1.443333
## 1981-09-03 0.01451667 1.451667
## 1981-09-04 0.01447500 1.447500
## 1981-09-07         NA       NA
## 1981-09-08 0.01452500 1.452500

A few final comments:

Kivanc Avrenli’s questions about this (HW 1.6) ask for decimal proportions

cbind functions stands for column bind and can be used to place columns side by side

Only use cbind for variable columns that are of the same length (same number of obs.)