R code is shown in gray boxes.
R comments are italicized.
R Output is shown in White boxes.
Today’s lecture get’s into some R coding details.
Here is a link to a PDF summary of some common R coding syntax: R Basics PDF
Question: How can we add multiple stocks to the same plot?
Answer (and Preview): We will cover this in Weeks 5 and 6 using a single interactive dygraph.
For now, here is a simple preview using hchart function.
First we load the packages and download multiple data sets:
# packages need to be installed if they weren't already
library(quantmod)
library(highcharter)
library(magrittr)
getSymbols("SBUX", from="2020-1-1") # Starbucks data starting Jan. 1, 2020
getSymbols("GRUB", from="2020-1-1") # GrubHub data starting Jan. 1, 2020
getSymbols("DASH", from="2020-1-1") # DoorDash data starting Jan. 1, 2020
Then we build a plot and add to it using pipes from the magrittr package and new command, hc_add_series.
Note: This is not required at this time in this course.
hchart(SBUX, name="StarBucks", color="brown") %>%
hc_add_series(GRUB, name="GrubHub" , color="red") %>%
hc_add_series(DASH, name="DoorDash" , color="blue")
Before we begin, download the following PDFs from the Course Wall or Files.
Save both files to a specific location on your computer.
For example, I created a ‘Week 3’ folder and saved these files there:
The Dunkin data set which is an Excel file:
A PDF lookup table for formatting dates to be readable by R:
There are many ways to do this. I like this option to keep projects organized:
Open a new script file in R
File > New File > R Script
Save that script file to the SAME folder where the downloaded files were saved.
File > Save as > then navigate to folder
Set working directory to be that folder
Session > Set Working Directory > To Source File Location
or
Session > Set Working Directory > Choose Directory > then navigate to folder
Resulting R code from this process is an R statement using the setwd(…) function:
# This is from my computer.
# Your setwd output will show where you set your working directory on your computer
setwd("C:/Users/pspoo/Dropbox/0 - Spring 2021/FIN 654/Week 3")
If you want to check what your working directory you can use the getwd function:
getwd()
## [1] "C:/Users/pspoo/Dropbox/0 - Spring 2021/FIN 654/Week 3"
Convert Excel file to .csv format.
(.csv stands for comma separated values)
Use read.csv function in R
donut <- read.csv(file = "DUNKIN.csv")
DONUT <- read.csv(file.choose())
We use the head and tail functions to examine the data.
We can ALSO examine data in the ‘Global Environment’.
head(donut)
## DAY OPEN HIGH LOW CLOSE VOLUME ADJUSTED
## 1 2011 Jul,27 25.00 29.62 24.97 27.85 45399000 23.1321
## 2 2011 Jul,28 29.21 30.49 27.10 28.39 7421800 23.5806
## 3 2011 Jul,29 28.99 29.20 28.01 28.93 2200200 24.0292
## 4 2011 Aug,01 31.37 31.94 28.30 29.66 3147700 24.6355
## 5 2011 Aug,02 29.86 29.88 27.44 27.76 2483500 23.0574
## 6 2011 Aug,03 27.50 27.97 25.75 26.59 2865600 22.0856
tail(donut)
## DAY OPEN HIGH LOW CLOSE VOLUME ADJUSTED
## 2358 2020 Dec,07 106.38 106.43 106.35 106.42 813000 106.42
## 2359 2020 Dec,08 106.40 106.43 106.38 106.41 633900 106.41
## 2360 2020 Dec,09 106.40 106.46 106.39 106.45 1067200 106.45
## 2361 2020 Dec,10 106.42 106.47 106.42 106.45 579600 106.45
## 2362 2020 Dec,11 106.43 106.45 106.42 106.42 1080500 106.42
## 2363 2020 Dec,14 106.43 106.50 106.43 106.48 1826500 106.48
Let’s try to create create a candlestick plot using chartSeries function.
The chartSeries function is used to plot time series data.
We see that it won’t work with our data in it’s current format:
# install command not required in every session
# install.packages("quantmod")
library(quantmod)
chartSeries(donut, type="candlesticks")
## Error in try.xts(x, error = "chartSeries requires an xtsible object"): chartSeries requires an xtsible object
We can use the str command to examine the structure of the data
We can ALSO examine the structure in the ‘Global Environment’.
# examine structure of data using str command
str(donut)
## 'data.frame': 2363 obs. of 7 variables:
## $ DAY : chr "2011 Jul,27" "2011 Jul,28" "2011 Jul,29" "2011 Aug,01" ...
## $ OPEN : num 25 29.2 29 31.4 29.9 ...
## $ HIGH : num 29.6 30.5 29.2 31.9 29.9 ...
## $ LOW : num 25 27.1 28 28.3 27.4 ...
## $ CLOSE : num 27.9 28.4 28.9 29.7 27.8 ...
## $ VOLUME : int 45399000 7421800 2200200 3147700 2483500 2865600 2356500 1840400 1644300 1769200 ...
## $ ADJUSTED: num 23.1 23.6 24 24.6 23.1 ...
The donut dataset is a generic dataframe.
It is not an .xts (extensible time series).
xts files are self-aware time series data that are self-aware.
We will change our generic data to .xts format.
Run tail command so data appears on console
Use as.Date function
First input in as.Date function in the variable to be converted.
Second input is the CURRENT format of the date and time information.
The provided pdf, WeeK 3 Date Format Lookup provides information.
In the code example below, we make a mistake which overwrites our ‘Day’ variable.
To fix this, we reimport the data.
# incorrect command
# % is missing before d
# overwrites the 'DAY' variable with missing values NAs
donut$DAY <- as.Date(donut$DAY, format="%Y %b,d")
tail(donut)
## DAY OPEN HIGH LOW CLOSE VOLUME ADJUSTED
## 2358 <NA> 106.38 106.43 106.35 106.42 813000 106.42
## 2359 <NA> 106.40 106.43 106.38 106.41 633900 106.41
## 2360 <NA> 106.40 106.46 106.39 106.45 1067200 106.45
## 2361 <NA> 106.42 106.47 106.42 106.45 579600 106.45
## 2362 <NA> 106.43 106.45 106.42 106.42 1080500 106.42
## 2363 <NA> 106.43 106.50 106.43 106.48 1826500 106.48
# DAY values were overwritten and can't be recovered.
# reimport data and correct the command:
donut <- read.csv(file = "DUNKIN.csv")
donut$DAY <- as.Date(donut$DAY, format="%Y %b,%d")
tail(donut)
## DAY OPEN HIGH LOW CLOSE VOLUME ADJUSTED
## 2358 2020-12-07 106.38 106.43 106.35 106.42 813000 106.42
## 2359 2020-12-08 106.40 106.43 106.38 106.41 633900 106.41
## 2360 2020-12-09 106.40 106.46 106.39 106.45 1067200 106.45
## 2361 2020-12-10 106.42 106.47 106.42 106.45 579600 106.45
## 2362 2020-12-11 106.43 106.45 106.42 106.42 1080500 106.42
## 2363 2020-12-14 106.43 106.50 106.43 106.48 1826500 106.48
# rerun str command
str(donut)
## 'data.frame': 2363 obs. of 7 variables:
## $ DAY : Date, format: "2011-07-27" "2011-07-28" ...
## $ OPEN : num 25 29.2 29 31.4 29.9 ...
## $ HIGH : num 29.6 30.5 29.2 31.9 29.9 ...
## $ LOW : num 25 27.1 28 28.3 27.4 ...
## $ CLOSE : num 27.9 28.4 28.9 29.7 27.8 ...
## $ VOLUME : int 45399000 7421800 2200200 3147700 2483500 2865600 2356500 1840400 1644300 1769200 ...
## $ ADJUSTED: num 23.1 23.6 24 24.6 23.1 ...
Note that if we try to create our time series chart it STILL won’t work:
chartSeries(donut, type="candlesticks")
## Error in try.xts(x, error = "chartSeries requires an xtsible object"): chartSeries requires an xtsible object
We have converted the DAY variable to to Date format BUT the dataset is still a generic data frame.
In order for this plot function to work, we need to convert our dataset to xts format
Print a subset of the data to the console for reference.
Use the xts function.
First input is core data, all variables EXCEPT Date variable (DAY)
We can include or exclude ANY part of a data frame or matrix using subset brackets [,]
Entries in [,] are rows and columns to be included or excluded in subset [rows,columns]
no value BEFORE comma in [,] indicates include all rows
no value AFTER comma in [,] indicates include all columns
-1 after comma in [,] indicates include all rows and all columns except first one [,-1]
1 after comma in [,] indicates include all rows and first column only [,1]
Second input is the order.by variable, our Time variable, which is Column 1 in our dataset (our Date variable, DAY.
Here are two ways to create this xts object and save it as GLAZED:
# First way specifies column 1 in the dataset as the Time variable
GLAZED <- xts(donut[,-1], order.by = donut[,1])
str(GLAZED)
## An 'xts' object on 2011-07-27/2020-12-14 containing:
## Data: num [1:2363, 1:6] 25 29.2 29 31.4 29.9 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:6] "OPEN" "HIGH" "LOW" "CLOSE" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## NULL
# Second way specifies the column by it's name within the data object and the $
GLAZED <- xts(donut[,-1], order.by = donut$DAY)
str(GLAZED)
## An 'xts' object on 2011-07-27/2020-12-14 containing:
## Data: num [1:2363, 1:6] 25 29.2 29 31.4 29.9 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:6] "OPEN" "HIGH" "LOW" "CLOSE" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## NULL
head(GLAZED)
## OPEN HIGH LOW CLOSE VOLUME ADJUSTED
## 2011-07-27 25.00 29.62 24.97 27.85 45399000 23.1321
## 2011-07-28 29.21 30.49 27.10 28.39 7421800 23.5806
## 2011-07-29 28.99 29.20 28.01 28.93 2200200 24.0292
## 2011-08-01 31.37 31.94 28.30 29.66 3147700 24.6355
## 2011-08-02 29.86 29.88 27.44 27.76 2483500 23.0574
## 2011-08-03 27.50 27.97 25.75 26.59 2865600 22.0856
NOTE that the Date variable, DAY, is no longer a variable. It has no column name.
In the GLAZED dataset, an xts object, the Date variable, DAY, is now the index or row ID for each observation.
We have seen xts objects already when we imported data from YaHoo Finance.
Here’s a reminder:
# If we exclude 'to' argument, the default is today
getSymbols("SBUX", from="2021-1-1")
## [1] "SBUX"
Now we can make the chart we planned (but it is static):
# static time series plot
chartSeries(GLAZED, type="candlesticks")
Alternatively, we can make an interactive plot using the highcharter package:
library(highcharter)
hchart(GLAZED, color="green")
Let’s examine this plot to determine why are some candlesticks colored and others aren’t?
So far we have downloaded data from YaHoo Finance.
We can also download data from other sources.
For example: the Federal Reserve of St. Louis.
We will download 6 month treasury bills from the Federal Reserve of St. Louis.
With these data, we will practice converting annualized returns to approx. monthly returns.
getSymbols("DGS6MO", src="FRED")
## [1] "DGS6MO"
head(DGS6MO)
## DGS6MO
## 1981-09-01 17.17
## 1981-09-02 17.32
## 1981-09-03 17.42
## 1981-09-04 17.37
## 1981-09-07 NA
## 1981-09-08 17.43
Each value is is divided by 100*12 = 1200
Note: If you don’t divide by 100, answer will be a percent.
# DGS6MO only has one variable so we don't have to specify a variable name
risk.free <- DGS6MO / 1200
head(risk.free)
## DGS6MO
## 1981-09-01 0.01430833
## 1981-09-02 0.01443333
## 1981-09-03 0.01451667
## 1981-09-04 0.01447500
## 1981-09-07 NA
## 1981-09-08 0.01452500
# or in percent format:
risk.free.pct <- DGS6MO/12
# examine both decimal and percent versions side by side
# same values in decimal and percent format
head(cbind(risk.free, risk.free.pct))
## DGS6MO DGS6MO.1
## 1981-09-01 0.01430833 1.430833
## 1981-09-02 0.01443333 1.443333
## 1981-09-03 0.01451667 1.451667
## 1981-09-04 0.01447500 1.447500
## 1981-09-07 NA NA
## 1981-09-08 0.01452500 1.452500
A few final comments:
Kivanc Avrenli’s questions about this (HW 1.6) ask for decimal proportions
cbind functions stands for column bind and can be used to place columns side by side
Only use cbind for variable columns that are of the same length (same number of obs.)