1 Dates

1.1 Introduction

1.1.1 What day is it?

R has a lot to offer in terms of dates and times. The two main classes of data for this are Date and POSIXct. Date is used for calendar date objects like “2015-01-22”. POSIXct is a way to represent datetime objects like “2015-01-22 08:39:40 EST”, meaning that it is 40 seconds after 8:39 AM Eastern Standard Time.

In practice, the best strategy is to use the simplest class that you need. Often, Date will be the simplest choice. We will use the Date class almost exclusively, but it is important to be aware of POSIXct as well for storing intraday financial data.

# What is the current date?
Sys.Date()
[1] "2020-06-27"
# What is the current date and time?
Sys.time()
[1] "2020-06-27 11:21:34 UTC"
# Create the variable today
today <- Sys.Date()


# Confirm the class of today
class(today)
[1] "Date"

1.1.2 From char to date

We will often have to create dates yourself from character strings. The as.Date() function is the best way to do this

# The Great Crash of 1929
great_crash <- as.Date("1929-11-29")

great_crash
[1] "1929-11-29"
class(great_crash)
[1] "Date"

Notice that the date is given in the format of “yyyy-mm-dd”. This is known as ISO format (ISO = International Organization for Standardization), and is the way R accepts and displays dates.

Internally, dates are stored as the number of days since January 1, 1970, and datetimes are stored as the number of seconds since then.

# Create crash
crash <- as.Date("2008-09-29")

# Print crash
crash
[1] "2008-09-29"
# crash as a numeric
as.numeric(crash)
[1] 14151
# Current time as a numeric
as.numeric(Sys.time())
[1] 1593256894
# Incorrect date format
#as.Date("09/29/2008")

1.1.3 Many dates

Creating a single date is nice to know how to do, but with financial data we will often have a large number of dates to work with. When this is the case, we will need to convert multiple dates from character to date format.

# Create a vector of daily character dates
dates <- c("2017-01-01", "2017-01-02",
           "2017-01-03", "2017-01-04") 

as.Date(dates)
[1] "2017-01-01" "2017-01-02" "2017-01-03" "2017-01-04"
# Create dates from "2017-02-05" to "2017-02-08" inclusive.
dates <- c("2017-02-05", "2017-02-06", "2017-02-07", "2017-02-08" )

# Add names to dates
names(dates)<- c("Sunday", "Monday", "Tuesday", "Wednesday")

# Subset dates to only return the date for Monday
dates["Monday"]
      Monday 
"2017-02-06" 

1.2 Date formats and extractor functions

1.2.1 Date formats

as.Date("09/28/2008", format = "%m / %d / %Y")
[1] "2008-09-28"

The basic idea is that you are defining a character vector telling R that your date is in the form of mm/dd/yyyy. It then knows how to extract the components and switch to yyyy-mm-dd.

There are a number of different formats you can specify, here are a few of them:

  • %Y: 4-digit year (1982)
  • %y: 2-digit year (82)
  • %m: 2-digit month (01)
  • %d: 2-digit day of the month (13)
  • %A: weekday (Wednesday)
  • %a: abbreviated weekday (Wed)
  • %B: month (January)
  • %b: abbreviated month (Jan)

We will work with the date, “1930-08-30”, Warren Buffett’s birth date:

# "08,30,30"
as.Date("08,30,1930", format = "%m, %d,%Y")
[1] "1930-08-30"
# "Aug 30,1930"
as.Date("Aug 30,1930", format = "%b %d, %Y")
[1] "1930-08-30"
# "30aug1930"
as.Date("30aug1930", format = "%d%b%Y")
[1] "1930-08-30"

We can convert objects that are already dates to differently formatted dates using format():

# The best point move in stock market history. A +936 point change in the Dow!
best_date <- as.Date("2008-10-13")
best_date
[1] "2008-10-13"
format(best_date, format = "%Y/%m/%d")
[1] "2008/10/13"
format(best_date, format = "%B %d, %Y")
[1] "October 13, 2008"
# char_dates
char_dates <- c("1jan17", "2jan17", "3jan17", "4jan17", "5jan17")

# Create dates using as.Date() and the correct format 
dates <- as.Date(char_dates, format = "%d%b%y")

# Use format() to go from "2017-01-04" -> "Jan 04, 17"
format(dates, format = "%b %d, %y")
[1] "Jan 01, 17" "Jan 02, 17" "Jan 03, 17" "Jan 04, 17" "Jan 05, 17"
# Use format() to go from "2017-01-04" -> "01,04,2017"
format(dates, format = "%m, %d, %Y")
[1] "01, 01, 2017" "01, 02, 2017" "01, 03, 2017" "01, 04, 2017"
[5] "01, 05, 2017"

1.2.2 Subtraction of dates

Just like with numerics, arithmetic can be done on dates. In particular, we can find the difference between two dates, in days, by using subtraction:

today <- as.Date("2017-01-02")
tomorrow <- as.Date("2017-01-03")
one_year_away <- as.Date("2018-01-02")

tomorrow - today
Time difference of 1 days
one_year_away - today
Time difference of 365 days

Equivalently, we could use the difftime() function to find the time interval instead.

difftime(tomorrow, today)
Time difference of 1 days
# With some extra options!
difftime(tomorrow, today, units = "secs")
Time difference of 86400 secs
# Dates
dates <- as.Date(c("2017-01-01", "2017-01-02", "2017-01-03"))

# Create the origin
origin <- as.Date("1970-01-01")

# Use as.numeric() on dates
as.numeric(dates)
[1] 17167 17168 17169
# Find the difference between dates and origin
dates - origin
Time differences in days
[1] 17167 17168 17169

1.2.3 months() and weekdays() and quarters()

There are a few functions that are useful for extracting date components. One of those is months().

my_date <- as.Date("2017-01-02")
months(my_date)
[1] "January"

Two other useful functions are weekdays() to extract the day of the week that your date falls on, and quarters() to determine which quarter of the year (Q1-Q4) that your date falls in.

# dates
dates <- as.Date(c("2017-01-02", "2017-05-03", "2017-08-04", "2017-10-17"))

# Extract the months
months(dates)
[1] "January" "May"     "August"  "October"
# Extract the quarters
quarters(dates)
[1] "Q1" "Q2" "Q3" "Q4"
# dates2
dates2 <- as.Date(c("2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"))

# Assign the weekdays() of dates2 as the names()
names(dates2) <- weekdays(dates2)

# Print dates2
dates2
      Monday      Tuesday    Wednesday     Thursday 
"2017-01-02" "2017-01-03" "2017-01-04" "2017-01-05" 

1.3 Relational operators

1.3.1 Operators

They are:

  • : Greater than

  • =: Greater than or equal to

  • < : Less than
  • <=: Less than or equal to
  • ==: Equality
  • !=: Not equal

These relational operators let us make comparisons in our data. If the equation is true, then the relational operator will return TRUE, otherwise it will return FALSE.

apple <- 45.46
microsoft <- 67.88

apple <= microsoft
[1] TRUE
hello <- "Hello world"

# Case sensitive!
hello == "hello world"
[1] FALSE
# Dates - today and tomorrow
today <- as.Date(Sys.Date())
tomorrow <- as.Date(Sys.Date() + 1)

# Today vs Tomorrow
tomorrow < today
[1] FALSE

1.3.2 Vectorized operations

We can extend the concept of relational operators to vectors of any arbitrary length. Compare two vectors using > to get a logical vector back of the same length, holding TRUE when the first is greater than the second, and FALSE otherwise.

apple <- c(120.00, 120.08, 119.97, 121.88)
datacamp  <- c(118.5, 124.21, 125.20, 120.22)

apple > datacamp
[1]  TRUE FALSE FALSE  TRUE

Comparing a vector and a single number works as well.

apple > 120
[1] FALSE  TRUE FALSE  TRUE
stocks <- data.frame(list(date = structure(c(17186, 17189, 17190, 17191), class = "Date"), 
    ibm = c(170.55, 171.03, 175.9, 178.29), panera = c(216.65, 
    216.06, 213.55, 212.22)))
# Print stocks
stocks

# IBM range
stocks$ibm_buy <- stocks$ibm < 175

# Panera range
stocks$panera_sell <- stocks$panera > 213

# IBM vs Panera
stocks$ibm_vs_panera <- stocks$ibm > stocks$panera

# Print stocks
stocks

More complex logic can always be created for useful buy and sell signals.

1.4 Logical operators

1.4.1 And / Or

We might want to check multiple relational conditions at once. For multiple conditions, we need the And operator &, and the Or operator |.

  • & (And): An intersection. a & b is true only if both a and b are true.
  • (Or): A union. a | b is true if either a or b is true.
apple <- c(120.00, 120.08, 119.97, 121.88)

# Both conditions must hold
(apple > 120) & (apple < 121)
[1] FALSE  TRUE FALSE FALSE
# Only one condition has to hold
(apple <= 120) | (apple > 121)
[1]  TRUE FALSE  TRUE  TRUE
# IBM buy range 
stocks$ibm_buy_range <- stocks$ibm > 171 & stocks$ibm < 176
# Panera spikes 
stocks$panera_spike <- stocks$panera < 213.20 | stocks$panera > 216.5
# Date range    
stocks$good_dates <- stocks$date > as.Date("2017-01-21") & stocks$date < as.Date("2017-01-25")
# Print stocks
stocks

1.4.2 Not!

One last operator to introduce is ! or, Not. Add ! in front of a logical expression, and it will flip that expression from TRUE to FALSE (and vice versa).

!TRUE
[1] FALSE
apple <- c(120.00, 120.08, 119.97, 121.88)

!(apple < 121)
[1] FALSE FALSE FALSE  TRUE
# IBM range
!stocks$ibm > 176
[1]  TRUE  TRUE  TRUE FALSE
# Missing data
missing <- c(24.5, 25.7, NA, 28, 28.6, NA)

# Is missing?
is.na(missing)
[1] FALSE FALSE  TRUE FALSE FALSE  TRUE
# Not missing?
!is.na(missing)
[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE

This can help you remove NA’s from your data easily.

1.4.3 Logicals and subset()

We know how to create logical vectors that tell you when a certain condition is true, but can you subset a data frame to only contains rows where that condition is true?

subset() takes as arguments a data frame (or vector/matrix) and a logical vector of which rows to return:

subset(stocks, ibm < 175)
# Panera range
subset(stocks, panera > 216)

# Specific date
subset(stocks, date == as.Date("2017-01-23"))

# IBM and Panera joint range
subset(stocks, ibm < 175 & panera < 216.5)

1.4.4 All together

stocks <- data.frame(list(date = structure(c(17136, 17137, 17138, 17139, 
17140, 17141, 17142, 17143, 17144, 17145, 17146, 17147, 17148, 
17149, 17150, 17151, 17152, 17153, 17154, 17155, 17156, 17157, 
17158, 17159, 17160, 17162, 17163, 17164, 17165), class = "Date"), 
    apple = c(109.49, 109.9, NA, NA, 109.11, 109.95, 111.03, 
    112.12, 113.95, NA, NA, 113.3, 115.19, 115.19, 115.82, 115.97, 
    NA, NA, 116.64, 116.95, 117.06, 116.29, 116.52, NA, NA, 117.26, 
    116.76, 116.73, 115.82), micr = c(59.2, 59.25, NA, NA, 60.22, 
    59.95, 61.37, 61.01, 61.97, NA, NA, 62.17, 62.98, 62.68, 
    62.58, 62.3, NA, NA, 63.62, 63.54, 63.54, 63.55, 63.24, NA, 
    NA, 63.28, 62.99, 62.9, 62.14)))
# View stocks
stocks

# Weekday investigation
stocks$weekday <- weekdays(stocks$date)

# View stocks again
stocks

# Remove missing data
stocks_no_NA <- subset(stocks, !is.na(apple))

# Apple and Microsoft joint range
subset(stocks_no_NA, apple > 117 | micr > 63)

1.5 If statements

1.5.1 If this

apple <- 54.3

if(apple < 70) {
    print("Apple is less than 70")
}
[1] "Apple is less than 70"
# micr
micr <- 48.55

# Fill in the blanks
if( micr < 55 ) {
    print("Buy!")
}
[1] "Buy!"

1.5.2 If this, Else that

# micr
micr <- 57.44

# Fill in the blanks
if( micr < 55 ) {
    print("Buy!")
} else {
    print("Do nothing!")
}
[1] "Do nothing!"

1.5.3 If this, Else If that, Else that other thing

# micr
micr <- 105.67

# Fill in the blanks
if( micr < 55 ) {
    print("Buy!")
} else if( micr >= 55 & micr < 75 ){
    print("Do nothing!")
} else { 
    print("Sell!")
}
[1] "Sell!"

1.5.4 If inside an If

Sometimes it makes sense to have nested if statements to add even more control.

# micr
micr <- 105.67
shares <- 1

# Fill in the blanks
if( micr < 55 ) {
    print("Buy!")
} else if( micr >= 55 & micr < 75 ) {
    print("Do nothing!")
} else { 
    if( shares >=1 ) {
        print("Sell!")
    } else {
        print("Not enough shares to sell!")
    }
}
[1] "Sell!"

1.5.5 ifelse()

A powerful function to know about is ifelse(). It creates an if statement in 1 line of code, and more than that, it works on entire vectors!

apple <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12)

ifelse(test = apple > 110, yes = "Buy!", no = "Do nothing!")
[1] "Do nothing!" "Do nothing!" "Do nothing!" "Do nothing!" "Buy!"       
[6] "Buy!"       
# Microsoft test
stocks$micr_buy <- ifelse(test = stocks$micr > 60 & stocks$micr < 62, yes = 1, no = 0)

# Apple test
stocks$apple_date <- ifelse(test = stocks$apple > 117, yes = stocks$date, no = NA)

# Print stocks
stocks

# Change the class() of apple_date.
class(stocks$apple_date) <- "Date"

# Print stocks again
stocks

2 Loops

2.1 Repeat loops

2.1.1 Repeat, repeat, repeat

Loops are a core concept in programming. They are used in almost every language. In R, there is another way of performing repeated actions using apply functions.

We use repeat, and inside the curly braces perform some action. You must specify when we want to break out of the loop. Otherwise it runs for eternity!

repeat {
    code
    if(condition) {
        break
    }
}
# Stock price
stock_price <- 126.34

repeat {
  # New stock price
  stock_price <- stock_price * runif(1, .985, 1.01)
  print(stock_price)
  
  # Check
  if(stock_price < 125) {
    print("Stock price is below 125! Buy it while it's cheap!")
    break
  }
}
[1] 125.8773
[1] 125.9764
[1] 124.8531
[1] "Stock price is below 125! Buy it while it's cheap!"

2.1.2 When to break?

The order in which we execute your code inside the loop and check when we should break is important. The following would run the code a different number of times.

# Code, then check condition
repeat {
    code
    if(condition) {
        break
    }
}

# Check condition, then code
repeat {
    if(condition) {
        break
    }
    code
}
# Stock price
stock_price <- 67.55

repeat {
  # New stock price
  stock_price <- stock_price * .995
  print(stock_price)
  
  # Check
  if(stock_price < 66) {
    print("Stock price is below 66! Buy it while it's cheap!")
    break
  }
  
}
[1] 67.21225
[1] 66.87619
[1] 66.54181
[1] 66.2091
[1] 65.87805
[1] "Stock price is below 66! Buy it while it's cheap!"
# Stock price
stock_price <- 67.55

repeat {
  # New stock price
  stock_price <- stock_price * .995
 
  # Check
  if(stock_price < 66) {
    print("Stock price is below 66! Buy it while it's cheap!")
    break
  }
  print(stock_price)
}
[1] 67.21225
[1] 66.87619
[1] 66.54181
[1] 66.2091
[1] "Stock price is below 66! Buy it while it's cheap!"

2.2 While loops

2.2.1 While with a print

While loops are slightly different from repeat loops. Like if statements, we specify the condition for them to run at the very beginning. There is no need for a break statement because the condition is checked at each iteration.

while (condition) {
    code
}

It might seem like the while loop is doing the exact same thing as the repeat loop, just with less code. In our cases, this is true. So, why ever use the repeat loop? Occasionally, there are cases when using a repeat loop to run forever is desired. If you are interested, click here and check out Intentional Looping.

# Initial debt
debt <- 5000

# While loop to pay off your debt
while (debt > 0) {
  debt <- debt - 500
  print(paste("Debt remaining", debt))
}
[1] "Debt remaining 4500"
[1] "Debt remaining 4000"
[1] "Debt remaining 3500"
[1] "Debt remaining 3000"
[1] "Debt remaining 2500"
[1] "Debt remaining 2000"
[1] "Debt remaining 1500"
[1] "Debt remaining 1000"
[1] "Debt remaining 500"
[1] "Debt remaining 0"

2.2.2 While with a plot

This example uses a loop to model paying it off, $500 at a time. However, at each iteration you will also append your remaining debt total to a plot, so that you can visualize the total decreasing as you go.

First, initialize some variables:

  • debt = Your current debt
  • i = Incremented each time debt is reduced. The next point on the x axis.
  • x_axis = A vector of i’s. The x axis for the plots.
  • y_axis = A vector of debt. The y axis for the plots.
  • Also, create the first plot. Just a single point of your current debt.
debt <- 5000    # initial debt
i <- 0          # x axis counter
x_axis <- i     # x axis
y_axis <- debt  # y axis

# Initial plot
plot(x_axis, y_axis, xlim = c(0,10), ylim = c(0,5000))

Then, create a while loop. As long as you still have debt:

  • debt is reduced by 500.
  • i is incremented.
  • x_axis is extended by 1 more point.
  • y_axis is extended by the next debt point.
  • The next plot is created from the updated data.
# Graph your debt
while (debt > 0) {

  # Updating variables
  debt <- debt - 500
  i <- i + 1
  x_axis <- c(x_axis, i)
  y_axis <- c(y_axis, debt)
  
  # Next plot
  plot(x_axis, y_axis, xlim = c(0,10), ylim = c(0,5000))
}

2.2.3 Break it

Sometimes, we have to end your while loop early. With the debt example, if we don’t have enough cash to pay off all of your debt, we won’t be able to continuing paying it down.

while (condition) {
    code
    if (breaking_condition) {
        break
    }
}

The while loop will completely stop, and all lines after it will be run, if the breaking_condition is met.

# debt and cash
debt <- 5000
cash <- 4000

# Pay off your debt...if you can!
while (debt > 0) {
  debt <- debt - 500
  cash <- cash - 500
  print(paste("Debt remaining:", debt, "and Cash remaining:", cash))

  if (cash == 0) {
     print("You ran out of cash!")
   break
  }
}
[1] "Debt remaining: 4500 and Cash remaining: 3500"
[1] "Debt remaining: 4000 and Cash remaining: 3000"
[1] "Debt remaining: 3500 and Cash remaining: 2500"
[1] "Debt remaining: 3000 and Cash remaining: 2000"
[1] "Debt remaining: 2500 and Cash remaining: 1500"
[1] "Debt remaining: 2000 and Cash remaining: 1000"
[1] "Debt remaining: 1500 and Cash remaining: 500"
[1] "Debt remaining: 1000 and Cash remaining: 0"
[1] "You ran out of cash!"

2.3 For loops

2.3.1 Loop over a vector

When you know how many times you want to repeat an action, a for loop is a good option. The idea of the for loop is that you are stepping through a sequence, one at a time, and performing an action at each step along the way.

That sequence is commonly a vector of numbers (such as the sequence from 1:10), but could also be numbers that are not in any order like c(2, 5, 4, 6), or even a sequence of characters!

for (value in sequence) {
    code
}
# Sequence
seq <- c(1:10)

# Print loop
for (value in seq) {
    print(value)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
# A sum variable
sum <- 0

# Sum loop
for (value in seq) {
    sum <- sum + value
    print(sum)
}
[1] 1
[1] 3
[1] 6
[1] 10
[1] 15
[1] 21
[1] 28
[1] 36
[1] 45
[1] 55

2.3.2 Loop over data frame rows

Imagine that you are interested in the days where the stock price of Apple rises above 117. If it goes above this value, you want to print out the current date and stock price. If you have a stock data frame with a date and apple price column, could you loop over the rows of the data frame to accomplish this.

Before you do so, note that you can get the number of rows in your data frame using nrow(stock). Then, you can create a sequence to loop over from 1:nrow(stock).

stock = data.frame(list(date = structure(c(17136, 17137, 17140, 17141,
17142, 17143, 17144, 17147, 17148, 17149, 17150, 17151, 17154, 
17155, 17156, 17157, 17158, 17162, 17163, 17164, 17165), class = "Date"), 
    apple = c(109.49, 109.9, 109.11, 109.95, 111.03, 112.12, 
    113.95, 113.3, 115.19, 115.19, 115.82, 115.97, 116.64, 116.95, 
    117.06, 116.29, 116.52, 117.26, 116.76, 116.73, 115.82)))
for (row in 1:nrow(stock)) {
    price <- stock[row, "apple"]
    date  <- stock[row, "date"]

    if(price > 117) {
        print(paste("On", date, 
                    "the stock price was", price))
    }
}
[1] "On 2016-12-21 the stock price was 117.06"
[1] "On 2016-12-27 the stock price was 117.26"
# Loop over stock rows
for (row in 1:nrow(stock)) {
    price <- stock[row, "apple"]
    date  <- stock[row, "date"]

    if(price > 116) {
        print(paste("On", date, 
                    "the stock price was", price))
    } else {
        print(paste("The date:", date, 
                    "is not an important day!"))
    }
}
[1] "The date: 2016-12-01 is not an important day!"
[1] "The date: 2016-12-02 is not an important day!"
[1] "The date: 2016-12-05 is not an important day!"
[1] "The date: 2016-12-06 is not an important day!"
[1] "The date: 2016-12-07 is not an important day!"
[1] "The date: 2016-12-08 is not an important day!"
[1] "The date: 2016-12-09 is not an important day!"
[1] "The date: 2016-12-12 is not an important day!"
[1] "The date: 2016-12-13 is not an important day!"
[1] "The date: 2016-12-14 is not an important day!"
[1] "The date: 2016-12-15 is not an important day!"
[1] "The date: 2016-12-16 is not an important day!"
[1] "On 2016-12-19 the stock price was 116.64"
[1] "On 2016-12-20 the stock price was 116.95"
[1] "On 2016-12-21 the stock price was 117.06"
[1] "On 2016-12-22 the stock price was 116.29"
[1] "On 2016-12-23 the stock price was 116.52"
[1] "On 2016-12-27 the stock price was 117.26"
[1] "On 2016-12-28 the stock price was 116.76"
[1] "On 2016-12-29 the stock price was 116.73"
[1] "The date: 2016-12-30 is not an important day!"

2.3.3 Loop over matrix elements

We have been looping over 1 dimensional data types. If we want to loop over elements in a matrix (columns and rows), then we will have to use nested loops. We will use this idea to print out the correlations between three stocks.

corr <- rbind(structure(c(1, 0.96, 0.88, 0.96, 1, 0.74, 0.88, 0.74, 1), .Dim = c(3L, 
3L), .Dimnames = list(c("apple", "ibm", "micr"), c("apple", "ibm", 
"micr"))))
# Print out corr
corr
      apple  ibm micr
apple  1.00 0.96 0.88
ibm    0.96 1.00 0.74
micr   0.88 0.74 1.00
# Create a nested loop
for(row in 1:nrow(corr)) {
    for(col in 1:ncol(corr)) {
        print(paste(colnames(corr)[col], "and", rownames(corr)[row], 
                    "have a correlation of", corr[row,col]))
    }
}
[1] "apple and apple have a correlation of 1"
[1] "ibm and apple have a correlation of 0.96"
[1] "micr and apple have a correlation of 0.88"
[1] "apple and ibm have a correlation of 0.96"
[1] "ibm and ibm have a correlation of 1"
[1] "micr and ibm have a correlation of 0.74"
[1] "apple and micr have a correlation of 0.88"
[1] "ibm and micr have a correlation of 0.74"
[1] "micr and micr have a correlation of 1"

2.3.4 Break and next

Let’s return to the concept of break, and the related concept of next. Just like with repeat and while loops, you can break out of a for loop completely by using the break statement. Additionally, if you just want to skip the current iteration, and continue the loop, you can use the next statement. This can be useful if our loop encounters an error, but we don’t want it to break everything.

for (value in sequence) {
    if(next_condition) {
        next
    }
    code
    if(breaking_condition) {
    break
    }
}
# Print apple
apple
[1] 109.49 109.90 109.11 109.95 111.03 112.12
# Loop through apple. Next if NA. Break if above 117.
for (value in apple) {
    if(is.na(value)) {
        print("Skipping NA")
        next
    }
    
    if(value > 117) {
        print("Time to sell!")
        break
    } else {
        print("Nothing to do here!")
    }
}
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"

3 Functions

3.1 What are functions?

3.1.1 Function help and documentation

?names

?names()

help(names)

3.1.2 Optional arguments

Let’s look at some of the round() function’s help documentation. It simply rounds a numeric vector off to a specified number of decimal places.

round(x, digits = 0)

The first argument, x is required. Without it, the function will not work!

The argument digits is known as an optional argument. Optional arguments are ones that don’t have to be set by the user, either because they are given a default value, or because the function can infer them from the other data you have given it. Even though they don’t have to be set, they often provide extra flexibility. Here, digits specifies the number of decimal places to round to.

# Round 5.4
round(5.4)
[1] 5
# Round 5.4 with 1 decimal place
round(5.4, 1)
[1] 5.4
# numbers
numbers <- c(.002623, pi, 812.33345)

# Round numbers to 3 decimal places
round(numbers, 3)
[1]   0.003   3.142 812.333

3.1.3 Functions in functions

To write clean code, sometimes it is useful to use functions inside of other functions. This let’s us use the result of one function directly in another one, without having to create an intermediate variable.

company <- c("Goldman Sachs", "J.P. Morgan", "Fidelity Investments")

for(i in 1:3) {
    print(paste("A large financial institution is", company[i]))
}
[1] "A large financial institution is Goldman Sachs"
[1] "A large financial institution is J.P. Morgan"
[1] "A large financial institution is Fidelity Investments"
apple <- c(109.49, 109.9, 109.11, 109.95, 111.03, 112.12, 113.95, 113.3, 
115.19, 115.19, 115.82, 115.97, 116.64, 116.95, 117.06, 116.29, 
116.52, 117.26, 116.76, 116.73, 115.82)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79, 165.36, 166.52, 165.5, 
168.29, 168.51, 168.02, 166.73, 166.68, 167.6, 167.33, 167.06, 
166.71, 167.14, 166.19, 166.6, 165.99)
micr <- c(59.2, 59.25, 60.22, 59.95, 61.37, 61.01, 61.97, 62.17, 62.98, 
62.68, 62.58, 62.3, 63.62, 63.54, 63.54, 63.55, 63.24, 63.28, 
62.99, 62.9, 62.14)
# cbind() the stocks
stocks <- cbind(apple, ibm, micr)

# cor() to create the correlation matrix
cor(stocks)
          apple       ibm      micr
apple 1.0000000 0.8872467 0.9477010
ibm   0.8872467 1.0000000 0.9126597
micr  0.9477010 0.9126597 1.0000000
# All at once! Nest cbind() inside of cor()
cor(cbind(apple, ibm, micr))
          apple       ibm      micr
apple 1.0000000 0.8872467 0.9477010
ibm   0.8872467 1.0000000 0.9126597
micr  0.9477010 0.9126597 1.0000000

3.2 Writing functions

“Functions are a fundamental building block of R: to master many of the more advanced techniques … you need a solid foundation in how functions work.” -Hadley Wickham

func_name <- function(arguments) {
    body
}
square <- function(x) {
    x^2
}

square(2)
[1] 4

Arguments are user inputs that the function works on. They can be the data that the function manipulates, or options that affect the calculation. The body of the function is the code that actually performs the manipulation.


# Percent to decimal function
percent_to_decimal <- function(percent){
  return(percent/100)
}



# Use percent_to_decimal() on 6
percent_to_decimal(6)
[1] 0.06
# Example percentage
pct <- 8

# Use percent_to_decimal() on pct
percent_to_decimal(pct)
[1] 0.08

3.2.1 Multiple arguments

Functions can have multiple arguments. These can help extend the flexibility of your function.

pow <- function(x, power = 2) {
    x^power
}

pow(2)
[1] 4
pow(2, power = 3)
[1] 8

The power argument is optional and has a default value of 2, but the user can easily change this. It is also an example of how you can add multiple arguments. Notice how the arguments are separated by a comma, and the default value is set using an equals sign.

# Percent to decimal function
percent_to_decimal <- function(percent, digits = 2) {
    decimal <- percent / 100    
    round(decimal, digits)
}

# percents
percents <- c(25.88, 9.045, 6.23)

# percent_to_decimal() with default digits
percent_to_decimal(percents)
[1] 0.26 0.09 0.06
# percent_to_decimal() with digits = 4
percent_to_decimal(percents, 4)
[1] 0.2588 0.0905 0.0623

Present value: we want to discount money that you will get in the future at a specific interest rate to represent the value of that money in today’s dollars.

present_value <- cash_flow * (1 + i / 100) ^ -year
# Present value function
pv <- function(cash_flow, i, year) {
    
    # Discount multiplier
    mult <- 1 + percent_to_decimal(i)
    
    # Present value calculation
    cash_flow  * mult ^ -year
}

# Calculate a present value
pv(1200, 7, 3)
[1] 979.5575

3.2.2 Function scope

Scoping is the process of how R looks a variable’s value when given a name.

3.3 Scoping is the process of how R looks a variable’s value when given a name.

3.3.1 tidyquant package

The tidyquant package is focused on retrieving, manipulating, and scaling financial data analysis in the easiest way possible.

# Library tidquant
library(tidyquant)

# Pull Apple stock data
apple <- tq_get("AAPL", get = "stock.prices", 
                from = "2007-01-03", to = "2017-06-05")

# Take a look at what it returned
head(apple)

# Plot the stock price over time
plot(apple$date, apple$adjusted, type = "l")


# Calculate daily stock returns for the adjusted price
apple <- tq_mutate(data = apple,
                   ohlc_fun = Ad,
                   mutate_fun = dailyReturn)
Argument ohlc_fun is deprecated; please use select instead.
# Sort the returns from least to greatest
sorted_returns <- sort(apple$daily.returns)

# Plot them
plot(sorted_returns)

4 Apply

4.1 Why use apply?

4.1.1 lapply() on a list

The first function in the apply family that you will learn is lapply(), which is short for “list apply.” When you have a list, and you want to apply the same function to each element of the list, lapply() is a potential solution that always returns another list.

stock_return <- list(apple = c(0.374463421317025, -0.718835304822572, 0.769865273577127, 0.982264665757161, 0.981716653156807, 1.63217980734927, -0.570425625274248, 
1.66813768755516, 0, 0.546922475909363, 0.129511310654469, 0.577735621281367, 
0.265775034293555, 0.0940572894399311, -0.657782333845888, 0.197781408547588, 
0.635084105732929, -0.426402865427256, -0.0256937307297029, -0.779576801165091
), ibm = c(0.125140783381315, -0.112485939257597, 0.319069069069063, 
2.7689429373246, 0.345894775168409, 0.70149975810353, -0.612539034350234, 
1.6858006042296, 0.130726721730346, -0.290783929737096, -0.767765742173563, 
-0.0299886043303442, 0.55195584353251, -0.161097852028629, -0.161357795972037,
-0.209505566862202, 0.257932937436254, -0.568385784372376, 0.246705577952943, 
-0.366146458583424), micr = c(0.0844594594594547, 1.63713080168776, 
-0.448356027897702, 2.36864053377814, -0.586605833469121, 1.57351253892805, 
0.322736808132972, 1.30287920218754, -0.476341695776432, -0.159540523292919, 
-0.447427293064879, 2.11878009630819, -0.12574662055957, 0, 0.0157381177211174, 
-0.487804878048773, 0.0632511068943693, -0.4582806573957, -0.142879822194004, 
-1.20826709062003))
# lapply to change percents to decimal
lapply(stock_return, FUN = percent_to_decimal)
$apple
 [1]  0.00 -0.01  0.01  0.01  0.01  0.02 -0.01  0.02  0.00  0.01  0.00
[12]  0.01  0.00  0.00 -0.01  0.00  0.01  0.00  0.00 -0.01

$ibm
 [1]  0.00  0.00  0.00  0.03  0.00  0.01 -0.01  0.02  0.00  0.00 -0.01
[12]  0.00  0.01  0.00  0.00  0.00  0.00 -0.01  0.00  0.00

$micr
 [1]  0.00  0.02  0.00  0.02 -0.01  0.02  0.00  0.01  0.00  0.00  0.00
[12]  0.02  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -0.01

4.1.2 lapply() on a data frame

stock_return <- data.frame(stock_return)
stock_return
stock_return <- data.frame(lapply(stock_return, FUN = function(x) x/100))
# lapply to get the average returns
lapply(stock_return, FUN = mean)
$apple
[1] 0.002838389

$ibm
[1] 0.001926806

$micr
[1] 0.002472939
# Sharpe ratio
sharpe <- function(returns) {
    (mean(returns) - .0003) / sd(returns)
}

# lapply to get the sharpe ratio
lapply(stock_return, FUN = sharpe)
$apple
[1] 0.3546496

$ibm
[1] 0.2000819

$micr
[1] 0.218519

4.1.3 FUN arguments

Often, the function that you want to apply will have other optional arguments that you may want to tweak. Consider the percent_to_decimal() function that allows the user to specify the number of decimal places.

percent_to_decimal(5.4, digits = 3)
[1] 0.054

In the call to lapply() you can specify the named optional arguments after the FUN argument, and they will get passed to the function that you are applying.

# Extend sharpe() to allow optional argument
sharpe <- function(returns, rf = 0.0003) {
    (mean(returns) - rf) / sd(returns)
}

# First lapply()
lapply(stock_return, FUN = sharpe, rf = 0.0004)
$apple
[1] 0.3406781

$ibm
[1] 0.1877828

$micr
[1] 0.2084626
# Second lapply()
lapply(stock_return, FUN = sharpe, rf = 0.0009)
$apple
[1] 0.2708209

$ibm
[1] 0.1262875

$micr
[1] 0.1581807
data.frame(stock_return)

4.2 sapply() - simplify it!

4.2.1 sapply() VS lapply()

lapply() is great, but sometimes you might want the returned data in a nicer form than a list. For instance, with the sharpe ratio, wouldn’t it be great if the returned sharpe ratios were in a vector rather than a list?

For this, you might want to consider sapply(), or simplify apply. It performs exactly like lapply(), but will attempt to simplify the output if it can. The basic syntax is the same, with a few additional arguments:

sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

These additional optional arguments let you specify if you want sapply() to try and simplify the output, and if you want it to use the names of the object in the output.

# lapply() on stock_return
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
4: In readChar(file, size, TRUE) : truncating string with embedded nuls
lapply(stock_return, sharpe)
$apple
[1] 0.3546496

$ibm
[1] 0.2000819

$micr
[1] 0.218519
# sapply() on stock_return
sapply(stock_return, sharpe)
    apple       ibm      micr 
0.3546496 0.2000819 0.2185190 
# sapply() on stock_return with optional arguments
sapply(stock_return, sharpe, simplify = FALSE, USE.NAMES = FALSE)
$apple
[1] 0.3546496

$ibm
[1] 0.2000819

$micr
[1] 0.218519

4.2.2 Failing to simplify

For interactive use, sapply() is great. It guesses the output type so that it can simplify, and normally that is fine. However, sapply() is not a safe option to be used when writing functions. If sapply() cannot simplify your output, then it will default to returning a list just like lapply(). This can be dangerous and break custom functions if you wrote them expecting sapply() to return a simplified vector.

# Market crash with as.Date()
market_crash <- list(dow_jones_drop = 777.68, 
                     date = as.Date("2008-09-28"))
                     
# Find the classes with sapply()
sapply(market_crash, class)
dow_jones_drop           date 
     "numeric"         "Date" 
# Market crash with as.POSIXct()
market_crash2 <- list(dow_jones_drop = 777.68, 
                      date = as.POSIXct("2008-09-28"))

# Find the classes with lapply()
lapply(market_crash2, class)
$dow_jones_drop
[1] "numeric"

$date
[1] "POSIXct" "POSIXt" 
# Find the classes with sapply()
sapply(market_crash2, class)
$dow_jones_drop
[1] "numeric"

$date
[1] "POSIXct" "POSIXt" 

See how sapply() returns a list like lapply() when it fails to simplify?

4.3 vapply() - specify your output!

4.3.1 vapply() VS sapply()

In the last example, sapply() failed to simplify because the date element of market_crash2 had two classes (POSIXct and POSIXt). Notice, however, that no error was thrown! If a function you had written expected a simplified vector to be returned by sapply(), this would be confusing.

To account for this, there is a more strict apply function called vapply(), which contains an extra argument FUN.VALUE where you can specify the type and length of the output that should be returned each time your applied function is called.

If you expected the return value of class() to be a character vector of length 1, you can specify that using vapply():

vapply(market_crash, class, FUN.VALUE = character(1))
dow_jones_drop           date 
     "numeric"         "Date" 

Other examples of FUN.VALUE might be numeric(2) or logical(1).

# Market crash with as.POSIXct()
market_crash2 <- list(dow_jones_drop = 777.68, 
                      date = as.POSIXct("2008-09-28"))

# Find the classes with sapply()
sapply(market_crash2, FUN = class)
$dow_jones_drop
[1] "numeric"

$date
[1] "POSIXct" "POSIXt" 
# Find the classes with vapply()
vapply(market_crash2, FUN = class, FUN.VALUE = character(1))
Error in vapply(market_crash2, FUN = class, FUN.VALUE = character(1)) : 
  values must be length 1,
 but FUN(X[[2]]) result is length 2

This is much clearer since we expected a simplified vector.

4.3.2 More vapply()

The difference between vapply() and sapply() was shown in the last example to demonstrate vapply() appropriately failing, but what about when it doesn’t fail? When there are no errors, vapply() returns a simplified result according to the FUN.VALUE argument.

# Sharpe ratio for all stocks
vapply(stock_return, sharpe, FUN.VALUE = numeric(1))
    apple       ibm      micr 
0.3546496 0.2000819 0.2185190 
# Summarize Apple
summary(stock_return$apple, 6)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.007796 -0.001259  0.002318  0.002838  0.006688  0.016681 
# Summarize all stocks
vapply(stock_return, summary, FUN.VALUE = numeric(6))
               apple           ibm          micr
Min.    -0.007795768 -0.0076776574 -0.0120826709
1st Qu. -0.001258710 -0.0022982516 -0.0045083719
Median   0.002317782  0.0004757609 -0.0006287331
Mean     0.002838389  0.0019268062  0.0024729391
3rd Qu.  0.006687794  0.0032577550  0.0056777241
Max.     0.016681377  0.0276894294  0.0236864053

vapply() requires more thought when writing the function, but its robustness far outweighs that cost!

4.3.3 Anonymous functions

we’ll learn about a concept called anonymous functions. So far, when calling an apply function like vapply(), you have been passing in named functions to FUN. Doesn’t it seem like a waste to have to create a function just for that specific vapply() call? Instead, you can use anonymous functions!

vapply(stock_return, FUN = function(percent) { percent / 100 }, 
       FUN.VALUE = numeric(20))
              apple           ibm          micr
 [1,]  3.744634e-05  1.251408e-05  8.445946e-06
 [2,] -7.188353e-05 -1.124859e-05  1.637131e-04
 [3,]  7.698653e-05  3.190691e-05 -4.483560e-05
 [4,]  9.822647e-05  2.768943e-04  2.368641e-04
 [5,]  9.817167e-05  3.458948e-05 -5.866058e-05
 [6,]  1.632180e-04  7.014998e-05  1.573513e-04
 [7,] -5.704256e-05 -6.125390e-05  3.227368e-05
 [8,]  1.668138e-04  1.685801e-04  1.302879e-04
 [9,]  0.000000e+00  1.307267e-05 -4.763417e-05
[10,]  5.469225e-05 -2.907839e-05 -1.595405e-05
[11,]  1.295113e-05 -7.677657e-05 -4.474273e-05
[12,]  5.777356e-05 -2.998860e-06  2.118780e-04
[13,]  2.657750e-05  5.519558e-05 -1.257466e-05
[14,]  9.405729e-06 -1.610979e-05  0.000000e+00
[15,] -6.577823e-05 -1.613578e-05  1.573812e-06
[16,]  1.977814e-05 -2.095056e-05 -4.878049e-05
[17,]  6.350841e-05  2.579329e-05  6.325111e-06
[18,] -4.264029e-05 -5.683858e-05 -4.582807e-05
[19,] -2.569373e-06  2.467056e-05 -1.428798e-05
[20,] -7.795768e-05 -3.661465e-05 -1.208267e-04
# Max and min
vapply(stock_return, 
       FUN = function(x) { c(max(x), min(x)) }, 
       FUN.VALUE = numeric(2))
            apple          ibm        micr
[1,]  0.016681377  0.027689429  0.02368641
[2,] -0.007795768 -0.007677657 -0.01208267
