R has a lot to offer in terms of dates and times. The two main classes of data for this are Date and POSIXct. Date is used for calendar date objects like "2015-01-22". POSIXct is a way to represent datetime objects like "2015-01-22 08:39:40 EST", meaning that it is 40 seconds after 8:39 AM Eastern Standard Time.
In practice, the best strategy is to use the simplest class that you need. Often, Date will be the simplest choice. This course will use the Date class almost exclusively, but it is important to be aware of POSIXct as well for storing intraday financial data.
In the exercise below, you will explore your first date and time objects by asking R to return the current date and the current time.
Instructions
# What is the current date?
Sys.Date()
## [1] "2020-09-06"
# What is the current date and time?
Sys.time()
## [1] "2020-09-06 06:58:57 +07"
# Create the variable today
today <- Sys.Date()
# Confirm the class of today
class(today)
## [1] "Date"
You will often have to create dates yourself from character strings. The as.Date() function is the best way to do this:
# The Great Crash of 1929
great_crash <- as.Date("1929-11-29")
great_crash
[1] "1929-11-29"
class(great_crash)
[1] "Date"
Notice that the date is given in the format of "yyyy-mm-dd". This is known as ISO format (ISO = International Organization for Standardization), and is the way R accepts and displays dates.
Internally, dates are stored as the number of days since January 1, 1970, and datetimes are stored as the number of seconds since then. You will confirm this in the exercises below.
Instructions
# Create crash
crash <- as.Date("2008-09-29")
# Print crash
crash
## [1] "2008-09-29"
# crash as a numeric
as.numeric(crash)
## [1] 14151
# Current time as a numeric
as.numeric(Sys.time())
## [1] 1599350338
# Incorrect date format
#as.Date("09/29/2008")
Creating a single date is nice to know how to do, but with financial data you will often have a large number of dates to work with. When this is the case, you will need to convert multiple dates from character to date format. You can do this all at once using vectors. In fact, if you remembered that a single character is actually a vector of length 1, then you would know that you have been doing this all along!
# Create a vector of daily character dates
dates <- c("2017-01-01", "2017-01-02",
"2017-01-03", "2017-01-04")
as.Date(dates)
[1] "2017-01-01" "2017-01-02" "2017-01-03" "2017-01-04"
Like before, this might look like it returned another character vector, but internally these are all stored as numerics, with some special properties that only dates have.
Instructions
# Create dates from "2017-02-05" to "2017-02-08" inclusive.
dates <- c("2017-02-05", "2017-02-06", "2017-02-07", "2017-02-08")
# Add names to dates
names(dates) <- c("Sunday", "Monday", "Tuesday", "Wednesday")
# Subset dates to only return the date for Monday
dates["Monday"]
## Monday
## "2017-02-06"
As you saw earlier, R is picky about how it reads dates. To remind you, as.Date("09/28/2008") threw an error because it was not in the correct format. The fix for this is to specify the format you are using through the format argument:
as.Date("09/28/2008", format = "%m / %d / %Y")
[1] "2008-09-29"
This might look strange, but the basic idea is that you are defining a character vector telling R that your date is in the form of mm/dd/yyyy. It then knows how to extract the components and switch to yyyy-mm-dd.
There are a number of different formats you can specify, here are a few of them:
Instructions
In this exercise you will work with the date, "1930-08-30", Warren Buffett's birth date!
Use as.Date() and an appropriate format to convert "30aug1930" to a date.
# "08,30,30"
as.Date("08,30,1930", format = "%m, %d, %Y")
## [1] "1930-08-30"
# "Aug 30,1930"
as.Date("Aug 30,1930", format = "%b %d, %Y")
## [1] "1930-08-30"
# "30aug1930"
as.Date("30aug1930", format = "%d%b%Y")
## [1] "1930-08-30"
Not only can you convert characters to dates, but you can convert objects that are already dates to differently formatted dates using format():
# The best point move in stock market history. A +936 point change in the Dow!
best_date
[1] "2008-10-13"
format(best_date, format = "%Y/%m/%d")
[1] "2008/10/13"
format(best_date, format = "%B %d, %Y")
[1] "October 10, 2008"
As a reminder, here are the formats:
Instructions
# char_dates
char_dates <- c("1jan17", "2jan17", "3jan17", "4jan17", "5jan17")
# Create dates using as.Date() and the correct format
dates <- as.Date(char_dates, format = "%d%b%y")
# Use format() to go from "2017-01-04" -> "Jan 04, 17"
format(dates, format = "%b %d, %y")
## [1] "Jan 01, 17" "Jan 02, 17" "Jan 03, 17" "Jan 04, 17" "Jan 05, 17"
# Use format() to go from "2017-01-04" -> "01,04,2017"
format(dates, format = "%m,%d,%Y")
## [1] "01,01,2017" "01,02,2017" "01,03,2017" "01,04,2017" "01,05,2017"
Just like with numerics, arithmetic can be done on dates. In particular, you can find the difference between two dates, in days, by using subtraction:
today <- as.Date("2017-01-02")
tomorrow <- as.Date("2017-01-03")
one_year_away <- as.Date("2018-01-02")
tomorrow - today
Time difference of 1 days
one_year_away - today
Time difference of 365 days
Equivalently, you could use the difftime() function to find the time interval instead.
difftime(tomorrow, today)
Time difference of 1 days
# With some extra options!
difftime(tomorrow, today, units = "secs")
Time difference of 86400 secs
Instructions
# Dates
dates <- as.Date(c("2017-01-01", "2017-01-02", "2017-01-03"))
# Create the origin
origin <- as.Date("1970-01-01")
# Use as.numeric() on dates
as.numeric(dates)
## [1] 17167 17168 17169
# Find the difference between dates and origin
dates - origin
## Time differences in days
## [1] 17167 17168 17169
As a final lesson on dates, there are a few functions that are useful for extracting date components. One of those is months().
my_date <- as.Date("2017-01-02")
months(my_date)
[1] "January"
Two other useful functions are weekdays() to extract the day of the week that your date falls on, and quarters() to determine which quarter of the year (Q1-Q4) that your date falls in.
Instructions
# dates
dates <- as.Date(c("2017-01-02", "2017-05-03", "2017-08-04", "2017-10-17"))
# Extract the months
months(dates)
## [1] "January" "May" "August" "October"
# Extract the quarters
quarters(dates)
## [1] "Q1" "Q2" "Q3" "Q4"
# dates2
dates2 <- as.Date(c("2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"))
# Assign the weekdays() of dates2 as the names()
names(dates2) <- weekdays(dates2)
# Print dates2
dates2
## Monday Tuesday Wednesday Thursday
## "2017-01-02" "2017-01-03" "2017-01-04" "2017-01-05"
In the video, Lore taught you all about different types of relational operators. For reference, here they are again:
> : Greater than
>=: Greater than or equal to
< : Less than
<=: Less than or equal to
==: Equality
!=: Not equal
These relational operators let us make comparisons in our data. If the equation is true, then the relational operator will return TRUE, otherwise it will return FALSE.
apple <- 45.46
microsoft <- 67.88
apple <= microsoft
[1] TRUE
hello <- "Hello world"
# Case sensitive!
hello == "hello world"
[1] FALSE
Instructions
micr and apple stock prices have been created for you.
Two dates have been created for you.
# Stock prices
apple <- 48.99
micr <- 77.93
# Apple vs Microsoft
apple > micr
## [1] FALSE
# Not equals
apple != micr
## [1] TRUE
# Dates - today and tomorrow
today <- as.Date(Sys.Date())
tomorrow <- as.Date(Sys.Date() + 1)
# Today vs Tomorrow
tomorrow < today
## [1] FALSE
You can extend the concept of relational operators to vectors of any arbitrary length. Compare two vectors using > to get a logical vector back of the same length, holding TRUE when the first is greater than the second, and FALSE otherwise.
apple <- c(120.00, 120.08, 119.97, 121.88)
datacamp <- c(118.5, 124.21, 125.20, 120.22)
apple > datacamp
[1] TRUE FALSE FALSE TRUE
Comparing a vector and a single number works as well. R will recycle the number to be the same length as the vector:
apple > 120
[1] FALSE TRUE FALSE TRUE
Imagine how this could be used as a buy/sell signal in stock analysis!
Instructions
A data.frame, stocks is in your workspace.
# Print stocks
stocks
# IBM range
stocks$ibm_buy <- stocks$ibm < 175
# Panera range
stocks$panera_sell <- stocks$panera > 213
# IBM vs Panera
stocks$ibm_vs_panera <- stocks$ibm > stocks$panera
# Print stocks
stocks
Output:
> # Print stocks
> stocks
date ibm panera
1 2017-01-20 170.55 216.65
2 2017-01-23 171.03 216.06
3 2017-01-24 175.90 213.55
4 2017-01-25 178.29 212.22
>
> # IBM range
> stocks$ibm_buy <- stocks$ibm < 175
>
> # Panera range
> stocks$panera_sell <- stocks$panera > 213
>
> # IBM vs Panera
> stocks$ibm_vs_panera <- stocks$ibm > stocks$panera
>
> # Print stocks
> stocks
date ibm panera ibm_buy panera_sell ibm_vs_panera
1 2017-01-20 170.55 216.65 TRUE TRUE FALSE
2 2017-01-23 171.03 216.06 TRUE TRUE FALSE
3 2017-01-24 175.90 213.55 FALSE TRUE FALSE
4 2017-01-25 178.29 212.22 FALSE FALSE FALSE
You might want to check multiple relational conditions at once. What if you wanted to know if Apple stock was above 120, but below 121? Simple relational operators are not enough! For multiple conditions, you need the And operator &, and the Or operator |.
# Both conditions must hold
(apple > 120) & (apple < 121)
[1] FALSE TRUE FALSE FALSE
# Only one condition has to hold
(apple <= 120) | (apple > 121)
[1] TRUE FALSE TRUE TRUE
Instructions
stocks is in your workspace.
# IBM buy range
stocks$ibm_buy_range <- (stocks$ibm > 171) & (stocks$ibm < 176)
# Panera spikes
stocks$panera_spike <- (stocks$panera < 213.20) | (stocks$panera > 216.50)
# Date range
stocks$good_dates <- (stocks$date > as.Date("2017-01-21")) & (stocks$date < as.Date("2017-01-25"))
# Print stocks
stocks
Output:
# IBM buy range
stocks$ibm_buy_range <- (stocks$ibm > 171) & (stocks$ibm < 176)
# Panera spikes
stocks$panera_spike <- (stocks$panera < 213.20) | (stocks$panera > 216.50)
# Date range
stocks$good_dates <- (stocks$date > as.Date("2017-01-21")) & (stocks$date < as.Date("2017-01-25"))
# Print stocks
stocks
One last operator to introduce is ! or, Not. You have already seen a similar operator, !=, so you might be able to guess what it does. Add ! in front of a logical expression, and it will flip that expression from TRUE to FALSE (and vice versa).
!TRUE
[1] FALSE
apple <- c(120.00, 120.08, 119.97, 121.88)
!(apple < 121)
[1] FALSE FALSE FALSE TRUE
Instructions
stocks is in your workspace.
# IBM range
!(stocks$ibm > 176)
# Missing data
missing <- c(24.5, 25.7, NA, 28, 28.6, NA)
# Is missing?
is.na(missing)
# Not missing?
!is.na(missing)
Output:
> # IBM range
> !(stocks$ibm > 176)
[1] TRUE TRUE TRUE FALSE
>
> # Missing data
> missing <- c(24.5, 25.7, NA, 28, 28.6, NA)
>
> # Is missing?
> is.na(missing)
[1] FALSE FALSE TRUE FALSE FALSE TRUE
>
> # Not missing?
> !is.na(missing)
[1] TRUE TRUE FALSE TRUE TRUE FALSE
Here's a fun problem. You know how to create logical vectors that tell you when a certain condition is true, but can you subset a data frame to only contains rows where that condition is true?
If you took Introduction to R for Finance, you might remember the subset() function. subset() takes as arguments a data frame (or vector/matrix) and a logical vector of which rows to return:
stocks
date ibm panera
1 2017-01-20 170.55 216.65
2 2017-01-23 171.03 216.06
3 2017-01-24 175.90 213.55
4 2017-01-25 178.29 212.22
subset(stocks, ibm < 175)
date ibm panera
1 2017-01-20 170.55 216.65
2 2017-01-23 171.03 216.06
Useful, right?
Instructions
stocks is in your workspace.
# Panera range
subset(stocks, panera > 216)
# Specific date
subset(stocks, date == as.Date("2017-01-23"))
# IBM and Panera joint range
subset(stocks, ibm < 175 & panera < 216.50)
Output:
> # Panera range
> subset(stocks, panera > 216)
date ibm panera
1 2017-01-20 170.55 216.65
2 2017-01-23 171.03 216.06
>
> # Specific date
> subset(stocks, date == as.Date("2017-01-23"))
date ibm panera
2 2017-01-23 171.03 216.06
>
> # IBM and Panera joint range
> subset(stocks, ibm < 175 & panera < 216.50)
date ibm panera
2 2017-01-23 171.03 216.06
Great! You have learned a lot about operators and subsetting. This will serve you well in future data analysis projects. Let's do one last exercise that combines a number of operators together.
Instructions
A new version of stocks is in your workspace.
# View stocks
stocks
# Weekday investigation
stocks$weekday <- weekdays(stocks$date)
# View stocks again
stocks
# Remove missing data
stocks_no_NA <- subset(stocks, !is.na(apple))
# Apple and Microsoft joint range
subset(stocks_no_NA, apple > 117 | micr > 63)
Output:
> # View stocks
> stocks
date apple micr
1 2016-12-01 109.49 59.20
2 2016-12-02 109.90 59.25
3 2016-12-03 NA NA
4 2016-12-04 NA NA
5 2016-12-05 109.11 60.22
6 2016-12-06 109.95 59.95
7 2016-12-07 111.03 61.37
8 2016-12-08 112.12 61.01
9 2016-12-09 113.95 61.97
10 2016-12-10 NA NA
11 2016-12-11 NA NA
12 2016-12-12 113.30 62.17
13 2016-12-13 115.19 62.98
14 2016-12-14 115.19 62.68
15 2016-12-15 115.82 62.58
16 2016-12-16 115.97 62.30
17 2016-12-17 NA NA
18 2016-12-18 NA NA
19 2016-12-19 116.64 63.62
20 2016-12-20 116.95 63.54
21 2016-12-21 117.06 63.54
22 2016-12-22 116.29 63.55
23 2016-12-23 116.52 63.24
24 2016-12-24 NA NA
25 2016-12-25 NA NA
26 2016-12-27 117.26 63.28
27 2016-12-28 116.76 62.99
28 2016-12-29 116.73 62.90
29 2016-12-30 115.82 62.14
>
> # Weekday investigation
> stocks$weekday <- weekdays(stocks$date)
>
> # View stocks again
> stocks
date apple micr weekday
1 2016-12-01 109.49 59.20 Thursday
2 2016-12-02 109.90 59.25 Friday
3 2016-12-03 NA NA Saturday
4 2016-12-04 NA NA Sunday
5 2016-12-05 109.11 60.22 Monday
6 2016-12-06 109.95 59.95 Tuesday
7 2016-12-07 111.03 61.37 Wednesday
8 2016-12-08 112.12 61.01 Thursday
9 2016-12-09 113.95 61.97 Friday
10 2016-12-10 NA NA Saturday
11 2016-12-11 NA NA Sunday
12 2016-12-12 113.30 62.17 Monday
13 2016-12-13 115.19 62.98 Tuesday
14 2016-12-14 115.19 62.68 Wednesday
15 2016-12-15 115.82 62.58 Thursday
16 2016-12-16 115.97 62.30 Friday
17 2016-12-17 NA NA Saturday
18 2016-12-18 NA NA Sunday
19 2016-12-19 116.64 63.62 Monday
20 2016-12-20 116.95 63.54 Tuesday
21 2016-12-21 117.06 63.54 Wednesday
22 2016-12-22 116.29 63.55 Thursday
23 2016-12-23 116.52 63.24 Friday
24 2016-12-24 NA NA Saturday
25 2016-12-25 NA NA Sunday
26 2016-12-27 117.26 63.28 Tuesday
27 2016-12-28 116.76 62.99 Wednesday
28 2016-12-29 116.73 62.90 Thursday
29 2016-12-30 115.82 62.14 Friday
>
> # Remove missing data
> stocks_no_NA <- subset(stocks, !is.na(apple))
>
> # Apple and Microsoft joint range
> subset(stocks_no_NA, apple > 117 | micr > 63)
date apple micr weekday
19 2016-12-19 116.64 63.62 Monday
20 2016-12-20 116.95 63.54 Tuesday
21 2016-12-21 117.06 63.54 Wednesday
22 2016-12-22 116.29 63.55 Thursday
23 2016-12-23 116.52 63.24 Friday
26 2016-12-27 117.26 63.28 Tuesday
If statements are great for adding extra logical flow to your code. First, let's look at the basic structure of an if statement:
if(condition) {
code
}
The condition is anything that returns a single TRUE or FALSE. If the condition is TRUE, then the code inside gets executed. Otherwise, the code gets skipped and the program continues. Here is an example:
apple <- 54.3
if(apple < 70) {
print("Apple is less than 70")
}
[1] "Apple is less than 70"
Relational operators are a common way to create the condition in the if statement!
Instructions
The variable micr has been created for you.
# micr
micr <- 48.55
# Fill in the blanks
if( micr < 55 ) {
print("Buy!")
}
## [1] "Buy!"
An extension of the if statement is to perform a different action if the condition is false. You can do this by adding else after your if statement:
if(condition) {
code if true
} else {
code if false
}
Instructions
Extend the last exercise by adding an else statement that prints "Do nothing!".
# micr
micr <- 57.44
# Fill in the blanks
if( micr < 55 ) {
print("Buy!")
} else {
print("Do nothing!")
}
## [1] "Do nothing!"
To add even more logic, you can follow the pattern of if, else if, else. You can add as many else if's as you need for your control logic.
if(condition1) {
code if condition1 is true
} else if(condition2) {
code if condition2 is true
} else {
code if both are false
}
Instructions
Extend the last example by filling in the blanks to complete the following logic: * if micr is less than 55, print "Buy!" * else if greater than or equal to 55 and micr is less than 75, print "Do nothing!" * else print "Sell!"
# micr
micr <- 105.67
# Fill in the blanks
if( micr < 55 ) {
print("Buy!")
} else if( micr >= 55 & micr < 75 ){
print("Do nothing!")
} else {
print("Sell!")
}
## [1] "Sell!"
Sometimes it makes sense to have nested if statements to add even more control. In the following exercise, you will add an if statement that checks if you are holding a share of the Microsoft stock before you attempt to sell it.
Here is the structure of nested if statements, it should look somewhat familiar:
if(condition1) {
if(condition2) {
code if both pass
} else {
code if 1 passes, 2 fails
}
} else {
code if 1 fails
}
Instructions
The variables micr and shares have been created for you.
# micr
micr <- 105.67
shares <- 1
# Fill in the blanks
if( micr < 55 ) {
print("Buy!")
} else if( micr >= 55 & micr < 75 ) {
print("Do nothing!")
} else {
if( shares >= 1 ) {
print("Sell!")
} else {
print("Not enough shares to sell!")
}
}
## [1] "Sell!"
A powerful function to know about is ifelse(). It creates an if statement in 1 line of code, and more than that, it works on entire vectors!
Suppose you have a vector of stock prices. What if you want to return "Buy!" each time apple > 110, and "Do nothing!", otherwise? A simple if statement would not be enough to solve this problem. However, with ifelse() you can do:
apple
[1] 109.49 109.90 109.11 109.95 111.03 112.12
ifelse(test = apple > 110, yes = "Buy!", no = "Do nothing!")
[1] "Do nothing!" "Do nothing!" "Do nothing!" "Do nothing!" "Buy!"
[6] "Buy!"
ifelse() evaluates the test to get a logical vector, and where the logical vector is TRUE it replaces TRUE with whatever is in yes. Similarly, FALSE is replaced by no.
Instructions
stocks is in your workspace.
# Microsoft test
stocks$micr_buy <- ifelse(test = stocks$micr > 60 & stocks$micr < 62, yes = 1, no = 0)
# Apple test
stocks$apple_date <- ifelse(test = stocks$apple > 117, yes = stocks$date, no = NA)
# Print stocks
stocks
# Change the class() of apple_date.
class(stocks$apple_date) <- "Date"
# Print stocks again
stocks
Output:
> # Microsoft test
> stocks$micr_buy <- ifelse(test = stocks$micr > 60 & stocks$micr < 62, yes = 1, no = 0)
>
> # Apple test
> stocks$apple_date <- ifelse(test = stocks$apple > 117, yes = stocks$date, no = NA)
>
> # Print stocks
> stocks
date apple micr micr_buy apple_date
1 2016-12-01 109.49 59.20 0 NA
2 2016-12-02 109.90 59.25 0 NA
5 2016-12-05 109.11 60.22 1 NA
6 2016-12-06 109.95 59.95 0 NA
7 2016-12-07 111.03 61.37 1 NA
8 2016-12-08 112.12 61.01 1 NA
9 2016-12-09 113.95 61.97 1 NA
12 2016-12-12 113.30 62.17 0 NA
13 2016-12-13 115.19 62.98 0 NA
14 2016-12-14 115.19 62.68 0 NA
15 2016-12-15 115.82 62.58 0 NA
16 2016-12-16 115.97 62.30 0 NA
19 2016-12-19 116.64 63.62 0 NA
20 2016-12-20 116.95 63.54 0 NA
21 2016-12-21 117.06 63.54 0 17156
22 2016-12-22 116.29 63.55 0 NA
23 2016-12-23 116.52 63.24 0 NA
26 2016-12-27 117.26 63.28 0 17162
27 2016-12-28 116.76 62.99 0 NA
28 2016-12-29 116.73 62.90 0 NA
29 2016-12-30 115.82 62.14 0 NA
>
> # Change the class() of apple_date.
> class(stocks$apple_date) <- "Date"
>
> # Print stocks again
> stocks
date apple micr micr_buy apple_date
1 2016-12-01 109.49 59.20 0 <NA>
2 2016-12-02 109.90 59.25 0 <NA>
5 2016-12-05 109.11 60.22 1 <NA>
6 2016-12-06 109.95 59.95 0 <NA>
7 2016-12-07 111.03 61.37 1 <NA>
8 2016-12-08 112.12 61.01 1 <NA>
9 2016-12-09 113.95 61.97 1 <NA>
12 2016-12-12 113.30 62.17 0 <NA>
13 2016-12-13 115.19 62.98 0 <NA>
14 2016-12-14 115.19 62.68 0 <NA>
15 2016-12-15 115.82 62.58 0 <NA>
16 2016-12-16 115.97 62.30 0 <NA>
19 2016-12-19 116.64 63.62 0 <NA>
20 2016-12-20 116.95 63.54 0 <NA>
21 2016-12-21 117.06 63.54 0 2016-12-21
22 2016-12-22 116.29 63.55 0 <NA>
23 2016-12-23 116.52 63.24 0 <NA>
26 2016-12-27 117.26 63.28 0 2016-12-27
27 2016-12-28 116.76 62.99 0 <NA>
28 2016-12-29 116.73 62.90 0 <NA>
29 2016-12-30 115.82 62.14 0 <NA>
Loops are a core concept in programming. They are used in almost every language. In R, there is another way of performing repeated actions using apply functions, but we will save those until chapter 5. For now, let's look at the repeat loop!
This is the simplest loop. You use repeat, and inside the curly braces perform some action. You must specify when you want to break out of the loop. Otherwise it runs for eternity!
repeat {
code
if(condition) {
break
}
}
Do not do the following. This is an infinite loop! In words, you are telling R to repeat your code for eternity.
repeat {
code
}
Instructions
# Stock price
stock_price <- 126.34
repeat {
# New stock price
stock_price <- stock_price * runif(1, .985, 1.01)
print(stock_price)
# Check
if(stock_price < 125) {
print("Stock price is below 125! Buy it while it's cheap!")
break
}
}
## [1] 126.9496
## [1] 125.8381
## [1] 126.5319
## [1] 126.1155
## [1] 126.228
## [1] 126.993
## [1] 128.0152
## [1] 127.1447
## [1] 125.4034
## [1] 125.2684
## [1] 123.7274
## [1] "Stock price is below 125! Buy it while it's cheap!"
The order in which you execute your code inside the loop and check when you should break is important. The following would run the code a different number of times.
# Code, then check condition
repeat {
code
if(condition) {
break
}
}
# Check condition, then code
repeat {
if(condition) {
break
}
code
}
Let's see this in an extension of the previous exercise. For the purposes of this example, the runif() function has been replaced with a static multiplier to remove randomness.
Instructions
# Stock price
stock_price <- 67.55
repeat {
# New stock price
stock_price <- stock_price * .995
# Check
if(stock_price < 66) {
print("Stock price is below 66! Buy it while it's cheap!")
break
}
print(stock_price)
}
## [1] 67.21225
## [1] 66.87619
## [1] 66.54181
## [1] 66.2091
## [1] "Stock price is below 66! Buy it while it's cheap!"
While loops are slightly different from repeat loops. Like if statements, you specify the condition for them to run at the very beginning. There is no need for a break statement because the condition is checked at each iteration.
while (condition) {
code
}
It might seem like the while loop is doing the exact same thing as the repeat loop, just with less code. In our cases, this is true. So, why ever use the repeat loop? Occasionally, there are cases when using a repeat loop to run forever is desired. If you are interested, click here and check out Intentional Looping.
For the exercise, imagine that you have a debt of $5000 that you need to pay back. Each month, you pay off $500 dollars, until you've paid everything off. You will use a loop to model the process of paying off the debt each month, where each iteration you decrease your total debt and print out the new total!
Instructions
# Initial debt
debt <- 5000
# While loop to pay off your debt
while (debt > 0) {
debt <- debt - 500
print(paste("Debt remaining", debt))
}
## [1] "Debt remaining 4500"
## [1] "Debt remaining 4000"
## [1] "Debt remaining 3500"
## [1] "Debt remaining 3000"
## [1] "Debt remaining 2500"
## [1] "Debt remaining 2000"
## [1] "Debt remaining 1500"
## [1] "Debt remaining 1000"
## [1] "Debt remaining 500"
## [1] "Debt remaining 0"
Loops can be used for all kinds of fun examples! What if you wanted to visualize your debt decreasing over time? Like the last exercise, this one uses a loop to model paying it off, $500 at a time. However, at each iteration you will also append your remaining debt total to a plot, so that you can visualize the total decreasing as you go.
This exercise has already been done for you. Let's talk about what is happening here.
First, initialize some variables:
Then, create a while loop. As long as you still have debt:
After you run the code, you can use Previous Plot to go back and view all 11 of the created plots!
debt <- 5000 # initial debt
i <- 0 # x axis counter
x_axis <- i # x axis
y_axis <- debt # y axis
# Initial plot
plot(x_axis, y_axis, xlim = c(0,10), ylim = c(0,5000))
# Graph your debt
while (debt > 0) {
# Updating variables
debt <- debt - 500
i <- i + 1
x_axis <- c(x_axis, i)
y_axis <- c(y_axis, debt)
# Next plot
plot(x_axis, y_axis, xlim = c(0,10), ylim = c(0,5000))
}
Sometimes, you have to end your while loop early. With the debt example, if you don't have enough cash to pay off all of your debt, you won't be able to continuing paying it down. In this exercise, you will add an if statement and a break to let you know if you run out of money!
while (condition) {
code
if (breaking_condition) {
break
}
}
The while loop will completely stop, and all lines after it will be run, if the breaking_condition is met. In this case, that condition will be running out of cash!
Instructions
debt and cash have been defined for you.
# debt and cash
debt <- 5000
cash <- 4000
# Pay off your debt...if you can!
while (debt > 0) {
debt <- debt - 500
cash <- cash - 500
print(paste("Debt remaining:", debt, "and Cash remaining:", cash))
if (cash == 0) {
print("You ran out of cash!")
break
}
}
## [1] "Debt remaining: 4500 and Cash remaining: 3500"
## [1] "Debt remaining: 4000 and Cash remaining: 3000"
## [1] "Debt remaining: 3500 and Cash remaining: 2500"
## [1] "Debt remaining: 3000 and Cash remaining: 2000"
## [1] "Debt remaining: 2500 and Cash remaining: 1500"
## [1] "Debt remaining: 2000 and Cash remaining: 1000"
## [1] "Debt remaining: 1500 and Cash remaining: 500"
## [1] "Debt remaining: 1000 and Cash remaining: 0"
## [1] "You ran out of cash!"
Last, but not least, in our discussion of loops is the for loop. When you know how many times you want to repeat an action, a for loop is a good option. The idea of the for loop is that you are stepping through a sequence, one at a time, and performing an action at each step along the way. That sequence is commonly a vector of numbers (such as the sequence from 1:10), but could also be numbers that are not in any order like c(2, 5, 4, 6), or even a sequence of characters!
for (value in sequence) {
code
}
In words this is saying, "for each value in my sequence, run this code." Examples could be, "for each row of my data frame, print column 1", or "for each word in my sentence, check if that word is DataCamp."
Let's try an example! First, you will create a loop that prints out the values in a sequence from 1 to 10. Then, you will modify that loop to also sum the values from 1 to 10, where at each iteration the next value in the sequence is added to the running sum.
Instructions
# Sequence
seq <- c(1:10)
# Print loop
for (value in seq) {
print(value)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
# A sum variable
sum <- 0
# Sum loop
for (value in seq) {
sum <- sum + value
print(sum)
}
## [1] 1
## [1] 3
## [1] 6
## [1] 10
## [1] 15
## [1] 21
## [1] 28
## [1] 36
## [1] 45
## [1] 55
Imagine that you are interested in the days where the stock price of Apple rises above 117. If it goes above this value, you want to print out the current date and stock price. If you have a stock data frame with a date and apple price column, could you loop over the rows of the data frame to accomplish this? You certainly could!
Before you do so, note that you can get the number of rows in your data frame using nrow(stock). Then, you can create a sequence to loop over from 1:nrow(stock).
for (row in 1:nrow(stock)) {
price <- stock[row, "apple"]
date <- stock[row, "date"]
if(price > 117) {
print(paste("On", date,
"the stock price was", price))
}
}
[1] "On 2016-12-21 the stock price was 117.06"
[1] "On 2016-12-27 the stock price was 117.26"
This incorporates a number of things we have learned so far. If statements, subsetting vectors, conditionals, and loops! Congratulations for learning so much!
Instructions
stock is in your workspace.
# Loop over stock rows
for (row in 1:nrow(stock)) {
price <- stock[row, "apple"]
date <- stock[row, "date"]
if(price > 116) {
print(paste("On", date,
"the stock price was", price))
} else {
print(paste("The date:", date,
"is not an important day!"))
}
}
Output:
> # Loop over stock rows
> for (row in 1:nrow(stock)) {
price <- stock[row, "apple"]
date <- stock[row, "date"]
if(price > 116) {
print(paste("On", date,
"the stock price was", price))
} else {
print(paste("The date:", date,
"is not an important day!"))
}
}
[1] "The date: 2016-12-01 is not an important day!"
[1] "The date: 2016-12-02 is not an important day!"
[1] "The date: 2016-12-05 is not an important day!"
[1] "The date: 2016-12-06 is not an important day!"
[1] "The date: 2016-12-07 is not an important day!"
[1] "The date: 2016-12-08 is not an important day!"
[1] "The date: 2016-12-09 is not an important day!"
[1] "The date: 2016-12-12 is not an important day!"
[1] "The date: 2016-12-13 is not an important day!"
[1] "The date: 2016-12-14 is not an important day!"
[1] "The date: 2016-12-15 is not an important day!"
[1] "The date: 2016-12-16 is not an important day!"
[1] "On 2016-12-19 the stock price was 116.64"
[1] "On 2016-12-20 the stock price was 116.95"
[1] "On 2016-12-21 the stock price was 117.06"
[1] "On 2016-12-22 the stock price was 116.29"
[1] "On 2016-12-23 the stock price was 116.52"
[1] "On 2016-12-27 the stock price was 117.26"
[1] "On 2016-12-28 the stock price was 116.76"
[1] "On 2016-12-29 the stock price was 116.73"
[1] "The date: 2016-12-30 is not an important day!"
So far, you have been looping over 1 dimensional data types. If you want to loop over elements in a matrix (columns and rows), then you will have to use nested loops. You will use this idea to print out the correlations between three stocks.
The easiest way to think about this is that you are going to start on row1, and move to the right, hitting col1, col2, ..., up until the last column in row1. Then, you move down to row2 and repeat the process.
my_matrix
[,1] [,2]
[1,] "r1c1" "r1c2"
[2,] "r2c1" "r2c2"
# Loop over my_matrix
for(row in 1:nrow(my_matrix)) {
for(col in 1:ncol(my_matrix)) {
print(my_matrix[row, col])
}
}
[1] "r1c1"
[1] "r1c2"
[1] "r2c1"
[1] "r2c2"
Instructions
The correlation matrix, corr, is in your workspace.
# Print out corr
corr
# Create a nested loop
for(row in 1:nrow(corr)) {
for(col in 1:ncol(corr)) {
print(paste(colnames(corr)[col], "and", rownames(corr)[row],
"have a correlation of", corr[row,col]))
}
}
Output:
> # Print out corr
> corr
apple ibm micr
apple 1.00 0.96 0.88
ibm 0.96 1.00 0.74
micr 0.88 0.74 1.00
>
> # Create a nested loop
> for(row in 1:nrow(corr)) {
for(col in 1:ncol(corr)) {
print(paste(colnames(corr)[col], "and", rownames(corr)[row],
"have a correlation of", corr[row,col]))
}
}
[1] "apple and apple have a correlation of 1"
[1] "ibm and apple have a correlation of 0.96"
[1] "micr and apple have a correlation of 0.88"
[1] "apple and ibm have a correlation of 0.96"
[1] "ibm and ibm have a correlation of 1"
[1] "micr and ibm have a correlation of 0.74"
[1] "apple and micr have a correlation of 0.88"
[1] "ibm and micr have a correlation of 0.74"
[1] "micr and micr have a correlation of 1"
To finish your lesson on loops, let's return to the concept of break, and the related concept of next. Just like with repeat and while loops, you can break out of a for loop completely by using the break statement. Additionally, if you just want to skip the current iteration, and continue the loop, you can use the next statement. This can be useful if your loop encounters an error, but you don't want it to break everything.
for (value in sequence) {
if(next_condition) {
next
}
code
if(breaking_condition) {
break
}
}
You don't have to use both break and next at the same time, this simply shows the general structure of using them.
The point of using next at the beginning, before the code runs, is to check for a problem before it happens.
Instructions
The apple vector is in your workspace.
# Print apple
apple
# Loop through apple. Next if NA. Break if above 117.
for (value in apple) {
if(is.na(value)) {
print("Skipping NA")
next
}
if(value > 117) {
print("Time to sell!")
break
} else {
print("Nothing to do here!")
}
}
Output:
> # Print apple
> apple
[1] 109.49 109.90 NA NA 109.11 109.95 111.03 112.12 113.95 NA
[11] NA 113.30 115.19 115.19 115.82 115.97 NA NA 116.64 116.95
[21] 117.06 116.29 116.52 NA NA 117.26 116.76 116.73 115.82
>
> # Loop through apple. Next if NA. Break if above 117.
> for (value in apple) {
if(is.na(value)) {
print("Skipping NA")
next
}
if(value > 117) {
print("Time to sell!")
break
} else {
print("Nothing to do here!")
}
}
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Skipping NA"
[1] "Skipping NA"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Skipping NA"
[1] "Skipping NA"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Skipping NA"
[1] "Skipping NA"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Time to sell!"
When you don't know how to use a function, or don't know what arguments it takes, where do you turn? Luckily for you, R has built in documentation. For example, to get help for the names() function, you can type one of:
?names
?names()
help(names)
These all do the same thing; they take you straight to the help page for names()!
In the DataCamp console, this takes you to the RDocumentation site to get help from there, but the information is all the same!
Below, you will explore the documentation for a few other functions.
Instructions
# subset help
?subset
## starting httpd help server ... done
# Sys.time help
?Sys.time
Let's look at some of the round() function's help documentation. It simply rounds a numeric vector off to a specified number of decimal places.
round(x, digits = 0)
The first argument, x is required. Without it, the function will not work!
The argument digits is known as an optional argument. Optional arguments are ones that don't have to be set by the user, either because they are given a default value, or because the function can infer them from the other data you have given it. Even though they don't have to be set, they often provide extra flexibility. Here, digits specifies the number of decimal places to round to.
Explore the round() function in the exercise!
Instructions
# Round 5.4
round(5.4)
## [1] 5
# Round 5.4 with 1 decimal place
round(5.4, digits = 1)
## [1] 5.4
# numbers
numbers <- c(.002623, pi, 812.33345)
# Round numbers to 3 decimal places
round(numbers, digits = 3)
## [1] 0.003 3.142 812.333
To write clean code, sometimes it is useful to use functions inside of other functions. This let's you use the result of one function directly in another one, without having to create an intermediate variable. You have actually already seen an example of this with print() and paste().
company <- c("Goldman Sachs", "J.P. Morgan", "Fidelity Investments")
for(i in 1:3) {
print(paste("A large financial institution is", company[i]))
}
[1] "A large financial institution is Goldman Sachs"
[1] "A large financial institution is J.P. Morgan"
[1] "A large financial institution is Fidelity Investments"
paste() strings together the character vectors, and print() prints it to the console.
The exercise below explores simplifying the calculation of the correlation matrix using nested functions.
Instructions
3 vectors of stock prices are in your workspace.
# cbind() the stocks
stocks <- cbind(apple, ibm, micr)
# cor() to create the correlation matrix
cor(stocks)
# All at once! Nest cbind() inside of cor()
cor(cbind(apple, ibm, micr))
Output:
> # cbind() the stocks
> stocks <- cbind(apple, ibm, micr)
>
> # cor() to create the correlation matrix
> cor(stocks)
apple ibm micr
apple 1.0000000 0.8872467 0.9477010
ibm 0.8872467 1.0000000 0.9126597
micr 0.9477010 0.9126597 1.0000000
>
> # All at once! Nest cbind() inside of cor()
> cor(cbind(apple, ibm, micr))
apple ibm micr
apple 1.0000000 0.8872467 0.9477010
ibm 0.8872467 1.0000000 0.9126597
micr 0.9477010 0.9126597 1.0000000
Time for your first function! This is a big step in an R programmer's journey. "Functions are a fundamental building block of R: to master many of the more advanced techniques ... you need a solid foundation in how functions work." -Hadley Wickham
Here is the basic structure of a function:
func_name <- function(arguments) {
body
}
And here is an example:
square <- function(x) {
x^2
}
square(2)
[1] 4
Two things to remember from what Lore taught you are arguments and the function body. Arguments are user inputs that the function works on. They can be the data that the function manipulates, or options that affect the calculation. The body of the function is the code that actually performs the manipulation.
The value that a function returns is simply the last executed line of the function body. In the example, since x^2 is the last line of the body, that is what gets returned.
In the exercise, you will create your first function to turn a percentage into a decimal, a useful calculation in finance!
Instructions
# Percent to decimal function
percent_to_decimal <- function(percent) {
percent / 100
}
# Use percent_to_decimal() on 6
percent_to_decimal(6)
## [1] 0.06
# Example percentage
pct <- 8
# Use percent_to_decimal() on pct
percent_to_decimal(pct)
## [1] 0.08
As you saw in the optional arguments example, functions can have multiple arguments. These can help extend the flexibility of your function. Let's see this in action.
pow <- function(x, power = 2) {
x^power
}
pow(2)
[1] 4
pow(2, power = 3)
[1] 8
Instead of a square() function, we now have a version that works with any power.
The power argument is optional and has a default value of 2, but the user can easily change this. It is also an example of how you can add multiple arguments. Notice how the arguments are separated by a comma, and the default value is set using an equals sign.
Let's add some more functionality to percent_to_decimal() that allows you to round the percentage to a certain number of digits.
Instructions
# Percent to decimal function
percent_to_decimal <- function(percent, digits = 2) {
decimal <- percent / 100
round(decimal, digits)
}
# percents
percents <- c(25.88, 9.045, 6.23)
# percent_to_decimal() with default digits
percent_to_decimal(percents)
## [1] 0.26 0.09 0.06
# percent_to_decimal() with digits = 4
percent_to_decimal(percents, digits = 4)
## [1] 0.2588 0.0904 0.0623
Let's think about a more complicated example. Do you remember present value from the Introduction to R for Finance course? If not, you can review the video for that here. The idea is that you want to discount money that you will get in the future at a specific interest rate to represent the value of that money in today's dollars. The following general formula was developed to help with this:
present_value <- cash_flow * (1 + i / 100) ^ -year
Wouldn't it be nice to have a function that did this calculation for you? Maybe something of the form:
present_value <- pv(cash_flow, i, year)
This function should work if you pass in numerics like pv(1500, 5, 2) and it should work if you pass in vectors of equal length to calculate an entire present value vector at once!
Instructions
The percent_to_decimal() function is in your workspace.
Perform the present value calculation. Do not store this in a variable. As the last executed line, it will be returned automatically.
Calculate the present value of $1200, at an interest rate of 7%, to be received 3 years from now.
# Present value function
pv <- function(cash_flow, i, year) {
# Discount multiplier
mult <- 1 + percent_to_decimal(i)
# Present value calculation
cash_flow * mult ^ -year
}
# Calculate a present value
pv(1200, 7, 3)
## [1] 979.5575
The tidyquant package is focused on retrieving, manipulating, and scaling financial data analysis in the easiest way possible. To get the tidyquant package and start working with it, you first have to install it.
install.packages("tidyquant")
This places it on your local computer. You then have to load it into your current R session. This gives you access to all of the functions in the package.
library(tidyquant) These steps of installing and librarying packages are necessary for any CRAN package you want to use.
The exercise code is already written for you. You will explore some of the functions that tidyquant has for financial analysis.
Instructions
The code is already written, but these instructions will walk you through the steps.
# Library tidquant
library(tidyquant)
## Warning: package 'tidyquant' was built under R version 3.6.3
## Loading required package: lubridate
## Warning: package 'lubridate' was built under R version 3.6.3
##
## Attaching package: 'lubridate'
## The following object is masked _by_ '.GlobalEnv':
##
## origin
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## Loading required package: PerformanceAnalytics
## Warning: package 'PerformanceAnalytics' was built under R version 3.6.3
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.6.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.6.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
## Loading required package: quantmod
## Warning: package 'quantmod' was built under R version 3.6.3
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.6.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Version 0.4-0 included new data defaults. See ?getSymbols.
## == Need to Learn tidyquant? ==============================
## Business Science offers a 1-hour course - Learning Lab #9: Performance Analysis & Portfolio Optimization with tidyquant!
## </> Learn more at: https://university.business-science.io/p/learning-labs-pro </>
# Pull Apple stock data
apple <- tq_get("AAPL", get = "stock.prices",
from = "2007-01-03", to = "2017-06-05")
# Take a look at what it returned
head(apple)
## # A tibble: 6 x 8
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2007-01-03 3.08 3.09 2.92 2.99 1238319600 2.59
## 2 AAPL 2007-01-04 3.00 3.07 2.99 3.06 847260400 2.64
## 3 AAPL 2007-01-05 3.06 3.08 3.01 3.04 834741600 2.62
## 4 AAPL 2007-01-08 3.07 3.09 3.05 3.05 797106800 2.64
## 5 AAPL 2007-01-09 3.09 3.32 3.04 3.31 3349298400 2.86
## 6 AAPL 2007-01-10 3.38 3.49 3.34 3.46 2952880000 2.99
# Plot the stock price over time
plot(apple$date, apple$adjusted, type = "l")
# Calculate daily stock returns for the adjusted price
apple <- tq_mutate(data = apple,
ohlc_fun = Ad,
mutate_fun = dailyReturn)
## Warning: Argument ohlc_fun is deprecated; please use select instead.
# Sort the returns from least to greatest
sorted_returns <- sort(apple$daily.returns)
# Plot them
plot(sorted_returns)
The first function in the apply family that you will learn is lapply(), which is short for "list apply." When you have a list, and you want to apply the same function to each element of the list, lapply() is a potential solution that always returns another list. How might this work?
Let's look at a simple example. Suppose you want to find the length of each vector in the following list.
my_list
$a
[1] 2 4 5
$b
[1] 10 14 5 3 4 5 6
# Using lapply
# Note that you don't need parenthesis when calling length
lapply(my_list, FUN = length)
$a
[1] 3
$b
[1] 7
As noted in the video, if at first you thought about looping over each element in the list, and using length() at each iteration, you aren't wrong. lapply() is the vectorized version of this kind of loop, and is often preferred (and simpler) in the R world.
Instructions
In your workspace is a list of daily stock returns as percentages called stock_return and the percent_to_decimal() function.
# Print stock_return
stock_return
# lapply to change percents to decimal
lapply(stock_return, FUN = percent_to_decimal)
Output:
> # Print stock_return
> stock_return
$apple
[1] 0.37446342 -0.71883530 0.76986527 0.98226467 0.98171665 1.63217981
[7] -0.57042563 1.66813769 0.00000000 0.54692248 0.12951131 0.57773562
[13] 0.26577503 0.09405729 -0.65778233 0.19778141 0.63508411 -0.42640287
[19] -0.02569373 -0.77957680
$ibm
[1] 0.1251408 -0.1124859 0.3190691 2.7689429 0.3458948 0.7014998
[7] -0.6125390 1.6858006 0.1307267 -0.2907839 -0.7677657 -0.0299886
[13] 0.5519558 -0.1610979 -0.1613578 -0.2095056 0.2579329 -0.5683858
[19] 0.2467056 -0.3661465
$micr
[1] 0.08445946 1.63713080 -0.44835603 2.36864053 -0.58660583 1.57351254
[7] 0.32273681 1.30287920 -0.47634170 -0.15954052 -0.44742729 2.11878010
[13] -0.12574662 0.00000000 0.01573812 -0.48780488 0.06325111 -0.45828066
[19] -0.14287982 -1.20826709
>
> # lapply to change percents to decimal
> lapply(stock_return, FUN = percent_to_decimal)
$apple
[1] 0.00 -0.01 0.01 0.01 0.01 0.02 -0.01 0.02 0.00 0.01 0.00 0.01
[13] 0.00 0.00 -0.01 0.00 0.01 0.00 0.00 -0.01
$ibm
[1] 0.00 0.00 0.00 0.03 0.00 0.01 -0.01 0.02 0.00 0.00 -0.01 0.00
[13] 0.01 0.00 0.00 0.00 0.00 -0.01 0.00 0.00
$micr
[1] 0.00 0.02 0.00 0.02 -0.01 0.02 0.00 0.01 0.00 0.00 0.00 0.02
[13] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.01
If, instead of a list, you had a data frame of stock returns, could you still use lapply()? Yes! Perhaps surprisingly, data frames are actually lists under the hood, and an lapply() call would apply the function to each column of the data frame.
df
a b
1 1 4
2 2 6
class(df)
[1] "data.frame"
lapply(df, FUN = sum)
$a
[1] 3
$b
[1] 10
lapply() summed each column in the data frame, but still follows its convention of always returning a list.
Instructions
In your workspace is a data frame of daily stock returns as decimals called stock_return.
# Print stock_return
stock_return
# lapply to get the average returns
lapply(stock_return, FUN = mean)
# Sharpe ratio
sharpe <- function(returns) {
(mean(returns) - .0003) / sd(returns)
}
# lapply to get the sharpe ratio
lapply(stock_return, FUN = sharpe)
Output:
> # Print stock_return
> stock_return
apple ibm micr
1 0.0037446342 0.001251408 0.0008445946
2 -0.0071883530 -0.001124859 0.0163713080
3 0.0076986527 0.003190691 -0.0044835603
4 0.0098226467 0.027689429 0.0236864053
5 0.0098171665 0.003458948 -0.0058660583
6 0.0163217981 0.007014998 0.0157351254
7 -0.0057042563 -0.006125390 0.0032273681
8 0.0166813769 0.016858006 0.0130287920
9 0.0000000000 0.001307267 -0.0047634170
10 0.0054692248 -0.002907839 -0.0015954052
11 0.0012951131 -0.007677657 -0.0044742729
12 0.0057773562 -0.000299886 0.0211878010
13 0.0026577503 0.005519558 -0.0012574662
14 0.0009405729 -0.001610979 0.0000000000
15 -0.0065778233 -0.001613578 0.0001573812
16 0.0019778141 -0.002095056 -0.0048780488
17 0.0063508411 0.002579329 0.0006325111
18 -0.0042640287 -0.005683858 -0.0045828066
19 -0.0002569373 0.002467056 -0.0014287982
20 -0.0077957680 -0.003661465 -0.0120826709
>
> # lapply to get the average returns
> lapply(stock_return, FUN = mean)
$apple
[1] 0.002838389
$ibm
[1] 0.001926806
$micr
[1] 0.002472939
>
> # Sharpe ratio
> sharpe <- function(returns) {
(mean(returns) - .0003) / sd(returns)
}
>
> # lapply to get the sharpe ratio
> lapply(stock_return, FUN = sharpe)
$apple
[1] 0.3546496
$ibm
[1] 0.2000819
$micr
[1] 0.218519
Often, the function that you want to apply will have other optional arguments that you may want to tweak. Consider the percent_to_decimal() function that allows the user to specify the number of decimal places.
percent_to_decimal(5.4, digits = 3)
[1] 0.054
In the call to lapply() you can specify the named optional arguments after the FUN argument, and they will get passed to the function that you are applying.
my_list
$a
[1] 2.444 3.500
$b
[1] 1.100 2.678 3.450
lapply(my_list, FUN = percent_to_decimal, digits = 4)
$a
[1] 0.0244 0.0350
$b
[1] 0.0110 0.0268 0.0345
In the exercise, you will extend the capability of your sharpe ratio function to allow the user to input the risk free rate as an argument, and then use this with lapply().
Instructions
In your workspace is a data frame of daily stock returns as decimals called stock_return.
# Extend sharpe() to allow optional argument
sharpe <- function(returns, rf = .0003) {
(mean(returns) - rf) / sd(returns)
}
# First lapply()
lapply(stock_return, FUN = sharpe, rf = .0004)
# Second lapply()
lapply(stock_return, FUN = sharpe, rf = .0009)
Output:
> # Extend sharpe() to allow optional argument
> sharpe <- function(returns, rf = .0003) {
(mean(returns) - rf) / sd(returns)
}
>
> # First lapply()
> lapply(stock_return, FUN = sharpe, rf = .0004)
$apple
[1] 0.3406781
$ibm
[1] 0.1877828
$micr
[1] 0.2084626
>
> # Second lapply()
> lapply(stock_return, FUN = sharpe, rf = .0009)
$apple
[1] 0.2708209
$ibm
[1] 0.1262875
$micr
[1] 0.1581807
lapply() is great, but sometimes you might want the returned data in a nicer form than a list. For instance, with the sharpe ratio, wouldn't it be great if the returned sharpe ratios were in a vector rather than a list? Further analysis would likely be easier!
For this, you might want to consider sapply(), or simplify apply. It performs exactly like lapply(), but will attempt to simplify the output if it can. The basic syntax is the same, with a few additional arguments:
sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
These additional optional arguments let you specify if you want sapply() to try and simplify the output, and if you want it to use the names of the object in the output.
In the exercise, you will recalculate sharpe ratios using sapply() to simplify the output.
Instructions
stock_return and the sharpe function are in your workspace.
# lapply() on stock_return
lapply(stock_return, FUN = sharpe)
# sapply() on stock_return
sapply(stock_return, FUN = sharpe)
# sapply() on stock_return with optional arguments
sapply(stock_return, FUN = sharpe, simplify = FALSE, USE.NAMES = FALSE)
Output:
> # lapply() on stock_return
> lapply(stock_return, FUN = sharpe)
$apple
[1] 0.3546496
$ibm
[1] 0.2000819
$micr
[1] 0.218519
>
> # sapply() on stock_return
> sapply(stock_return, FUN = sharpe)
apple ibm micr
0.3546496 0.2000819 0.2185190
>
> # sapply() on stock_return with optional arguments
> sapply(stock_return, FUN = sharpe, simplify = FALSE, USE.NAMES = FALSE)
$apple
[1] 0.3546496
$ibm
[1] 0.2000819
$micr
[1] 0.218519
For interactive use, sapply() is great. It guesses the output type so that it can simplify, and normally that is fine. However, sapply() is not a safe option to be used when writing functions. If sapply() cannot simplify your output, then it will default to returning a list just like lapply(). This can be dangerous and break custom functions if you wrote them expecting sapply() to return a simplified vector.
Let's look at an exercise using a list containing information about the stock market crash of 2008.
Instructions
The list market_crash has been created for you.
A new list, market_crash2 has been created. The difference is in the creation of the date!
date in market_crash2 has multiple classes. Why couldn't sapply() simplify this?
# Market crash with as.Date()
market_crash <- list(dow_jones_drop = 777.68,
date = as.Date("2008-09-28"))
# Find the classes with sapply()
sapply(market_crash, class)
# Market crash with as.POSIXct()
market_crash2 <- list(dow_jones_drop = 777.68,
date = as.POSIXct("2008-09-28"))
# Find the classes with lapply()
lapply(market_crash2, class)
# Find the classes with sapply()
sapply(market_crash2, class)
Output:
> # Market crash with as.Date()
> market_crash <- list(dow_jones_drop = 777.68,
date = as.Date("2008-09-28"))
>
> # Find the classes with sapply()
> sapply(market_crash, class)
dow_jones_drop date
"numeric" "Date"
>
> # Market crash with as.POSIXct()
> market_crash2 <- list(dow_jones_drop = 777.68,
date = as.POSIXct("2008-09-28"))
>
> # Find the classes with lapply()
> lapply(market_crash2, class)
$dow_jones_drop
[1] "numeric"
$date
[1] "POSIXct" "POSIXt"
>
> # Find the classes with sapply()
> sapply(market_crash2, class)
$dow_jones_drop
[1] "numeric"
$date
[1] "POSIXct" "POSIXt"
In the last example, sapply() failed to simplify because the date element of market_crash2 had two classes (POSIXct and POSIXt). Notice, however, that no error was thrown! If a function you had written expected a simplified vector to be returned by sapply(), this would be confusing.
To account for this, there is a more strict apply function called vapply(), which contains an extra argument FUN.VALUE where you can specify the type and length of the output that should be returned each time your applied function is called.
If you expected the return value of class() to be a character vector of length 1, you can specify that using vapply():
vapply(market_crash, class, FUN.VALUE = character(1))
dow_jones_drop date
"numeric" "Date"
Other examples of FUN.VALUE might be numeric(2) or logical(1).
Instructions
market_crash2 is again defined for you.
# Market crash with as.POSIXct()
market_crash2 <- list(dow_jones_drop = 777.68,
date = as.POSIXct("2008-09-28"))
# Find the classes with sapply()
sapply(market_crash2, class)
# Find the classes with vapply()
vapply(market_crash2, class, FUN.VALUE = character(1))
Output:
# Market crash with as.POSIXct()
market_crash2 <- list(dow_jones_drop = 777.68,
date = as.POSIXct("2008-09-28"))
# Find the classes with sapply()
sapply(market_crash2, class)
# Find the classes with vapply()
vapply(market_crash2, class, FUN.VALUE = character(1))
The difference between vapply() and sapply() was shown in the last example to demonstrate vapply() appropriately failing, but what about when it doesn't fail? When there are no errors, vapply() returns a simplified result according to the FUN.VALUE argument.
Instructions
The stock_return dataset is in your workspace containing daily returns for Apple, IBM, and Microsoft. The sharpe() function is also available.
# Sharpe ratio for all stocks
vapply(stock_return, sharpe, FUN.VALUE = numeric(1))
# Summarize Apple
summary(stock_return$apple)
# Summarize all stocks
vapply(stock_return, summary, FUN.VALUE = numeric(6))
Output:
> # Sharpe ratio for all stocks
> vapply(stock_return, sharpe, FUN.VALUE = numeric(1))
apple ibm micr
0.3546496 0.2000819 0.2185190
>
> # Summarize Apple
> summary(stock_return$apple)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.007796 -0.001259 0.002318 0.002838 0.006688 0.016681
>
> # Summarize all stocks
> vapply(stock_return, summary, FUN.VALUE = numeric(6))
apple ibm micr
Min. -0.007795768 -0.0076776574 -0.0120826709
1st Qu. -0.001258710 -0.0022982516 -0.0045083719
Median 0.002317782 0.0004757609 -0.0006287331
Mean 0.002838389 0.0019268062 0.0024729391
3rd Qu. 0.006687794 0.0032577550 0.0056777241
Max. 0.016681377 0.0276894294 0.0236864053
As a last exercise, you'll learn about a concept called anonymous functions. So far, when calling an apply function like vapply(), you have been passing in named functions to FUN. Doesn't it seem like a waste to have to create a function just for that specific vapply() call? Instead, you can use anonymous functions!
Named function:
percent_to_decimal <- function(percent) {
percent / 100
}
Anonymous function:
function(percent) { percent / 100 }
As you can see, anonymous functions are basically functions that aren't assigned a name. To use them in vapply() you might do:
vapply(stock_return, FUN = function(percent) { percent / 100 },
FUN.VALUE = numeric(2))
apple ibm
[1,] 0.003744634 0.001251408
[2,] -0.007188353 -0.001124859
Instructions
stock_return is in your workspace.
# Max and min
vapply(stock_return,
FUN = function(x) { c(max(x), min(x)) },
FUN.VALUE = numeric(2))
Output:
> # Max and min
> vapply(stock_return,
FUN = function(x) { c(max(x), min(x)) },
FUN.VALUE = numeric(2))
apple ibm micr
[1,] 0.016681377 0.027689429 0.02368641
[2,] -0.007795768 -0.007677657 -0.01208267