Dates

What day is it?

R has a lot to offer in terms of dates and times. The two main classes of data for this are Date and POSIXct. Date is used for calendar date objects like "2015-01-22". POSIXct is a way to represent datetime objects like "2015-01-22 08:39:40 EST", meaning that it is 40 seconds after 8:39 AM Eastern Standard Time.

In practice, the best strategy is to use the simplest class that you need. Often, Date will be the simplest choice. This course will use the Date class almost exclusively, but it is important to be aware of POSIXct as well for storing intraday financial data.

In the exercise below, you will explore your first date and time objects by asking R to return the current date and the current time.

Instructions

  • Type Sys.Date() to have R return the current date.
  • Type Sys.time() to have R return the current date and time. Notice the difference in capitalization of Date vs time.
  • Store Sys.Date() in the variable today.
  • Use class() on today to confirm its class.
# What is the current date?
Sys.Date()
## [1] "2020-09-06"
# What is the current date and time?
Sys.time()
## [1] "2020-09-06 06:58:57 +07"
# Create the variable today
today <- Sys.Date()

# Confirm the class of today
class(today)
## [1] "Date"

From char to date

You will often have to create dates yourself from character strings. The as.Date() function is the best way to do this:

# The Great Crash of 1929
great_crash <- as.Date("1929-11-29")

great_crash
[1] "1929-11-29"

class(great_crash)
[1] "Date"

Notice that the date is given in the format of "yyyy-mm-dd". This is known as ISO format (ISO = International Organization for Standardization), and is the way R accepts and displays dates.

Internally, dates are stored as the number of days since January 1, 1970, and datetimes are stored as the number of seconds since then. You will confirm this in the exercises below.

Instructions

  • Create a date variable named crash for "2008-09-29", the date of the largest stock market point drop in a single day.
  • Print crash.
  • Use as.numeric() on crash to convert it to the number of days since January 1, 1970.
  • Wrap as.numeric() around Sys.time() to see the current time in number of seconds since January 1, 1970.
  • Attempt to create a date from "09/29/2008". What happens?
# Create crash
crash <- as.Date("2008-09-29")

# Print crash
crash
## [1] "2008-09-29"
# crash as a numeric
as.numeric(crash)
## [1] 14151
# Current time as a numeric
as.numeric(Sys.time())
## [1] 1599350338
# Incorrect date format
#as.Date("09/29/2008")

Many dates

Creating a single date is nice to know how to do, but with financial data you will often have a large number of dates to work with. When this is the case, you will need to convert multiple dates from character to date format. You can do this all at once using vectors. In fact, if you remembered that a single character is actually a vector of length 1, then you would know that you have been doing this all along!

# Create a vector of daily character dates
dates <- c("2017-01-01", "2017-01-02",
           "2017-01-03", "2017-01-04") 

as.Date(dates)
[1] "2017-01-01" "2017-01-02" "2017-01-03" "2017-01-04"

Like before, this might look like it returned another character vector, but internally these are all stored as numerics, with some special properties that only dates have.

Instructions

  • Create another vector of dates containing the 4 days from "2017-02-05" to "2017-02-08" inclusive. Call this dates.
  • Assign the days of the week "Sunday", "Monday", "Tuesday", "Wednesday", in that order, as names() of the vector dates.
  • Subset dates using [ ] to retrieve only the date for "Monday".
# Create dates from "2017-02-05" to "2017-02-08" inclusive.
dates <- c("2017-02-05", "2017-02-06", "2017-02-07", "2017-02-08") 

# Add names to dates
names(dates) <- c("Sunday", "Monday", "Tuesday", "Wednesday")

# Subset dates to only return the date for Monday
dates["Monday"]
##       Monday 
## "2017-02-06"

Date formats (1)

As you saw earlier, R is picky about how it reads dates. To remind you, as.Date("09/28/2008") threw an error because it was not in the correct format. The fix for this is to specify the format you are using through the format argument:

as.Date("09/28/2008", format = "%m / %d / %Y")
[1] "2008-09-29"

This might look strange, but the basic idea is that you are defining a character vector telling R that your date is in the form of mm/dd/yyyy. It then knows how to extract the components and switch to yyyy-mm-dd.

There are a number of different formats you can specify, here are a few of them:

  • %Y: 4-digit year (1982)
  • %y: 2-digit year (82)
  • %m: 2-digit month (01)
  • %d: 2-digit day of the month (13)
  • %A: weekday (Wednesday)
  • %a: abbreviated weekday (Wed)
  • %B: month (January)
  • %b: abbreviated month (Jan)

Instructions

  • In this exercise you will work with the date, "1930-08-30", Warren Buffett's birth date!

  • Use as.Date() and an appropriate format to convert "08,30,1930" to a date (it is in the form of "month,day,year").
  • Use as.Date() and an appropriate format to convert "Aug 30,1930" to a date.
  • Use as.Date() and an appropriate format to convert "30aug1930" to a date.

# "08,30,30"
as.Date("08,30,1930", format = "%m, %d, %Y")
## [1] "1930-08-30"
# "Aug 30,1930"
as.Date("Aug 30,1930", format = "%b %d, %Y")
## [1] "1930-08-30"
# "30aug1930"
as.Date("30aug1930", format = "%d%b%Y")
## [1] "1930-08-30"

Date formats (2)

Not only can you convert characters to dates, but you can convert objects that are already dates to differently formatted dates using format():

# The best point move in stock market history. A +936 point change in the Dow!
best_date
[1] "2008-10-13"

format(best_date, format = "%Y/%m/%d")
[1] "2008/10/13"

format(best_date, format = "%B %d, %Y")
[1] "October 10, 2008"

As a reminder, here are the formats:

  • %Y: 4-digit year (1982)
  • %y: 2-digit year (82)
  • %m: 2-digit month (01)
  • %d: 2-digit day of the month (13)
  • %A: weekday (Wednesday)
  • %a: abbreviated weekday (Wed)
  • %B: month (January)
  • %b: abbreviated month (Jan)

Instructions

  • The vector char_dates has been created for you
  • Create the vector dates from char_date. Specify the format so R reads them correctly.
  • Modify dates using format() so that each date looks like "Jan 04, 17".
  • Modify dates using format() so that each date looks like "01,04,2017".
# char_dates
char_dates <- c("1jan17", "2jan17", "3jan17", "4jan17", "5jan17")

# Create dates using as.Date() and the correct format 
dates <- as.Date(char_dates, format = "%d%b%y")

# Use format() to go from "2017-01-04" -> "Jan 04, 17"
format(dates, format = "%b %d, %y")
## [1] "Jan 01, 17" "Jan 02, 17" "Jan 03, 17" "Jan 04, 17" "Jan 05, 17"
# Use format() to go from "2017-01-04" -> "01,04,2017"
format(dates, format = "%m,%d,%Y")
## [1] "01,01,2017" "01,02,2017" "01,03,2017" "01,04,2017" "01,05,2017"

Subtraction of dates

Just like with numerics, arithmetic can be done on dates. In particular, you can find the difference between two dates, in days, by using subtraction:

today <- as.Date("2017-01-02")
tomorrow <- as.Date("2017-01-03")
one_year_away <- as.Date("2018-01-02")

tomorrow - today
Time difference of 1 days

one_year_away - today
Time difference of 365 days

Equivalently, you could use the difftime() function to find the time interval instead.

difftime(tomorrow, today)
Time difference of 1 days

# With some extra options!
difftime(tomorrow, today, units = "secs")
Time difference of 86400 secs

Instructions

  • A vector of dates has been created for you.
  • You can use subtraction to confirm that January 1, 1970 is the first date that R counts from. First, create a variable called origin containing "1970-01-01" as a date.
  • Now, use as.numeric() on dates to see how many days from January 1, 1970 it has been.
  • Finally, subtract origin from dates to confirm the results! (Notice how recycling is used here!)
# Dates
dates <- as.Date(c("2017-01-01", "2017-01-02", "2017-01-03"))

# Create the origin
origin <- as.Date("1970-01-01")

# Use as.numeric() on dates
as.numeric(dates)
## [1] 17167 17168 17169
# Find the difference between dates and origin
dates - origin
## Time differences in days
## [1] 17167 17168 17169

months() and weekdays() and quarters(), oh my!

As a final lesson on dates, there are a few functions that are useful for extracting date components. One of those is months().

my_date <- as.Date("2017-01-02")

months(my_date)
[1] "January"

Two other useful functions are weekdays() to extract the day of the week that your date falls on, and quarters() to determine which quarter of the year (Q1-Q4) that your date falls in.

Instructions

# dates
dates <- as.Date(c("2017-01-02", "2017-05-03", "2017-08-04", "2017-10-17"))

# Extract the months
months(dates)
## [1] "January" "May"     "August"  "October"
# Extract the quarters
quarters(dates)
## [1] "Q1" "Q2" "Q3" "Q4"
# dates2
dates2 <- as.Date(c("2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"))

# Assign the weekdays() of dates2 as the names()
names(dates2) <- weekdays(dates2)

# Print dates2
dates2
##       Monday      Tuesday    Wednesday     Thursday 
## "2017-01-02" "2017-01-03" "2017-01-04" "2017-01-05"

If Statements and Operators

Relational practice

In the video, Lore taught you all about different types of relational operators. For reference, here they are again:

> : Greater than
>=: Greater than or equal to
< : Less than
<=: Less than or equal to
==: Equality
!=: Not equal

These relational operators let us make comparisons in our data. If the equation is true, then the relational operator will return TRUE, otherwise it will return FALSE.

apple <- 45.46
microsoft <- 67.88

apple <= microsoft
[1] TRUE
hello <- "Hello world"

# Case sensitive!
hello == "hello world"
[1] FALSE

Instructions

micr and apple stock prices have been created for you.

  • Is apple larger than micr? Use >.
  • The != operator returns TRUE if two objects are not equal. Use != with apple and micr.

Two dates have been created for you.

  • Is tomorrow less than today?
# Stock prices
apple <- 48.99
micr <- 77.93

# Apple vs Microsoft
apple > micr
## [1] FALSE
# Not equals
apple != micr
## [1] TRUE
# Dates - today and tomorrow
today <- as.Date(Sys.Date())
tomorrow <- as.Date(Sys.Date() + 1)

# Today vs Tomorrow
tomorrow < today
## [1] FALSE

Vectorized operations

You can extend the concept of relational operators to vectors of any arbitrary length. Compare two vectors using > to get a logical vector back of the same length, holding TRUE when the first is greater than the second, and FALSE otherwise.

apple <- c(120.00, 120.08, 119.97, 121.88)
datacamp  <- c(118.5, 124.21, 125.20, 120.22)

apple > datacamp
[1]  TRUE FALSE FALSE  TRUE

Comparing a vector and a single number works as well. R will recycle the number to be the same length as the vector:

apple > 120
[1] FALSE  TRUE FALSE  TRUE

Imagine how this could be used as a buy/sell signal in stock analysis!

Instructions

A data.frame, stocks is in your workspace.

  • Print stocks.
  • You want to buy ibm when it crosses below 175. Use $ to select the ibm column and a logical operator to know when this happens. Add it to stocks as the column, ibm_buy.
  • If panera crosses above 213, sell. Use a logical operator to know when this happens. Add it to stocks as the column, panera_sell.
  • Is ibm ever above panera? Add the result to stocks as the column, ibm_vs_panera.
  • Print stocks.
# Print stocks
stocks

# IBM range
stocks$ibm_buy <- stocks$ibm < 175

# Panera range
stocks$panera_sell <- stocks$panera > 213

# IBM vs Panera
stocks$ibm_vs_panera <- stocks$ibm > stocks$panera

# Print stocks
stocks

Output:

> # Print stocks
> stocks
        date    ibm panera
1 2017-01-20 170.55 216.65
2 2017-01-23 171.03 216.06
3 2017-01-24 175.90 213.55
4 2017-01-25 178.29 212.22
> 
> # IBM range
> stocks$ibm_buy <- stocks$ibm < 175
> 
> # Panera range
> stocks$panera_sell <- stocks$panera > 213
> 
> # IBM vs Panera
> stocks$ibm_vs_panera <- stocks$ibm > stocks$panera
> 
> # Print stocks
> stocks
        date    ibm panera ibm_buy panera_sell ibm_vs_panera
1 2017-01-20 170.55 216.65    TRUE        TRUE         FALSE
2 2017-01-23 171.03 216.06    TRUE        TRUE         FALSE
3 2017-01-24 175.90 213.55   FALSE        TRUE         FALSE
4 2017-01-25 178.29 212.22   FALSE       FALSE         FALSE

And / Or

You might want to check multiple relational conditions at once. What if you wanted to know if Apple stock was above 120, but below 121? Simple relational operators are not enough! For multiple conditions, you need the And operator &, and the Or operator |.

  • & (And): An intersection. a & b is true only if both a and b are true.
  • | (Or): A union. a | b is true if either a or b is true. apple <- c(120.00, 120.08, 119.97, 121.88)
# Both conditions must hold
(apple > 120) & (apple < 121)
[1] FALSE  TRUE FALSE FALSE

# Only one condition has to hold
(apple <= 120) | (apple > 121)
[1]  TRUE FALSE  TRUE  TRUE

Instructions

stocks is in your workspace.

  • When is ibm between 171 and 176? Add the logical vector to stocks as ibm_buy_range.
  • Check if panera drops below 213.20 or rises above 216.50, then add it to stocks as the column panera_spike.
  • Suppose you are interested in dates after 2017-01-21 but before 2017-01-25, exclusive. Use as.Date() and & for this. Add the result to stocks as good_dates.
  • Print stocks.
# IBM buy range 
stocks$ibm_buy_range <- (stocks$ibm > 171) & (stocks$ibm < 176) 
# Panera spikes 
stocks$panera_spike <- (stocks$panera < 213.20) | (stocks$panera > 216.50)  
# Date range    
stocks$good_dates <- (stocks$date > as.Date("2017-01-21")) & (stocks$date < as.Date("2017-01-25"))  
# Print stocks  
stocks  

Output:

# IBM buy range 
stocks$ibm_buy_range <- (stocks$ibm > 171) & (stocks$ibm < 176) 
# Panera spikes 
stocks$panera_spike <- (stocks$panera < 213.20) | (stocks$panera > 216.50)  
# Date range    
stocks$good_dates <- (stocks$date > as.Date("2017-01-21")) & (stocks$date < as.Date("2017-01-25"))  
# Print stocks  
stocks  

Not!

One last operator to introduce is ! or, Not. You have already seen a similar operator, !=, so you might be able to guess what it does. Add ! in front of a logical expression, and it will flip that expression from TRUE to FALSE (and vice versa).

!TRUE
[1] FALSE

apple <- c(120.00, 120.08, 119.97, 121.88)

!(apple < 121)
[1] FALSE FALSE FALSE  TRUE

Instructions

stocks is in your workspace.

  • Use ! and a relational operator to know when ibm is not above 176.
  • A new vector, missing, has been created, which contains missing data.
  • The function is.na() checks for missing data. Use is.na() on missing.
  • Suppose you are more interested in where you are not missing data. ! can show you this. Use ! in front of is.na() to show positions where you do have data.
# IBM range
!(stocks$ibm > 176)

# Missing data
missing <- c(24.5, 25.7, NA, 28, 28.6, NA)

# Is missing?
is.na(missing)

# Not missing?
!is.na(missing)

Output:

> # IBM range
> !(stocks$ibm > 176)
[1]  TRUE  TRUE  TRUE FALSE
> 
> # Missing data
> missing <- c(24.5, 25.7, NA, 28, 28.6, NA)
> 
> # Is missing?
> is.na(missing)
[1] FALSE FALSE  TRUE FALSE FALSE  TRUE
> 
> # Not missing?
> !is.na(missing)
[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE

Logicals and subset()

Here's a fun problem. You know how to create logical vectors that tell you when a certain condition is true, but can you subset a data frame to only contains rows where that condition is true?

If you took Introduction to R for Finance, you might remember the subset() function. subset() takes as arguments a data frame (or vector/matrix) and a logical vector of which rows to return:

stocks
        date    ibm panera
1 2017-01-20 170.55 216.65
2 2017-01-23 171.03 216.06
3 2017-01-24 175.90 213.55
4 2017-01-25 178.29 212.22

subset(stocks, ibm < 175)
        date    ibm panera
1 2017-01-20 170.55 216.65
2 2017-01-23 171.03 216.06

Useful, right?

Instructions

stocks is in your workspace.

  • Subset stocks to include rows where panera is greater than 216.
  • Subset stocks to retrieve the row where date is equal to "2017-01-23". Don't forget as.Date()!
  • Subset stocks to retrieve rows where ibm is less than 175 and panera is less than 216.50.
# Panera range
subset(stocks, panera > 216)

# Specific date
subset(stocks, date == as.Date("2017-01-23"))

# IBM and Panera joint range
subset(stocks, ibm < 175 & panera < 216.50)

Output:

> # Panera range
> subset(stocks, panera > 216)
        date    ibm panera
1 2017-01-20 170.55 216.65
2 2017-01-23 171.03 216.06
> 
> # Specific date
> subset(stocks, date == as.Date("2017-01-23"))
        date    ibm panera
2 2017-01-23 171.03 216.06
> 
> # IBM and Panera joint range
> subset(stocks, ibm < 175 & panera < 216.50)
        date    ibm panera
2 2017-01-23 171.03 216.06

All together now!

Great! You have learned a lot about operators and subsetting. This will serve you well in future data analysis projects. Let's do one last exercise that combines a number of operators together.

Instructions

A new version of stocks is in your workspace.

  • First, print stocks. It contains Apple and Microsoft prices for December, 2016.
  • It seems like you have missing data. Let's investigate further. Use weekdays() on the date column, and assign it to stocks as the column, weekday.
  • View stocks now. The missing data is on weekends! This makes sense, the stock market is not open on weekends.
  • Remove the missing rows using subset(). Use!is.na() on apple as your condition. Assign this new data.frame to stocks_no_NA.
  • Now, you are interested in days where apple was above 117, or when micr was above 63. Use relational operators, |, and subset() to accomplish this with stocks_no_NA.
# View stocks
stocks

# Weekday investigation
stocks$weekday <- weekdays(stocks$date)

# View stocks again
stocks

# Remove missing data
stocks_no_NA <- subset(stocks, !is.na(apple))

# Apple and Microsoft joint range
subset(stocks_no_NA, apple > 117 | micr > 63)

Output:

> # View stocks
> stocks
         date  apple  micr
1  2016-12-01 109.49 59.20
2  2016-12-02 109.90 59.25
3  2016-12-03     NA    NA
4  2016-12-04     NA    NA
5  2016-12-05 109.11 60.22
6  2016-12-06 109.95 59.95
7  2016-12-07 111.03 61.37
8  2016-12-08 112.12 61.01
9  2016-12-09 113.95 61.97
10 2016-12-10     NA    NA
11 2016-12-11     NA    NA
12 2016-12-12 113.30 62.17
13 2016-12-13 115.19 62.98
14 2016-12-14 115.19 62.68
15 2016-12-15 115.82 62.58
16 2016-12-16 115.97 62.30
17 2016-12-17     NA    NA
18 2016-12-18     NA    NA
19 2016-12-19 116.64 63.62
20 2016-12-20 116.95 63.54
21 2016-12-21 117.06 63.54
22 2016-12-22 116.29 63.55
23 2016-12-23 116.52 63.24
24 2016-12-24     NA    NA
25 2016-12-25     NA    NA
26 2016-12-27 117.26 63.28
27 2016-12-28 116.76 62.99
28 2016-12-29 116.73 62.90
29 2016-12-30 115.82 62.14
> 
> # Weekday investigation
> stocks$weekday <- weekdays(stocks$date)
> 
> # View stocks again
> stocks
         date  apple  micr   weekday
1  2016-12-01 109.49 59.20  Thursday
2  2016-12-02 109.90 59.25    Friday
3  2016-12-03     NA    NA  Saturday
4  2016-12-04     NA    NA    Sunday
5  2016-12-05 109.11 60.22    Monday
6  2016-12-06 109.95 59.95   Tuesday
7  2016-12-07 111.03 61.37 Wednesday
8  2016-12-08 112.12 61.01  Thursday
9  2016-12-09 113.95 61.97    Friday
10 2016-12-10     NA    NA  Saturday
11 2016-12-11     NA    NA    Sunday
12 2016-12-12 113.30 62.17    Monday
13 2016-12-13 115.19 62.98   Tuesday
14 2016-12-14 115.19 62.68 Wednesday
15 2016-12-15 115.82 62.58  Thursday
16 2016-12-16 115.97 62.30    Friday
17 2016-12-17     NA    NA  Saturday
18 2016-12-18     NA    NA    Sunday
19 2016-12-19 116.64 63.62    Monday
20 2016-12-20 116.95 63.54   Tuesday
21 2016-12-21 117.06 63.54 Wednesday
22 2016-12-22 116.29 63.55  Thursday
23 2016-12-23 116.52 63.24    Friday
24 2016-12-24     NA    NA  Saturday
25 2016-12-25     NA    NA    Sunday
26 2016-12-27 117.26 63.28   Tuesday
27 2016-12-28 116.76 62.99 Wednesday
28 2016-12-29 116.73 62.90  Thursday
29 2016-12-30 115.82 62.14    Friday
> 
> # Remove missing data
> stocks_no_NA <- subset(stocks, !is.na(apple))
> 
> # Apple and Microsoft joint range
> subset(stocks_no_NA, apple > 117 | micr > 63)
         date  apple  micr   weekday
19 2016-12-19 116.64 63.62    Monday
20 2016-12-20 116.95 63.54   Tuesday
21 2016-12-21 117.06 63.54 Wednesday
22 2016-12-22 116.29 63.55  Thursday
23 2016-12-23 116.52 63.24    Friday
26 2016-12-27 117.26 63.28   Tuesday

If this

If statements are great for adding extra logical flow to your code. First, let's look at the basic structure of an if statement:

if(condition) {
    code
}

The condition is anything that returns a single TRUE or FALSE. If the condition is TRUE, then the code inside gets executed. Otherwise, the code gets skipped and the program continues. Here is an example:

apple <- 54.3

if(apple < 70) {
    print("Apple is less than 70")
}
[1] "Apple is less than 70"

Relational operators are a common way to create the condition in the if statement!

Instructions

The variable micr has been created for you.

  • Fill in the if statement that first tests if micr is less than 55, and if it is, then prints "Buy!".
# micr
micr <- 48.55

# Fill in the blanks
if( micr < 55 ) {
    print("Buy!")
}
## [1] "Buy!"

If this, Else that

An extension of the if statement is to perform a different action if the condition is false. You can do this by adding else after your if statement:

if(condition) {
    code if true
} else {
    code if false 
}

Instructions

Extend the last exercise by adding an else statement that prints "Do nothing!".

# micr
micr <- 57.44

# Fill in the blanks
if( micr < 55 ) {
    print("Buy!")
} else {
    print("Do nothing!")
}
## [1] "Do nothing!"

If this, Else If that, Else that other thing

To add even more logic, you can follow the pattern of if, else if, else. You can add as many else if's as you need for your control logic.

if(condition1) {
    code if condition1 is true
} else if(condition2) {
    code if condition2 is true
} else {
    code if both are false
}

Instructions

Extend the last example by filling in the blanks to complete the following logic: * if micr is less than 55, print "Buy!" * else if greater than or equal to 55 and micr is less than 75, print "Do nothing!" * else print "Sell!"

# micr
micr <- 105.67

# Fill in the blanks
if( micr < 55 ) {
    print("Buy!")
} else if( micr >= 55 & micr < 75 ){
    print("Do nothing!")
} else { 
    print("Sell!")
}
## [1] "Sell!"

Can you If inside an If?

Sometimes it makes sense to have nested if statements to add even more control. In the following exercise, you will add an if statement that checks if you are holding a share of the Microsoft stock before you attempt to sell it.

Here is the structure of nested if statements, it should look somewhat familiar:

if(condition1) {        
    if(condition2) {     
        code if both pass
    } else {            
        code if 1 passes, 2 fails
    }
} else {            
    code if 1 fails
}

Instructions

The variables micr and shares have been created for you.

  • Fill in the nested if statement to check if shares is greater than or equal to 1 before you decide to sell.
  • If this is true, then print "Sell!".
  • Else, print "Not enough shares to sell!".
# micr
micr <- 105.67
shares <- 1

# Fill in the blanks
if( micr < 55 ) {
    print("Buy!")
} else if( micr >= 55 & micr < 75 ) {
    print("Do nothing!")
} else { 
    if( shares >= 1 ) {
        print("Sell!")
    } else {
        print("Not enough shares to sell!")
    }
}
## [1] "Sell!"

ifelse()

A powerful function to know about is ifelse(). It creates an if statement in 1 line of code, and more than that, it works on entire vectors!

Suppose you have a vector of stock prices. What if you want to return "Buy!" each time apple > 110, and "Do nothing!", otherwise? A simple if statement would not be enough to solve this problem. However, with ifelse() you can do:

apple
[1] 109.49 109.90 109.11 109.95 111.03 112.12

ifelse(test = apple > 110, yes = "Buy!", no = "Do nothing!")
[1] "Do nothing!" "Do nothing!" "Do nothing!" "Do nothing!" "Buy!"       
[6] "Buy!"

ifelse() evaluates the test to get a logical vector, and where the logical vector is TRUE it replaces TRUE with whatever is in yes. Similarly, FALSE is replaced by no.

Instructions

stocks is in your workspace.

  • Use ifelse() to test if micr is above 60 but below 62. When true, return a 1 and when false return a 0. Add the result to stocks as the column, micr_buy.
  • Use ifelse() to test if apple is greater than 117. The returned value should be the date column if TRUE, and NA otherwise.
  • Print stocks. date became a numeric! ifelse() strips the date of its attribute before returning it, so it becomes a numeric.
  • Assigning the apple_date column the class() of "Date".
  • Print stocks again.
# Microsoft test
stocks$micr_buy <- ifelse(test = stocks$micr > 60 & stocks$micr < 62, yes = 1, no = 0)

# Apple test
stocks$apple_date <- ifelse(test = stocks$apple > 117, yes = stocks$date, no = NA)

# Print stocks
stocks

# Change the class() of apple_date.
class(stocks$apple_date) <- "Date"

# Print stocks again
stocks

Output:

> # Microsoft test
> stocks$micr_buy <- ifelse(test = stocks$micr > 60 & stocks$micr < 62, yes = 1, no = 0)
> 
> # Apple test
> stocks$apple_date <- ifelse(test = stocks$apple > 117, yes = stocks$date, no = NA)
> 
> # Print stocks
> stocks
         date  apple  micr micr_buy apple_date
1  2016-12-01 109.49 59.20        0         NA
2  2016-12-02 109.90 59.25        0         NA
5  2016-12-05 109.11 60.22        1         NA
6  2016-12-06 109.95 59.95        0         NA
7  2016-12-07 111.03 61.37        1         NA
8  2016-12-08 112.12 61.01        1         NA
9  2016-12-09 113.95 61.97        1         NA
12 2016-12-12 113.30 62.17        0         NA
13 2016-12-13 115.19 62.98        0         NA
14 2016-12-14 115.19 62.68        0         NA
15 2016-12-15 115.82 62.58        0         NA
16 2016-12-16 115.97 62.30        0         NA
19 2016-12-19 116.64 63.62        0         NA
20 2016-12-20 116.95 63.54        0         NA
21 2016-12-21 117.06 63.54        0      17156
22 2016-12-22 116.29 63.55        0         NA
23 2016-12-23 116.52 63.24        0         NA
26 2016-12-27 117.26 63.28        0      17162
27 2016-12-28 116.76 62.99        0         NA
28 2016-12-29 116.73 62.90        0         NA
29 2016-12-30 115.82 62.14        0         NA
> 
> # Change the class() of apple_date.
> class(stocks$apple_date) <- "Date"
> 
> # Print stocks again
> stocks
         date  apple  micr micr_buy apple_date
1  2016-12-01 109.49 59.20        0       <NA>
2  2016-12-02 109.90 59.25        0       <NA>
5  2016-12-05 109.11 60.22        1       <NA>
6  2016-12-06 109.95 59.95        0       <NA>
7  2016-12-07 111.03 61.37        1       <NA>
8  2016-12-08 112.12 61.01        1       <NA>
9  2016-12-09 113.95 61.97        1       <NA>
12 2016-12-12 113.30 62.17        0       <NA>
13 2016-12-13 115.19 62.98        0       <NA>
14 2016-12-14 115.19 62.68        0       <NA>
15 2016-12-15 115.82 62.58        0       <NA>
16 2016-12-16 115.97 62.30        0       <NA>
19 2016-12-19 116.64 63.62        0       <NA>
20 2016-12-20 116.95 63.54        0       <NA>
21 2016-12-21 117.06 63.54        0 2016-12-21
22 2016-12-22 116.29 63.55        0       <NA>
23 2016-12-23 116.52 63.24        0       <NA>
26 2016-12-27 117.26 63.28        0 2016-12-27
27 2016-12-28 116.76 62.99        0       <NA>
28 2016-12-29 116.73 62.90        0       <NA>
29 2016-12-30 115.82 62.14        0       <NA>

Loops

Repeat, repeat, repeat

Loops are a core concept in programming. They are used in almost every language. In R, there is another way of performing repeated actions using apply functions, but we will save those until chapter 5. For now, let's look at the repeat loop!

This is the simplest loop. You use repeat, and inside the curly braces perform some action. You must specify when you want to break out of the loop. Otherwise it runs for eternity!

repeat {
    code
    if(condition) {
        break
    }
}

Do not do the following. This is an infinite loop! In words, you are telling R to repeat your code for eternity.

repeat {
    code
}

Instructions

  • A repeat loop has been created for you. Run the script and see what happens.
  • Change the condition in the if statement to break when stock_price is below 125.
  • Update the stock price value in the print statement to be consistent with the change.
  • Rerun the script again. Then press Submit Answer.
# Stock price
stock_price <- 126.34

repeat {
  # New stock price
  stock_price <- stock_price * runif(1, .985, 1.01)
  print(stock_price)
  
  # Check
  if(stock_price < 125) {
    print("Stock price is below 125! Buy it while it's cheap!")
    break
  }
}
## [1] 126.9496
## [1] 125.8381
## [1] 126.5319
## [1] 126.1155
## [1] 126.228
## [1] 126.993
## [1] 128.0152
## [1] 127.1447
## [1] 125.4034
## [1] 125.2684
## [1] 123.7274
## [1] "Stock price is below 125! Buy it while it's cheap!"

When to break?

The order in which you execute your code inside the loop and check when you should break is important. The following would run the code a different number of times.

# Code, then check condition
repeat {
    code
    if(condition) {
        break
    }
}

# Check condition, then code
repeat {
    if(condition) {
        break
    }
    code
}

Let's see this in an extension of the previous exercise. For the purposes of this example, the runif() function has been replaced with a static multiplier to remove randomness.

Instructions

  • The structure of a repeat loop has been created. Fill in the blanks so that the loop checks if the stock_price is below 66, and breaks if so. Run this, and note the number of times that the stock price was printed.
  • Move the statement print(stock_price) to after the if statement, but still inside the repeat loop. Run the script again, how many times was the stock_price printed now?
# Stock price
stock_price <- 67.55

repeat {
  # New stock price
  stock_price <- stock_price * .995
  
 
  # Check
  if(stock_price < 66) {
    print("Stock price is below 66! Buy it while it's cheap!")
    break
  }
  print(stock_price)
}
## [1] 67.21225
## [1] 66.87619
## [1] 66.54181
## [1] 66.2091
## [1] "Stock price is below 66! Buy it while it's cheap!"

While with a print

While loops are slightly different from repeat loops. Like if statements, you specify the condition for them to run at the very beginning. There is no need for a break statement because the condition is checked at each iteration.

while (condition) {
    code
}

It might seem like the while loop is doing the exact same thing as the repeat loop, just with less code. In our cases, this is true. So, why ever use the repeat loop? Occasionally, there are cases when using a repeat loop to run forever is desired. If you are interested, click here and check out Intentional Looping.

For the exercise, imagine that you have a debt of $5000 that you need to pay back. Each month, you pay off $500 dollars, until you've paid everything off. You will use a loop to model the process of paying off the debt each month, where each iteration you decrease your total debt and print out the new total!

Instructions

  • The variable debt has been created for you.
  • Fill in the while loop condition to check if debt is greater than 0. If this is true, decrease debt by 500.
# Initial debt
debt <- 5000

# While loop to pay off your debt
while (debt > 0) {
  debt <- debt - 500
  print(paste("Debt remaining", debt))
}
## [1] "Debt remaining 4500"
## [1] "Debt remaining 4000"
## [1] "Debt remaining 3500"
## [1] "Debt remaining 3000"
## [1] "Debt remaining 2500"
## [1] "Debt remaining 2000"
## [1] "Debt remaining 1500"
## [1] "Debt remaining 1000"
## [1] "Debt remaining 500"
## [1] "Debt remaining 0"

While with a plot

Loops can be used for all kinds of fun examples! What if you wanted to visualize your debt decreasing over time? Like the last exercise, this one uses a loop to model paying it off, $500 at a time. However, at each iteration you will also append your remaining debt total to a plot, so that you can visualize the total decreasing as you go.

This exercise has already been done for you. Let's talk about what is happening here.

First, initialize some variables:

  • debt = Your current debt
  • i = Incremented each time debt is reduced. The next point on the x axis.
  • x_axis = A vector of i's. The x axis for the plots.
  • y_axis = A vector of debt. The y axis for the plots.
  • Also, create the first plot. Just a single point of your current debt.

Then, create a while loop. As long as you still have debt:

  • debt is reduced by 500.
  • i is incremented.
  • x_axis is extended by 1 more point.
  • y_axis is extended by the next debt point.
  • The next plot is created from the updated data.

After you run the code, you can use Previous Plot to go back and view all 11 of the created plots!

debt <- 5000    # initial debt
i <- 0          # x axis counter
x_axis <- i     # x axis
y_axis <- debt  # y axis

# Initial plot
plot(x_axis, y_axis, xlim = c(0,10), ylim = c(0,5000))

# Graph your debt
while (debt > 0) {

  # Updating variables
  debt <- debt - 500
  i <- i + 1
  x_axis <- c(x_axis, i)
  y_axis <- c(y_axis, debt)
  
  # Next plot
  plot(x_axis, y_axis, xlim = c(0,10), ylim = c(0,5000))
}

Break it

Sometimes, you have to end your while loop early. With the debt example, if you don't have enough cash to pay off all of your debt, you won't be able to continuing paying it down. In this exercise, you will add an if statement and a break to let you know if you run out of money!

while (condition) {
    code
    if (breaking_condition) {
        break
    }
}

The while loop will completely stop, and all lines after it will be run, if the breaking_condition is met. In this case, that condition will be running out of cash!

Instructions

debt and cash have been defined for you.

  • First, fill in the while loop, but don't touch the commented if statement. It should decrement cash and debt by 500 each time. Run this. What happens to cash when you reach 0 debt?
  • Negative cash? That's not good! Remove the comments and fill in the if statement. It should break if you run out of cash. Specifically, if cash equals 0. Run the entire program again.
# debt and cash
debt <- 5000
cash <- 4000

# Pay off your debt...if you can!
while (debt > 0) {
  debt <- debt - 500
  cash <- cash - 500
  print(paste("Debt remaining:", debt, "and Cash remaining:", cash))

  if (cash == 0) {
    print("You ran out of cash!")
    break
  }
}
## [1] "Debt remaining: 4500 and Cash remaining: 3500"
## [1] "Debt remaining: 4000 and Cash remaining: 3000"
## [1] "Debt remaining: 3500 and Cash remaining: 2500"
## [1] "Debt remaining: 3000 and Cash remaining: 2000"
## [1] "Debt remaining: 2500 and Cash remaining: 1500"
## [1] "Debt remaining: 2000 and Cash remaining: 1000"
## [1] "Debt remaining: 1500 and Cash remaining: 500"
## [1] "Debt remaining: 1000 and Cash remaining: 0"
## [1] "You ran out of cash!"

Loop over a vector

Last, but not least, in our discussion of loops is the for loop. When you know how many times you want to repeat an action, a for loop is a good option. The idea of the for loop is that you are stepping through a sequence, one at a time, and performing an action at each step along the way. That sequence is commonly a vector of numbers (such as the sequence from 1:10), but could also be numbers that are not in any order like c(2, 5, 4, 6), or even a sequence of characters!

for (value in sequence) {
    code
}

In words this is saying, "for each value in my sequence, run this code." Examples could be, "for each row of my data frame, print column 1", or "for each word in my sentence, check if that word is DataCamp."

Let's try an example! First, you will create a loop that prints out the values in a sequence from 1 to 10. Then, you will modify that loop to also sum the values from 1 to 10, where at each iteration the next value in the sequence is added to the running sum.

Instructions

  • A vector seq has been created for you.
  • Fill in the for loop, using seq as your sequence. Print out value during each iteration.
  • A variable sum has been created for you.
  • Use the loop to sum the numbers in seq. Each iteration, value should be added to sum, then sum is printed out.
# Sequence
seq <- c(1:10)

# Print loop
for (value in seq) {
    print(value)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
# A sum variable
sum <- 0

# Sum loop
for (value in seq) {
    sum <- sum + value
    print(sum)
}
## [1] 1
## [1] 3
## [1] 6
## [1] 10
## [1] 15
## [1] 21
## [1] 28
## [1] 36
## [1] 45
## [1] 55

Loop over data frame rows

Imagine that you are interested in the days where the stock price of Apple rises above 117. If it goes above this value, you want to print out the current date and stock price. If you have a stock data frame with a date and apple price column, could you loop over the rows of the data frame to accomplish this? You certainly could!

Before you do so, note that you can get the number of rows in your data frame using nrow(stock). Then, you can create a sequence to loop over from 1:nrow(stock).

for (row in 1:nrow(stock)) {
    price <- stock[row, "apple"]
    date  <- stock[row, "date"]

    if(price > 117) {
        print(paste("On", date, 
                    "the stock price was", price))
    }
}
[1] "On 2016-12-21 the stock price was 117.06"
[1] "On 2016-12-27 the stock price was 117.26"

This incorporates a number of things we have learned so far. If statements, subsetting vectors, conditionals, and loops! Congratulations for learning so much!

Instructions

stock is in your workspace.

  • Fill in the blanks in the for loop to make the following true:
  • price should hold that iteration's price
  • date should hold that iteration's date
  • This time, you want to know if apple goes above 116.
  • If it does, print the date and price.
  • If it was below 116, print out the date and print that it was not an important day!
# Loop over stock rows
for (row in 1:nrow(stock)) {
    price <- stock[row, "apple"]
    date  <- stock[row, "date"]

    if(price > 116) {
        print(paste("On", date, 
                    "the stock price was", price))
    } else {
        print(paste("The date:", date, 
                    "is not an important day!"))
    }
}

Output:

> # Loop over stock rows
> for (row in 1:nrow(stock)) {
      price <- stock[row, "apple"]
      date  <- stock[row, "date"]
  
      if(price > 116) {
          print(paste("On", date, 
                      "the stock price was", price))
      } else {
          print(paste("The date:", date, 
                      "is not an important day!"))
      }
  }
[1] "The date: 2016-12-01 is not an important day!"
[1] "The date: 2016-12-02 is not an important day!"
[1] "The date: 2016-12-05 is not an important day!"
[1] "The date: 2016-12-06 is not an important day!"
[1] "The date: 2016-12-07 is not an important day!"
[1] "The date: 2016-12-08 is not an important day!"
[1] "The date: 2016-12-09 is not an important day!"
[1] "The date: 2016-12-12 is not an important day!"
[1] "The date: 2016-12-13 is not an important day!"
[1] "The date: 2016-12-14 is not an important day!"
[1] "The date: 2016-12-15 is not an important day!"
[1] "The date: 2016-12-16 is not an important day!"
[1] "On 2016-12-19 the stock price was 116.64"
[1] "On 2016-12-20 the stock price was 116.95"
[1] "On 2016-12-21 the stock price was 117.06"
[1] "On 2016-12-22 the stock price was 116.29"
[1] "On 2016-12-23 the stock price was 116.52"
[1] "On 2016-12-27 the stock price was 117.26"
[1] "On 2016-12-28 the stock price was 116.76"
[1] "On 2016-12-29 the stock price was 116.73"
[1] "The date: 2016-12-30 is not an important day!"

Loop over matrix elements

So far, you have been looping over 1 dimensional data types. If you want to loop over elements in a matrix (columns and rows), then you will have to use nested loops. You will use this idea to print out the correlations between three stocks.

The easiest way to think about this is that you are going to start on row1, and move to the right, hitting col1, col2, ..., up until the last column in row1. Then, you move down to row2 and repeat the process.

my_matrix
     [,1]   [,2]  
[1,] "r1c1" "r1c2"
[2,] "r2c1" "r2c2"

# Loop over my_matrix
for(row in 1:nrow(my_matrix)) {
    for(col in 1:ncol(my_matrix)) {
        print(my_matrix[row, col])
    }
}
[1] "r1c1"
[1] "r1c2"
[1] "r2c1"
[1] "r2c2"

Instructions

The correlation matrix, corr, is in your workspace.

  • Print corr to get a peek at the data.
  • Fill in the nested for loop! It should satisfy the following:
  • The outer loop should be over the rows of corr.
  • The inner loop should be over the cols of corr.
  • The print statement should print the names of the current column and row, and also print their correlation.
# Print out corr
corr

# Create a nested loop
for(row in 1:nrow(corr)) {
    for(col in 1:ncol(corr)) {
        print(paste(colnames(corr)[col], "and", rownames(corr)[row],
                    "have a correlation of", corr[row,col]))
    }
}

Output:

> # Print out corr
> corr
      apple  ibm micr
apple  1.00 0.96 0.88
ibm    0.96 1.00 0.74
micr   0.88 0.74 1.00
> 
> # Create a nested loop
> for(row in 1:nrow(corr)) {
      for(col in 1:ncol(corr)) {
          print(paste(colnames(corr)[col], "and", rownames(corr)[row],
                      "have a correlation of", corr[row,col]))
      }
  }
[1] "apple and apple have a correlation of 1"
[1] "ibm and apple have a correlation of 0.96"
[1] "micr and apple have a correlation of 0.88"
[1] "apple and ibm have a correlation of 0.96"
[1] "ibm and ibm have a correlation of 1"
[1] "micr and ibm have a correlation of 0.74"
[1] "apple and micr have a correlation of 0.88"
[1] "ibm and micr have a correlation of 0.74"
[1] "micr and micr have a correlation of 1"

Break and next

To finish your lesson on loops, let's return to the concept of break, and the related concept of next. Just like with repeat and while loops, you can break out of a for loop completely by using the break statement. Additionally, if you just want to skip the current iteration, and continue the loop, you can use the next statement. This can be useful if your loop encounters an error, but you don't want it to break everything.

for (value in sequence) {
    if(next_condition) {
        next
    }
    code
    if(breaking_condition) {
        break
    }
}

You don't have to use both break and next at the same time, this simply shows the general structure of using them.

The point of using next at the beginning, before the code runs, is to check for a problem before it happens.

Instructions

The apple vector is in your workspace.

  • Print out apple. You have some missing values!
  • Fill in the blanks in the loop to do the following:
  • Check if value is NA. If so, go to the next iteration.
  • Check if value is above 117. If so, break and sell!
  • Else print "Nothing to do here!"
# Print apple
apple

# Loop through apple. Next if NA. Break if above 117.
for (value in apple) {
    if(is.na(value)) {
        print("Skipping NA")
        next
    }
    
    if(value > 117) {
        print("Time to sell!")
        break
    } else {
        print("Nothing to do here!")
    }
}

Output:

> # Print apple
> apple
 [1] 109.49 109.90     NA     NA 109.11 109.95 111.03 112.12 113.95     NA
[11]     NA 113.30 115.19 115.19 115.82 115.97     NA     NA 116.64 116.95
[21] 117.06 116.29 116.52     NA     NA 117.26 116.76 116.73 115.82
> 
> # Loop through apple. Next if NA. Break if above 117.
> for (value in apple) {
      if(is.na(value)) {
          print("Skipping NA")
          next
      }
      
      if(value > 117) {
          print("Time to sell!")
          break
      } else {
          print("Nothing to do here!")
      }
  }
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Skipping NA"
[1] "Skipping NA"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Skipping NA"
[1] "Skipping NA"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Skipping NA"
[1] "Skipping NA"
[1] "Nothing to do here!"
[1] "Nothing to do here!"
[1] "Time to sell!"

Functions

Function help and documentation

When you don't know how to use a function, or don't know what arguments it takes, where do you turn? Luckily for you, R has built in documentation. For example, to get help for the names() function, you can type one of:

?names

?names()

help(names)

These all do the same thing; they take you straight to the help page for names()!

In the DataCamp console, this takes you to the RDocumentation site to get help from there, but the information is all the same!

Below, you will explore the documentation for a few other functions.

Instructions

  • Use ? to look at the documentation for subset().
  • Use ? to look at the documentation for Sys.time().
# subset help
?subset
## starting httpd help server ... done
# Sys.time help
?Sys.time

Optional arguments

Let's look at some of the round() function's help documentation. It simply rounds a numeric vector off to a specified number of decimal places.

round(x, digits = 0)

The first argument, x is required. Without it, the function will not work!

The argument digits is known as an optional argument. Optional arguments are ones that don't have to be set by the user, either because they are given a default value, or because the function can infer them from the other data you have given it. Even though they don't have to be set, they often provide extra flexibility. Here, digits specifies the number of decimal places to round to.

Explore the round() function in the exercise!

Instructions

  • Use round() on 5.4.
  • Use round() on 5.4, specify digits = 1.
  • A vector numbers has been created for you.
  • Use round() on numbers and specify digits = 3.
# Round 5.4
round(5.4)
## [1] 5
# Round 5.4 with 1 decimal place
round(5.4, digits = 1)
## [1] 5.4
# numbers
numbers <- c(.002623, pi, 812.33345)

# Round numbers to 3 decimal places
round(numbers, digits = 3)
## [1]   0.003   3.142 812.333

Functions in functions

To write clean code, sometimes it is useful to use functions inside of other functions. This let's you use the result of one function directly in another one, without having to create an intermediate variable. You have actually already seen an example of this with print() and paste().

company <- c("Goldman Sachs", "J.P. Morgan", "Fidelity Investments")

for(i in 1:3) {
    print(paste("A large financial institution is", company[i]))
}
[1] "A large financial institution is Goldman Sachs"
[1] "A large financial institution is J.P. Morgan"
[1] "A large financial institution is Fidelity Investments"

paste() strings together the character vectors, and print() prints it to the console.

The exercise below explores simplifying the calculation of the correlation matrix using nested functions.

Instructions

3 vectors of stock prices are in your workspace.

  • First, cbind() them together in the order of apple, ibm, micr. Save this as stocks.
  • Then, use cor() on stocks.
  • Now, let's see how this would work all at once. Use cbind() inside of cor() with the 3 stock vectors in the same order as above to create the correlation matrix.
# cbind() the stocks
stocks <- cbind(apple, ibm, micr)

# cor() to create the correlation matrix
cor(stocks)

# All at once! Nest cbind() inside of cor()
cor(cbind(apple, ibm, micr))

Output:

> # cbind() the stocks
> stocks <- cbind(apple, ibm, micr)
> 
> # cor() to create the correlation matrix
> cor(stocks)
          apple       ibm      micr
apple 1.0000000 0.8872467 0.9477010
ibm   0.8872467 1.0000000 0.9126597
micr  0.9477010 0.9126597 1.0000000
> 
> # All at once! Nest cbind() inside of cor()
> cor(cbind(apple, ibm, micr))
          apple       ibm      micr
apple 1.0000000 0.8872467 0.9477010
ibm   0.8872467 1.0000000 0.9126597
micr  0.9477010 0.9126597 1.0000000

Your first function

Time for your first function! This is a big step in an R programmer's journey. "Functions are a fundamental building block of R: to master many of the more advanced techniques ... you need a solid foundation in how functions work." -Hadley Wickham

Here is the basic structure of a function:

func_name <- function(arguments) {
    body
}

And here is an example:

square <- function(x) {
    x^2
}

square(2)
[1] 4

Two things to remember from what Lore taught you are arguments and the function body. Arguments are user inputs that the function works on. They can be the data that the function manipulates, or options that affect the calculation. The body of the function is the code that actually performs the manipulation.

The value that a function returns is simply the last executed line of the function body. In the example, since x^2 is the last line of the body, that is what gets returned.

In the exercise, you will create your first function to turn a percentage into a decimal, a useful calculation in finance!

Instructions

  • Create a function named percent_to_decimal that takes 1 argument, percent, and returns percent divided by 100.
  • Call percent_to_decimal() on the percentage 6 (we aren't using % here, but assume this is 6%).
  • A variable pct has been created for you.
  • Call percent_to_decimal() on pct.
# Percent to decimal function
percent_to_decimal <- function(percent) {
    percent / 100
}

# Use percent_to_decimal() on 6
percent_to_decimal(6)
## [1] 0.06
# Example percentage
pct <- 8

# Use percent_to_decimal() on pct
percent_to_decimal(pct)
## [1] 0.08

Multiple arguments (1)

As you saw in the optional arguments example, functions can have multiple arguments. These can help extend the flexibility of your function. Let's see this in action.

pow <- function(x, power = 2) {
    x^power
}

pow(2)
[1] 4

pow(2, power = 3)
[1] 8

Instead of a square() function, we now have a version that works with any power.

The power argument is optional and has a default value of 2, but the user can easily change this. It is also an example of how you can add multiple arguments. Notice how the arguments are separated by a comma, and the default value is set using an equals sign.

Let's add some more functionality to percent_to_decimal() that allows you to round the percentage to a certain number of digits.

Instructions

  • Fill in the blanks in the improved percent_to_decimal() function to do the following:
  • Add a second optional argument named digits that defaults to 2.
  • In the body of the function, divide percent by 100 and assign this to decimal.
  • Use the round function on decimal, and set the second argument to digits to specify the number of decimal places.
  • Your function will work on vectors with length >1 too. percents has been defined for you.
  • Call percent_to_decimal() on percents. Do not specify any optional arguments.
  • Call percent_to_decimal() on percents again. Specify digits = 4.
# Percent to decimal function
percent_to_decimal <- function(percent, digits = 2) {
    decimal <- percent / 100    
    round(decimal, digits)
}

# percents
percents <- c(25.88, 9.045, 6.23)

# percent_to_decimal() with default digits
percent_to_decimal(percents)
## [1] 0.26 0.09 0.06
# percent_to_decimal() with digits = 4
percent_to_decimal(percents, digits = 4)
## [1] 0.2588 0.0904 0.0623

Multiple arguments (2)

Let's think about a more complicated example. Do you remember present value from the Introduction to R for Finance course? If not, you can review the video for that here. The idea is that you want to discount money that you will get in the future at a specific interest rate to represent the value of that money in today's dollars. The following general formula was developed to help with this:

present_value <- cash_flow * (1 + i / 100) ^ -year

Wouldn't it be nice to have a function that did this calculation for you? Maybe something of the form:

present_value <- pv(cash_flow, i, year)

This function should work if you pass in numerics like pv(1500, 5, 2) and it should work if you pass in vectors of equal length to calculate an entire present value vector at once!

Instructions

The percent_to_decimal() function is in your workspace.

  • Fill in the blanks in the function so it does the following:
  • Require the arguments: cash_flow, i, year
  • Create the discount multiplier: (1 + i / 100). Use the percent_to_decimal() function to convert i to a decimal.
  • Perform the present value calculation. Do not store this in a variable. As the last executed line, it will be returned automatically.

  • Calculate the present value of $1200, at an interest rate of 7%, to be received 3 years from now.

# Present value function
pv <- function(cash_flow, i, year) {
    
    # Discount multiplier
    mult <- 1 + percent_to_decimal(i)
    
    # Present value calculation
    cash_flow * mult ^ -year
}

# Calculate a present value
pv(1200, 7, 3)
## [1] 979.5575

tidyquant package

The tidyquant package is focused on retrieving, manipulating, and scaling financial data analysis in the easiest way possible. To get the tidyquant package and start working with it, you first have to install it.

install.packages("tidyquant")

This places it on your local computer. You then have to load it into your current R session. This gives you access to all of the functions in the package.

library(tidyquant) These steps of installing and librarying packages are necessary for any CRAN package you want to use.

The exercise code is already written for you. You will explore some of the functions that tidyquant has for financial analysis.

Instructions

The code is already written, but these instructions will walk you through the steps.

  • First, library the package to access its functions. Use the tidyquant function, tq_get() to get the stock price data for Apple.
  • Take a look at the data frame it returned.
  • Plot the stock price over time.
  • Calculate daily returns for the adjusted stock price using tq_mutate(). This function "mutates" your data frame by adding a new column onto it. Here, that new column is the daily returns.
  • Sort the returns.
  • Plot the sorted returns. You can see that Apple had a few days of losses >10%, and a number of days with gains of >5%.
# Library tidquant
library(tidyquant)
## Warning: package 'tidyquant' was built under R version 3.6.3
## Loading required package: lubridate
## Warning: package 'lubridate' was built under R version 3.6.3
## 
## Attaching package: 'lubridate'
## The following object is masked _by_ '.GlobalEnv':
## 
##     origin
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
## Loading required package: PerformanceAnalytics
## Warning: package 'PerformanceAnalytics' was built under R version 3.6.3
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.6.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.6.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend
## Loading required package: quantmod
## Warning: package 'quantmod' was built under R version 3.6.3
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.6.3
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Version 0.4-0 included new data defaults. See ?getSymbols.
## == Need to Learn tidyquant? ==============================
## Business Science offers a 1-hour course - Learning Lab #9: Performance Analysis & Portfolio Optimization with tidyquant!
## </> Learn more at: https://university.business-science.io/p/learning-labs-pro </>
# Pull Apple stock data
apple <- tq_get("AAPL", get = "stock.prices", 
                from = "2007-01-03", to = "2017-06-05")

# Take a look at what it returned
head(apple)
## # A tibble: 6 x 8
##   symbol date        open  high   low close     volume adjusted
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>      <dbl>    <dbl>
## 1 AAPL   2007-01-03  3.08  3.09  2.92  2.99 1238319600     2.59
## 2 AAPL   2007-01-04  3.00  3.07  2.99  3.06  847260400     2.64
## 3 AAPL   2007-01-05  3.06  3.08  3.01  3.04  834741600     2.62
## 4 AAPL   2007-01-08  3.07  3.09  3.05  3.05  797106800     2.64
## 5 AAPL   2007-01-09  3.09  3.32  3.04  3.31 3349298400     2.86
## 6 AAPL   2007-01-10  3.38  3.49  3.34  3.46 2952880000     2.99
# Plot the stock price over time
plot(apple$date, apple$adjusted, type = "l")

# Calculate daily stock returns for the adjusted price
apple <- tq_mutate(data = apple,
                   ohlc_fun = Ad,
                   mutate_fun = dailyReturn)
## Warning: Argument ohlc_fun is deprecated; please use select instead.
# Sort the returns from least to greatest
sorted_returns <- sort(apple$daily.returns)

# Plot them
plot(sorted_returns)

Apply

lapply() on a list

The first function in the apply family that you will learn is lapply(), which is short for "list apply." When you have a list, and you want to apply the same function to each element of the list, lapply() is a potential solution that always returns another list. How might this work?

Let's look at a simple example. Suppose you want to find the length of each vector in the following list.

my_list
$a
[1] 2 4 5

$b
[1] 10 14  5  3  4  5  6

# Using lapply
# Note that you don't need parenthesis when calling length
lapply(my_list, FUN = length)
$a
[1] 3

$b
[1] 7

As noted in the video, if at first you thought about looping over each element in the list, and using length() at each iteration, you aren't wrong. lapply() is the vectorized version of this kind of loop, and is often preferred (and simpler) in the R world.

Instructions

In your workspace is a list of daily stock returns as percentages called stock_return and the percent_to_decimal() function.

  • Print stock_return.
  • Fill in the lapply() function to apply percent_to_decimal() to each element in stock_return.
# Print stock_return
stock_return

# lapply to change percents to decimal
lapply(stock_return, FUN = percent_to_decimal)

Output:

> # Print stock_return
> stock_return
$apple
 [1]  0.37446342 -0.71883530  0.76986527  0.98226467  0.98171665  1.63217981
 [7] -0.57042563  1.66813769  0.00000000  0.54692248  0.12951131  0.57773562
[13]  0.26577503  0.09405729 -0.65778233  0.19778141  0.63508411 -0.42640287
[19] -0.02569373 -0.77957680

$ibm
 [1]  0.1251408 -0.1124859  0.3190691  2.7689429  0.3458948  0.7014998
 [7] -0.6125390  1.6858006  0.1307267 -0.2907839 -0.7677657 -0.0299886
[13]  0.5519558 -0.1610979 -0.1613578 -0.2095056  0.2579329 -0.5683858
[19]  0.2467056 -0.3661465

$micr
 [1]  0.08445946  1.63713080 -0.44835603  2.36864053 -0.58660583  1.57351254
 [7]  0.32273681  1.30287920 -0.47634170 -0.15954052 -0.44742729  2.11878010
[13] -0.12574662  0.00000000  0.01573812 -0.48780488  0.06325111 -0.45828066
[19] -0.14287982 -1.20826709
> 
> # lapply to change percents to decimal
> lapply(stock_return, FUN = percent_to_decimal)
$apple
 [1]  0.00 -0.01  0.01  0.01  0.01  0.02 -0.01  0.02  0.00  0.01  0.00  0.01
[13]  0.00  0.00 -0.01  0.00  0.01  0.00  0.00 -0.01

$ibm
 [1]  0.00  0.00  0.00  0.03  0.00  0.01 -0.01  0.02  0.00  0.00 -0.01  0.00
[13]  0.01  0.00  0.00  0.00  0.00 -0.01  0.00  0.00

$micr
 [1]  0.00  0.02  0.00  0.02 -0.01  0.02  0.00  0.01  0.00  0.00  0.00  0.02
[13]  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -0.01

lapply() on a data frame

If, instead of a list, you had a data frame of stock returns, could you still use lapply()? Yes! Perhaps surprisingly, data frames are actually lists under the hood, and an lapply() call would apply the function to each column of the data frame.

df
  a b
1 1 4
2 2 6

class(df)
[1] "data.frame"

lapply(df, FUN = sum)
$a
[1] 3

$b
[1] 10

lapply() summed each column in the data frame, but still follows its convention of always returning a list.

Instructions

In your workspace is a data frame of daily stock returns as decimals called stock_return.

  • Print stock_return to see the data frame. Use lapply() to get the average (mean) of each column.
  • Create a function for the sharpe ratio. It should take the average of the returns, subtract the risk free rate (.03%) from it, and then divide by the standard deviation of the returns.
  • Use lapply() to calculate the sharpe ratio of each column.
# Print stock_return
stock_return

# lapply to get the average returns
lapply(stock_return, FUN = mean)

# Sharpe ratio
sharpe <- function(returns) {
    (mean(returns) - .0003) / sd(returns)
}

# lapply to get the sharpe ratio
lapply(stock_return, FUN = sharpe)

Output:

> # Print stock_return
> stock_return
           apple          ibm          micr
1   0.0037446342  0.001251408  0.0008445946
2  -0.0071883530 -0.001124859  0.0163713080
3   0.0076986527  0.003190691 -0.0044835603
4   0.0098226467  0.027689429  0.0236864053
5   0.0098171665  0.003458948 -0.0058660583
6   0.0163217981  0.007014998  0.0157351254
7  -0.0057042563 -0.006125390  0.0032273681
8   0.0166813769  0.016858006  0.0130287920
9   0.0000000000  0.001307267 -0.0047634170
10  0.0054692248 -0.002907839 -0.0015954052
11  0.0012951131 -0.007677657 -0.0044742729
12  0.0057773562 -0.000299886  0.0211878010
13  0.0026577503  0.005519558 -0.0012574662
14  0.0009405729 -0.001610979  0.0000000000
15 -0.0065778233 -0.001613578  0.0001573812
16  0.0019778141 -0.002095056 -0.0048780488
17  0.0063508411  0.002579329  0.0006325111
18 -0.0042640287 -0.005683858 -0.0045828066
19 -0.0002569373  0.002467056 -0.0014287982
20 -0.0077957680 -0.003661465 -0.0120826709
> 
> # lapply to get the average returns
> lapply(stock_return, FUN = mean)
$apple
[1] 0.002838389

$ibm
[1] 0.001926806

$micr
[1] 0.002472939
> 
> # Sharpe ratio
> sharpe <- function(returns) {
      (mean(returns) - .0003) / sd(returns)
  }
> 
> # lapply to get the sharpe ratio
> lapply(stock_return, FUN = sharpe)
$apple
[1] 0.3546496

$ibm
[1] 0.2000819

$micr
[1] 0.218519

FUN arguments

Often, the function that you want to apply will have other optional arguments that you may want to tweak. Consider the percent_to_decimal() function that allows the user to specify the number of decimal places.

percent_to_decimal(5.4, digits = 3)
[1] 0.054

In the call to lapply() you can specify the named optional arguments after the FUN argument, and they will get passed to the function that you are applying.

my_list
$a
[1] 2.444 3.500

$b
[1] 1.100 2.678 3.450

lapply(my_list, FUN = percent_to_decimal, digits = 4)
$a
[1] 0.0244 0.0350

$b
[1] 0.0110 0.0268 0.0345

In the exercise, you will extend the capability of your sharpe ratio function to allow the user to input the risk free rate as an argument, and then use this with lapply().

Instructions

In your workspace is a data frame of daily stock returns as decimals called stock_return.

  • Extend sharpe to allow the input of the risk free rate as an optional argument. The default should be set at .0003.
  • Use lapply() on stock_return to find the sharpe ratio if the risk free rate is .0004.
  • Use lapply() on stock_return to find the sharpe ratio if the risk free rate is .0009.
# Extend sharpe() to allow optional argument
sharpe <- function(returns, rf = .0003) {
    (mean(returns) - rf) / sd(returns)
}

# First lapply()
lapply(stock_return, FUN = sharpe, rf = .0004)

# Second lapply()
lapply(stock_return, FUN = sharpe, rf = .0009)

Output:

> # Extend sharpe() to allow optional argument
> sharpe <- function(returns, rf = .0003) {
      (mean(returns) - rf) / sd(returns)
  }
> 
> # First lapply()
> lapply(stock_return, FUN = sharpe, rf = .0004)
$apple
[1] 0.3406781

$ibm
[1] 0.1877828

$micr
[1] 0.2084626
> 
> # Second lapply()
> lapply(stock_return, FUN = sharpe, rf = .0009)
$apple
[1] 0.2708209

$ibm
[1] 0.1262875

$micr
[1] 0.1581807

sapply() VS lapply()

lapply() is great, but sometimes you might want the returned data in a nicer form than a list. For instance, with the sharpe ratio, wouldn't it be great if the returned sharpe ratios were in a vector rather than a list? Further analysis would likely be easier!

For this, you might want to consider sapply(), or simplify apply. It performs exactly like lapply(), but will attempt to simplify the output if it can. The basic syntax is the same, with a few additional arguments:

sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

These additional optional arguments let you specify if you want sapply() to try and simplify the output, and if you want it to use the names of the object in the output.

In the exercise, you will recalculate sharpe ratios using sapply() to simplify the output.

Instructions

stock_return and the sharpe function are in your workspace.

  • First, use lapply() on stock_return to get the sharpe ratio again.
  • Now, use sapply() on stock_return to see the simplified sharpe ratio output.
  • Use sapply() on stock_return to get the sharpe ratio with the arguments simplify = FALSE and USE.NAMES = FALSE. This is equivalent to lapply()!
# lapply() on stock_return
lapply(stock_return, FUN = sharpe)

# sapply() on stock_return
sapply(stock_return, FUN = sharpe)

# sapply() on stock_return with optional arguments
sapply(stock_return, FUN = sharpe, simplify = FALSE, USE.NAMES = FALSE)

Output:

> # lapply() on stock_return
> lapply(stock_return, FUN = sharpe)
$apple
[1] 0.3546496

$ibm
[1] 0.2000819

$micr
[1] 0.218519
> 
> # sapply() on stock_return
> sapply(stock_return, FUN = sharpe)
    apple       ibm      micr 
0.3546496 0.2000819 0.2185190
> 
> # sapply() on stock_return with optional arguments
> sapply(stock_return, FUN = sharpe, simplify = FALSE, USE.NAMES = FALSE)
$apple
[1] 0.3546496

$ibm
[1] 0.2000819

$micr
[1] 0.218519

Failing to simplify

For interactive use, sapply() is great. It guesses the output type so that it can simplify, and normally that is fine. However, sapply() is not a safe option to be used when writing functions. If sapply() cannot simplify your output, then it will default to returning a list just like lapply(). This can be dangerous and break custom functions if you wrote them expecting sapply() to return a simplified vector.

Let's look at an exercise using a list containing information about the stock market crash of 2008.

Instructions

The list market_crash has been created for you.

  • Use sapply() to get the class() of each element in market_crash.

A new list, market_crash2 has been created. The difference is in the creation of the date!

  • Use lapply() to get the class() of each element in market_crash2.
  • Use sapply() to get the class() of each element in market_crash2.

date in market_crash2 has multiple classes. Why couldn't sapply() simplify this?


# Market crash with as.Date()
market_crash <- list(dow_jones_drop = 777.68, 
                     date = as.Date("2008-09-28"))
                     
# Find the classes with sapply()
sapply(market_crash, class)

# Market crash with as.POSIXct()
market_crash2 <- list(dow_jones_drop = 777.68, 
                      date = as.POSIXct("2008-09-28"))

# Find the classes with lapply()
lapply(market_crash2, class)

# Find the classes with sapply()
sapply(market_crash2, class)

Output:

> # Market crash with as.Date()
> market_crash <- list(dow_jones_drop = 777.68, 
                       date = as.Date("2008-09-28"))
> 
> # Find the classes with sapply()
> sapply(market_crash, class)
dow_jones_drop           date 
     "numeric"         "Date"
> 
> # Market crash with as.POSIXct()
> market_crash2 <- list(dow_jones_drop = 777.68, 
                        date = as.POSIXct("2008-09-28"))
> 
> # Find the classes with lapply()
> lapply(market_crash2, class)
$dow_jones_drop
[1] "numeric"

$date
[1] "POSIXct" "POSIXt"
> 
> # Find the classes with sapply()
> sapply(market_crash2, class)
$dow_jones_drop
[1] "numeric"

$date
[1] "POSIXct" "POSIXt"

vapply() VS sapply()

In the last example, sapply() failed to simplify because the date element of market_crash2 had two classes (POSIXct and POSIXt). Notice, however, that no error was thrown! If a function you had written expected a simplified vector to be returned by sapply(), this would be confusing.

To account for this, there is a more strict apply function called vapply(), which contains an extra argument FUN.VALUE where you can specify the type and length of the output that should be returned each time your applied function is called.

If you expected the return value of class() to be a character vector of length 1, you can specify that using vapply():

vapply(market_crash, class, FUN.VALUE = character(1))
dow_jones_drop           date 
     "numeric"         "Date"

Other examples of FUN.VALUE might be numeric(2) or logical(1).

Instructions

market_crash2 is again defined for you.

  • Use sapply() again to find the class() of market_crash2 elements. Notice how it returns a list and not an error.
  • Use vapply() on market_crash2 to find the class(). Specify FUN.VALUE = character(1). It should appropriately fail.
# Market crash with as.POSIXct()
market_crash2 <- list(dow_jones_drop = 777.68, 
                      date = as.POSIXct("2008-09-28"))

# Find the classes with sapply()
sapply(market_crash2, class)

# Find the classes with vapply()
vapply(market_crash2, class, FUN.VALUE = character(1))

Output:

# Market crash with as.POSIXct()
market_crash2 <- list(dow_jones_drop = 777.68, 
                      date = as.POSIXct("2008-09-28"))

# Find the classes with sapply()
sapply(market_crash2, class)

# Find the classes with vapply()
vapply(market_crash2, class, FUN.VALUE = character(1))

More vapply()

The difference between vapply() and sapply() was shown in the last example to demonstrate vapply() appropriately failing, but what about when it doesn't fail? When there are no errors, vapply() returns a simplified result according to the FUN.VALUE argument.

Instructions

The stock_return dataset is in your workspace containing daily returns for Apple, IBM, and Microsoft. The sharpe() function is also available.

  • Calculate the sharpe ratio for each stock using vapply().
  • Use summary() on the apple column to get a 6 number summary.
  • vapply() the summary() function across stock_return to summarize each column.

# Sharpe ratio for all stocks
vapply(stock_return, sharpe, FUN.VALUE = numeric(1))

# Summarize Apple
summary(stock_return$apple)

# Summarize all stocks
vapply(stock_return, summary, FUN.VALUE = numeric(6))

Output:

> # Sharpe ratio for all stocks
> vapply(stock_return, sharpe, FUN.VALUE = numeric(1))
    apple       ibm      micr 
0.3546496 0.2000819 0.2185190
> 
> # Summarize Apple
> summary(stock_return$apple)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.007796 -0.001259  0.002318  0.002838  0.006688  0.016681
> 
> # Summarize all stocks
> vapply(stock_return, summary, FUN.VALUE = numeric(6))
               apple           ibm          micr
Min.    -0.007795768 -0.0076776574 -0.0120826709
1st Qu. -0.001258710 -0.0022982516 -0.0045083719
Median   0.002317782  0.0004757609 -0.0006287331
Mean     0.002838389  0.0019268062  0.0024729391
3rd Qu.  0.006687794  0.0032577550  0.0056777241
Max.     0.016681377  0.0276894294  0.0236864053

Anonymous functions

As a last exercise, you'll learn about a concept called anonymous functions. So far, when calling an apply function like vapply(), you have been passing in named functions to FUN. Doesn't it seem like a waste to have to create a function just for that specific vapply() call? Instead, you can use anonymous functions!

Named function:

percent_to_decimal <- function(percent) {
    percent / 100
}

Anonymous function:

function(percent) { percent / 100 }

As you can see, anonymous functions are basically functions that aren't assigned a name. To use them in vapply() you might do:

vapply(stock_return, FUN = function(percent) { percent / 100 }, 
       FUN.VALUE = numeric(2))
            apple          ibm
[1,]  0.003744634  0.001251408
[2,] -0.007188353 -0.001124859

Instructions

stock_return is in your workspace.

  • Use vapply() to apply an anonymous function that returns a vector of the max() and min() (in that order) of each column of stock_return.
# Max and min
vapply(stock_return, 
       FUN = function(x) { c(max(x), min(x)) }, 
       FUN.VALUE = numeric(2))

Output:

> # Max and min
> vapply(stock_return, 
         FUN = function(x) { c(max(x), min(x)) }, 
         FUN.VALUE = numeric(2))
            apple          ibm        micr
[1,]  0.016681377  0.027689429  0.02368641
[2,] -0.007795768 -0.007677657 -0.01208267

The End of Module