The Basics

Your first R script

Welcome! In the script to the right you will type R code to solve the exercises. When you hit the Submit Answer button, every line of code in the script is executed by R and you get a message that indicates whether or not your code was correct. The output of your submission is shown in the R console.

You can also execute code directly in the R Console. When you type in the console, your submission will not be checked for correctness! Try, for example, to type in 3 + 4 and hit Enter. R should return [1] 7.

Instructions

  • An addition example has already been created for you.
  • Add another line of code in the script to calculate the difference of 6 and 4.
  • Note: Check out the # symbol in the script! This denotes a comment in your code. Comments are a great way to document your code, and are not run when you submit your answer.
# Addition!
3 + 5
## [1] 8
# Subtraction!
6-4
## [1] 2

Arithmetic in R (1)

Let's play around with your new calculator. First, check out these arithmetic operators, most of them should look familiar:

  • Addition: +
  • Subtraction: -
  • Multiplication: *
  • Division: /
  • Exponentiation: ^ or **
  • Modulo: %%

You might be unfamiliar with the last two. The ^ operator raises the number to its left to the power of the number to its right. For example, 3^2 is 9. The modulo returns the remainder of the division of the number to the left by the number on the right, for example 5 modulo 3 or 5 %% 3 is 2.

Lastly, there is another useful way to execute your code besides typing in the R Console or pressing Submit Answer. Clicking on a line of code in the script, and then pressing Command + Enter will execute just that line in the R Console. Try it out with the 2 + 2 line already in the script!

Instructions

  • Some examples for addition, subtraction, and multiplication are shown for you.
  • Type 4 / 2 in the script to perform division.
  • Type 2^4 to raise 2 to the power of 4.
  • Type 7 %% 3 to calculate 7 modulo 3.
  • Don't forget to press Submit Answer when you finish!
# Addition 
2 + 2
## [1] 4
# Subtraction
4 - 1
## [1] 3
# Multiplication
3 * 4
## [1] 12
# Division
4/2
## [1] 2
# Exponentiation
2^4
## [1] 16
# Modulo
7%%3
## [1] 1

Arithmetic in R (2)

The order in which you perform your mathematical operations is critical to get the correct answer. The correct sequence of "order of operation" is:

Parenthesis, Exponentiation, Multiplication and Division, Addition and Subtraction

Or PEMDAS for short!

This means that when you come along the expression: 20 - 8 * 2 , you know to do the multiplication first, then the subtraction, to get the correct answer of 4.

Assignment and variables (1)

It looks like you're becoming an expert at using R as a calculator! Time to take it one step further. These numbers you are calculating haven't been very descriptive. 5? 5 what? 5 apples? 5 monkeys? What if you could assign that 5 a descriptive name like number_of_apples, and then simply type that name whenever you want to use 5? Enter, variables.

A variable allows you to store a value or an object in R. You can then later use this variable's name to easily access the value or the object that is stored within this variable. You use <- to assign a variable:

my_money <- 100

Instructions

  • Assign a value of 200 to the savings variable in the script.
  • Press Submit Answer and note how simply typing savings in the script asks R prints the value to the console!
# Assign 200 to savings
savings <- 200

# Print the value of savings to the console
savings
## [1] 200

Assignment and variables (2)

Suppose you have $100 stored in my_money, and your friend Dan has $200 dollars. To be clear, you decide to give Dan's money a variable name too. You want to know how much money the two of you have together. Now that each variable has a descriptive name, this is easy using the arithmetic you learned earlier:

my_money + dans_money

Instructions

  • my_money has been defined for you.
  • Assign 200 to Dan's money.
  • Follow the example in the exercise text and add your money to Dan's money.
  • Add your money to Dan's money again, but this time save the result to our_money!
# Assign 100 to my_money
my_money <- 100

# Assign 200 to dans_money
dans_money <- 200

# Add my_money and dans_money
my_money+dans_money
## [1] 300
# Add my_money and dans_money again, save the result to our_money
our_money=my_money+dans_money

Financial returns (1)

Time for some application! Earlier, Lore taught you about financial returns. Now, its time for you to put that knowledge to work! But first, a quick review.

Assume you have $100. During January, you make a 5% return on that money. How much do you have at the end of January? Well, you have 100% of your starting money, plus another 5%: 100% + 5% = 105%. In decimals, this is 1 + .05 = 1.05. This 1.05 is the return multiplier for January, and you multiply your original $100 by it to get the amount you have at the end of January.

105 = 100 * 1.05

Or in terms of variables:

post_jan_cash <- starting_cash * jan_ret

A quick way to get the multiplier is:

multiplier = 1 + (return / 100)

Instructions

# Variables for starting_cash and 5% return during January
starting_cash <- 200
jan_ret <- 5
jan_mult <- 1 + (jan_ret / 100)

# How much money do you have at the end of January?
post_jan_cash <- starting_cash * jan_mult

# Print post_jan_cash
post_jan_cash
## [1] 210
# January 10% return multiplier
jan_ret_10 <- 10
jan_mult_10 <- 1 + (jan_ret_10 / 100)

# How much money do you have at the end of January now?
post_jan_cash_10 <- starting_cash*jan_mult_10

# Print post_jan_cash_10
post_jan_cash_10
## [1] 220

Financial returns (2)

Let's make you some more money. If, in February, you earn another 2% on your cash, how would you calculate the total amount at the end of February? You already know that the amount at the end of January is $100 * 1.05 = $105. To get from the end of January to the end of February, just use another multiplier!

$105 * 1.02 = $107.1

Which is equivalent to:

$100 * 1.05 * 1.02 = $107.1

In this last form, you see the effect of both multipliers on your original $100. In fact, this form can help you find the total return over both months. The correct way to do this is by multiplying the two multipliers together: 1.05 * 1.02 = 1.071. This means you earned 7.1% in total over the 2 month period.

Instructions

  • Your starting cash, and the returns for January and February have been given.
  • Use them to calculate the January and February return multipliers: jan_mult and feb_mult.
  • Use those multipliers and starting_cash to find your total_cash at the end of the two months.
  • Print total_cash to see how your money has grown!
# Starting cash and returns 
starting_cash <- 200
jan_ret <- 4
feb_ret <- 5

# Multipliers
jan_mult <- 1+(jan_ret/100)
feb_mult <- 1+(feb_ret/100)

# Total cash at the end of the two months
total_cash <- starting_cash*jan_mult*feb_mult

# Print total_cash
total_cash
## [1] 218.4

Data type exploration

To get started, here are some of R's most basic data types:

  • Numerics are decimal numbers like 4.5. A special type of numeric is an integer, which is a numeric without a decimal piece. Integers must be specified like 4L.
  • Logicals are the boolean values TRUE and FALSE. Capital letters are important here; true and false are not valid.
  • Characters are text values like "hello world".

Instructions

  • Assign the numeric 150.45 to apple_stock.
  • Assign the character "AAA" to credit_rating.
  • Answer the final question with either TRUE or FALSE, we won't judge!
  • Print my_answer!
# Apple's stock price is a numeric
apple_stock <- 150.45

# Bond credit ratings are characters
credit_rating <- "AAA"

# You like the stock market. TRUE or FALSE?
my_answer <- TRUE

# Print my_answer
my_answer
## [1] TRUE

What's that data type?

Up until now, you have been determining what data type a variable is just by looks. There is actually a better way to check this.

class(my_var)

This will return the data type (or class) of whatever variable you pass in.

The variables a, b, and c have already been defined for you. You can type ls() in the console at any time to "list" the variables currently available to you. Use the console, and class() to decide which statement below is correct.

Vectors and Matrices

c()ombine

Now is where things get fun! It is time to create your first vector. Since this is a finance oriented course, it is only appropriate that your first vector be a numeric vector of stock prices. Remember, you create a vector using the combine function, c(), and each element you add is separated by a comma.

For example, this is a vector of Apple's stock prices from December, 2016:

apple_stock <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12)

And this is a character vector of bond credit ratings:

credit_rating <- c("AAA", "AA", "BBB", "BB", "B")

Instructions

  • Another example of a numeric vector for IBM stock prices is shown for you.
  • Create a character vector of the finance related words "stocks", "bonds", and "investments", in that order.
  • Create a logical vector of TRUE, FALSE, TRUE in that order.
# Another numeric vector
ibm_stock <- c(159.82, 160.02, 159.84)

# Another character vector
finance <-c("stocks","bonds","investments")

# A logical vector
logic <- c(TRUE, FALSE, TRUE)

Coerce it

It is important to remember that a vector can only be composed of one data type. This means that you cannot have both a numeric and a character in the same vector. If you attempt to do this, the lower ranking type will be coerced into the higher ranking type.

For example: c(1.5, "hello") results in c("1.5", "hello") where the numeric 1.5 has been coerced into the character data type.

The hierarchy for coercion is:

logical < integer < numeric < character

Logicals are coerced a bit differently depending on what the highest data type is. c(TRUE, 1.5) will return c(1, 1.5) where TRUE is coerced to the numeric 1 (FALSE would be converted to a 0). On the other hand, c(TRUE, "this_char") is converted to c("TRUE", "this_char").

Vector names()

Let's return to the example about January and February's returns. As a refresher, in January you earned a 5% return, and in February, an extra 2% return. Being the savvy data scientist you are, you realize that you can put these returns into a vector! That would look something like this:

ret <- c(5, 2)

This is great! Now all of the returns are in one place. However, you could go one step further by adding names to each return in your vector. You do this using names(). Check this out:

names(ret) <- c("Jan", "Feb")

Printing ret now returns:

Jan Feb 
5   2
Pretty cool, right?

Instructions

  • Defined for you are a vector of 12 monthly returns, and a vector of month names.
  • Add months as names to ret to create a more descriptive vector.
  • Print out ret to see the newly named vector!
# Vectors of 12 months of returns, and month names
ret <- c(5, 2, 3, 7, 8, 3, 5, 9, 1, 4, 6, 3)
months <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

# Add names to ret
names(ret) <- months

# Print out ret to see the new names!
ret
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##   5   2   3   7   8   3   5   9   1   4   6   3

Visualize your vector

Time to try something a bit different. So far, you have been programming in the script, and looking at your data by printing it out. For a more informative visualization, try a plot!

For this exercise, you will again be working with some Apple stock data. This time it contains the prices for all of December, 2016.

The plot() function is one of the many ways to create a graph from your data in R. Passing in a vector will add its values to the y-axis of the graph, and on the x-axis will be an index created from the order that your vector is in.

Inside of plot(), you can change the type of your graph using type =. The default is "p" for points, but you can also change it to "l" for line.

Instructions

apple_stock has already been defined, and everything has been set up for you. Try running the script line-by-line using Command + Enter on Mac or Control + Enter on Windows while clicked on each line.

# Look at the data
apple_stock

# Plot the data points
plot(apple_stock)

# Plot the data as a line graph
plot(apple_stock, type = "l")

Output:

# Look at the data
apple_stock

# Plot the data points
plot(apple_stock)

# Plot the data as a line graph
plot(apple_stock, type = "l")

Weighted average (1)

As a finance professional, there are a number of important calculations that you will have to know. One of these is the weighted average. The weighted average allows you to calculate your portfolio return over a time period. Consider the following example:

Assume you have 40% of your cash in Apple stock, and 60% of your cash in IBM stock. If, in January, Apple earned 5% and IBM earned 7%, what was your total portfolio return?

To calculate this, take the return of each stock in your portfolio, and multiply it by the weight of that stock. Then sum up all of the results. For this example, you would do:

6.2 = 5 * .4 + 7 * .6

Or, in variable terms:

portf_ret <- apple_ret * apple_weight + ibm_ret * ibm_weight

Instructions

  • Weights and returns for Microsoft and Sony have been defined for you.
  • Calculate the portf_ret for this porfolio.
# Weights and returns
micr_ret <- 7
sony_ret <- 9
micr_weight <- .2
sony_weight <- .8

# Portfolio return
portf_ret <- micr_ret * micr_weight + sony_ret * sony_weight

Weighted average (2)

Wait a minute, Lore taught us a much better way to do this! Remember, R does arithmetic with vectors! Can you take advantage of this fact to calculate the portfolio return more efficiently? Think carefully about the following code:

ret <- c(5, 7)
weight <- c(.4, .6)

ret_X_weight <- ret * weight

sum(ret_X_weight)

[1] 6.2

First, calculate ret * weight, which multiplies each element in the vectors together to create a new vector ret_X_weight. All you need to do then is add up the pieces, so you use sum() to sum up each element in the vector.

Now its your turn!

Instructions

  • ret and weight for Microsoft and Sony are defined for you again, but this time, in vector form!
  • Add company names to your ret and weight vectors. Use vectorized arithmetic to multiply ret and weight together.
  • Print ret_X_weight to see the results.
  • Use sum() to get the total portf_ret.
  • Print portf_ret and compare to the last exercise!
# Weights, returns, and company names
ret <- c(7, 9)
weight <- c(.2, .8)
companies <- c("Microsoft", "Sony")

# Assign company names to your vectors
names(ret) <- companies
names(weight) <- companies

# Multiply the returns and weights together 
ret_X_weight <- ret * weight

# Print ret_X_weight
ret_X_weight
## Microsoft      Sony 
##       1.4       7.2
# Sum to get the total portfolio return
portf_ret <- sum(ret_X_weight)

# Print portf_ret
portf_ret
## [1] 8.6

Weighted average (3)

Let's look at an example of recycling. What if you wanted to give equal weight to your Microsoft and Sony stock returns? That is, you want to be invested 50% in Microsoft and 50% in Sony.

ret <- c(7, 9)

weight <- .5

ret_X_weight <- ret * weight

ret_X_weight

[1] 3.5 4.5

ret is a vector of length 2, and weight is a vector of length 1. R reuses the .5 in weight twice to make it the same length of ret, then performs the element-wise arithmetic.

Instructions

  • A named vector, ret, containing the returns of 3 stocks is in your workspace.
  • Print ret to see the returns of your 3 stocks.
  • Assign the value of 1/3 to weight. This will be the weight that each stock receives.
  • Create ret_X_weight by multiplying ret and weight. See how R recycles weight? sum() the ret_X_weight variable to create your equally weighted portf_ret.
  • Run the last line of code multiplying a vector of length 3 by a vector of length 2. R reuses the 1st value of the vector of length 2, but notice the warning!
# Print ret
ret
## Microsoft      Sony 
##         7         9
# Assign 1/3 to weight
weight <- 1/3

# Create ret_X_weight
ret_X_weight <- weight * ret

# Calculate your portfolio return
portf_ret <- sum(ret_X_weight)

# Vector of length 3 * Vector of length 2?
ret * c(.2, .6)
## Microsoft      Sony 
##       1.4       5.4

Vector subsetting

Sometimes, you will only want to use specific pieces of your vectors, and you'll need some way to access just those parts. For example, what if you only wanted the first month of returns from the vector of 12 months of returns? To solve this, you can subset the vector using [ ].

Here is the 12 month return vector:

ret <- c(5, 2, 3, 7, 8, 3, 5, 9, 1, 4, 6, 3)

Select the first month: ret[1].

Select the first month by name: ret["Jan"].

Select the first three months: ret[1:3] or ret[c(1, 2, 3)].

Instructions

  • The named vector ret is defined in your workspace.
  • Subset the first 6 months of returns.
  • Subset only March and May's returns using c() and "Mar", "May".
  • Run the last line of code to perform a subset that omits the first month of returns.
ret <- c(5, 2, 3, 7, 8, 3, 5, 9, 1, 4, 6, 3)
names(ret) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
# First 6 months of returns
ret[1:6]
## Jan Feb Mar Apr May Jun 
##   5   2   3   7   8   3
# Just March and May
ret[c("Mar", "May")]
## Mar May 
##   3   8
# Omit the first month of returns
ret[-1]
## Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##   2   3   7   8   3   5   9   1   4   6   3

Create a matrix!

Matrices are similar to vectors, except they are in 2 dimensions! Let's create a 2x2 matrix "by hand" using matrix().

matrix(data = c(2, 3, 4, 5), nrow = 2, ncol = 2)

     [,1] [,2]
[1,]    2    4
[2,]    3    5

Notice that the actual data for the matrix is passed in as a vector using c(), and is then converted to a matrix by specifying the number of rows and columns (also known as the dimensions).

Because the matrix is just created from a vector, the following is equivalent to the above code.

my_vector <- c(2, 3, 4, 5)

matrix(data = my_vector, nrow = 2, ncol = 2)

Instructions

  • my_vector has been defined for you.
  • Replace the ___ to create a 3x3 matrix from my_vector.
  • Print my_matrix.
  • By default, matrices fill down each row. Run the code in the last example and note how the matrix fills across by using byrow = TRUE. Compare this to the example given above.
# A vector of 9 numbers
my_vector <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

# 3x3 matrix
my_matrix <- matrix(data = my_vector, nrow = 3, ncol = 3)

# Print my_matrix
my_matrix
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
# Filling across using byrow = TRUE
matrix(data = c(2, 3, 4, 5), nrow = 2, ncol = 2, byrow = TRUE)
##      [,1] [,2]
## [1,]    2    3
## [2,]    4    5

Matrix <- bind vectors

Often, you won't be creating vectors like we did in the last example. Instead, you will create them from multiple vectors that you want to combine together. For this, it is easiest to use the functions cbind() and rbind() (column bind and row bind respectively). To see these in action, let's combine two vectors of Apple and IBM stock prices:

apple <- c(109.49, 109.90, 109.11, 109.95, 111.03)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79)

cbind(apple, ibm)

      apple    ibm
[1,] 109.49 159.82
[2,] 109.90 160.02
[3,] 109.11 159.84
[4,] 109.95 160.35
[5,] 111.03 164.79

rbind(apple, ibm)

        [,1]   [,2]   [,3]   [,4]   [,5]
apple 109.49 109.90 109.11 109.95 111.03
ibm   159.82 160.02 159.84 160.35 164.79

Now its your turn!

Instructions

  • The apple, ibm, and micr stock price vectors from December, 2016 are in your workspace.
  • Use cbind() to column bind apple, ibm, and micr together, in that order, as cbind_stocks. Print cbind_stocks.
  • Use rbind() to row bind the three vectors together, in the same order, as rbind_stocks.
  • Print rbind_stocks.
# cbind the vectors together
cbind_stocks <- cbind(apple, ibm, micr)

# Print cbind_stocks
cbind_stocks

# rbind the vectors together
rbind_stocks <- rbind(apple, ibm, micr)

# Print rbind_stocks
rbind_stocks

Outputs:

> # cbind the vectors together
> cbind_stocks <- cbind(apple, ibm, micr)
> 
> # Print cbind_stocks
> cbind_stocks
       apple    ibm  micr
 [1,] 109.49 159.82 59.20
 [2,] 109.90 160.02 59.25
 [3,] 109.11 159.84 60.22
 [4,] 109.95 160.35 59.95
 [5,] 111.03 164.79 61.37
 [6,] 112.12 165.36 61.01
 [7,] 113.95 166.52 61.97
 [8,] 113.30 165.50 62.17
 [9,] 115.19 168.29 62.98
[10,] 115.19 168.51 62.68
[11,] 115.82 168.02 62.58
[12,] 115.97 166.73 62.30
[13,] 116.64 166.68 63.62
[14,] 116.95 167.60 63.54
[15,] 117.06 167.33 63.54
[16,] 116.29 167.06 63.55
[17,] 116.52 166.71 63.24
[18,] 117.26 167.14 63.28
[19,] 116.76 166.19 62.99
[20,] 116.73 166.60 62.90
[21,] 115.82 165.99 62.14
> 
> # rbind the vectors together
> rbind_stocks <- rbind(apple, ibm, micr)
> 
> # Print rbind_stocks
> rbind_stocks
        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]  [,10]
apple 109.49 109.90 109.11 109.95 111.03 112.12 113.95 113.30 115.19 115.19
ibm   159.82 160.02 159.84 160.35 164.79 165.36 166.52 165.50 168.29 168.51
micr   59.20  59.25  60.22  59.95  61.37  61.01  61.97  62.17  62.98  62.68
       [,11]  [,12]  [,13]  [,14]  [,15]  [,16]  [,17]  [,18]  [,19]  [,20]
apple 115.82 115.97 116.64 116.95 117.06 116.29 116.52 117.26 116.76 116.73
ibm   168.02 166.73 166.68 167.60 167.33 167.06 166.71 167.14 166.19 166.60
micr   62.58  62.30  63.62  63.54  63.54  63.55  63.24  63.28  62.99  62.90
       [,21]
apple 115.82
ibm   165.99
micr   62.14

Visualize your matrix

Similar to vectors, we can visualize our matrix to gain some insights about the relationships in the data.

In this exercise, you will plot the matrix of Apple and Microsoft stock prices to see the relationship between the two companies' stock prices during December, 2016.

Instructions

  • The matrix apple_micr_matrix is available in your workspace.
  • First, print out apple_micr_matrix to get a look at the data.
  • Use plot() to create a scatter plot of Microsoft VS Apple stock prices.
# View the data
apple_micr_matrix

# Scatter plot of Microsoft vs Apple
plot(apple_micr_matrix)

Outputs:

> # View the data
> apple_micr_matrix
       apple  micr
 [1,] 109.49 59.20
 [2,] 109.90 59.25
 [3,] 109.11 60.22
 [4,] 109.95 59.95
 [5,] 111.03 61.37
 [6,] 112.12 61.01
 [7,] 113.95 61.97
 [8,] 113.30 62.17
 [9,] 115.19 62.98
[10,] 115.19 62.68
[11,] 115.82 62.58
[12,] 115.97 62.30
[13,] 116.64 63.62
[14,] 116.95 63.54
[15,] 117.06 63.54
[16,] 116.29 63.55
[17,] 116.52 63.24
[18,] 117.26 63.28
[19,] 116.76 62.99
[20,] 116.73 62.90
[21,] 115.82 62.14
> 
> # Scatter plot of Microsoft vs Apple
> plot(apple_micr_matrix)

cor()relation

Did you notice the relationship between the two stocks? It seems that when Apple's stock moves up, Microsoft's does as well. One way to capture this kind of relationship is by finding the correlation between the two stocks. Correlation is a measure of association between two things, here, stock prices, and is represented by a number from -1 to 1. A 1 represents perfect positive correlation, a -1 represents perfect negative correlation, and 0 correlation means that the stocks move independently of each other. Correlation is a common metric in finance, and it is useful to know how to calculate it in R.

The cor() function will calculate the correlation between two vectors, or will create a correlation matrix when given a matrix.

cor(apple, micr)
[1] 0.9477011

cor(apple_micr_matrix)

          apple      micr
apple 1.0000000 0.9477011
micr  0.9477011 1.0000000

cor(apple, micr) simply returned the correlation between the two stocks. A large correlation of .9477 hints that Apple and Microsoft's stock prices move closely together. cor(apple_micr_matrix) returned a matrix that shows all of the possible pairwise correlations. The top left correlation of 1 is the correlation of Apple with itself, which makes sense!

Instructions

  • The vectors of stock prices for apple, micr, and ibm are in your workspace.
  • Calculate the correlation between apple and ibm.
  • Create a matrix of apple, micr, and ibm, in that order, named stocks using cbind().
  • Try to run the code for the correlation of all three stocks. Notice how it fails when using more than 2 vectors!
  • Rewrite the failing code to use the stocks matrix instead. Correlation matrices are very powerful when you have many stocks!
# Correlation of Apple and IBM
cor(apple, ibm)

# stock matrix
stocks <- cbind(apple, micr, ibm)

# cor() of all three
cor(stocks)

Outputs:

> # Correlation of Apple and IBM
> cor(apple, ibm)
[1] 0.8872467
> 
> # stock matrix
> stocks <- cbind(apple, micr, ibm)
> 
> # cor() of all three
> cor(stocks)
          apple      micr       ibm
apple 1.0000000 0.9477010 0.8872467
micr  0.9477010 1.0000000 0.9126597
ibm   0.8872467 0.9126597 1.0000000

Matrix subsetting

Just like vectors, matrices can be selected from and subsetted! To do this, you will again use [ ], but this time it will have two inputs. The basic structure is:

my_matrix[row, col]

Then: To select the first row and first column of stocks from the last example: stocks[1,1]

To select the entire first row, leave the col empty: stocks[1, ]

To select the first two rows: stocks[1:2, ] or stocks[c(1,2), ]

To select an entire column, leave the row empty: stocks[, 1]

You can also select an entire column by name: stocks[, "apple"]

Instructions

  • stocks is in your workspace.
  • Select the third row of stocks.
  • Select the fourth and fifth row of the ibm column of stocks.
  • Select the apple and micr columns from stocks using c() inside the brackets.
# Third row
stocks[3, ]

# Fourth and fifth row of the ibm column
stocks[4:5, "ibm"]

# apple and micr columns
stocks[ , c("apple","micr")]

Outputs:

> # Third row
> stocks[3, ]
 apple    ibm   micr 
109.11 159.84  60.22
> 
> # Fourth and fifth row of the ibm column
> stocks[4:5, "ibm"]
[1] 160.35 164.79
> 
> # apple and micr columns
> stocks[ , c("apple","micr")]
       apple  micr
 [1,] 109.49 59.20
 [2,] 109.90 59.25
 [3,] 109.11 60.22
 [4,] 109.95 59.95
 [5,] 111.03 61.37
 [6,] 112.12 61.01
 [7,] 113.95 61.97
 [8,] 113.30 62.17
 [9,] 115.19 62.98
[10,] 115.19 62.68
[11,] 115.82 62.58
[12,] 115.97 62.30
[13,] 116.64 63.62
[14,] 116.95 63.54
[15,] 117.06 63.54
[16,] 116.29 63.55
[17,] 116.52 63.24
[18,] 117.26 63.28
[19,] 116.76 62.99
[20,] 116.73 62.90
[21,] 115.82 62.14

Data Frames

Create your first data.frame()

Data frames are great because of their ability to hold a different type of data in each column. To get started, let's use the data.frame() function to create a data frame of your business's future cash flows. Here are the variables that will be in the data frame:

  • company - The company that is paying you the cash flow (A or B).
  • cash_flow - The amount of money a company will receive.
  • year - The number of years from now that you receive the cash flow.

To create the data frame, you do the following:

data.frame(company = c("A", "A", "B"), cash_flow = c(100, 200, 300), year = c(1, 3, 2))

  company cash_flow year
1       A       100    1
2       A       200    3
3       B       300    2

Like matrices, data frames are created from vectors, so this code would have also worked:

company <- c("A", "A", "B")
cash_flow <- c(100, 200, 300)
year <- c(1, 3, 2)

data.frame(company, cash_flow, year)

Instructions

  • New company, cash_flow, and year variables have been defined for you.
  • Create another data frame containing company, cash_flow, and year in that order. Assign it to cash You will use this data frame throughout the rest of the chapter!
  • Print out cash to get a look at your shiny new data frame.
# Variables
company <- c("A", "A", "A", "B", "B", "B", "B")
cash_flow <- c(1000, 4000, 550, 1500, 1100, 750, 6000)
year <- c(1, 3, 4, 1, 2, 4, 5)

# Data frame
cash <- data.frame(company, cash_flow, year)

# Print cash
cash
##   company cash_flow year
## 1       A      1000    1
## 2       A      4000    3
## 3       A       550    4
## 4       B      1500    1
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5

Making head()s and tail()s of your data with some str()ucture

Time to introduce a few simple, but very useful functions.

  • head() - Returns the first few rows of a data frame. By default, 6. To change this, use head(cash, n = ___)
  • tail() - Returns the last few rows of a data frame. By default, 6. To change this, use tail(cash, n = ___)
  • str() - Check the structure of an object. This fantastic function will show you the data type of the object you pass in (here, data.frame), and will list each column variable along with its data type.

With a small data set such as yours, head() and tail() are not incredibly useful, but imagine if you had a data frame of hundreds or thousands of rows!

Instructions

  • Call head() on cash to see the first 4 rows.
  • Call tail() on cash to see the last 3 rows.
  • Call str() on cash to check out the structure of your data frame. (You might notice that the class of company is a Factor and not a character. Do not fear! This will be covered in Chapter 4. For now, don't worry about it.)
# Call head() for the first 4 rows
head(cash, n = 4)
##   company cash_flow year
## 1       A      1000    1
## 2       A      4000    3
## 3       A       550    4
## 4       B      1500    1
# Call tail() for the last 3 rows
tail(cash, n = 3)
##   company cash_flow year
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5
# Call str()
str(cash)
## 'data.frame':    7 obs. of  3 variables:
##  $ company  : Factor w/ 2 levels "A","B": 1 1 1 2 2 2 2
##  $ cash_flow: num  1000 4000 550 1500 1100 750 6000
##  $ year     : num  1 3 4 1 2 4 5

Naming your columns / rows

Let's look at cash again:

cash

  comp cash yr
1    A 1000  1
2    A 4000  3
3    A  550  4
4    B 1500  1
5    B 1100  2
6    B  750  4
7    B 6000  5

Wait, that's not right! It looks like someone has changed your column names! Don't worry, you can change them back using colnames() just like you did with names() back with vectors.

Similarly, you can change the row names using rownames(), but this is less common.

Instructions

  • The altered data frame cash is in your workspace.
  • Fix your column names by using colnames() and assigning a character vector of "company", "cash_flow", and "year" in that order.
  • Print out the fixed colnames() of cash.
# Fix your column names
colnames(cash) <- c("company", "cash_flow", "year")

# Print out the column names of cash
colnames(cash)
## [1] "company"   "cash_flow" "year"

Accessing and subsetting data frames (1)

Even more often than with vectors, you are going to want to subset your data frame or access certain columns. Again, one of the ways to do this is to use [ ]. The notation is just like matrices! Here are some examples:

Select the first row: cash[1, ]

Select the first column: cash[ ,1]

Select the first column by name: cash[ ,"company"]

Instructions

  • Select the third row and second column of cash.
  • Select the fifth row of the "year" column of cash.
# Third row, second column
cash[3, 2]
## [1] 550
# Fifth row of the "year" column
cash[5, "year"]
## [1] 2

Accessing and subsetting data frames (2)

As you might imagine, selecting a specific column from a data frame is a common manipulation. So common, in fact, that it was given its own shortcut, the \(. The following return the same answer: ``` cash\)cash_flow

[1] 1000 4000 550 1500 1100 750 6000

cash[,"cash_flow"]

[1] 1000 4000 550 1500 1100 750 6000 ``` Useful right? Try it out!

Instructions

  • Select the "year" column from cash using $.
  • Select the "cash_flow" column from cash using $ and multiply it by 2.
  • You can delete a column by assigning it NULL. Run the code that deletes "company".
  • Now print out cash again.
# Select the year column
cash$year
## [1] 1 3 4 1 2 4 5
# Select the cash_flow column and multiply by 2
cash$cash_flow * 2
## [1]  2000  8000  1100  3000  2200  1500 12000
# Delete the company column
cash$company <- NULL

# Print cash again
cash
##   cash_flow year
## 1      1000    1
## 2      4000    3
## 3       550    4
## 4      1500    1
## 5      1100    2
## 6       750    4
## 7      6000    5

Accessing and subsetting data frames (3)

Often, just simply selecting a column from a data frame is not all you want to do. What if you are only interested in the cash flows from company A? For more flexibility, try subset()!

subset(cash, company == "A")

  company cash_flow year
1       A      1000    1
2       A      4000    3
3       A       550    4

There are a few important things happening here:

The first argument you pass to subset() is the name of your data frame, cash. Notice that you shouldn't put company in quotes! The == is the equality operator. It tests to find where two things are equal, and returns a logical vector. There is a lot more to learn about these relational operators, and you can learn all about them in the second finance course, Intermediate R for Finance!

Instructions

Use subset() to select only the rows of cash corresponding to company B. Now subset() rows that have cash flows due in 1 year.

# Rows about company B
subset(cash, company == "B")
##   cash_flow year
## 4      1500    1
## 5      1100    2
## 6       750    4
## 7      6000    5
# Rows with cash flows due in 1 year
subset(cash, year == 1)
##   cash_flow year
## 1      1000    1
## 4      1500    1

Adding new columns

In a perfect world, you could be 100% certain that you will receive all of your cash flows. But, since these are predictions about the future, there is always a chance that someone won't be able to pay! You decide to run some analysis about a worst case scenario where you only receive half of your expected cash flow. To save the worst case scenario for later analysis, you decide to add it as a new column to the data frame!

cash$half_cash <- cash$cash_flow * .5

cash

  company cash_flow year half_cash
1       A      1000    1       500
2       A      4000    3      2000
3       A       550    4       275
4       B      1500    1       750
5       B      1100    2       550
6       B       750    4       375
7       B      6000    5      3000

And that's it! Creating new columns in your data frame is as simple as assigning the new information to data_frame$new_column. Often, the newly created column is some transformation of existing columns, so the $ operator really comes in handy here!

Instructions

Create a new worst case scenario where you only receive 25% of your expected cash flow, add it to the data frame as quarter_cash. What if it took twice as long (in terms of year) to receive your money? Add a new column double_year with this scenario.

# Quarter cash flow scenario
cash$quarter_cash <- cash$cash_flow * .25

# Double year scenario
cash$double_year <- cash$year * 2

Present value of projected cash flows (1)

Time for some analysis! Earlier, Lore introduced the idea of present value. You will use that idea in the next two exercises, so here is another example.

If you expect a cash flow of $100 to be received 1 year from now, what is the present value of that cash flow at a 5% interest rate? To calculate this, you discount the cash flow to get it in terms of today's dollars. The general formula for this is:

present_value <- cash_flow * (1 + interest / 100) ^ -year

95.238 = 100 * (1.05) ^ -1

Another way to think about this is to reverse the problem. If you have $95.238 today, and it earns 5% over the next year, how much money do you have at the end of the year? We know how to do this problem from way back in chapter 1! Find the multiplier that corresponds to 5% and multiply by $95.238!

100 = 95.238 * (1.05)

Aha! To discount your money, just do the reverse of what you did with stock returns in chapter 1.

Instructions

  • If you expect to receive $4000 in 3 years, at a 5% interest rate, what is the present value of that money? Follow the general formula above and assign the result to present_value_4k.
  • Using vectors, you can calculate the present value of the entire column of cash_flow at once! Use cash\(cash_flow, cash\)year and the general formula to calculate the present value of all of your cash flows at 5% interest. Add it to cash as the column present_value.
  • Print out cash to see your new column.
# Present value of $4000, in 3 years, at 5%
present_value_4k <- 4000 * (1.05) ^ -3

# Present value of all cash flows
cash$present_value <- cash$cash_flow * (1.05) ^ -cash$year

# Print out cash
cash
##   cash_flow year quarter_cash double_year present_value
## 1      1000    1        250.0           2      952.3810
## 2      4000    3       1000.0           6     3455.3504
## 3       550    4        137.5           8      452.4864
## 4      1500    1        375.0           2     1428.5714
## 5      1100    2        275.0           4      997.7324
## 6       750    4        187.5           8      617.0269
## 7      6000    5       1500.0          10     4701.1570

Present value of projected cash flows (2)

Amazing! You are almost done with this chapter, and you are becoming a true wizard of data frames and finance. Before you move on, let's answer a few more questions.

You now have a column for present_value, but you want to report the total amount of that column to your board members. Calculating this part is easy, use the sum() function you learned earlier to add up the elements of cash$present_value.

However, you also want to know how much company A and company B individually contribute to the total present value. Do you remember how to separate the rows of your data frame to only include company A or B?

cash_A <- subset(cash, company == "A")

sum(cash_A$present_value)

[1] 4860.218

Instructions

  • Use the sum() function to calculate the total present_value of cash. Assign it to total_pv.
  • Subset cash to only include rows about company B to create cash_B.
  • Use sum() and cash_B to calculate the total present_value from company B. Assign it to total_pv_B.
# Total present value of cash
total_pv <- sum(cash$present_value)

# Company B information
cash_B <- subset(cash, company == "B")

# Total present value of cash_B
total_pv_B <- sum(cash_B$present_value)

Factors

Create a factor

Bond credit ratings are common in the fixed income side of the finance world as a simple measure of how "risky" a certain bond might be. Here, riskiness can be defined as the probability of default, which means an inability to pay back your debts. The Standard and Poor's and Fitch credit rating agency has defined the following ratings, from least likely to default to most likely:

AAA, AA, A, BBB, BB, B, CCC, CC, C, D

This is a perfect example of a factor! It is a categorical variable that takes on a limited number of levels.

To create a factor in R, use the factor() function, and pass in a vector that you want to be converted into a factor.

Suppose you have a portfolio of 7 bonds with these credit ratings:

credit_rating <- c("AAA", "AA", "A", "BBB", "AA", "BBB", "A")

To create a factor from this:

factor(credit_rating)

[1] AAA AA  A   BBB AA  BBB A  
Levels: A AA AAA BBB

A new character vector, credit_rating has been created for you in the code for this exercise.

Instructions

  • Turn credit_rating into a factor using factor(). Assign it to credit_factor.
  • Print out credit_factor.
  • Call str() on credit_rating to note the structure.
  • Call str() on credit_factor and compare the structure to credit_rating.
# credit_rating character vector
credit_rating <- c("BB", "AAA", "AA", "CCC", "AA", "AAA", "B", "BB")

# Create a factor from credit_rating
credit_factor <- factor(credit_rating)

# Print out your new factor
credit_factor
## [1] BB  AAA AA  CCC AA  AAA B   BB 
## Levels: AA AAA B BB CCC
# Call str() on credit_rating
str(credit_rating)
##  chr [1:8] "BB" "AAA" "AA" "CCC" "AA" "AAA" "B" "BB"
# Call str() on credit_factor
str(credit_factor)
##  Factor w/ 5 levels "AA","AAA","B",..: 4 2 1 5 1 2 3 4

Factor levels

Accessing the unique levels of your factor is simple enough by using the levels() function. You can also use this to rename your factor levels!

credit_factor

[1] AAA AA  A   BBB AA  BBB A  
Levels: A AA AAA BBB

levels(credit_factor)

[1] "A"   "AA"  "AAA" "BBB"

levels(credit_factor) <- c("1A", "2A", "3A", "3B")

credit_factor

[1] 3A 2A 1A 3B 2A 3B 1A
Levels: 1A 2A 3A 3B

The credit_factor variable you created in the last exercise is available in your workspace.

Instructions

  • Use levels() on credit_factor to identify the unique levels.
  • Using the same "1A", "2A" notation as in the example, rename the levels of credit_factor. Pay close attention to the level order!
  • Print the renamed credit_factor.
# Identify unique levels
levels(credit_factor)
## [1] "AA"  "AAA" "B"   "BB"  "CCC"
# Rename the levels of credit_factor
levels(credit_factor) <- c("2A", "3A", "1B", "2B", "3C")

# Print credit_factor
credit_factor
## [1] 2B 3A 2A 3C 2A 3A 1B 2B
## Levels: 2A 3A 1B 2B 3C

Factor summary

As any good bond investor would do, you would like to keep track of how many bonds you are holding of each credit rating. A way to present a table of the counts of each bond credit rating would be great! Luckily for you, the summary() function for factors can help you with that.

The character vector credit_rating and the factor credit_factor are both in your workspace.

Instructions

  • First call summary() on credit_rating. Does this seem useful?
  • Now try summary() again, but this time on credit_factor.
# Summarize the character vector, credit_rating
summary(credit_rating)
##    Length     Class      Mode 
##         8 character character
# Summarize the factor, credit_factor
summary(credit_factor)
## 2A 3A 1B 2B 3C 
##  2  2  1  2  1

Visualize your factor

You can also visualize the table that you created in the last example by using a bar chart. A bar chart is a type of graph that displays groups of data using rectangular bars where the height of each bar represents the number of counts in that group.

The plot() function can again take care of all of the magic for you, check it out!

Note that in the example below, you are creating the plot from a factor and not a character vector. R will throw an error if you try and plot a character vector!

Instructions

  • The factor credit_factor is in your workspace.
  • Plot credit_factor to create your first bar chart!
# Visualize your factor!
plot(credit_factor)

Bucketing a numeric variable into a factor

Your old friend Dan sent you a list of 50 AAA rated bonds called AAA_rank, with each bond having an additional number from 1-100 describing how profitable he thinks that bond will be (100 being the most profitable). You are interested in doing further analysis on his suggestions, but first it would be nice if the bonds were bucketed by their ranking somehow. This would help you create groups of bonds, from least profitable to most profitable, to more easily analyze them.

This is a great example of creating a factor from a numeric vector. The easiest way to do this is to use cut(). Below, Dan's 1-100 ranking is bucketed into 5 evenly spaced groups. Note that the ( in the factor levels means we do not include the number beside it in that group, and the ] means that we do include that number in the group.

head(AAA_rank)

[1]  31  48 100  53  85  73

AAA_factor <- cut(x = AAA_rank, breaks = c(0, 20, 40, 60, 80, 100))

head(AAA_factor)

[1] (20,40]  (40,60]  (80,100] (40,60]  (80,100] (60,80] 
Levels: (0,20] (20,40] (40,60] (60,80] (80,100]

In the cut() function, using breaks = allows you to specify the groups that you want R to bucket your data by!

Instructions

  • Instead of 5 buckets, can you create just 4? In breaks = use a vector from 0 to 100 where each element is 25 numbers apart. Assign it to AAA_factor.
  • The 4 buckets do not have very descriptive names. Use levels() to rename the levels to "low", "medium", "high", and "very_high", in that order.
  • Print the newly named AAA_factor.
  • Plot the AAA_factor to visualize your work!
# Create 4 buckets for AAA_rank using cut()
AAA_factor <- cut(x = AAA_rank, breaks = c(0, 25, 50, 75, 100))

# Rename the levels 
levels(AAA_factor) <- c("low", "medium", "high", "very_high")

# Print AAA_factor
AAA_factor

# Plot AAA_factor
plot(AAA_factor)

Outputs:

> # Create 4 buckets for AAA_rank using cut()
> AAA_factor <- cut(x = AAA_rank, breaks = c(0, 25, 50, 75, 100))
> 
> # Rename the levels
> levels(AAA_factor) <- c("low", "medium", "high", "very_high")
> 
> # Print AAA_factor
> AAA_factor
 [1] medium    medium    very_high high      very_high high      high     
 [8] high      medium    medium    very_high high      medium    very_high
[15] medium    low       medium    low       high      medium    low      
[22] medium    high      very_high very_high very_high medium    very_high
[29] low       low       low       medium    very_high low       very_high
[36] low       very_high low       low       high      medium    medium   
[43] medium    low       low       low       low       medium    medium   
[50] medium   
Levels: low medium high very_high
> 
> # Plot AAA_factor
> plot(AAA_factor)

Create an ordered factor

Look at the plot created over on the right. It looks great, but look at the order of the bars! No order was specified when you created the factor, so, when R tried to plot it, it just placed the levels in alphabetical order. By now, you know that there is an order to credit ratings, and your plots should reflect that!

As a reminder, the order of credit ratings from least risky to most risky is:

AAA, AA, A, BBB, BB, B, CCC, CC, C, D

To order your factor, there are two options.

When creating a factor, specify ordered = TRUE and add unique levels in order from least to greatest:

credit_rating <- c("AAA", "AA", "A", "BBB", "AA", "BBB", "A")

credit_factor_ordered <- factor(credit_rating, ordered = TRUE, 
                                levels = c("AAA", "AA", "A", "BBB"))
For an existing unordered factor like credit_factor, use the ordered() function:

ordered(credit_factor, levels = c("AAA", "AA", "A", "BBB"))

Both ways result in:

credit_factor_ordered

[1] AAA AA  A   BBB AA  BBB A  
Levels: AAA < AA < A < BBB

Notice the < specifying the order of the levels that was not there before!

Instructions

  • The character vector credit_rating is in your workspace.
  • Use the unique() function with credit_rating to print only the unique words in the character vector. These will be your levels.
  • Use factor() to create an ordered factor for credit_rating and store it as credit_factor_ordered. Make sure to list the levels from least to greatest in terms of risk!
  • Plot credit_factor_ordered and note the new order of the bars.
# Use unique() to find unique words
unique(credit_rating)
## [1] "BB"  "AAA" "AA"  "CCC" "B"
# Create an ordered factor
credit_factor_ordered <- factor(credit_rating, ordered = TRUE, levels = c("AAA", "AA", "BB", "B", "CCC"))

# Plot credit_factor_ordered
plot(credit_factor_ordered)

Subsetting a factor

You can subset factors in a similar way that you subset vectors. As usual, [ ] is the key! However, R has some interesting behavior when you want to remove a factor level from your analysis. For example, what if you wanted to remove the AAA bond from your portfolio?

credit_factor

[1] AAA AA  A   BBB AA  BBB A  
Levels: BBB < A < AA < AAA

credit_factor[-1]

[1] AA  A   BBB AA  BBB A  
Levels: BBB < A < AA < AAA

R removed the AAA bond at the first position, but left the AAA level behind! If you were to plot this, you would end up with the bar chart over to the right. A better plan would have been to tell R to drop the AAA level entirely. To do that, add drop = TRUE:

credit_factor[-1, drop = TRUE]

[1] AA  A   BBB AA  BBB A  
Levels: BBB < A < AA

That's what you wanted!

Instructions

  • Using the same data, remove the "A" bonds from positions 3 and 7 of credit_factor. For now, do not use drop = TRUE. Assign this to keep_level.
  • Plot keep_level.
  • Now, remove "A" from credit_factor again, but this time use drop = TRUE. Assign this to drop_level.
  • Plot drop_level.
# Remove the A bonds at positions 3 and 7. Don't drop the A level.
keep_level <- credit_factor[-c(3,7)]

# Plot keep_level
plot(keep_level)

# Remove the A bonds at positions 3 and 7. Drop the A level.
drop_level <- credit_factor[-c(3,7), drop = TRUE]

# Plot drop_level
plot(drop_level)

stringsAsFactors

Do you remember back in the data frame chapter when you used str() on your cash data frame? This was the output:

str(cash)

'data.frame':    3 obs. of  3 variables:
 $ company  : Factor w/ 2 levels "A","B": 1 1 2
 $ cash_flow: num  100 200 300
 $ year     : num  1 3 2
 ```
See how the company column has been converted to a factor? R's default behavior when creating data frames is to convert all characters into factors. This has caused countless novice R users a headache trying to figure out why their character columns are not working properly, but not you! You will be prepared!

To turn off this behavior:

cash <- data.frame(company, cash_flow, year, stringsAsFactors = FALSE)

str(cash)

'data.frame': 3 obs. of 3 variables: $ company : chr "A" "A" "B" $ cash_flow: num 100 200 300 $ year : num 1 3 2 ``` Instructions

  • Two variables, credit_rating and bond_owners have been defined for you. bond_owners is a character vector of the names of some of your friends.
  • Create a data frame named bonds from credit_rating and bond_owners, in that order, and use stringsAsFactors = FALSE.
  • Use str() to confirm that both columns are characters. bond_owners would not be a useful factor, but credit_rating could be! Create a new column in bonds called credit_factor using $ which is created from credit_rating as a correctly ordered factor.
  • Use str() again to confirm that credit_factor is an ordered factor.
# Variables
credit_rating <- c("AAA", "A", "BB")
bond_owners <- c("Dan", "Tom", "Joe")

# Create the data frame of character vectors, bonds
bonds <- data.frame(credit_rating, bond_owners, stringsAsFactors = FALSE)

# Use str() on bonds
str(bonds)
## 'data.frame':    3 obs. of  2 variables:
##  $ credit_rating: chr  "AAA" "A" "BB"
##  $ bond_owners  : chr  "Dan" "Tom" "Joe"
# Create a factor column in bonds called credit_factor from credit_rating
bonds$credit_factor <- factor(bonds$credit_rating, ordered = TRUE, levels = c("AAA","A","BB"))

# Use str() on bonds again
str(bonds)
## 'data.frame':    3 obs. of  3 variables:
##  $ credit_rating: chr  "AAA" "A" "BB"
##  $ bond_owners  : chr  "Dan" "Tom" "Joe"
##  $ credit_factor: Ord.factor w/ 3 levels "AAA"<"A"<"BB": 1 2 3

Lists

Create a list

Just like a grocery list, lists in R can be used to hold together items of different data types. Creating a list is, you guessed it, as simple as using the list() function. You could say that a list is a kind of super data type: you can store practically any piece of information in it! Create a list like so:

words <- c("I <3 R")
numbers <- c(42, 24)

my_list <- list(words, numbers)

my_list

[[1]]
[1] "I <3 R"

[[2]]
[1] 42 24

Below, you will create your first list from some of the data you have already worked with!

Instructions

  • The 4 components for your list have been created for you.
  • Use list() to create a list of name, apple, ibm, and cor_matrix, in that order, and assign it to portfolio.
  • Print your portfolio.
# List components
name <- "Apple and IBM"
apple <- c(109.49, 109.90, 109.11, 109.95, 111.03)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79)
cor_matrix <- cor(cbind(apple, ibm))

# Create a list
portfolio <- list(name, apple, ibm, cor_matrix)

# View your first list
portfolio
## [[1]]
## [1] "Apple and IBM"
## 
## [[2]]
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## [[3]]
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## [[4]]
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000

Named lists

Knowing how forgetful you are, you decide it would be important to add names to your list so you can remember what each element is describing. There are two ways to do this!

You could name the elements as you create the list with the form name = value:

my_list <- list(my_words = words, my_numbers = numbers)
Or, if the list was already created, you could use names():

my_list <- list(words, numbers)
names(my_list) <- c("my_words", "my_numbers") 

Both would result in:

my_list

$my_words
[1] "I <3 R"

$my_numbers
[1] 42 24

Instructions

  • The portfolio list is available to work with.
  • Use names() to add the following names to your list: "portfolio_name", "apple", "ibm", "correlation", in that order.
  • Print portfolio to see your newly named list.
# Add names to your portfolio
names(portfolio) <- c("portfolio_name", "apple", "ibm", "correlation")

# Print the named portfolio
portfolio
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000

Access elements in a list

Subsetting a list is similar to subsetting a vector or data frame, with one extra useful operation.

To access the elements in the list, use [ ]. This will always return another list.

my_list[1]

$my_words
[1] "I <3 R"

my_list[c(1,2)]

$my_words
[1] "I <3 R"

$my_numbers
[1] 42 24

To pull out the data inside each element of your list, use [[ ]].

my_list[[1]]

[1] "I <3 R"

If your list is named, you can use the $ operator: my_list$my_words. This is the same as using [[ ]] to return the inner data.

Instructions

  • The portfolio named list is available for use.
  • Access the second and third elements of portfolio using [ ] and c().
  • Use $ to access the correlation data.
# Second and third elements of portfolio
portfolio[c(2,3)]
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
# Use $ to get the correlation data
portfolio$correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000

Adding to a list

Once you create a list, you aren't stuck with it forever. You can add new elements to it whenever you want! Say you want to add your friend Dan's favorite movie to your list. You can do so using $ like you did when adding new columns to data frames.

my_list$dans_movie <- "StaR Wars"

my_list

$my_words
[1] "I <3 R"

$my_numbers
[1] 42 24

$dans_movie
[1] "StaR Wars"

You could have also used c() to add another element to the list: c(my_list, dans_movie = "StaR Wars"). This can be useful if you want to add multiple elements to your list at once.

Instructions

  • Another useful piece of information for your portfolio is the variable weight describing how invested you are in Apple and IBM. Fill in the ___ correctly so that you are invested 20% in Apple and 80% in IBM. Remember to use decimal numbers, not percentages!
  • Print portfolio to see the weight element. You can change the data in a list in the same way as adding to it using $. Create weight to be invested 30% in Apple and 70% in IBM.
  • Print portfolio again to see your changes.
# Add weight: 20% Apple, 80% IBM
portfolio$weight <- c(apple = .2, ibm = .8)

# Print portfolio
portfolio
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## 
## $weight
## apple   ibm 
##   0.2   0.8
# Change the weight variable: 30% Apple, 70% IBM
portfolio$weight <- c(apple = .3, ibm = .7)

# Print portfolio to see the changes
portfolio
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## 
## $weight
## apple   ibm 
##   0.3   0.7

Removing from a list

The natural next step is to learn how to remove elements from a list. You decide that even though Dan is your best friend, you don't want his info in your list. To remove dans_movie:

my_list$dans_movie <- NULL

my_list

$my_words
[1] "I <3 R"

$my_numbers
[1] 42 24

Using NULL is the easiest way to remove an element from your list! If your list is not named, you can also remove elements by position using my_list[1] <- NULL or my_list[[1]] <- NULL.

Instructions

  • Take a look at your portfolio. It seems that someone has added microsoft stock that you did not buy!
  • Remove the microsoft element of portfolio using NULL.
# Take a look at portfolio
portfolio
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## 
## $weight
## apple   ibm 
##   0.3   0.7
# Remove the microsoft stock prices from your portfolio
portfolio$microsoft <- NULL

Split it

Often, you will have data for multiple groups together in one data frame. The cash data frame was an example of this back in Chapter 3. There were cash_flow and year columns for two groups (companies A and B). What if you wanted to split up this data frame into two separate data frames divided by company? In the next exercise, you will explore why you might want to do this, but first let's explore how to make this happen using the split() function.

Create a grouping to split on, and use split() to create a list of two data frames.

grouping <- cash$company
split_cash <- split(cash, grouping)

split_cash 

$A
  company cash_flow year
1       A      1000    1
2       A      4000    3
3       A       550    4

$B
  company cash_flow year
4       B      1500    1
5       B      1100    2
6       B       750    4
7       B      6000    5

To get your original data frame back, use unsplit(split_cash, grouping).

Instructions

  • The cash data frame is available in your workspace. Create a new grouping from the year column.
  • Use split() to split cash into a list of 5 data frames separated by year. Assign this to split_cash.
  • Print split_cash.
  • Use unsplit() to combine the data frames again. Assign this to original_cash.
  • Print original_cash to compare to the first cash data frame.
# Define grouping from year
grouping <- cash$year

# Split cash on your new grouping
split_cash <- split(cash, grouping)

# Look at your split_cash list
split_cash
## $`1`
##   cash_flow year quarter_cash double_year present_value
## 1      1000    1          250           2       952.381
## 4      1500    1          375           2      1428.571
## 
## $`2`
##   cash_flow year quarter_cash double_year present_value
## 5      1100    2          275           4      997.7324
## 
## $`3`
##   cash_flow year quarter_cash double_year present_value
## 2      4000    3         1000           6       3455.35
## 
## $`4`
##   cash_flow year quarter_cash double_year present_value
## 3       550    4        137.5           8      452.4864
## 6       750    4        187.5           8      617.0269
## 
## $`5`
##   cash_flow year quarter_cash double_year present_value
## 7      6000    5         1500          10      4701.157
# Unsplit split_cash to get the original data back.
original_cash <- unsplit(split_cash, grouping)

# Print original_cash
original_cash
##   cash_flow year quarter_cash double_year present_value
## 1      1000    1        250.0           2      952.3810
## 2      4000    3       1000.0           6     3455.3504
## 3       550    4        137.5           8      452.4864
## 4      1500    1        375.0           2     1428.5714
## 5      1100    2        275.0           4      997.7324
## 6       750    4        187.5           8      617.0269
## 7      6000    5       1500.0          10     4701.1570

Split-Apply-Combine

A common data science problem is to split your data frame by a grouping, apply some transformation to each group, and then recombine those pieces back into one data frame. This is such a common class of problems in R that it has been given the name split-apply-combine. In Intermediate R for Finance, you will explore a number of these problems and functions that are useful when solving them, but, for now, let's do a simple example.

Suppose, for the cash data frame, you are interested in doubling the cash_flow for company A, and tripling it for company B:

grouping <- cash$company
split_cash <- split(cash, grouping)

# We can access each list element's cash_flow column by:
split_cash$A$cash_flow
[1] 1000 4000  550

split_cash$A$cash_flow <- split_cash$A$cash_flow * 2
split_cash$B$cash_flow <- split_cash$B$cash_flow * 3

new_cash <- unsplit(split_cash, grouping)

Take a look again at how you access the cash_flow column. The first $ is to access the A element of the split_cash list. The second $ is to access the cash_flow column of the data frame in A.

Instructions

  • The split_cash data frame is available for you. Also, the grouping that was used to split cash is available.
  • Print split_cash to get a look at the list.
  • Print the cash_flow column for company B in split_cash.
  • Tragically, you have learned that company A went out of business. Set the cash_flow for company A to 0.
  • Use grouping to unsplit() the split_cash data frame. Assign this to cash_no_A.
  • Finally, print cash_no_A to see the modified data frame.
# Print split_cash
split_cash
## $`1`
##   cash_flow year quarter_cash double_year present_value
## 1      1000    1          250           2       952.381
## 4      1500    1          375           2      1428.571
## 
## $`2`
##   cash_flow year quarter_cash double_year present_value
## 5      1100    2          275           4      997.7324
## 
## $`3`
##   cash_flow year quarter_cash double_year present_value
## 2      4000    3         1000           6       3455.35
## 
## $`4`
##   cash_flow year quarter_cash double_year present_value
## 3       550    4        137.5           8      452.4864
## 6       750    4        187.5           8      617.0269
## 
## $`5`
##   cash_flow year quarter_cash double_year present_value
## 7      6000    5         1500          10      4701.157
# Print the cash_flow column of B in split_cash
split_cash$B$cash_flow
## NULL
# Set the cash_flow column of company A in split_cash to 0
split_cash$A$cash_flow <- 0

# Use the grouping to unsplit split_cash
cash_no_A <- unsplit(split_cash, grouping)

# Print cash_no_A
cash_no_A
##   cash_flow year quarter_cash double_year present_value
## 1      1000    1        250.0           2      952.3810
## 2      4000    3       1000.0           6     3455.3504
## 3       550    4        137.5           8      452.4864
## 4      1500    1        375.0           2     1428.5714
## 5      1100    2        275.0           4      997.7324
## 6       750    4        187.5           8      617.0269
## 7      6000    5       1500.0          10     4701.1570

Attributes

You have made it to the last exercise in the course! Congrats! Let's finish up with an easy one.

Attributes are a bit of extra metadata about your data structure. Some of the most common attributes are: row names and column names, dimensions, and class. You can use the attributes() function to return a list of attributes about the object you pass in. To access a specific attribute, you can use the attr() function.

Exploring the attributes of cash:

attributes(cash)

$names
[1] "company"   "cash_flow" "year"     

$row.names
[1] 1 2 3 4 5 6 7

$class
[1] "data.frame"

attr(cash, which = "names")

[1] "company"   "cash_flow" "year"     

Instructions

  • The matrix my_matrix and the factor my_factor are defined for you.
  • Use attributes() on my_matrix. Use attr() on my_matrix to return the "dim" attribute.
  • Use attributes() on my_factor.
# my_matrix and my_factor
my_matrix <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
rownames(my_matrix) <- c("Row1", "Row2")
colnames(my_matrix) <- c("Col1", "Col2", "Col3")

my_factor <- factor(c("A", "A", "B"), ordered = T, levels = c("A", "B"))

# attributes of my_matrix
attributes(my_matrix)
## $dim
## [1] 2 3
## 
## $dimnames
## $dimnames[[1]]
## [1] "Row1" "Row2"
## 
## $dimnames[[2]]
## [1] "Col1" "Col2" "Col3"
# Just the dim attribute of my_matrix
attr(my_matrix, which = "dim")
## [1] 2 3
# attributes of my_factor
attributes(my_factor)
## $levels
## [1] "A" "B"
## 
## $class
## [1] "ordered" "factor"

The End of The Module