Intro for R for Finance

Chapter 1

Throughout this chapter I learned about the basics of R. First understanding simple addition and subtraction, then figuring out different variables, calculating financial returns and determinging different data types. ### First R Script Must remember to use the order of operations when conducting these

# Addition!
3 + 5
## [1] 8
## [1] 8

# Subtraction!
6-4
## [1] 2
## [1] 2

Arithmetic in R (1)

# Addition 
2 + 2
## [1] 4
# Subtraction
4 - 1
## [1] 3

# Multiplication
3 * 4
## [1] 12

# Division
4 /2
## [1] 2

# Exponentiation
2^4
## [1] 16

# Modulo
7 %% 3
## [1] 1

Assignment and variables

# Assign 200 to savings
savings <- 200

# Print the value of savings to the console
savings
## [1] 200
## [1] 200

Assignment and variables 2

We then looked at “Real life” possibile situations like money problems and how to calculate how much people owed you.

# Assign 100 to my_money
my_money <- 100

# Assign 200 to dans_money
dans_money <- 200

# Add my_money and dans_money
my_money + dans_money
## [1] 300

# Add my_money and dans_money again, save the result to our_money
our_money <- 300

Financial returns

We learned about the multiplier equation which is : $multiplier = 1 + (return / 100)

# Variables for starting_cash and 5% return during January
starting_cash <- 200
jan_ret <- 5   # 5% interest rate
jan_mult <- 1 + (jan_ret / 100)

# How much money do you have at the end of January?
post_jan_cash <- starting_cash * jan_mult

# Print post_jan_cash
post_jan_cash
## [1] 210
## [1] 210

# January 10% return multiplier
jan_ret_10 <- 10
jan_mult_10 <- 1 + 10 / 100 

# How much money do you have at the end of January now?
post_jan_cash_10 <- starting_cash * jan_mult_10 

# Print post_jan_cash_10
post_jan_cash_10
## [1] 220
## [1] 220

Financial Returns 2

# Starting cash and returns 
starting_cash <- 200
jan_ret <- 4   # 4% interest rate
feb_ret <- 5

# Multipliers
jan_mult <- 1 + 4 / 100
feb_mult <- 1 + 5 / 100

# Total cash at the end of the two months
total_cash <- starting_cash * jan_mult * feb_mult

# Print total_cash
total_cash
## [1] 218.4
## [1] 218.4

Data type exploration

R’s most basic data types are numerics, logicals and characters

# Apple's stock price is a numeric
apple_stock <- 150.45 

# Bond credit ratings are characters
credit_rating <- "AAA"

# You like the stock market. TRUE or FALSE?
my_answer <- TRUE

# Print my_answer
my_answer
## [1] TRUE
## [1] TRUE

What’s that data type?

a <- TRUE
class(a)
## [1] "logical"
## [1] "logical"

b <- 5.5
class(b)
## [1] "numeric"
## [1] "numeric"

c <- "Hello World"
class(c)
## [1] "character"
## [1] "character"

Chapter 2

In the next chapter we learned about vertors and matrices using real life examples with Apple and IBM. We were taught how to name, rename, manipulate and create new vectors and matrices. We also worked on plotting the matrix.

C()ombine

# Another numeric vector
ibm_stock <- c(159.82, 160.02, 159.84)

# Another character vector
finance <- c("stocks", "bonds", "investments")

# A logical vector
logic <- c(TRUE, FALSE, TRUE)

Coerce It

A vector can only be composed of one data type. This means that you cannot have both a numeric and a character in the same vector. If you attempt to do this, the lower ranking type will be coerced into the higher ranking type.

Vector Names

Naming the vectors is important due to the Rmarkdown remembering each ret so you can reuse in the future

# Vectors of 12 months of returns, and month names
ret <- c(5, 2, 3, 7, 8, 3, 5, 9, 1, 4, 6, 3)
months <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

# Add names to ret
names(ret) <- months

# Print out ret to see the new names!
ret
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##   5   2   3   7   8   3   5   9   1   4   6   3
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##   5   2   3   7   8   3   5   9   1   4   6   3

Visualizing your vector

# Look at the data
apple_stock <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12, 113.95, 113.30, 115.19, 115.19, 115.82, 115.97, 116.64, 116.95, 117.06, 116.29, 116.52, 117.26, 116.76, 116.73, 115.82)

# Plot the data points
plot(apple_stock)   # The default is "p" for points


# Plot the data as a line graph
plot(apple_stock, type = "l")

Weighted Average

This allows you to calculate your return over a given period of time

# Weights and returns
micr_ret <- 7
sony_ret <- 9
micr_weight <- .2
sony_weight <- .8

# Portfolio return
portf_ret <- micr_ret * micr_weight + sony_ret * sony_weight

Weighted Average 2

It is awesome that this program will calculate everything for you

# Weights, returns, and company names
ret <- c(7, 9)
weight <- c(.2, .8)
companies <- c("Microsoft", "Sony")

# Assign company names to your vectors
names(ret) <- companies
names(weight) <- companies

# Multiply the returns and weights together 
ret_X_weight <- ret * weight

# Print ret_X_weight
ret_X_weight
## Microsoft      Sony 
##       1.4       7.2

# Sum to get the total portfolio return
portf_ret <-sum(ret_X_weight)

# Print portf_ret
portf_ret
## [1] 8.6

Weighted average 3

# Print ret
ret
## Microsoft      Sony 
##         7         9
## Microsoft      Sony 
##         7         9

# Assign 1/3 to weight
weight <- 1/3

# Create ret_X_weight
ret_X_weight <- ret * weight
ret_X_weight
## Microsoft      Sony 
##  2.333333  3.000000
## Microsoft      Sony 
##  2.333333  3.000000

# Calculate your portfolio return
portf_ret <- sum(ret_X_weight)
portf_ret
## [1] 5.333333
## [1] 5.333333

# Vector of length 3 * Vector of length 2?
ret * c(.2, .6) # R reuses the 1st value of the vector of length 2, but notice the warning!
## Microsoft      Sony 
##       1.4       5.4
## Microsoft      Sony 
##       1.4       5.4

Vector Subsetting

# Define ret
ret <- c(5, 2, 3, 7, 8, 3, 5, 9, 1, 4, 6, 3)
names(ret) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
ret
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##   5   2   3   7   8   3   5   9   1   4   6   3
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##   5   2   3   7   8   3   5   9   1   4   6   3

# First 6 months of returns
ret[1:6]
## Jan Feb Mar Apr May Jun 
##   5   2   3   7   8   3
## Jan Feb Mar Apr May Jun 
##   5   2   3   7   8   3

# Just March and May
ret[c("Mar", "May")]
## Mar May 
##   3   8
## Mar May 
##   3   8

# Omit the first month of returns
ret[-1]
## Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##   2   3   7   8   3   5   9   1   4   6   3
## Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##   2   3   7   8   3   5   9   1   4   6   3

Create your Matrix

# A vector of 9 numbers
my_vector <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

# 3x3 matrix
my_matrix <- matrix(data = my_vector, nrow = 3, ncol = 3)

# Print my_matrix
my_matrix
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

# Filling across using byrow = TRUE
matrix(data = c(2, 3, 4, 5), nrow = 2, ncol = 2, byrow = TRUE)
##      [,1] [,2]
## [1,]    2    3
## [2,]    4    5
##      [,1] [,2]
## [1,]    2    3
## [2,]    4    5

Matrix <- bind vectors

# Define vectors
apple <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12, 113.95, 113.30, 115.19, 115.19,
           115.82, 115.97, 116.64, 116.95, 117.06, 116.29, 116.52, 117.26, 116.76, 116.73,
           115.82)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79, 165.36, 166.52, 165.50, 168.29, 168.51, 
         168.02, 166.73, 166.68, 167.60, 167.33, 167.06, 166.71, 167.14, 166.19, 166.60, 
         165.99)
micr <- c(59.20, 59.25, 60.22, 59.95, 61.37, 61.01, 61.97, 62.17, 62.98, 62.68, 62.58,
          62.30, 63.62, 63.54, 63.54, 63.55, 63.24, 63.28, 62.99, 62.90, 62.14)

# cbind the vectors together
cbind_stocks <- cbind(apple, ibm, micr)

# Print cbind_stocks
cbind_stocks
##        apple    ibm  micr
##  [1,] 109.49 159.82 59.20
##  [2,] 109.90 160.02 59.25
##  [3,] 109.11 159.84 60.22
##  [4,] 109.95 160.35 59.95
##  [5,] 111.03 164.79 61.37
##  [6,] 112.12 165.36 61.01
##  [7,] 113.95 166.52 61.97
##  [8,] 113.30 165.50 62.17
##  [9,] 115.19 168.29 62.98
## [10,] 115.19 168.51 62.68
## [11,] 115.82 168.02 62.58
## [12,] 115.97 166.73 62.30
## [13,] 116.64 166.68 63.62
## [14,] 116.95 167.60 63.54
## [15,] 117.06 167.33 63.54
## [16,] 116.29 167.06 63.55
## [17,] 116.52 166.71 63.24
## [18,] 117.26 167.14 63.28
## [19,] 116.76 166.19 62.99
## [20,] 116.73 166.60 62.90
## [21,] 115.82 165.99 62.14
##        apple    ibm  micr
##  [1,] 109.49 159.82 59.20
##  [2,] 109.90 160.02 59.25
##  [3,] 109.11 159.84 60.22
##  [4,] 109.95 160.35 59.95
##  [5,] 111.03 164.79 61.37
##  [6,] 112.12 165.36 61.01
##  [7,] 113.95 166.52 61.97
##  [8,] 113.30 165.50 62.17
##  [9,] 115.19 168.29 62.98
## [10,] 115.19 168.51 62.68
## [11,] 115.82 168.02 62.58
## [12,] 115.97 166.73 62.30
## [13,] 116.64 166.68 63.62
## [14,] 116.95 167.60 63.54
## [15,] 117.06 167.33 63.54
## [16,] 116.29 167.06 63.55
## [17,] 116.52 166.71 63.24
## [18,] 117.26 167.14 63.28
## [19,] 116.76 166.19 62.99
## [20,] 116.73 166.60 62.90
## [21,] 115.82 165.99 62.14

# rbind the vectors together
rbind_stocks <- rbind(apple, ibm, micr) 

# Print rbind_stocks
rbind_stocks
##         [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]
## apple 109.49 109.90 109.11 109.95 111.03 112.12 113.95 113.30 115.19
## ibm   159.82 160.02 159.84 160.35 164.79 165.36 166.52 165.50 168.29
## micr   59.20  59.25  60.22  59.95  61.37  61.01  61.97  62.17  62.98
##        [,10]  [,11]  [,12]  [,13]  [,14]  [,15]  [,16]  [,17]  [,18]
## apple 115.19 115.82 115.97 116.64 116.95 117.06 116.29 116.52 117.26
## ibm   168.51 168.02 166.73 166.68 167.60 167.33 167.06 166.71 167.14
## micr   62.68  62.58  62.30  63.62  63.54  63.54  63.55  63.24  63.28
##        [,19]  [,20]  [,21]
## apple 116.76 116.73 115.82
## ibm   166.19 166.60 165.99
## micr   62.99  62.90  62.14
##         [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]
## apple 109.49 109.90 109.11 109.95 111.03 112.12 113.95 113.30 115.19
## ibm   159.82 160.02 159.84 160.35 164.79 165.36 166.52 165.50 168.29
## micr   59.20  59.25  60.22  59.95  61.37  61.01  61.97  62.17  62.98
##        [,10]  [,11]  [,12]  [,13]  [,14]  [,15]  [,16]  [,17]  [,18]
## apple 115.19 115.82 115.97 116.64 116.95 117.06 116.29 116.52 117.26
## ibm   168.51 168.02 166.73 166.68 167.60 167.33 167.06 166.71 167.14
## micr   62.68  62.58  62.30  63.62  63.54  63.54  63.55  63.24  63.28
##        [,19]  [,20]  [,21]
## apple 116.76 116.73 115.82
## ibm   166.19 166.60 165.99
## micr   62.99  62.90  62.14

Visualize your Matrix

# Define matrix
apple_micr_matrix <- cbind(apple, micr)

# View the data
apple_micr_matrix
##        apple  micr
##  [1,] 109.49 59.20
##  [2,] 109.90 59.25
##  [3,] 109.11 60.22
##  [4,] 109.95 59.95
##  [5,] 111.03 61.37
##  [6,] 112.12 61.01
##  [7,] 113.95 61.97
##  [8,] 113.30 62.17
##  [9,] 115.19 62.98
## [10,] 115.19 62.68
## [11,] 115.82 62.58
## [12,] 115.97 62.30
## [13,] 116.64 63.62
## [14,] 116.95 63.54
## [15,] 117.06 63.54
## [16,] 116.29 63.55
## [17,] 116.52 63.24
## [18,] 117.26 63.28
## [19,] 116.76 62.99
## [20,] 116.73 62.90
## [21,] 115.82 62.14
##        apple  micr
##  [1,] 109.49 59.20
##  [2,] 109.90 59.25
##  [3,] 109.11 60.22
##  [4,] 109.95 59.95
##  [5,] 111.03 61.37
##  [6,] 112.12 61.01
##  [7,] 113.95 61.97
##  [8,] 113.30 62.17
##  [9,] 115.19 62.98
## [10,] 115.19 62.68
## [11,] 115.82 62.58
## [12,] 115.97 62.30
## [13,] 116.64 63.62
## [14,] 116.95 63.54
## [15,] 117.06 63.54
## [16,] 116.29 63.55
## [17,] 116.52 63.24
## [18,] 117.26 63.28
## [19,] 116.76 62.99
## [20,] 116.73 62.90
## [21,] 115.82 62.14

# Scatter plot of Microsoft vs Apple
plot(apple_micr_matrix)

Correlation

This function detects the correlation between two vectors

apple <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12, 113.95, 113.30, 115.19, 115.19,
           115.82, 115.97, 116.64, 116.95, 117.06, 116.29, 116.52, 117.26, 116.76, 116.73,
           115.82)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79, 165.36, 166.52, 165.50, 168.29, 168.51, 
         168.02, 166.73, 166.68, 167.60, 167.33, 167.06, 166.71, 167.14, 166.19, 166.60, 
         165.99)
micr <- c(59.20, 59.25, 60.22, 59.95, 61.37, 61.01, 61.97, 62.17, 62.98, 62.68, 62.58,
          62.30, 63.62, 63.54, 63.54, 63.55, 63.24, 63.28, 62.99, 62.90, 62.14)


# Correlation of Apple and IBM
cor(apple, ibm)
## [1] 0.8872467
## [1] 0.8872467

# stock matrix
stocks <- cbind(apple, micr, ibm)

# cor() of all three
cor(stocks) 
##           apple      micr       ibm
## apple 1.0000000 0.9477010 0.8872467
## micr  0.9477010 1.0000000 0.9126597
## ibm   0.8872467 0.9126597 1.0000000
##           apple      micr       ibm
## apple 1.0000000 0.9477010 0.8872467
## micr  0.9477010 1.0000000 0.9126597
## ibm   0.8872467 0.9126597 1.0000000

# Note how it fails when using more than 2 vectors! Try to run the code for the correlation of all three stocks.
#cor(apple, micr, ibm)

Matrix Subsetting

# Third row
stocks[3, ]
##  apple   micr    ibm 
## 109.11  60.22 159.84
##  apple   micr    ibm 
## 109.11  60.22 159.84

# Fourth and fifth row of the ibm column
stocks[4:5, "ibm"]
## [1] 160.35 164.79
## [1] 160.35 164.79

# apple and micr columns
stocks[, c("apple", "micr")]
##        apple  micr
##  [1,] 109.49 59.20
##  [2,] 109.90 59.25
##  [3,] 109.11 60.22
##  [4,] 109.95 59.95
##  [5,] 111.03 61.37
##  [6,] 112.12 61.01
##  [7,] 113.95 61.97
##  [8,] 113.30 62.17
##  [9,] 115.19 62.98
## [10,] 115.19 62.68
## [11,] 115.82 62.58
## [12,] 115.97 62.30
## [13,] 116.64 63.62
## [14,] 116.95 63.54
## [15,] 117.06 63.54
## [16,] 116.29 63.55
## [17,] 116.52 63.24
## [18,] 117.26 63.28
## [19,] 116.76 62.99
## [20,] 116.73 62.90
## [21,] 115.82 62.14
##        apple  micr
##  [1,] 109.49 59.20
##  [2,] 109.90 59.25
##  [3,] 109.11 60.22
##  [4,] 109.95 59.95
##  [5,] 111.03 61.37
##  [6,] 112.12 61.01
##  [7,] 113.95 61.97
##  [8,] 113.30 62.17
##  [9,] 115.19 62.98
## [10,] 115.19 62.68
## [11,] 115.82 62.58
## [12,] 115.97 62.30
## [13,] 116.64 63.62
## [14,] 116.95 63.54
## [15,] 117.06 63.54
## [16,] 116.29 63.55
## [17,] 116.52 63.24
## [18,] 117.26 63.28
## [19,] 116.76 62.99
## [20,] 116.73 62.90
## [21,] 115.82 62.14

Chapter 3

In the Data Frames chapter we worked on using data is tables and separating them from each column. We also learned how to manipulate each column and how to delete them. Throughout this we were also able to calculate interest for projected cash flows.

Creating a data frame

# Variables
company <- c("A", "A", "A", "B", "B", "B", "B")
cash_flow <- c(1000, 4000, 550, 1500, 1100, 750, 6000)
year <- c(1, 3, 4, 1, 2, 4, 5)

# Data frame
cash <- data.frame(company, cash_flow, year)

# Print cash
cash
##   company cash_flow year
## 1       A      1000    1
## 2       A      4000    3
## 3       A       550    4
## 4       B      1500    1
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5

Making head()s and tail()s of your data with some str()ucture

These functions can be useful when needing to look at specific parts of your data frames

# Call head() for the first 4 rows
head(cash, n = 4)
##   company cash_flow year
## 1       A      1000    1
## 2       A      4000    3
## 3       A       550    4
## 4       B      1500    1
##   company cash_flow year
## 1       A      1000    1
## 2       A      4000    3
## 3       A       550    4
## 4       B      1500    1

# Call tail() for the last 3 rows
tail(cash, n= 3)
##   company cash_flow year
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5
##   company cash_flow year
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5

# Call str()
str(cash)
## 'data.frame':    7 obs. of  3 variables:
##  $ company  : Factor w/ 2 levels "A","B": 1 1 1 2 2 2 2
##  $ cash_flow: num  1000 4000 550 1500 1100 750 6000
##  $ year     : num  1 3 4 1 2 4 5
## 'data.frame':    7 obs. of  3 variables:
##  $ company  : Factor w/ 2 levels "A","B": 1 1 1 2 2 2 2
##  $ cash_flow: num  1000 4000 550 1500 1100 750 6000
##  $ year     : num  1 3 4 1 2 4 5

Naming your columns / rows

# Fix your column names
colnames(cash) <- c("company", "cash_flow", "year")

# Print out the column names of cash
colnames(cash)
## [1] "company"   "cash_flow" "year"
## [1] "company"   "cash_flow" "year"

Accessing and subsetting data frames

# Third row, second column
cash[3, 2]
## [1] 550
## [1] 550

# Fifth row of the "year" column
cash[5, "year"]
## [1] 2
## [1] 2

Accessing and subsetting data frames 2

# Third row, second column
cash[3, 2]
## [1] 550
## [1] 550

# Fifth row of the "year" column
cash[5, "year"]
## [1] 2
## [1] 2

Accessing data frames and subsettings 3

# Restore cash
company <- c("A", "A", "A", "B", "B", "B", "B")
cash_flow <- c(1000, 4000, 550, 1500, 1100, 750, 6000)
year <- c(1, 3, 4, 1, 2, 4, 5)

cash <- data.frame(company, cash_flow, year)

# Rows about company B
subset(cash, company == "B")
##   company cash_flow year
## 4       B      1500    1
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5
##   company cash_flow year
## 4       B      1500    1
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5

# Rows with cash flows due in 1 year
subset(cash, year == 1)
##   company cash_flow year
## 1       A      1000    1
## 4       B      1500    1
##   company cash_flow year
## 1       A      1000    1
## 4       B      1500    1

Adding new columns

# Quarter cash flow scenario
cash$quarter_cash <- cash$cash_flow * 0.25
cash 
##   company cash_flow year quarter_cash
## 1       A      1000    1        250.0
## 2       A      4000    3       1000.0
## 3       A       550    4        137.5
## 4       B      1500    1        375.0
## 5       B      1100    2        275.0
## 6       B       750    4        187.5
## 7       B      6000    5       1500.0
##   company cash_flow year quarter_cash
## 1       A      1000    1        250.0
## 2       A      4000    3       1000.0
## 3       A       550    4        137.5
## 4       B      1500    1        375.0
## 5       B      1100    2        275.0
## 6       B       750    4        187.5
## 7       B      6000    5       1500.0

# Double year scenario
cash$double_year <- cash$year * 2
cash
##   company cash_flow year quarter_cash double_year
## 1       A      1000    1        250.0           2
## 2       A      4000    3       1000.0           6
## 3       A       550    4        137.5           8
## 4       B      1500    1        375.0           2
## 5       B      1100    2        275.0           4
## 6       B       750    4        187.5           8
## 7       B      6000    5       1500.0          10
##   company cash_flow year quarter_cash double_year
## 1       A      1000    1        250.0           2
## 2       A      4000    3       1000.0           6
## 3       A       550    4        137.5           8
## 4       B      1500    1        375.0           2
## 5       B      1100    2        275.0           4
## 6       B       750    4        187.5           8
## 7       B      6000    5       1500.0          10

Present value of projected cash flows

Learning this equation is helpful when wanting to find out how much cash will be coming in within X amount of years : $present_value <- cash_flow * (1 + interest / 100) ^ -year

# Restore cash
cash$quarter_cash <- NULL
cash$double_year <- NULL

# Present value of $4000, in 3 years, at 5%
present_value_4k <- 4000 * (1+0.05)^(-3)

# Present value of all cash flows
cash$present_value <- cash$cash_flow * (1+0.05)^(-cash$year)

# Print out cash
cash
##   company cash_flow year present_value
## 1       A      1000    1      952.3810
## 2       A      4000    3     3455.3504
## 3       A       550    4      452.4864
## 4       B      1500    1     1428.5714
## 5       B      1100    2      997.7324
## 6       B       750    4      617.0269
## 7       B      6000    5     4701.1570
##   company cash_flow year present_value
## 1       A      1000    1      952.3810
## 2       A      4000    3     3455.3504
## 3       A       550    4      452.4864
## 4       B      1500    1     1428.5714
## 5       B      1100    2      997.7324
## 6       B       750    4      617.0269
## 7       B      6000    5     4701.1570

Present value of projected cash flows 2

# Total present value of cash
total_pv <- sum(cash$present_value)
total_pv
## [1] 12604.71
## [1] 12604.71

# Company B information
cash_B <- subset(cash, company == "B")
cash_B
##   company cash_flow year present_value
## 4       B      1500    1     1428.5714
## 5       B      1100    2      997.7324
## 6       B       750    4      617.0269
## 7       B      6000    5     4701.1570
##   company cash_flow year present_value
## 4       B      1500    1     1428.5714
## 5       B      1100    2      997.7324
## 6       B       750    4      617.0269
## 7       B      6000    5     4701.1570

# Total present value of cash_B
total_pv_B <- sum(cash_B$present_value)
total_pv_B
## [1] 7744.488
## [1] 7744.488

Chapter 4

In the Factors chapter we assesed the different levels in each credit factor. We learned about the summary command which is helpful when you want a table of the counts of your data.

Creating a vector

# credit_rating character vector
credit_rating <- c("BB", "AAA", "AA", "CCC", "AA", "AAA", "B", "BB")

# Create a factor from credit_rating
credit_factor <- factor(credit_rating)

# Print out your new factor
credit_factor
## [1] BB  AAA AA  CCC AA  AAA B   BB 
## Levels: AA AAA B BB CCC
## [1] BB  AAA AA  CCC AA  AAA B   BB 
## Levels: AA AAA B BB CCC

Factor Levels

# Identify unique levels
levels(credit_factor)
## [1] "AA"  "AAA" "B"   "BB"  "CCC"
## [1] "AA"  "AAA" "B"   "BB"  "CCC"

# Rename the levels of credit_factor
levels(credit_factor)
## [1] "AA"  "AAA" "B"   "BB"  "CCC"
## [1] "AA"  "AAA" "B"   "BB"  "CCC"
levels(credit_factor) <- c("2A", "3A", "1B", "2B", "3C")

# Print credit_factor
credit_factor
## [1] 2B 3A 2A 3C 2A 3A 1B 2B
## Levels: 2A 3A 1B 2B 3C
## [1] 2B 3A 2A 3C 2A 3A 1B 2B
## Levels: 2A 3A 1B 2B 3C

Factor summary

# Restore credit_factor
levels(credit_factor)
## [1] "2A" "3A" "1B" "2B" "3C"
## [1] "2A" "3A" "1B" "2B" "3C"
levels(credit_factor) <- c("AA", "AAA", "B", "BB", "CCC")

# Summarize the character vector, credit_rating
summary(credit_rating)
##    Length     Class      Mode 
##         8 character character
##    Length     Class      Mode 
##         8 character character

# Summarize the factor, credit_factor
summary(credit_factor)
##  AA AAA   B  BB CCC 
##   2   2   1   2   1
##  AA AAA   B  BB CCC 
##   2   2   1   2   1

Visualize your factor

# Visualize your factor!
plot(credit_factor)

Bucketing a numeric variable into a factor

# Define AAA_rank.
AAA_rank <- c(31,  48, 100, 53, 85, 73, 62, 74, 42, 38, 97, 61, 48, 86, 44, 9, 43, 18,  62,
              38, 23, 37, 54, 80, 78, 93, 47, 100, 22,  22, 18, 26, 81, 17, 98, 4,  83, 5,
              6,  52, 29, 44, 50, 2,  25, 19, 15, 42, 30, 27)

# Create 4 buckets for AAA_rank using cut()
AAA_factor <- cut(x = AAA_rank, breaks = c(0, 25, 50, 75, 100))

# Rename the levels 
levels(AAA_factor)
## [1] "(0,25]"   "(25,50]"  "(50,75]"  "(75,100]"
## [1] "(0,25]"   "(25,50]"  "(50,75]"  "(75,100]"
levels(AAA_factor) <- c("low", "medium", "high", "very_high")

# Print AAA_factor
AAA_factor
##  [1] medium    medium    very_high high      very_high high      high     
##  [8] high      medium    medium    very_high high      medium    very_high
## [15] medium    low       medium    low       high      medium    low      
## [22] medium    high      very_high very_high very_high medium    very_high
## [29] low       low       low       medium    very_high low       very_high
## [36] low       very_high low       low       high      medium    medium   
## [43] medium    low       low       low       low       medium    medium   
## [50] medium   
## Levels: low medium high very_high
##  [1] medium    medium    very_high high      very_high high      high     
##  [8] high      medium    medium    very_high high      medium    very_high
## [15] medium    low       medium    low       high      medium    low      
## [22] medium    high      very_high very_high very_high medium    very_high
## [29] low       low       low       medium    very_high low       very_high
## [36] low       very_high low       low       high      medium    medium   
## [43] medium    low       low       low       low       medium    medium   
## [50] medium   
## Levels: low medium high very_high

# Plot AAA_factor
plot(AAA_factor)

Create an ordered factor

# Use unique() to find unique words
unique(credit_rating)
## [1] "BB"  "AAA" "AA"  "CCC" "B"
## [1] "BB"  "AAA" "AA"  "CCC" "B"

# Create an ordered factor
credit_factor_ordered <- factor(credit_rating, ordered = TRUE, levels = c("AAA", "AA", "BB", "B", "CCC"))

# Plot credit_factor_ordered
plot(credit_factor_ordered)

Subsetting a factor

# Define credit_factor
credit_factor <- factor(c("AAA", "AA", "A", "BBB", "AA", "BBB", "A"),
                        ordered = TRUE,
                        levels = c("BBB", "A", "AA", "AAA"))

# Remove the A bonds at positions 3 and 7. Don't drop the A level.
keep_level <- credit_factor[-c(3, 7)]

# Plot keep_level
plot(keep_level)


# Remove the A bonds at positions 3 and 7. Drop the A level.
drop_level <- droplevels(keep_level)

# Plot drop_level
plot(drop_level)

stringsAsFactors

# Variables
credit_rating <- c("AAA", "A", "BB")
bond_owners <- c("Dan", "Tom", "Joe")

# Create the data frame of character vectors, bonds
bonds <- data.frame(credit_rating, bond_owners, stringsAsFactors = FALSE)
bonds
##   credit_rating bond_owners
## 1           AAA         Dan
## 2             A         Tom
## 3            BB         Joe
##   credit_rating bond_owners
## 1           AAA         Dan
## 2             A         Tom
## 3            BB         Joe

# Use str() on bonds
str(bonds)
## 'data.frame':    3 obs. of  2 variables:
##  $ credit_rating: chr  "AAA" "A" "BB"
##  $ bond_owners  : chr  "Dan" "Tom" "Joe"
## 'data.frame':    3 obs. of  2 variables:
##  $ credit_rating: chr  "AAA" "A" "BB"
##  $ bond_owners  : chr  "Dan" "Tom" "Joe"

# Create a factor column in bonds called credit_factor from credit_rating
bonds$credit_factor <- factor(bonds$credit_rating, ordered = TRUE, levels = c("AAA", "A", "BB"))

# Use str() on bonds again
str(bonds)
## 'data.frame':    3 obs. of  3 variables:
##  $ credit_rating: chr  "AAA" "A" "BB"
##  $ bond_owners  : chr  "Dan" "Tom" "Joe"
##  $ credit_factor: Ord.factor w/ 3 levels "AAA"<"A"<"BB": 1 2 3
## 'data.frame':    3 obs. of  3 variables:
##  $ credit_rating: chr  "AAA" "A" "BB"
##  $ bond_owners  : chr  "Dan" "Tom" "Joe"
##  $ credit_factor: Ord.factor w/ 3 levels "AAA"<"A"<"BB": 1 2 3

Chapter 5

In the last chapter, Lists, we built a portfoilio of stocks to help us keep track of all our vectors, matrices and data frames. We learned how to create, name, add to a list, remove from a list, split/unsplit a list and combine lists.

Create a list

# List components
name <- "Apple and IBM"
apple <- c(109.49, 109.90, 109.11, 109.95, 111.03)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79)
cor_matrix <- cor(cbind(apple, ibm))

# Create a list
portfolio <- list(name, apple, ibm, cor_matrix)

# View your first list
portfolio
## [[1]]
## [1] "Apple and IBM"
## 
## [[2]]
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## [[3]]
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## [[4]]
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## [[1]]
## [1] "Apple and IBM"
## 
## [[2]]
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## [[3]]
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## [[4]]
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000

Named lists

There are 2 ways to name a list; creating a list and renaming a list

# Add names to your portfolio
names(portfolio) <- c("portfolio_name", "apple", "ibm", "correlation")

# Print portfolio
portfolio
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000

Accessing elements in a list

# Second and third elements of portfolio
portfolio[c(2,3)]
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79

# Use $ to get the correlation data
portfolio$correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000

# Third item of the second element of portfolio
portfolio[[c(2,3)]]
## [1] 109.11
## [1] 109.11

Adding to a list

# Add weight: 20% Apple, 80% IBM
portfolio$weight <- c(apple = 0.2, ibm = 0.8)

# Print portfolio
portfolio
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## 
## $weight
## apple   ibm 
##   0.2   0.8
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## 
## $weight
## apple   ibm 
##   0.2   0.8

# Change the weight variable: 30% Apple, 70% IBM
portfolio$weight <- c(apple = 0.3, ibm = 0.7)

# Print portfolio to see the changes
portfolio
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## 
## $weight
## apple   ibm 
##   0.3   0.7
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## 
## $weight
## apple   ibm 
##   0.3   0.7

Split-Apply-Combine

# Define split_cash and grouping
split_cash <- split(cash, company)
grouping <- company

# Print split_cash
split_cash
## $A
##   company cash_flow year present_value
## 1       A      1000    1      952.3810
## 2       A      4000    3     3455.3504
## 3       A       550    4      452.4864
## 
## $B
##   company cash_flow year present_value
## 4       B      1500    1     1428.5714
## 5       B      1100    2      997.7324
## 6       B       750    4      617.0269
## 7       B      6000    5     4701.1570
## $A
##   company cash_flow year
## 1       A      1000    1
## 2       A      4000    3
## 3       A       550    4
## 
## $B
##   company cash_flow year
## 4       B      1500    1
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5

# Print the cash_flow column of B in split_cash
split_cash$B$cash_flow
## [1] 1500 1100  750 6000
## [1] 1500 1100  750 6000

# Set the cash_flow column of company A in split_cash to 0
split_cash$A$cash_flow <- 0

# Use the grouping to unsplit split_cash
cash_no_A <- unsplit(split_cash, grouping)

# Print cash_no_A
cash_no_A
##   company cash_flow year present_value
## 1       A         0    1      952.3810
## 2       A         0    3     3455.3504
## 3       A         0    4      452.4864
## 4       B      1500    1     1428.5714
## 5       B      1100    2      997.7324
## 6       B       750    4      617.0269
## 7       B      6000    5     4701.1570
##   company cash_flow year
## 1       A         0    1
## 2       A         0    3
## 3       A         0    4
## 4       B      1500    1
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5

Removing from a list

# Define portfolio
portfolio_name <- "Apple and IBM"
apple <- c(109.49, 109.90, 109.11, 109.95, 111.03)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79)
microsoft <- c(150.0, 152.0, 154.0, 154.5)
correlation <- cor(cbind(apple, ibm))

portfolio <- list(portfolio_name = portfolio_name, 
                  apple = apple, 
                  ibm = ibm, 
                  microsoft = microsoft, 
                  correlation = correlation)


# Take a look at portfolio
portfolio
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $microsoft
## [1] 150.0 152.0 154.0 154.5
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000

# Remove the microsoft stock prices from your portfolio
portfolio$microsoft <- NULL

Spliting a list

# Define cash
cash$present_value <- NULL

# Define grouping from year
grouping <- cash$year

# Split cash on your new grouping
split_cash <- split(cash, grouping)

# Look at your split_cash list
split_cash
## $`1`
##   company cash_flow year
## 1       A      1000    1
## 4       B      1500    1
## 
## $`2`
##   company cash_flow year
## 5       B      1100    2
## 
## $`3`
##   company cash_flow year
## 2       A      4000    3
## 
## $`4`
##   company cash_flow year
## 3       A       550    4
## 6       B       750    4
## 
## $`5`
##   company cash_flow year
## 7       B      6000    5
## $`1`
##   company cash_flow year
## 1       A      1000    1
## 4       B      1500    1
## 
## $`2`
##   company cash_flow year
## 5       B      1100    2
## 
## $`3`
##   company cash_flow year
## 2       A      4000    3
## 
## $`4`
##   company cash_flow year
## 3       A       550    4
## 6       B       750    4
## 
## $`5`
##   company cash_flow year
## 7       B      6000    5
str(split_cash)
## List of 5
##  $ 1:'data.frame':   2 obs. of  3 variables:
##   ..$ company  : Factor w/ 2 levels "A","B": 1 2
##   ..$ cash_flow: num [1:2] 1000 1500
##   ..$ year     : num [1:2] 1 1
##  $ 2:'data.frame':   1 obs. of  3 variables:
##   ..$ company  : Factor w/ 2 levels "A","B": 2
##   ..$ cash_flow: num 1100
##   ..$ year     : num 2
##  $ 3:'data.frame':   1 obs. of  3 variables:
##   ..$ company  : Factor w/ 2 levels "A","B": 1
##   ..$ cash_flow: num 4000
##   ..$ year     : num 3
##  $ 4:'data.frame':   2 obs. of  3 variables:
##   ..$ company  : Factor w/ 2 levels "A","B": 1 2
##   ..$ cash_flow: num [1:2] 550 750
##   ..$ year     : num [1:2] 4 4
##  $ 5:'data.frame':   1 obs. of  3 variables:
##   ..$ company  : Factor w/ 2 levels "A","B": 2
##   ..$ cash_flow: num 6000
##   ..$ year     : num 5
## List of 5
##  $ 1:'data.frame':   2 obs. of  3 variables:
##   ..$ company  : Factor w/ 2 levels "A","B": 1 2
##   ..$ cash_flow: num [1:2] 1000 1500
##   ..$ year     : num [1:2] 1 1
##  $ 2:'data.frame':   1 obs. of  3 variables:
##   ..$ company  : Factor w/ 2 levels "A","B": 2
##   ..$ cash_flow: num 1100
##   ..$ year     : num 2
##  $ 3:'data.frame':   1 obs. of  3 variables:
##   ..$ company  : Factor w/ 2 levels "A","B": 1
##   ..$ cash_flow: num 4000
##   ..$ year     : num 3
##  $ 4:'data.frame':   2 obs. of  3 variables:
##   ..$ company  : Factor w/ 2 levels "A","B": 1 2
##   ..$ cash_flow: num [1:2] 550 750
##   ..$ year     : num [1:2] 4 4
##  $ 5:'data.frame':   1 obs. of  3 variables:
##   ..$ company  : Factor w/ 2 levels "A","B": 2
##   ..$ cash_flow: num 6000
##   ..$ year     : num 5

# Unsplit split_cash to get the original data back.
original_cash <- unsplit(split_cash, grouping)

# Print original_cash
cash
##   company cash_flow year
## 1       A      1000    1
## 2       A      4000    3
## 3       A       550    4
## 4       B      1500    1
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5
##   company cash_flow year
## 1       A      1000    1
## 2       A      4000    3
## 3       A       550    4
## 4       B      1500    1
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5

Attributes

# my_matrix and my_factor
my_matrix <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
rownames(my_matrix) <- c("Row1", "Row2")
colnames(my_matrix) <- c("Col1", "Col2", "Col3")

my_factor <- factor(c("A", "A", "B"), ordered = T, levels = c("A", "B"))

# attributes of my_matrix
attributes(my_matrix)
## $dim
## [1] 2 3
## 
## $dimnames
## $dimnames[[1]]
## [1] "Row1" "Row2"
## 
## $dimnames[[2]]
## [1] "Col1" "Col2" "Col3"
## $dim
## [1] 2 3
## 
## $dimnames
## $dimnames[[1]]
## [1] "Row1" "Row2"
## 
## $dimnames[[2]]
## [1] "Col1" "Col2" "Col3"

# Just the dim attribute of my_matrix
attr(my_matrix, which = "dim")
## [1] 2 3
## [1] 2 3

# attributes of my_factor
attributes(my_factor)
## $levels
## [1] "A" "B"
## 
## $class
## [1] "ordered" "factor"
## $levels
## [1] "A" "B"
## 
## $class
## [1] "ordered" "factor"

QUIZ

  1. Compare vectors and matrices. What are similarities and differences? A vector and a matrix are similar in the sense that they can only hold the same data type but different because a matrix is two-dimensional with multiple rows and columns while a vector is not.

  2. Compare matrices and data frames. What are similarities and differences? A matrix has the same data type, while a data frame is more general, meaning that different columns can have different modes.

  3. Create your first vector, matrix, data frame, factor, and list. Do this within a R code chunk.

# A vector of 9 numbers
my_vector <- c(111, 61, 15, 100, 39)

# 3x3 matrix
my_matrix <- matrix(data = my_vector, nrow = 2, ncol = 1)
# Variables
company <- c("C", "BB", "CC", "D", "EF")
cash_flow <- c(150, 175, 200, 225, 250)
year <- c(1, 2, 3, 4, 5)
# Variables
company <- c("C", "BB", "CC", "D", "EF")
cash_flow <- c(150, 175, 200, 225, 250)
year <- c(1, 2, 3, 4, 5) 

# Data frame
cash <- data.frame(company, cash_flow, year)
# credit_rating character vector
credit_rating <- c("C", "BB", "CC", "D", "EF")

# Create a factor from credit_rating
credit_factor <- factor(credit_rating)

# Print out your new factor
credit_factor
## [1] C  BB CC D  EF
## Levels: BB C CC D EF