Chapter 1

1.1 Your first R script

Type R code to solve the equations. To add 3 and 4 you type 3 + 4 at the prompt and press enter. The solution will be returned as [1] 7. The exercise demonstrated 3+5 and asked you perform 6-4.

# Addition!
3 + 5
## [1] 8

# Subtraction!
6 - 4
## [1] 2

1.2 Arithmetic in R (1)

This exercise showed how to perform basic arithmetic. A helpful hint was that clicking on a line of code in the script, and then pressing Command + Enter will execute just that line in the R Console.

# Addition 
2 + 2
## [1] 4

# Subtraction
4 - 1
## [1] 3

# Multiplication
3 * 4
## [1] 12

# Division
4 / 2
## [1] 2

# Exponentiation
2 ^ 4
## [1] 16

# Modulo
7 %% 3
## [1] 1

1.3 Arithmetic in R (2)

This section reviewed the order of operations rule - PEMDAS

1.4 Assignment and variables (1)

A variable allows you to store a value or an object in R. You can then later use this variable’s name to easily access the value or the object that is stored within this variable. You use <- to assign a variable:

# Assign 200 to savings
savings <- 200

# Print the value of savings to the console
savings
## [1] 200

1.5 Assignment and variables (2)

You can assign values to your variables and use arithmetic in R to perform functions

# Assign 100 to my_money
my_money <- 100

# Assign 200 to dans_money
dans_money <-200

# Add my_money and dans_money
my_money + dans_money
## [1] 300

# Add my_money and dans_money again, save the result to our_money
our_money <- my_money + dans_money

1.6 Financial Returns (1)

You can use multipliers to calculate financial returns. Multiplier = 1 + (return/100).

# Variables for starting_cash and 5% return during January
starting_cash <- 200
jan_ret <- 5
jan_mult <- 1 + (jan_ret / 100)

# How much money do you have at the end of January?
post_jan_cash <- starting_cash * jan_mult

# Print post_jan_cash
post_jan_cash
## [1] 210

# January 10% return multiplier
jan_ret_10 <- 10
jan_mult_10 <- 1 + (jan_ret_10 / 100)

# How much money do you have at the end of January now?
post_jan_cash_10 <- starting_cash * jan_mult_10

# Print post_jan_cash_10
post_jan_cash_10
## [1] 220

1.7 Financial Returns (2)

You find the total return over two or more months by multiplying the multipliers together.

# Starting cash and returns 
starting_cash <- 200
jan_ret <- 4
feb_ret <- 5

# Multipliers
jan_mult <- 1 + (jan_ret / 100)
feb_mult <- 1 + (feb_ret / 100)

# Total cash at the end of the two months
total_cash <- starting_cash * jan_mult * feb_mult

# Print total_cash
total_cash
## [1] 218.4

1.8 Data Type Exploration

There are 3 types of data: Numbers either with decimals or integers which are whole numbers, integers must be specified by adding L after the number; Logical data which are the values TRUE and FALSE which must be capitalized; and Charcters which is text and must be entered in quotation marks.

# Apple's stock price is a numeric
apple_stock <- 150.45

# Bond credit ratings are characters
credit_rating <- "AAA"

# You like the stock market. TRUE or FALSE?
my_answer <- TRUE

# Print my_answer
my_answer
## [1] TRUE

1.9 What’s that data type?

You can determine what data type a variable is by entering class(my_var). This will return the data type (or class) of whatever variable you pass in.

Chapter 2

2.1 c()ombine

A vector is created using the combine function, c(). Each element you add is separated by a comma.

# Another numeric vector
ibm_stock <- c(159.82, 160.02, 159.84)

# Another character vector
finance <- c("stocks", "bonds", "investments")

# A logical vector
logic <- c( TRUE, FALSE, TRUE)

2.2 Coerce It

Since a vector can only be composed of one data type. If you use more than one data type in a vector, the lower ranking type will be coerced into the higher ranking type. The hierarchy for coercion is: logical < integer < numeric < character

2.3 Vector Names ()

You can add names to each value in your vector. You do this using names()

# Vectors of 12 months of returns, and month names
ret <- c(5, 2, 3, 7, 8, 3, 5, 9, 1, 4, 6, 3)
months <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

# Add names to ret
names(ret) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

# Print out ret to see the new names!
ret
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##   5   2   3   7   8   3   5   9   1   4   6   3

2.4 Visualize your vector

You can create a graph of your data by using the plot() function. Passing in a vector will add its values to the y-axis of the graph, and on the x-axis will be an index created from the order that your vector is in. Inside of plot(), you can change the type of your graph using type =. The default is “p” for points, but you can also change it to “l” for line.


# Define apple_stock

apple_stock <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12, 113.95, 113.30, 115.19, 115.19, 115.82, 115.97, 116.64, 116.95, 117.06, 116.29, 116.52, 117.26, 116.76, 116.73, 115.82)


# Look at the data
apple_stock
##  [1] 109.49 109.90 109.11 109.95 111.03 112.12 113.95 113.30 115.19 115.19
## [11] 115.82 115.97 116.64 116.95 117.06 116.29 116.52 117.26 116.76 116.73
## [21] 115.82

# Plot the data points
plot(apple_stock)


# Plot the data as a line graph
plot(apple_stock, type = "l")

## 2.5 Weighted Average

Weighted average allows you to calculate your portfolio return over a time period. To calculate the weighted average, take the return of each stock in your portfolio, and multiply it by the weight of that stock.

# Weights and returns
micr_ret <- 7
sony_ret <- 9
micr_weight <- .2
sony_weight <- .8

# Portfolio return
portf_ret <- micr_ret * micr_weight + sony_ret * sony_weight

2.6 Weighted Average (2)

# Weights, returns, and company names
ret <- c(7, 9)
weight <- c(.2, .8)
companies <- c("Microsoft", "Sony")

# Assign company names to your vectors
names(ret) <- c("Microsoft", "Sony")
names(weight) <- c("Microsoft", "Sony")

# Multiply the returns and weights together 
ret_X_weight <- ret * weight

# Print ret_X_weight
ret_X_weight
## Microsoft      Sony 
##       1.4       7.2

# Sum to get the total portfolio return
portf_ret <- sum(ret_X_weight)

# Print portf_ret
portf_ret
## [1] 8.6

2.7 Create a matrix!

Matrices are similar to vectors, except they are in 2 dimensions. The actual data for the matrix is passed in as a vector using c(), and is then converted to a matrix by specifying the number of rows and columns (also known as the dimensions).

# A vector of 9 numbers
my_vector <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

# 3x3 matrix
my_matrix <- matrix(data =c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3)

# Print my_matrix
my_matrix
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

# Filling across using byrow = TRUE
matrix(data = c(2, 3, 4, 5), nrow = 2, ncol = 2, byrow = TRUE)
##      [,1] [,2]
## [1,]    2    3
## [2,]    4    5

2.8 Matrix <- bind vectors

You can create matrices by combining multiple vectors by using the functions cbind() and rbind().

apple <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12, 113.95, 113.30, 115.19, 115.19, 115.82, 115.97, 116.64, 116.95, 117.06, 116.29, 116.52, 117.26, 116.76, 116.73, 115.82)

ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79, 165.36, 166.52, 165.50, 168.29, 168.51, 168.02, 166.73, 166.68, 167.60, 167.33, 167.06, 166.71, 167.14, 166.19, 166.60, 165.99)

micr <- c(59.20, 59.25, 60.22, 59.95, 61.37, 61.01, 61.97, 62.17, 62.98, 62.68, 62.58, 62.30, 63.62, 63.54, 63.54, 63.55, 63.24, 63.28, 62.99, 62.90, 62.14)

# cbind the vectors together
cbind_stocks <- cbind (apple, ibm, micr)

# Print cbind_stocks
cbind_stocks
##        apple    ibm  micr
##  [1,] 109.49 159.82 59.20
##  [2,] 109.90 160.02 59.25
##  [3,] 109.11 159.84 60.22
##  [4,] 109.95 160.35 59.95
##  [5,] 111.03 164.79 61.37
##  [6,] 112.12 165.36 61.01
##  [7,] 113.95 166.52 61.97
##  [8,] 113.30 165.50 62.17
##  [9,] 115.19 168.29 62.98
## [10,] 115.19 168.51 62.68
## [11,] 115.82 168.02 62.58
## [12,] 115.97 166.73 62.30
## [13,] 116.64 166.68 63.62
## [14,] 116.95 167.60 63.54
## [15,] 117.06 167.33 63.54
## [16,] 116.29 167.06 63.55
## [17,] 116.52 166.71 63.24
## [18,] 117.26 167.14 63.28
## [19,] 116.76 166.19 62.99
## [20,] 116.73 166.60 62.90
## [21,] 115.82 165.99 62.14

# rbind the vectors together
rbind_stocks <- rbind (apple,ibm, micr)

# Print rbind_stocks
rbind_stocks
##         [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]
## apple 109.49 109.90 109.11 109.95 111.03 112.12 113.95 113.30 115.19
## ibm   159.82 160.02 159.84 160.35 164.79 165.36 166.52 165.50 168.29
## micr   59.20  59.25  60.22  59.95  61.37  61.01  61.97  62.17  62.98
##        [,10]  [,11]  [,12]  [,13]  [,14]  [,15]  [,16]  [,17]  [,18]
## apple 115.19 115.82 115.97 116.64 116.95 117.06 116.29 116.52 117.26
## ibm   168.51 168.02 166.73 166.68 167.60 167.33 167.06 166.71 167.14
## micr   62.68  62.58  62.30  63.62  63.54  63.54  63.55  63.24  63.28
##        [,19]  [,20]  [,21]
## apple 116.76 116.73 115.82
## ibm   166.19 166.60 165.99
## micr   62.99  62.90  62.14

2.9 Visualize your matrix

You can plot matrices using plot().


apple <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12, 113.95, 113.30, 115.19, 115.19, 115.82, 115.97, 116.64, 116.95, 117.06, 116.29, 116.52, 117.26, 116.76, 116.73, 115.82)

micr <- c(59.20, 59.25, 60.22, 59.95, 61.37, 61.01, 61.97, 62.17, 62.98, 62.68, 62.58, 62.30, 63.62, 63.54, 63.54, 63.55, 63.24, 63.28, 62.99, 62.90, 62.14)

apple_micr_matrix <- cbind (apple,micr)

# View the data
apple_micr_matrix
##        apple  micr
##  [1,] 109.49 59.20
##  [2,] 109.90 59.25
##  [3,] 109.11 60.22
##  [4,] 109.95 59.95
##  [5,] 111.03 61.37
##  [6,] 112.12 61.01
##  [7,] 113.95 61.97
##  [8,] 113.30 62.17
##  [9,] 115.19 62.98
## [10,] 115.19 62.68
## [11,] 115.82 62.58
## [12,] 115.97 62.30
## [13,] 116.64 63.62
## [14,] 116.95 63.54
## [15,] 117.06 63.54
## [16,] 116.29 63.55
## [17,] 116.52 63.24
## [18,] 117.26 63.28
## [19,] 116.76 62.99
## [20,] 116.73 62.90
## [21,] 115.82 62.14

# Scatter plot of Microsoft vs Apple
plot (apple_micr_matrix)

2.10 cor()relation

The cor() function will calculate the correlation between two vectors, or will create a correlation matrix when given a matrix.


apple <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12, 113.95, 113.30, 115.19, 115.19, 115.82, 115.97, 116.64, 116.95, 117.06, 116.29, 116.52, 117.26, 116.76, 116.73, 115.82)

ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79, 165.36, 166.52, 165.50, 168.29, 168.51, 168.02, 166.73, 166.68, 167.60, 167.33, 167.06, 166.71, 167.14, 166.19, 166.60, 165.99)

micr <- c(59.20, 59.25, 60.22, 59.95, 61.37, 61.01, 61.97, 62.17, 62.98, 62.68, 62.58, 62.30, 63.62, 63.54, 63.54, 63.55, 63.24, 63.28, 62.99, 62.90, 62.14)

# Correlation of Apple and IBM
cor(apple,ibm)
## [1] 0.8872467

# stock matrix
stocks <- cbind (apple, micr, ibm)


cor(stocks)
##           apple      micr       ibm
## apple 1.0000000 0.9477010 0.8872467
## micr  0.9477010 1.0000000 0.9126597
## ibm   0.8872467 0.9126597 1.0000000

2.11 Matrix subsetting

Matrices can be selected from and subsetted. The basic structure is: my_matrix[row, col]


apple <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12, 113.95, 113.30, 115.19, 115.19, 115.82, 115.97, 116.64, 116.95, 117.06, 116.29, 116.52, 117.26, 116.76, 116.73, 115.82)

ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79, 165.36, 166.52, 165.50, 168.29, 168.51, 168.02, 166.73, 166.68, 167.60, 167.33, 167.06, 166.71, 167.14, 166.19, 166.60, 165.99)

micr <- c(59.20, 59.25, 60.22, 59.95, 61.37, 61.01, 61.97, 62.17, 62.98, 62.68, 62.58, 62.30, 63.62, 63.54, 63.54, 63.55, 63.24, 63.28, 62.99, 62.90, 62.14)

# Third row
stocks[3, ]
##  apple   micr    ibm 
## 109.11  60.22 159.84

# Fourth and fifth row of the ibm column
stocks[4:5,"ibm"]
## [1] 160.35 164.79

# apple and micr columns
stocks[,c("apple", "micr")]
##        apple  micr
##  [1,] 109.49 59.20
##  [2,] 109.90 59.25
##  [3,] 109.11 60.22
##  [4,] 109.95 59.95
##  [5,] 111.03 61.37
##  [6,] 112.12 61.01
##  [7,] 113.95 61.97
##  [8,] 113.30 62.17
##  [9,] 115.19 62.98
## [10,] 115.19 62.68
## [11,] 115.82 62.58
## [12,] 115.97 62.30
## [13,] 116.64 63.62
## [14,] 116.95 63.54
## [15,] 117.06 63.54
## [16,] 116.29 63.55
## [17,] 116.52 63.24
## [18,] 117.26 63.28
## [19,] 116.76 62.99
## [20,] 116.73 62.90
## [21,] 115.82 62.14

Chapter 3

3.1 Create your first data frame

A data frame is a table. It is like a matrix but it can hold different types of data.

# Variables
company <- c("A", "A", "A", "B", "B", "B", "B")
cash_flow <- c(1000, 4000, 550, 1500, 1100, 750, 6000)
year <- c(1, 3, 4, 1, 2, 4, 5)

# Data frame
cash <- data.frame(company, cash_flow, year)


# Print cash
cash
##   company cash_flow year
## 1       A      1000    1
## 2       A      4000    3
## 3       A       550    4
## 4       B      1500    1
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5

3.2 Knowledge Test

You can create data frames with all types of data.

3.3 Making head()s and tail()s fo your data with str()ucture

head() - Returns the first few rows of a data frame. By default, 6. To change this, use head(cash, n = ). tail() - Returns the last few rows of a data frame. By default, 6. To change this, use tail(cash, n = ) str() - Check the structure of an object. This function will show you the data type of the object you pass in (here, data.frame), and will list each column variable along with its data type.

# Call head() for the first 4 rows
head (cash, n = 4)
##   company cash_flow year
## 1       A      1000    1
## 2       A      4000    3
## 3       A       550    4
## 4       B      1500    1

# Call tail() for the last 3 rows
tail (cash, n = 3)
##   company cash_flow year
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5

# Call str (cash)
str (cash)
## 'data.frame':    7 obs. of  3 variables:
##  $ company  : Factor w/ 2 levels "A","B": 1 1 1 2 2 2 2
##  $ cash_flow: num  1000 4000 550 1500 1100 750 6000
##  $ year     : num  1 3 4 1 2 4 5

3.4 Naming your columns / rows

You can name columns by using colnames() and you can name rows by using rownames(),

# Fix your column names
colnames(cash) <- c("company", "cash_flow", "year")

# Print out the column names of cash
cash
##   company cash_flow year
## 1       A      1000    1
## 2       A      4000    3
## 3       A       550    4
## 4       B      1500    1
## 5       B      1100    2
## 6       B       750    4
## 7       B      6000    5

3.5 Accessing and subsetting data frames (1)

You can subset your data frame or access certain columns by using [ ].

# Third row, second column
cash[3,2]
## [1] 550

# Fifth row of the "year" column
cash[5, "year"]
## [1] 2


# Select the year column
cash$year
## [1] 1 3 4 1 2 4 5

3.6 Accessing and subsetting data frames (2)

Selecting a specific column from a data frame can be done using the shortcut, the $.




# Select the cash_flow column and multiply by 2
cash$cash_flow * 2
## [1]  2000  8000  1100  3000  2200  1500 12000

# Delete the company column
cash$company <- NULL

# Print cash again
cash
##   cash_flow year
## 1      1000    1
## 2      4000    3
## 3       550    4
## 4      1500    1
## 5      1100    2
## 6       750    4
## 7      6000    5

3.7 Accessing and subsetting data frames (3)

The first argument you pass to subset() is the name of your data frame. The == is the equality operator. It tests to find where two things are equal, and returns a logical vector.

# Rows about company B
subset (cash,company == "B")
##   cash_flow year
## 4      1500    1
## 5      1100    2
## 6       750    4
## 7      6000    5

# Rows with cash flows due in 1 year
subset (cash, year == 1)
##   cash_flow year
## 1      1000    1
## 4      1500    1

3.8 Adding new columns

You can add new columns in your data frame by assigning the new information to data_frame$new_column.

# Quarter cash flow scenario
cash$quarter_cash <- cash$cash_flow * .25

cash
##   cash_flow year quarter_cash
## 1      1000    1        250.0
## 2      4000    3       1000.0
## 3       550    4        137.5
## 4      1500    1        375.0
## 5      1100    2        275.0
## 6       750    4        187.5
## 7      6000    5       1500.0

# Double year scenario
cash$double_year <- cash$year * 2

cash
##   cash_flow year quarter_cash double_year
## 1      1000    1        250.0           2
## 2      4000    3       1000.0           6
## 3       550    4        137.5           8
## 4      1500    1        375.0           2
## 5      1100    2        275.0           4
## 6       750    4        187.5           8
## 7      6000    5       1500.0          10

3.9 Present value of projected cash flows

The general formula for calculating the present value is: present_value <- cash_flow * (1 + interest / 100) ^ -year

# Present value of $4000, in 3 years, at 5%
present_value_4k <- 4000 * (1 + 5 / 100) ^ -3

# Present value of all cash flows
cash$present_value <-cash$cash_flow * (1 + 5 / 100) ^ -cash$year


# Print out cash
cash
##   cash_flow year quarter_cash double_year present_value
## 1      1000    1        250.0           2      952.3810
## 2      4000    3       1000.0           6     3455.3504
## 3       550    4        137.5           8      452.4864
## 4      1500    1        375.0           2     1428.5714
## 5      1100    2        275.0           4      997.7324
## 6       750    4        187.5           8      617.0269
## 7      6000    5       1500.0          10     4701.1570

3.10 Present value of projected cash flows (2)

You can use the sum() function to add up the elements of your present value calculations.

# Total present value of cash
total_pv <- sum (cash$present_value)

# Company B information
cash_B <- subset (cash, company == "B")

# Total present value of cash_B
total_pv_B <- sum (cash_B$present_value)

Chapter 4

4.1 Create a factor

To create a factor in R, use the factor() function, and pass in a vector that you want to be converted into a factor.

# credit_rating character vector
credit_rating <- c("BB", "AAA", "AA", "CCC", "AA", "AAA", "B", "BB")

# Create a factor from credit_rating
credit_factor <- factor (credit_rating)

# Print out your new factor
credit_factor
## [1] BB  AAA AA  CCC AA  AAA B   BB 
## Levels: AA AAA B BB CCC

# Call str() on credit_rating
str (credit_rating)
##  chr [1:8] "BB" "AAA" "AA" "CCC" "AA" "AAA" "B" "BB"

# Call str() on credit_factor
str (credit_factor)
##  Factor w/ 5 levels "AA","AAA","B",..: 4 2 1 5 1 2 3 4

4.2 Factor Levels

You can access and rename your factor levels by using the levels() function.

# Identify unique levels
levels (credit_factor)
## [1] "AA"  "AAA" "B"   "BB"  "CCC"

# Rename the levels of credit_factor
levels (credit_factor) <- c("2A", "3A", "1B", "2B", "3C")

# Print credit_factor
credit_factor
## [1] 2B 3A 2A 3C 2A 3A 1B 2B
## Levels: 2A 3A 1B 2B 3C

4.3 Factor Summary

You can summarize factors using the summary() command.

# Summarize the character vector, credit_rating
summary (credit_rating)
##    Length     Class      Mode 
##         8 character character

# Summarize the factor, credit_factor
summary (credit_factor)
## 2A 3A 1B 2B 3C 
##  2  2  1  2  1

4.4 Visualize your factor

You can use plot() to create a bar graph of your factor.

# Visualize your factor!
plot (credit_factor)

## 4.5 Bucketing a numeric variable into a factor

You can create a factor from a numeric vector by using cut(). The ( in the factor levels means we do not include the number beside it in that group, and the ] means that we do include that number in the group.

AAA_rank <- c(31,  48, 100,  53,  85,  73,  62,  74,  42,  38,  97,  61,  48,  86,  44,   9,  43,  18,  62, 38,  23,  37,  54,  80,  78,  93,  47, 100,  22,  22,  18,  26,  81,  17,  98,   4,  83,   5, 6,  52,  29,  44,  50,   2,  25,  19,  15,  42,  30,  27)

# Create 4 buckets for AAA_rank using cut()
AAA_factor <- cut(x = AAA_rank, breaks = c (0, 25, 50, 75, 100))

# Rename the levels 
levels(AAA_factor) <- c("low", "medium", "high", "very_high")

# Print AAA_factor
AAA_factor
##  [1] medium    medium    very_high high      very_high high      high     
##  [8] high      medium    medium    very_high high      medium    very_high
## [15] medium    low       medium    low       high      medium    low      
## [22] medium    high      very_high very_high very_high medium    very_high
## [29] low       low       low       medium    very_high low       very_high
## [36] low       very_high low       low       high      medium    medium   
## [43] medium    low       low       low       low       medium    medium   
## [50] medium   
## Levels: low medium high very_high

# Plot AAA_factor
plot(AAA_factor)

## 4.6 Create an ordered factor You can asssign an order to your factor by adding ordered = TRUE when you create the factor, and assigning levels.

# Use unique() to find unique words
unique(credit_rating)
## [1] "BB"  "AAA" "AA"  "CCC" "B"

# Create an ordered factor
credit_factor_ordered <- factor(credit_rating, ordered = TRUE, levels = c("AAA", "AA", "BB", "B", "CCC"))

# Plot credit_factor_ordered
plot (credit_factor_ordered)

4.7 Subsetting a factor

You can subset a factor using []. If you want to drop an element entirely you need to add Drop = TRUE.

# Remove the A bonds at positions 3 and 7. Don't drop the A level.
keep_level <- credit_factor[-c(3, 7)]

# Plot keep_level
plot (keep_level)


# Remove the A bonds at positions 3 and 7. Drop the A level.
drop_level <- credit_factor[-c(3, 7), drop = TRUE]

# Plot drop_level
plot (drop_level)

## 4.8 stringAsFactors

R’s default behavior when creating data frames is to convert all characters into factors. To turn off this default: cash <- data.frame(company, cash_flow, year, stringsAsFactors = FALSE)

# Variables
credit_rating <- c("AAA", "A", "BB")
bond_owners <- c("Dan", "Tom", "Joe")

# Create the data frame of character vectors, bonds
bonds <- data.frame(credit_rating, bond_owners, stringsAsFactors = FALSE)

# Use str() on bonds
str(bonds)
## 'data.frame':    3 obs. of  2 variables:
##  $ credit_rating: chr  "AAA" "A" "BB"
##  $ bond_owners  : chr  "Dan" "Tom" "Joe"

# Create a factor column in bonds called credit_factor from credit_rating
bonds$credit_factor <- factor(bonds$credit_rating, ordered = TRUE, levels = c("AAA","A","BB"))

# Use str() on bonds again
str(bonds)
## 'data.frame':    3 obs. of  3 variables:
##  $ credit_rating: chr  "AAA" "A" "BB"
##  $ bond_owners  : chr  "Dan" "Tom" "Joe"
##  $ credit_factor: Ord.factor w/ 3 levels "AAA"<"A"<"BB": 1 2 3

Chapter 5

5.1 Create a list

You can create a list in R to hold together items of different data types by using the list() function.

# List components
name <- "Apple and IBM"
apple <- c(109.49, 109.90, 109.11, 109.95, 111.03)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79)
cor_matrix <- cor(cbind(apple, ibm))

# Create a list
portfolio <- list (name, apple, ibm, cor_matrix)

# View your first list
portfolio
## [[1]]
## [1] "Apple and IBM"
## 
## [[2]]
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## [[3]]
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## [[4]]
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000

5.2 Named lists

You can name the elements as you create a list with the form name = value:. If the list was already created, you could use names():

# Add names to your portfolio
names (portfolio) <-c ("portfolio_name", "apple", "ibm", "correlation")

# Print portfolio
portfolio
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000

5.3 Access elements in a list

To access the elements in the list, use [ ]. This will always return another list.

# Second and third elements of portfolio
portfolio [c( 2,3)]
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79

# Use $ to get the correlation data
portfolio$correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000

5.4 Adding to a list

You can $ use to add new elements to a list my_list. You can also use c() to add another element to the list:, this can be useful if you want to add multiple elements to your list at once.

# Add weight: 20% Apple, 80% IBM
portfolio$weight <- c(apple = .20, ibm = .80)

# Print portfolio
portfolio
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## 
## $weight
## apple   ibm 
##   0.2   0.8


# Change the weight variable: 30% Apple, 70% IBM
portfolio$weight <- c(apple = .30, ibm = .70)

# Print portfolio to see the changes
portfolio
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## 
## $weight
## apple   ibm 
##   0.3   0.7

5.5 Removing from a list

Using NULL is the easiest way to remove an element from your list. If your list is not named, you can also remove elements by position using my_list[1] <- NULL or my_list[[1]] <- NULL.

# Take a look at portfolio
portfolio
## $portfolio_name
## [1] "Apple and IBM"
## 
## $apple
## [1] 109.49 109.90 109.11 109.95 111.03
## 
## $ibm
## [1] 159.82 160.02 159.84 160.35 164.79
## 
## $correlation
##           apple       ibm
## apple 1.0000000 0.9131575
## ibm   0.9131575 1.0000000
## 
## $weight
## apple   ibm 
##   0.3   0.7

# Remove the microsoft stock prices from your portfolio
portfolio$microsoft <- NULL

5.6 Split it

You can use split() to create a list of two data frames. You can reverse the split by using unsplit().

# Define grouping from year
grouping <- cash$year

# Split cash on your new grouping
split_cash <- split (cash, grouping)

# Look at your split_cash list
split_cash
## $`1`
##   cash_flow year quarter_cash double_year present_value
## 1      1000    1          250           2       952.381
## 4      1500    1          375           2      1428.571
## 
## $`2`
##   cash_flow year quarter_cash double_year present_value
## 5      1100    2          275           4      997.7324
## 
## $`3`
##   cash_flow year quarter_cash double_year present_value
## 2      4000    3         1000           6       3455.35
## 
## $`4`
##   cash_flow year quarter_cash double_year present_value
## 3       550    4        137.5           8      452.4864
## 6       750    4        187.5           8      617.0269
## 
## $`5`
##   cash_flow year quarter_cash double_year present_value
## 7      6000    5         1500          10      4701.157

# Unsplit split_cash to get the original data back.
original_cash <- unsplit (split_cash, grouping)

# Print original_cash
original_cash
##   cash_flow year quarter_cash double_year present_value
## 1      1000    1        250.0           2      952.3810
## 2      4000    3       1000.0           6     3455.3504
## 3       550    4        137.5           8      452.4864
## 4      1500    1        375.0           2     1428.5714
## 5      1100    2        275.0           4      997.7324
## 6       750    4        187.5           8      617.0269
## 7      6000    5       1500.0          10     4701.1570

5.7 Split-Apply-Combine

You can split your data frame by a grouping, apply some transformation to each group, and then recombine those pieces back into one data frame. This is referred to in R as split-apply-combine.

# Print split_cash
split_cash
## $`1`
##   cash_flow year quarter_cash double_year present_value
## 1      1000    1          250           2       952.381
## 4      1500    1          375           2      1428.571
## 
## $`2`
##   cash_flow year quarter_cash double_year present_value
## 5      1100    2          275           4      997.7324
## 
## $`3`
##   cash_flow year quarter_cash double_year present_value
## 2      4000    3         1000           6       3455.35
## 
## $`4`
##   cash_flow year quarter_cash double_year present_value
## 3       550    4        137.5           8      452.4864
## 6       750    4        187.5           8      617.0269
## 
## $`5`
##   cash_flow year quarter_cash double_year present_value
## 7      6000    5         1500          10      4701.157

# Print the cash_flow column of B in split_cash
split_cash$B$cash_flow
## NULL

# Set the cash_flow column of company A in split_cash to 0
split_cash$A$cash_flow <- 0

# Use the grouping to unsplit split_cash
cash_no_A <- unsplit(split_cash, grouping)

# Print cash_no_A
cash_no_A
##   cash_flow year quarter_cash double_year present_value
## 1      1000    1        250.0           2      952.3810
## 2      4000    3       1000.0           6     3455.3504
## 3       550    4        137.5           8      452.4864
## 4      1500    1        375.0           2     1428.5714
## 5      1100    2        275.0           4      997.7324
## 6       750    4        187.5           8      617.0269
## 7      6000    5       1500.0          10     4701.1570

5.8 Attributes

You can use the attributes() function to return a list of attributes about the object you pass in. To access a specific attribute, you can use the attr() function.

# my_matrix and my_factor
my_matrix <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
rownames(my_matrix) <- c("Row1", "Row2")
colnames(my_matrix) <- c("Col1", "Col2", "Col3")

my_factor <- factor(c("A", "A", "B"), ordered = T, levels = c("A", "B"))

# attributes of my_matrix
attributes (my_matrix)
## $dim
## [1] 2 3
## 
## $dimnames
## $dimnames[[1]]
## [1] "Row1" "Row2"
## 
## $dimnames[[2]]
## [1] "Col1" "Col2" "Col3"

# Just the dim attribute of my_matrix
attr (my_matrix, which = "dim")
## [1] 2 3

# attributes of my_factor
attributes (my_factor)
## $levels
## [1] "A" "B"
## 
## $class
## [1] "ordered" "factor"

Quiz

The first week’s quiz is a very brief and simple just to get your feet wet. Complete the tasks below and include them at the end of your RMarkdown file of the first week. And publish it in RPubs.com and email me the link for grading.

1. Compare vectors and matrices. What are similarities and differences?

A vector is a collection of data that is all of the same type. Vectors contain only one row or one column. Matrices are also collections of data that is all the same type. The difference is that matrices have both rows and columns.

2. Compare matrices and data frames. What are similarities and differences?

Matrices are like tables of data that contain only one data type. Data frames are also tables of data but data frames can contain different types of data.

3. Create your first vector, matrix, data frame, factor, and list. Do this within a R code chunk.

My_portfolio is a vector of stocks that I own

stocks <- c("amazon", "apple", "starbucks")

stocks
## [1] "amazon"    "apple"     "starbucks"

Stock_portfolio is a matrix created from stocks and their symbols.

stocks <- c("amazon", "apple", "starbucks")

symbols <- c("AMZN", "AAPL", "SBUX")

stock_portfolio <-cbind (stocks, symbols)

stock_portfolio
##      stocks      symbols
## [1,] "amazon"    "AMZN" 
## [2,] "apple"     "AAPL" 
## [3,] "starbucks" "SBUX"

Stock_values is a data frame made by combining stocks, symbols and their current prices.

stocks <- c("amazon", "apple", "starbucks")

symbols <- c("AMZN", "AAPL", "SBUX")

prices <- c( 978, 149, 62)

stock_values <- data.frame (stocks, symbols, prices)

stock_values
##      stocks symbols prices
## 1    amazon    AMZN    978
## 2     apple    AAPL    149
## 3 starbucks    SBUX     62

Stock_list is a list of my stocks, their symbols and their prices

stocks <- c("amazon", "apple", "starbucks")

symbols <- c("AMZN", "AAPL", "SBUX")

prices <- c( 978, 149, 62)

stock_list <- list (stocks, symbols, prices)