1 Basics

1.1 Financial returns

Assume we have $100. During January, you make a 5% return on that money. How much do you have at the end of January? Well, we have 100% of your starting money, plus another 5%: 100% + 5% = 105%. In decimals, this is 1 + .05 = 1.05. This 1.05 is the return multiplier for January, and you multiply your original $100 by it to get the amount you have at the end of January.

105 = 100 * 1.05

Or in terms of variables:

post_jan_cash <- starting_cash * jan_ret

A quick way to get the multiplier is:

multiplier = 1 + (return / 100)
# Variables for starting_cash and 5% return during January
starting_cash <- 200
jan_ret <- 5
jan_mult <- 1 + (jan_ret / 100)

# How much money do you have at the end of January?
post_jan_cash <- starting_cash * jan_mult

# Print post_jan_cash
post_jan_cash
[1] 210
# January 10% return multiplier
jan_ret_10 <- 10
jan_mult_10 <- 1 + (jan_ret_10 / 100)

# How much money do you have at the end of January now?
post_jan_cash_10 <- starting_cash * jan_mult_10

# Print post_jan_cash_10
post_jan_cash_10
[1] 220

If, in February, we earn another 2% on your cash, how would we calculate the total amount at the end of February? We already know that the amount at the end of January is $100 * 1.05 = $105. To get from the end of January to the end of February, just use another multiplier!

$105 * 1.02 = $107.1

Which is equivalent to:

$100 * 1.05 * 1.02 = $107.1

In this last form, we see the effect of both multipliers on your original $100. In fact, this form can help you find the total return over both months. The correct way to do this is by multiplying the two multipliers together: 1.05 * 1.02 = 1.071. This means you earned 7.1% in total over the 2 month period.

# Starting cash and returns 
starting_cash <- 200
jan_ret <- 4
feb_ret <- 5

# Multipliers
jan_mult <- 1 + (jan_ret/100)
feb_mult <- 1 + (feb_ret/100)

# Total cash at the end of the two months
total_cash <- starting_cash * jan_mult * feb_mult

# Print total_cash
total_cash
[1] 218.4

2 Vectors and Matrices

2.1 What is a vector?

2.1.1 c()ombine

We create a vector using the combine function, c(), and each element you add is separated by a comma.

# Another numeric vector
ibm_stock <- c(159.82, 160.02, 159.84)

# Another character vector
finance <- c("stocks", "bonds","investments")

# A logical vector
logic <- c(TRUE, FALSE, TRUE)

2.1.2 Coerce it

It is important to remember that a vector can only be composed of one data type. This means that you cannot have both a numeric and a character in the same vector. If you attempt to do this, the lower ranking type will be coerced into the higher ranking type.

For example: c(1.5, “hello”) results in c(“1.5”, “hello”) where the numeric 1.5 has been coerced into the character data type.

The hierarchy for coercion is:

logical < integer < numeric < character

Logicals are coerced a bit differently depending on what the highest data type is. c(TRUE, 1.5) will return c(1, 1.5) where TRUE is coerced to the numeric 1 (FALSE would be converted to a 0). On the other hand, c(TRUE, “this_char”) is converted to c(“TRUE”, “this_char”).

2.1.3 Vector names()

Let’s return to the example about January and February’s returns. We can put these returns into a vector:

ret <- c(5, 2)

Now all of the returns are in one place. However, we could go one step further by adding names to each return in your vector. We do this using names(). Check this out:

names(ret) <- c("Jan", "Feb")
# Vectors of 12 months of returns, and month names
ret <- c(5, 2, 3, 7, 8, 3, 5, 9, 1, 4, 6, 3)
months <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

# Add names to ret
names(ret) <- months

# Print out ret to see the new names!
ret
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
  5   2   3   7   8   3   5   9   1   4   6   3 

2.1.4 Visualize your vector

The plot() function is one of the many ways to create a graph from your data in R. Passing in a vector will add its values to the y-axis of the graph, and on the x-axis will be an index created from the order that your vector is in.

Inside of plot(), we can change the type of your graph using type =. The default is “p” for points, but we can also change it to “l” for line.

# Look at the data
apple_stock <- c(109.49, 109.9, 109.11, 109.95, 111.03, 112.12, 113.95, 113.3, 
115.19, 115.19, 115.82, 115.97, 116.64, 116.95, 117.06, 116.29, 
116.52, 117.26, 116.76, 116.73, 115.82)
apple_stock
 [1] 109.49 109.90 109.11 109.95 111.03 112.12 113.95 113.30 115.19 115.19
[11] 115.82 115.97 116.64 116.95 117.06 116.29 116.52 117.26 116.76 116.73
[21] 115.82
# Plot the data points
plot(apple_stock)


# Plot the data as a line graph
plot(apple_stock, type = "l")

2.2 Vector manipulation

2.2.1 Weighted average

The weighted average allows you to calculate your portfolio return over a time period.

Assume you have 40% of your cash in Apple stock, and 60% of your cash in IBM stock. If, in January, Apple earned 5% and IBM earned 7%, what was your total portfolio return?

portf_ret <- apple_ret * apple_weight + ibm_ret * ibm_weight
# Weights and returns
micr_ret <- 7
sony_ret <- 9
micr_weight <- .2
sony_weight <- .8

# Portfolio return
portf_ret <- micr_ret * micr_weight + sony_ret * sony_weight

We can use vectors

# Weights, returns, and company names
ret <- c(7, 9)
weight <- c(.2, .8)
companies <- c("Microsoft", "Sony")

# Assign company names to your vectors
names(ret) <- companies
names(weight) <- companies

# Multiply the returns and weights together 
ret_X_weight <- ret * weight

# Print ret_X_weight
ret_X_weight
Microsoft      Sony 
      1.4       7.2 
# Sum to get the total portfolio return
portf_ret <- sum(ret_X_weight)

# Print portf_ret
portf_ret
[1] 8.6

Vectors of different lengths can also be used to multiply

ret = c(Microsoft = 7, Apple = 8, Sony = 9)
# Print ret
ret
Microsoft     Apple      Sony 
        7         8         9 
# Assign 1/3 to weight
weight <- 1/3

# Create ret_X_weight
ret_X_weight <- ret * weight

# Calculate your portfolio return
portf_ret <- sum(ret_X_weight)

# Vector of length 3 * Vector of length 2?
ret * c(.2, .6)
longer object length is not a multiple of shorter object length
Microsoft     Apple      Sony 
      1.4       4.8       1.8 

Look at the warning message

2.2.2 Vector subsetting

Sometimes, we will only want to use specific pieces of your vectors, and you’ll need some way to access just those parts. To solve this, you can subset the vector using [ ].

Ret <- c(Jan = 5, Feb = 2, Mar = 3, Apr = 7, May = 8, Jun = 3, Jul = 5, 
Aug = 9, Sep = 1, Oct = 4, Nov = 6, Dec = 3)
# First 6 months of returns
ret[1:6]
Microsoft     Apple      Sony      <NA>      <NA>      <NA> 
        7         8         9        NA        NA        NA 
# Just March and May
ret[c("Mar", "May")]
<NA> <NA> 
  NA   NA 
# Omit the first month of returns
ret[-1]
Apple  Sony 
    8     9 

2.3 Matrix - a 2D vector

2.3.1 Creating a matrix

Matrices are similar to vectors, except they are in 2 dimensions! Let’s create a 2x2 matrix “by hand” using matrix().

matrix(data = c(2, 3, 4, 5), nrow = 2, ncol = 2)
     [,1] [,2]
[1,]    2    4
[2,]    3    5

Notice that the actual data for the matrix is passed in as a vector using c(), and is then converted to a matrix by specifying the number of rows and columns (also known as the dimensions).

# A vector of 9 numbers
my_vector <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

# 3x3 matrix
my_matrix <- matrix(data = my_vector, nrow = 3, ncol = 3)

# Print my_matrix
my_matrix
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
# Filling across using byrow = TRUE
matrix(data = c(2, 3, 4, 5), nrow = 2, ncol = 2, byrow = TRUE)
     [,1] [,2]
[1,]    2    3
[2,]    4    5

2.3.2 Matrix <- bind vectors

Often, you won’t be creating vectors like we did in the last example. Instead, you will create them from multiple vectors that you want to combine together. For this, it is easiest to use the functions cbind() and rbind() (column bind and row bind respectively). To see these in action, let’s combine two vectors of Apple and IBM stock prices:

apple <- c(109.49, 109.90, 109.11, 109.95, 111.03)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79)

cbind(apple, ibm)
      apple    ibm
[1,] 109.49 159.82
[2,] 109.90 160.02
[3,] 109.11 159.84
[4,] 109.95 160.35
[5,] 111.03 164.79
rbind(apple, ibm)
        [,1]   [,2]   [,3]   [,4]   [,5]
apple 109.49 109.90 109.11 109.95 111.03
ibm   159.82 160.02 159.84 160.35 164.79

The functions cbind() and rbind() are pretty common. They also work with data frames.

2.3.3 Visualize your matrix

We can visualize our matrix to gain some insights about the relationships in the data.

apple_micr_matrix <-  cbind(apple, ibm)
# View the data
apple_micr_matrix
      apple    ibm
[1,] 109.49 159.82
[2,] 109.90 160.02
[3,] 109.11 159.84
[4,] 109.95 160.35
[5,] 111.03 164.79
# Scatter plot of Microsoft vs Apple
plot(apple_micr_matrix)

2.3.4 cor()relation

It seems that when Apple’s stock moves up, Microsoft’s does as well. One way to capture this kind of relationship is by finding the correlation between the two stocks. Correlation is a measure of association between two things, here, stock prices, and is represented by a number from -1 to 1. A 1 represents perfect positive correlation, a -1 represents perfect negative correlation, and 0 correlation means that the stocks move independently of each other.

The cor() function will calculate the correlation between two vectors, or will create a correlation matrix when given a matrix.

cor(apple, ibm)
[1] 0.9131575
cor(apple_micr_matrix)
          apple       ibm
apple 1.0000000 0.9131575
ibm   0.9131575 1.0000000

A large correlation of .913 hints that Apple and IBM’s stock prices move closely together.

2.3.5 Matrix subsetting

Matrices can be selected from and subsetted with [ ]. The basic structure is:

my_matrix[row, col]
apple <- c(109.49, 109.90, 109.11, 109.95, 111.03)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79)
stocks <- cbind(apple, ibm)
# Third row
stocks[3,]
 apple    ibm 
109.11 159.84 
# Fourth and fifth row of the ibm column
stocks[4:5, 2]
[1] 160.35 164.79
# ibm columns
stocks[, 2]
[1] 159.82 160.02 159.84 160.35 164.79

3 Data Frames

3.1 What is a data frame?

3.1.1 Create a data frame

Data frames are great because of their ability to hold a different type of data in each column. To get started, let’s use the data.frame() function to create a data frame of your business’s future cash flows. Here are the variables that will be in the data frame: company - The company that is paying you the cash flow (A or B). cash_flow - The amount of money a company will receive. year - The number of years from now that you receive the cash flow.

# Variables
company <- c("A", "A", "A", "B", "B", "B", "B")
cash_flow <- c(1000, 4000, 550, 1500, 1100, 750, 6000)
year <- c(1, 3, 4, 1, 2, 4, 5)

# Data frame
cash <- data.frame(company, cash_flow, year)

# Print cash
cash

3.2 Making head()s and tail()s of your data with some str()ucture

Time to introduce a few simple, but very useful functions.

  • head() - Returns the first few rows of a data frame. By default, 6. To change this, use head(cash, n = ___)
  • tail() - Returns the last few rows of a data frame. By default, 6. To change this, use tail(cash, n = ___)
  • str() - Check the structure of an object. This fantastic function will show you the data type of the object you pass in (here, data.frame), and will list each column variable along with its data type.
# Call head() for the first 4 rows
head(cash, 4)

# Call tail() for the last 3 rows
tail(cash, 3)

# Call str()
str(cash)
'data.frame':   7 obs. of  3 variables:
 $ company  : chr  "A" "A" "A" "B" ...
 $ cash_flow: num  1000 4000 550 1500 1100 750 6000
 $ year     : num  1 3 4 1 2 4 5

3.2.1 Naming your columns / rows

We can change the columns names colnames() and the row names using rownames(), but this is less common.

# Fix your column names
colnames(cash) <- c("company", "cash_flow", "year")

# Print out the column names of cash
colnames(cash)
[1] "company"   "cash_flow" "year"     

3.3 Data frame manipulation

3.3.1 Accessing and subsetting data frames

One of the ways to do this is to use [ ]. The notation is just like matrices

# Third row, second column
cash[3, 2]
[1] 550
# Fifth row of the "year" column
cash[5, "year"]
[1] 2

As you might imagine, selecting a specific column from a data frame is a common manipulation. So common, in fact, that it was given its own shortcut, the $.

# Select the year column
cash$year
[1] 1 3 4 1 2 4 5
# Select the cash_flow column and multiply by 2
cash$cash_flow * 2
[1]  2000  8000  1100  3000  2200  1500 12000
# Delete the company column
cash$company <- NULL

# Print cash again
cash

What if you are only interested in the cash flows from company A? For more flexibility, try subset()

subset(cash, company == "A")

There are a few important things happening here:

  • The first argument you pass to subset() is the name of your data frame, cash.Notice that you shouldn’t put company in quotes.
  • The == is the equality operator. It tests to find where two things are equal, and returns a logical vector.
# Rows about company B
subset(cash, company == "B")

# Rows with cash flows due in 1 year

subset(cash, year == 1)

3.3.2 Adding new columns

Creating new columns in your data frame is as simple as assigning the new information to data_frame$new_column. Often, the newly created column is some transformation of existing columns, so the $ operator really comes in handy here.

# Quarter cash flow scenario
cash$quarter_cash <- cash$cash_flow * .25

# Double year scenario
cash$double_year <- cash$year * 2

3.4 Present value

3.4.1 Present value of projected cash flows

If you expect a cash flow of $100 to be received 1 year from now, what is the present value of that cash flow at a 5% interest rate? To calculate this, you discount the cash flow to get it in terms of today’s dollars. The general formula for this is:

present_value <- cash_flow * (1 + interest / 100) ^ -year
95.238 = 100 * (1.05) ^ -1
# Present value of $4000, in 3 years, at 5%
present_value_4k <- 4000 * (1.05)^-3

# Present value of all cash flows
cash$present_value <- cash$cash_flow * (1.05)^-cash$year

# Print out cash
cash

We now have a column for present_value, but we want to report the total amount of that column to your board members. Calculating this part is easy, use the sum() function you learned earlier to add up the elements of cash$present_value.

However, we also want to know how much company A and company B individually contribute to the total present value.

# Total present value of cash
total_pv <- sum(cash$present_value)

# Company B information
cash_B <- subset(cash, company == "B")

# Total present value of cash_B
total_pv_B <- sum(cash_B$present_value)

4 Factor

4.1 What is a factor?

4.1.1 Create a factor

Bond credit ratings are common in the fixed income side of the finance world as a simple measure of how “risky” a certain bond might be. Here, riskiness can be defined as the probability of default, which means an inability to pay back your debts. The Standard and Poor’s and Fitch credit rating agency has defined the following ratings, from least likely to default to most likely: AAA, AA, A, BBB, BB, B, CCC, CC, C, D This is a perfect example of a factor! It is a categorical variable that takes on a limited number of levels.

To create a factor in R, use the factor() function, and pass in a vector that you want to be converted into a factor.

# credit_rating character vector
credit_rating <- c("BB", "AAA", "AA", "CCC", "AA", "AAA", "B", "BB")

# Create a factor from credit_rating
credit_factor <- factor(credit_rating)

# Print out your new factor
credit_factor
[1] BB  AAA AA  CCC AA  AAA B   BB 
Levels: AA AAA B BB CCC
# Call str() on credit_rating
str(credit_rating)
 chr [1:8] "BB" "AAA" "AA" "CCC" "AA" "AAA" "B" "BB"
# Call str() on credit_factor
str(credit_factor)
 Factor w/ 5 levels "AA","AAA","B",..: 4 2 1 5 1 2 3 4

4.1.2 Factor levels

Accessing the unique levels of your factor is simple enough by using the levels() function. You can also use this to rename your factor levels

# Identify unique levels
levels(credit_factor)
[1] "AA"  "AAA" "B"   "BB"  "CCC"
# Rename the levels of credit_factor
levels(credit_factor)  <- c("2A", "3A", "1B", "2B", "3C")

# Print credit_factor
credit_factor
[1] 2B 3A 2A 3C 2A 3A 1B 2B
Levels: 2A 3A 1B 2B 3C

4.1.3 Factor summary

As any good bond investor would do, you would like to keep track of how many bonds you are holding of each credit rating. A way to present a table of the counts of each bond credit rating would be great! Luckily for you, the summary() function for factors can help you with that.

# Summarize the character vector, credit_rating
summary(credit_rating)
   Length     Class      Mode 
        8 character character 
# Summarize the factor, credit_factor
summary(credit_factor)
2A 3A 1B 2B 3C 
 2  2  1  2  1 

4.1.4 Visualize your factor

We can also visualize the table that you created in the last example by using a bar chart. A bar chart is a type of graph that displays groups of data using rectangular bars where the height of each bar represents the number of counts in that group.

The plot() function can again take care of all of the magic for us. We are creating the plot from a factor and not a character vector. R will throw an error if you try and plot a character vector!

# Visualize your factor!
plot(credit_factor)

4.1.5 Bucketing a numeric variable into a factor

Your old friend Dan sent you a list of 50 AAA rated bonds called AAA_rank, with each bond having an additional number from 1-100 describing how profitable he thinks that bond will be (100 being the most profitable). You are interested in doing further analysis on his suggestions, but first it would be nice if the bonds were bucketed by their ranking somehow. This would help you create groups of bonds, from least profitable to most profitable, to more easily analyze them.

AAA_rank <- c(31L, 48L, 100L, 53L, 85L, 73L, 62L, 74L, 42L, 38L, 97L, 61L, 
48L, 86L, 44L, 9L, 43L, 18L, 62L, 38L, 23L, 37L, 54L, 80L, 78L, 
93L, 47L, 100L, 22L, 22L, 18L, 26L, 81L, 17L, 98L, 4L, 83L, 5L, 
6L, 52L, 29L, 44L, 50L, 2L, 25L, 19L, 15L, 42L, 30L, 27L)

This is a great example of creating a factor from a numeric vector. The easiest way to do this is to use cut().

Note that the ( in the factor levels means we do not include the number beside it in that group, and the ] means that we do include that number in the group.

In the cut() function, using breaks = allows you to specify the groups that you want R to bucket your data by!

# Create 4 buckets for AAA_rank using cut()
AAA_factor <- cut(x = AAA_rank, breaks = c(0, 25, 50, 75, 100))

# Rename the levels 
levels(AAA_factor) <- c("low", "medium", "high", "very_high")

# Print AAA_factor
AAA_factor
 [1] medium    medium    very_high high      very_high high      high     
 [8] high      medium    medium    very_high high      medium    very_high
[15] medium    low       medium    low       high      medium    low      
[22] medium    high      very_high very_high very_high medium    very_high
[29] low       low       low       medium    very_high low       very_high
[36] low       very_high low       low       high      medium    medium   
[43] medium    low       low       low       low       medium    medium   
[50] medium   
Levels: low medium high very_high
# Plot AAA_factor
plot(AAA_factor)

4.2 Ordering and subsetting factors

4.2.1 Create an ordered factor

To order your factor, there are two options.

When creating a factor, specify ordered = TRUE and add unique levels in order from least to greatest:

credit_rating <- c("AAA", "AA", "A", "BBB", "AA", "BBB", "A")

credit_factor_ordered <- factor(credit_rating, ordered = TRUE, 
                                levels = c("AAA", "AA", "A", "BBB"))

For an existing unordered factor like credit_factor, use the ordered() function:

ordered(credit_factor, levels = c("AAA", "AA", "A", "BBB"))
[1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
Levels: AAA < AA < A < BBB
credit_factor_ordered
[1] AAA AA  A   BBB AA  BBB A  
Levels: AAA < AA < A < BBB

Notice the < specifying the order of the levels.

# Use unique() to find unique words
unique(credit_rating)
[1] "AAA" "AA"  "A"   "BBB"
# Create an ordered factor
credit_factor_ordered <- factor(credit_rating, ordered = TRUE, levels = c("CCC", "B", "BB", "AA", "AAA"))

# Plot credit_factor_ordered
plot(credit_factor_ordered)

Ordered factors are great for plotting or creating tables with a predefined order.

4.2.2 Subsetting a factor

You can subset factors in a similar way that you subset vectors. As usual, [ ] is the key! Also, you can add drop = TRUE so the levels are updated in line with your new vector.

# Remove the A bonds at positions 3 and 7. Don't drop the A level.
keep_level <- credit_factor_ordered[c(-3, -7)]

# Plot keep_level
plot(keep_level)


# Remove the A bonds at positions 3 and 7. Drop the A level.
drop_level <- keep_level <- credit_factor_ordered[c(-3, -7), drop = TRUE]

# Plot drop_level
plot(keep_level)

The drop argument will help you get rid of those pesky factor levels that stick around.

4.2.3 stringsAsFactors

R’s default behavior when creating data frames is to convert all characters into factors. This has caused countless novice R users a headache trying to figure out why their character columns are not working properly

To turn off this behavior:

cash <- data.frame(company, cash_flow, year, stringsAsFactors = FALSE)

str(cash)
'data.frame':   7 obs. of  3 variables:
 $ company  : chr  "A" "A" "A" "B" ...
 $ cash_flow: num  1000 4000 550 1500 1100 750 6000
 $ year     : num  1 3 4 1 2 4 5
# Variables
credit_rating <- c("AAA", "A", "BB")
bond_owners <- c("Dan", "Tom", "Joe")

# Create the data frame of character vectors, bonds
bonds <- data.frame(credit_rating, bond_owners, stringsAsFactors = FALSE)

# Use str() on bonds
str(bonds)
'data.frame':   3 obs. of  2 variables:
 $ credit_rating: chr  "AAA" "A" "BB"
 $ bond_owners  : chr  "Dan" "Tom" "Joe"
# Create a factor column in bonds called credit_factor from credit_rating
bonds$credit_factor <- factor(bonds$credit_rating, ordered = TRUE, levels = c("BB", "A", "AAA"))

# Use str() on bonds again
str(bonds)
'data.frame':   3 obs. of  3 variables:
 $ credit_rating: chr  "AAA" "A" "BB"
 $ bond_owners  : chr  "Dan" "Tom" "Joe"
 $ credit_factor: Ord.factor w/ 3 levels "BB"<"A"<"AAA": 3 2 1

5 Lists

5.1 What is a list?

5.1.1 Create a list

Lists in R can be used to hold together items of different data types. Creating a list is as simple as using the list() function.

# List components
name <- "Apple and IBM"
apple <- c(109.49, 109.90, 109.11, 109.95, 111.03)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79)
cor_matrix <- cor(cbind(apple, ibm))

# Create a list
portfolio <- list(name, apple, ibm, cor_matrix)

# View your first list
portfolio
[[1]]
[1] "Apple and IBM"

[[2]]
[1] 109.49 109.90 109.11 109.95 111.03

[[3]]
[1] 159.82 160.02 159.84 160.35 164.79

[[4]]
          apple       ibm
apple 1.0000000 0.9131575
ibm   0.9131575 1.0000000

5.1.2 Named lists

You could name the elements as you create the list with the form name = value. Or, if the list was already created, you could use names().

# Add names to your portfolio
names(portfolio) <- c("portfolio_name", "apple", "ibm", "correlation")

# Print portfolio
portfolio
$portfolio_name
[1] "Apple and IBM"

$apple
[1] 109.49 109.90 109.11 109.95 111.03

$ibm
[1] 159.82 160.02 159.84 160.35 164.79

$correlation
          apple       ibm
apple 1.0000000 0.9131575
ibm   0.9131575 1.0000000

5.1.3 Access elements in a list

Subsetting a list is similar to subsetting a vector or data frame, with one extra useful operation.

To access the elements in the list, use [ ]. This will always return another list.

To pull out the data inside each element of your list, use [[ ]].

# Second and third elements of portfolio
portfolio[c(2,3)]
$apple
[1] 109.49 109.90 109.11 109.95 111.03

$ibm
[1] 159.82 160.02 159.84 160.35 164.79
# Use $ to get the correlation data
portfolio$correlation
          apple       ibm
apple 1.0000000 0.9131575
ibm   0.9131575 1.0000000

5.1.4 Adding to a list

We can add new elements to it. You can do so using $. We could have also used c() to add another element to the list: c(my_list, dans_movie = “StaR Wars”). This can be useful if we want to add multiple elements to your list at once.

# Add weight: 20% Apple, 80% IBM
portfolio$weight <- c(apple = 0.2, ibm = 0.8)

# Print portfolio
portfolio
$portfolio_name
[1] "Apple and IBM"

$apple
[1] 109.49 109.90 109.11 109.95 111.03

$ibm
[1] 159.82 160.02 159.84 160.35 164.79

$correlation
          apple       ibm
apple 1.0000000 0.9131575
ibm   0.9131575 1.0000000

$weight
apple   ibm 
  0.2   0.8 
# Change the weight variable: 30% Apple, 70% IBM
portfolio$weight <- c(apple = 0.3, ibm = 0.7)

# Print portfolio to see the changes
portfolio
$portfolio_name
[1] "Apple and IBM"

$apple
[1] 109.49 109.90 109.11 109.95 111.03

$ibm
[1] 159.82 160.02 159.84 160.35 164.79

$correlation
          apple       ibm
apple 1.0000000 0.9131575
ibm   0.9131575 1.0000000

$weight
apple   ibm 
  0.3   0.7 

5.1.5 Removing from a list

Using NULL is the easiest way to remove an element from your list. We can also remove elements by position using my_list[1] <- NULL or my_list[[1]] <- NULL.

# Take a look at portfolio
portfolio
$portfolio_name
[1] "Apple and IBM"

$apple
[1] 109.49 109.90 109.11 109.95 111.03

$ibm
[1] 159.82 160.02 159.84 160.35 164.79

$correlation
          apple       ibm
apple 1.0000000 0.9131575
ibm   0.9131575 1.0000000

$weight
apple   ibm 
  0.3   0.7 
# Remove the microsoft stock prices from your portfolio
portfolio$microsoft <- NULL

5.2 A few list creating functions

5.2.1 Split it

Often, you will have data for multiple groups together in one data frame.

We can create a grouping to split on, and use split() to create a list of two data frames.

grouping <- cash$company
split_cash <- split(cash, grouping)

split_cash 
$A

$B
NA

To get your original data frame back, use:

unsplit(split_cash, grouping)
# Define grouping from year
grouping <- cash$year

# Split cash on your new grouping
split_cash <- split(cash, grouping)

# Look at your split_cash list
split_cash
$`1`

$`2`

$`3`

$`4`

$`5`
# Unsplit split_cash to get the original data back.
original_cash <- unsplit(split_cash, grouping)

# Print original_cash
original_cash

5.2.2 Split-Apply-Combine

A common data science problem is to split your data frame by a grouping, apply some transformation to each group, and then recombine those pieces back into one data frame. This is such a common class of problems in R that it has been given the name split-apply-combine.

Suppose, for the cash data frame, you are interested in doubling the cash_flow for company A, and tripling it for company B:

grouping <- cash$company
split_cash <- split(cash, grouping)

# We can access each list element's cash_flow column by:
split_cash$A$cash_flow
[1] 1000 4000  550
split_cash$A$cash_flow <- split_cash$A$cash_flow * 2
split_cash$B$cash_flow <- split_cash$B$cash_flow * 3

new_cash <- unsplit(split_cash, grouping)
new_cash
# Print the cash_flow column of B in split_cash
split_cash$B$cash_flow
[1]  4500  3300  2250 18000
# Set the cash_flow column of company A in split_cash to 0
split_cash$A$cash_flow <- 0

# Use the grouping to unsplit split_cash
cash_no_A <- unsplit(split_cash, grouping)

# Print cash_no_A
cash_no_A

5.2.3 Attributes

Attributes are a bit of extra metadata about your data structure. Some of the most common attributes are: row names and column names, dimensions, and class. You can use the attributes() function to return a list of attributes about the object you pass in. To access a specific attribute, you can use the attr() function.

# my_matrix and my_factor
my_matrix <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
rownames(my_matrix) <- c("Row1", "Row2")
colnames(my_matrix) <- c("Col1", "Col2", "Col3")

my_factor <- factor(c("A", "A", "B"), ordered = T, levels = c("A", "B"))

# attributes of my_matrix
attributes(my_matrix)
$dim
[1] 2 3

$dimnames
$dimnames[[1]]
[1] "Row1" "Row2"

$dimnames[[2]]
[1] "Col1" "Col2" "Col3"
# Just the dim attribute of my_matrix
attr(my_matrix, which = "dim")
[1] 2 3
# attributes of my_factor
attributes(my_factor)
$levels
[1] "A" "B"

$class
[1] "ordered" "factor" 
