Data Structures
Lists
Lists are R objects that can store elements of different classes,
oposite to vectors which can be used to store elements of the same class
only. Each element of a list can be of any object class such as data
frames, numeric vectors, matrices or even list. For example, we can
create a list to store information of a financial portfolio.
I will create a list that represents information of a financial
portfolio. I first create the elements of the portfolio:
# Create vectors for stock tickers and expected stock returns:
tickers <- c("AAPL","AMZN","WMT")
expected_stock_returns <- c(0.020,0.015,0.010)
expected_stock_risk <- c(0.04,0.045,0.02)
tickers
expected_stock_returns
expected_stock_risk
Now I will indicate how much (in %) I will allocate for each asset in
my portfolio. I will assign 50% to Apple, and 25% to Amazon and
Wal-Mart. I create a vector with these weights (in decimal):
stock_weights <- c(0.50,0.25,0.25)
Now I will calculate the expected portfolio return as a weighted
average the stock expected returns:
weighted_returns <- expected_stock_returns*stock_weights
weighted_returns
When you multiply 2 vectors you get an element-by-element
multiplication. This is called a “vectorization” operation. By default R
“vectorizes” mathematical operations. This means the first element of
the first vector is multiplied by the first element of the second
vector. The same for the 2nd. and 3rd. elements of each vector.
Now I just sum all the elements of this vector that contains the
weighted returns to get the expected return of my portfolio:
expected_portfolio_return <- sum(weighted_returns)
I used the function sum to the weighted_returns vector, so each
element of the vector is added to get the expected portfolio return.
I now create a list with all these pieces of information of my
portfolio. I use the function list() to create the portfolio object as a
list that contains the tickers vector, and the vectors for expected
stock return, expected stock risk, stock weights and the expected
portfolio return:
portfolio1 <- list(tickers, expected_stock_returns, expected_stock_risk,
stock_weights, expected_portfolio_return)
# I assign names to each element of the portfolio:
names(portfolio1)<-c("Tickers","Stock_returns","Stock_risk","Stock_weights",
"Portfolio expected Return")
# I display the portfolio object:
portfolio1
I can access the elements of the portfolio1 list using either the
name of each element, or the number of the element using double squared
brackets[[]]:
# I display the expected stock tickers:
portfolio1$Tickers
# I can also display expected stock tickers using the element number and double
# brackets:
portfolio1[[1]]
# I display the class and length of the first element:
class(portfolio1$Tickers)
length(portfolio1$Tickers)
The xts data class
R has different data classes (or R objects). Actually, you can create
your own data class, which is a set one or more data structures. For
time series datasets, which are very common in Finance, the most popular
data classes are: ts, zoo and xts. xts stands for extensible time series
class. An xts object contains a set of columns that can be of any type
(alphabetic, numeric, boolean) and it has a time index. All xts objects
are usually sorted chronologically.
The index of an xts object is like the first column of the object,
but it is actually its index; it is not considered a column of the
object. An xts object has the advantage that we can apply several
functions to that object for data merging, sorting, selecting, etc.
Try the following example. Download a sample dataset called edhec,
which comes with the PerformanceAnalytics package. In R, we can use
thousands of available packages to do different types of calculations
and operations. PerformanceAnalytics package is a package with many
functions related to Financial analytics. We will use some of these
functions later in the course.
we need the PerformanceAnalytics package, so you can install it from
the Package menu (right-bottom of your RStudio).
Once the package is installed in my computer, I do not need to
re-install it again. However, each time I need to use the package, I
need to load it into memory:
library(PerformanceAnalytics)
I download the sample dataset edhec, which contains monthly returns
of different hedge fund indexes created by the EDHEC Risk Institute.
data("edhec")
The edhec dataset will appear in my Global Environment.
edhec is an xts R object that contains historical monthly returns of
different hedge fund indexes. We can easily do different calculations
with specific functions that apply to xts objects. For now, we will
learn how to do subsetting with xts objects.
If you want to know more functions for the xts R objects, you can
check the xts CheatSheet at: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/xts_Cheat_Sheet_R.pdf
Subsetting data
R has powerful indexing features for accessing object elements. These
features can be used to select and exclude columns and rows
(observations). In the next paragraphs we will learn how to extract the
information we want from an xts object. Several of these functions also
apply to data frames.
I see the structure of the edhec R dataset and the first and last
rows:
str(edhec)
head(edhec)
tail(edhec)
From the results above, we see that this xts object has 13 columns,
which are returns of different hedge fund indexes.
I can subset either rows, columns or both. For example I can select
in which months the Emeging market fund has had negative returns:
negative_returns_em <- edhec[edhec$`Emerging Markets`<0,4]
# I display the first rows of this new object, which has only negative returns for Emerging Markets:
head(negative_returns_em)
negative_returns_em_all <- edhec[edhec$`Emerging Markets`<0,]
head(negative_returns_em_all)
As you see, to do subsetting I use squared brackets afher the object,
and the first number refers to the ROWS to be selected, and the second
parameter refers to the COLUMNS to be selected. In this case, after the
coma I wrote NOTHING, indicating that I want ALL COLUMNS!
If I want to do a subsetting of specific rows using a range of dates,
I can do the following. Since this is an xts object, it is very easy to
subset using dates. I select the returns of the 12 months of the year
2008:
edhec2008<-edhec["2008-01-01/2008-12-31",]
I can also select those months where the Emerging Markets fund and
the Convertible Arbitrage fund had positive returns in 2008:
positive_edhec2008<-edhec2008[edhec2008$`Convertible Arbitrage`>0 & edhec2008$`Emerging Markets`>0,]
positive_edhec2008
We can see that only 2 months had positive returns in these funds in
2008.
There are several logical operators to be used for the subsetting
condition of rows. These operators are:
== equal to != different to < less than <= less than or equal
to > more than >= more than or equal to or & and
There are several logical operators to be used for the subsetting
condition of rows. These operators are:
== equal to != different to < less than <= less than or equal
to > more than >= more than or equal to or & and
##Control-flow commands Control-flow commands or constructs are used
to program a series of tasks using conditions and/or loops.
There are 7 main control structures:
if/else: testing a condition for: execute a loop a fixed number of
times repeat: execute an infinite loop while: execute a loop while a
condition is true break: break the execution of a loop next: skip an
interation of a loop return: exit a function
The most simple ones are the conditionals if() and if/else, which
allow you to implement some basic algorithm logic in your analysis. We
will start with the conditionals statements and then the loop
statements.
Conditionals
If and if else
if(<condition1>) {
## do stuff
}
Here is a simple example, that is self-explanatory:
grade <- 70
if(grade > 69){cat("You have passed this course. Well done!")}
In this case the condition will be true, so the message will be
displayed. You can try different values for grade and re-run this code
to see what happens.
ELSE IF
Imagine that you have several conditions, not only one as the
previous example. In this case, you need to use the else if statement.
The structure looks like this:
if(<condition1>) {
## do something
} else if(<condition2>) {
## do something different
}
So our previous example can be upgraded as follows. Try chaging the
values in the input variables.
gradePartial1 <- 80
gradePartial2 <- 60
gradeFinalExam <- 100
FinalGrade <- (gradePartial1*0.25 + gradePartial1*0.25 + gradeFinalExam*0.5)
if(FinalGrade > 90){
"You did an Excellent job!"
} else if(FinalGrade > 69){
"You have passed this course. Well done."
} else if(FinalGrade > 0){
"Sorry you have failed the course."
}
ELSE
Now imagine that we have several conditionals but we want to do
something with the remaining variables that don’t fulfill the previous
conditionals, then we use else. The structure looks like:
gradePartial1 <- -70
gradePartial2 <- 60
gradeFinalExam <- -60
FinalGrade <- (gradePartial1*0.25 + gradePartial1*0.25 + gradeFinalExam*0.5)
if(FinalGrade > 90){
"You did awesome!"
} else if(FinalGrade > 69){
"You have passed this course. Well done."
} else if(FinalGrade > 0){
"Sorry you have failed the course."
} else {
"You have negative grades; it should be an error"
}
%in%
In the next example we use the operator %in%. This operator checks
whether a value exists inside a vector. For each element of the vector
it returns TRUE if the value is contained in the verctor and 0
otherwise:
# Both elements in the first vector appears in the second vector
c(1,2) %in% c(6,4,8,3,2,1)
# Only the last two elements of the first verctor appear in the second
c(6,4,8,3,2,1) %in% c(1,2)
For Loop
As you see, the result of using the %in% operator is a vector of
Boolean values (TRUE OR FALSE). If the first element is TRUE, it means
that the first number of the first vector is inside the second
vector.
students<-c("Pedro","Laura","Bryan")
for(name in students){
##Instead of name you can write any variable just change it inside the loop
cat("Hi, my name is",name)
}
In the next example we will use the quantmod library to download some
financial data from online data sources such as Yahoo finance and Google
Finance. Our initial vector indicate the tickers to download, in this
case AAPL,JPM and GE.
You have to install the package in the Package tab of the
bottom-right windows of RStudio.
Once you install the command, you have to load it with the library
function. This quantmod library has the getSymbols function that is used
to download online data from the web:
library(quantmod)
#Vector of tickers
tickers <-c("AAPL","JPM","GE")
for(i in tickers){
getSymbols(i)
cat("the prices of the ticker",i,"have been downloaded")
}
In the next example we do a similar process for a different ticker
list. In the case of GE and AAPL we are going to download the data from
yahoo, while INTGSTMXM193N from the FED. What happens with the
remainning tickers?
tickers <- c("AAPL","GM","INTGSTMXM193N","GE","JPM")
for(i in tickers){
if(i %in% c("AAPL","GE")){
getSymbols(i,src = "yahoo")
cat("the prices of the ticker",i,"have been downloaded from yahoo")
}
else if(i=="INTGSTMXM193N"){
getSymbols(i,src = "FRED")
#Note that the source for the FED is called FRED
cat("the data of the ticker",i,"have been downloaded from the FED (Federal US Bank) ")
}
}
We are going to use the previous example, but in this case we want
also to add a condition to print the tickers that have NOT been
downloaded.
tickers <- c("AAPL","GM","INTGSTMXM193N","GE","JPM")
for(i in tickers){
if(i %in% c("AAPL","GE")){
getSymbols(i,src = "yahoo")
cat("the prices of the ticker",i,"have been downloaded from yahoo")
}
else if(i=="INTGSTMXM193N"){
getSymbols(i,src = "FRED")
cat("the data of the ticker",i,"have been downloaded from the FED (Federal US Bank) ")
}
else {
cat("The ticker",i,"has not been downloaded")
}
}
WHILE AND REPEAT TOOLS
In a similar manner than the for loop, it is possible to program a
loop with while and repeat control-flow statements. The main difference
is that in both you need to specify what is the exit condition, and it
may be possible that NOT all the itereations has to be performed (like
in the for case).
For the while you need to specify the condition at the beginning.
Here an example. If you want to know how many years you need to keep an
investment with a fixed interest rate in order to duplicate your initial
investment:
APR<-0.10
# I define Annual Percentage Rate to be equal to 10%
INV<-100
# Initial investment equal to $100
MULTIPLE<-2
# Multiple = 2 to check when the investment double
BALANCE<-INV
# I start assigning the balance equal to the initial investment
year<-0
# I start with year equal to zero
while (BALANCE<MULTIPLE*INV) {
# the exit condition means that while the balance is less than the initial investment
# multiplied by the multiple, then continue with the iterations of the loop
year<-year+1
# I increase the value of year by 1
BALANCE<-BALANCE*(1+APR)
# I multiply the current balance times the growth factor (1+APR) using to the
# Annual Percentage Rate
}
cat(
"To multiply your investment times ", MULTIPLE, "you need ", year, " years.")
cat(
"Your balance after ", year, " years will be $",BALANCE)
Unlike the while statement, you can use the repeat statement to do a
loop, but in the case of repeat, you have to specify the exit condition
using if and the break statement any place within the loop. Here is the
same loop we did above but using repeat:
APR<-0.10
# I define Annual Percentage Rate to be equal to 10%
INV<-100
# Initial investment equal to $100
MULTIPLE<-2
# Multiple = 2 to check when the investment double
BALANCE<-INV
# I start assigning the balance equal to the initial investment
year<-0
# I start with year equal to zero
repeat {
year<-year+1
# I increase the value of year by 1
BALANCE<-BALANCE*(1+APR)
# I multiply the current balance times the growth factor (1+APR) using to the
# Annual Percentage Rate
if (BALANCE>=MULTIPLE*INV) {
break
}
# If the balance is greater than the multiple times the investment, then
# the break statement is executed, so the program stops the interations
}
cat(
"To multiply your investment times ", MULTIPLE, "you need ", year, " years.")
cat(
"Your balance after ", year, " years will be $",BALANCE)
5.4 Looping vs R-vectorization Most R programmers believe that loops
must be AVOIDED! This is wierd since most of the time programming
requires to do repetitive tasks. However, in R, unlike other traditional
computer languages such as C, there is an alternative way to do loops
without writing loops!
A <-c(3,3,4,6,8)
B <-c(-2,-3,-1,3,5)
# A and B are vectors with the annual free cash flows of each product
sumcashflows<-vector(length=5)
for (i in 1:length(A)) {
sumcashflows[i]<-A[i] + B[i]
# We sum the element i of both vectors and leave the result
# in the position i of the vector sumcashflows
}
sumcashflows
A <-c(3,3,4,6,8)
B <-c(-2,-3,-1,3,5)
sumcashflows<-A+B
# We just sum both vectors as if they were numbers
sumcashflows
5.5 Application: Present value of future cash flows Now, how can you
calculate the Present Value of these cash flows? You have to remember
the basics of time value of money. To calculate the present value (PV0)
of a sequence of cash flows in the future, you just have to apply the
following formula for the sequence of cash flows:
con loop
# I assign 0 the the variable PV, present value
PV <- 0
# I defined the discount rate as R:
R <- 0.15
for (i in 1:length(sumcashflows)) {
# I calculate the present value of each cash flow i:
PVCF <- sumcashflows[i] / (1 + R)^i
# I sum each corresponding cash flow i to the cumulative variable PV
PV <- PV + PVCF
}
# The loop iterates 5 times, one for each cash flow and will ends with the sum of all
cat("The Present Value of all cash flows is ", PV)
sin loop
# I assign a sequence from 1 to 5, for the exponents of the formula of present value
# for each cash fow
exponents <- seq(1,5)
# The seq is a function that generates a sequence. Here I specified to start the
# sequence in 1 and finish in 5
# I calculate a vector with the present value for each cash flow. I use vectorization:
PVvector <- sumcashflows / (1 + R)^exponents
# Note that this mathematical expression applies to each of the element of the vector.
# Also, each vector has the same dimension.
# I finally sum all elements of the vector using the function sum:
PV<-sum(PVvector)
cat("The Present Value of all cash flows is ", PV)
##7 CHALLENGE 1 You have to write a program to calculate the number
of months needed to finish paying a mortgage loan. The information about
the loan is the following:
Loan amount = $3,000,000.00 pesos
APR (Annual % rate) = 11% (compounded monthly)
Monthly Fixed Payment = $40,000.00 (includes interests and
capital)
Your program has to provide 2 results: the number of months needed to
finish paying the loan, and the amount of the last payment if the
payment is less than the fixed payment amount. Your program has to be
able to run with any change in any of the values of the above
variables.
This is a quite challenging exercise!
Hint: if you are familiar with Excel, start solving the problem in
Excel, and then try to write your program in R.
##8 CHALLENGE 2 Write a program that calculates the price of the
following bond issued by the company ABC. ABC needs to finance an
important project to develop a new technological device.
To get the money, ABC issued a bond with the following
characteristics:
Principal: $3,000,000 Time to maturity: 20 years Coupon rate: 11%
(annual) coupons are payed each 6 months Calculate the price of this
bond for each of the following annual interest rates:
8% 11% 13% You have to get the price for the bond for each of these 3
interest rates.
Remember that the price of a bond is the present value of its future
cash flows.
CHALLENGE 1
You have to write a program to calculate the number of months needed
to finish paying a mortgage loan. The information about the loan is the
following:
Loan amount = $3,000,000.00 pesos
APR (Annual % rate) = 11% (compounded monthly)
Monthly Fixed Payment = $40,000.00 (includes interests and
capital)
Your program has to provide 2 results: the number of months needed to
finish paying the loan, and the amount of the last payment if the
payment is less than the fixed payment amount. Your program has to be
able to run with any change in any of the values of the above
variables.
This is a quite challenging exercise!
Hint: if you are familiar with Excel, start solving the problem in
Excel, and then try to write your program in R.
FinancialMath::amort.period(Loan = 3000000, pmt = 40000, i = .11, n = NA)
Error in FinancialMath::amort.period(Loan = 3e+06, pmt = 40000, i = 0.11, :
Too small of pmt.
CHALLENGE 2
Write a program that calculates the price of the following bond
issued by the company ABC. ABC needs to finance an important project to
develop a new technological device.
To get the money, ABC issued a bond with the following
characteristics:
Principal: $3,000,000 Time to maturity: 20 years Coupon rate: 11%
(annual) coupons are payed each 6 months Calculate the price of this
bond for each of the following annual interest rates:
8% 11% 13% You have to get the price for the bond for each of these 3
interest rates.
Remember that the price of a bond is the present value of its future
cash flows.
FinancialMath::bond(t = NA, f = 3000000, r = 20, n = 11, i = .08, c = 1)
Bond Summary
Price 4.283379e+08
Premium 4.283379e+08
Coupon 6.000000e+07
Eff Rate 8.000000e-02
Years 1.100000e+01
MAC D 5.239503e+00
MOD D 4.851392e+00
MAC C 3.710111e+01
MOD C 3.630025e+01
(I think?)
FinancialMath::bond(t = NA, f = 3000000, r = 20, n = 11, i = .11, c = 1)
Bond Summary
Price 3.723909e+08
Premium 3.723909e+08
Coupon 6.000000e+07
Eff Rate 1.100000e-01
Years 1.100000e+01
MAC D 4.978808e+00
MOD D 4.485413e+00
MAC C 3.415739e+01
MOD C 3.176382e+01
FinancialMath::bond(t = NA, f = 3000000, r = 20, n = 13, i = .11, c = 1)
Bond Summary
Price 4.049922e+08
Premium 4.049922e+08
Coupon 6.000000e+07
Eff Rate 1.100000e-01
Years 1.300000e+01
MAC D 5.582155e+00
MOD D 5.028969e+00
MAC C 4.395334e+01
MOD C 4.020412e+01
