Finance Programming - Workshop 1

Authors

Alberto Dorantes Dosamantes, Ph.D.

Monterrey Tech, Queretaro Campus

Published

February 14, 2023

Abstract
In this workshop we will learn what is a programming language and the basics of objects and data structures in R.

Learning objectives:

  1. Understand what a programming language is.
  2. Understand the way R manages data: what is an R object, and what are the main (atomic) data classes of R objects.
  3. Understand the basic R data structures such as vectors, matrices and data frames.

General directions for this Workshop

You will work in RStudio. Create an R Notebook document (File -> New File -> R Notebook), where you have to write whatever is asked in this workshop.

Create an R Notebook document (File -> New File -> R Notebook), where you have to write whatever is asked in this workshop. More specifically, you have to:

  • Replicate all the R Code along with its output.

  • You have to respond to the CHALLENGES of the workshop and any other question asked in the workshop.

Any QUESTION or any INTERPRETATION you need to do will be written in CAPITAL LETTERS. For ANY QUESTION or INTERPRETATION, you have to RESPOND IN CAPITAL LETTERS right after the question.

  • It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.

You have to keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file. Pay attention in class to know how to generate an html file from your .Rmd.

Set up the name of your R Notebook for this workshop

Setup title and name of your Workshop

Once you have created a new R Notebook, you will see a sample R Notebook document. You must DELETE all the lines of this sample document except the first lines related to title and output. As title, write the workshop # and course, and add a new line with your name. You have to end up with something like:


title: “Workshop 1, Financial Programming”

author: YourName

output: html_notebook


Now you are ready to continue writing your first R Notebook.

You can start writing your own notes/explanations we cover in this workshop. When you need to write lines of R Code, you need to click Insert at the top of the RStudio Window and select R. Immediately a chunk of R code will be set up to start writing your R code. You can execute this piece of code by clicking in the play button (green triangle).

Note that you can open and edit several R Notebooks, which will appear as tabs at the top of the window. You can visualize the output (results) of your code in the console, located at the bottom of the window. Also, the created variables are listed in the environment, located in the top-right pane. The bottom-right pane shows the files, plots, installed packages, help, and viewer tabs.

Save your R Notebook file as W1-YourName.Rmd. Go to the File menu and select Save As.

Introduction to programming languages

What is a programming language? A programming language is a set of commands or instructions that are executed by the computer usually to automate repetitive tasks or functions that usually require intensive data processing. Programming languages are also used to develop computer applications. In Economics and Finance, we use programming languages mainly to do the following:

  1. data input and/or data collection,
  2. data cleaning,
  3. data management including data transformation and data merging,
  4. data storing,
  5. data processing that includes descriptive analytics and predictive models, and finally
  6. information delivery that includes reports or documents usually with summary tables and graphs.

You can effectively perform these programming processes in R.
One of the great advantages of R as oppose to other statistical software is that R is free. Its distribution is available under the GNU General Public License. The R language was initially designed for statistically computing. R is maintained by the scientific community and its popularity has increased substantially in recent years since the initial version released in 1995. Nowadays, both Python and R are the two more popular programming languages for Data Science around the world.

R objects

In R, any piece of information is stored into an object. A computer object takes bits of memory from your computer so the information is available to be read, manipulated, analysed, exported or deleted. In general terms, all you have to understand is that R will take those objects and perform the tasks indicated in the code using the data stored in the object.

In R, each object is considered to be of a specific data class. Each * R class* has its own data structure and attributes. In other programming languages any piece of data is called a variable.

Assigning values to an object

R at its simplest form can be used as a calculator. We can simply write an operation at the R console to receive the answer:

5+7
[1] 12

R can do much more than a simple calculator. We can save or assign the result of an expression into an object (variable). We can use the assignment operator <-, also known as the back arrow operator to assign a value into a variable:

x <- 5+7

To view the contents of the new object x, just type x and press Enter in the Console:

x
[1] 12

R replies printing out in the screen the value of the x variable, which in this example is equal to 12. We can do more calculations using x and other numbers. For example:

z <- x+5

In this case we assigned the value of x plus 5 to a new variable z. An interesting feature of R is that it uses vectors to store single values such as x and z.

When you try to see the content of x you can see the number 1 between squared brackets []:

x
[1] 12

The [1] means that x is a vector of only 1 element that in this case is equal to 12. However, a vector can have more than one element, each of those can be of any of the following atomic classes: numeric, character, integer, logical (true/false) or complex.

Besides vectors, R also uses matrices, data frames and lists. We examine those type of objects in the second part of this document.

Data Structures

Now we will learn about data structures in R. The most common data structures are:

  1. Vectors
  2. Matrix
  3. Data Frame
  4. Lists

Vectors

A vector is a collection of values. The values of a vector can be of any of the following atomic classes: numeric, character, integer, logical (true/false) or complex. Each vector can have only one type of data class.

Now we will create a small collection of numbers in a numeric vector. A numeric vector is the simplest type of data structure in R. In fact, even a single number is considered a vector of length one.

To create a vector we can use the function c() that means combine We can define a vector with the numbers 1, 2 and 3 using the c() function and separating each element by a comma as shown below.

y <- c(1,2,3)
y
[1] 1 2 3

We can also create an integer vector as follows:

y <- 1:3
y
[1] 1 2 3

We can do arithmetic operations with a numeric vector. For example, we can add a number to each element of the vector and assign the result into another numeric vector as follows:

z <- y+1
z
[1] 2 3 4

Also you can also make arithmetic operations with two or more vectors:

w <- z-y
w
[1] 1 1 1

When given two vectors of the same length, R performs the specified arithmetic operation (+, -, *, etc.) element-by-element. You can even combine vectors to create new ones:

w <- c(y,"a","b")

In this case we have added two letters into the integer vector y. The final vector w, result of this operation, will be a character vector, so each number of the y vector was converted into a string or character value. You can observe that the elements of this new vectors are character since each value has quotes. A vector in R can only contain objects of the same class, in this case it transforms from numeric to character:

class(y)
[1] "integer"
class(w)
[1] "character"

The elements of a vector are R objects, so these element are of any of five basic or atomic data classes:

  1. Numeric:
x <- c(2,3.43,4.21)
  1. Character
x <- c("2", "hola", "4 gatos")
  1. Integer
x <- c(2L,3L,4L)
  1. logical (True/False)
x <- c(T,FALSE,F,TRUE)
  1. Complex (imaginary numbers)
z <- c(5i,4i)

Now we will learn how to access a particular element of a vector. First we will create a vector with numbers form 101 to 105 using the function seq() which generates a sequence of numbers according to the arguments in the function, in this case form 101 to 105.

vector1 <- seq(from = 101, to = 105)
vector1
[1] 101 102 103 104 105

If we want to access the third element of this vector, which is 103, we write the element number between the operator []:

vector1[3]
[1] 103

Imagine that we want to access not only one, but several element in the vector. This is done by indicating a vector of numbers that indicate the element number inside the original vector. We do this as follows:

vector1[c(1,3,5)]
[1] 101 103 105

In the case that you want to modify certain element in the existing vector `vector1, all you have to do is to assign <- the new value to the specific vector location indicated by the value inside the square brackets operator [].

The example below can be read as follows: the second element of the vector vector1 gets the value 109.

vector1[2] <- 109
vector1
[1] 101 109 103 104 105

R considers all vectors as VERTICAL vectors. However, when you display a vector on the screen you will see it as HORIZONTAL. Let’s see an example:

vector2 <- seq(1,100)
vector2
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
 [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
 [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
 [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
 [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
 [91]  91  92  93  94  95  96  97  98  99 100

We can see that the elements of the vector are displayed horizontally. Why is it important to know whether a vector is vertical or horizontal? when we do matrix operations with matrices and/or vectors we have to know whether the vector is horizontal or vertical.

When we display a vector R always shows an index [1] indicating that the vector starts with the element 1. If the vector has many elements that cannot be displayed in 1 row, then the vector continues in the following row and it display the element # in the following row as [#], where # will be the number of the first element shown in that row.

Actually, when we declare a numaric variable with only 1 value, R define it as a vector of 1 element. Then, the simpest R object is a vector, no matter if it has 1 or more elements.

Matrix

Matrices are vectors with a dimension attribute. A typical matrix object has two dimensions: rows and columns. Those dimensions define the size of the matrix and must be defined every time you create a new matrix object.

In order to create a matrix you can use the function matrix(), the first attribute is the the data in your matrix, the nrow attribute is the desired number of rows while the ncol attribute is the desired number of columns.

matrix(1:4, nrow = 2, ncol = 2)
     [,1] [,2]
[1,]    1    3
[2,]    2    4

Alternatively, you can create a matrix by joining 2 or more different vectors, using the functions rbind() or cbind(). rbind joins to vectors as rows while cbind combines columns. In the example below, we create two vectors v1 and v2, each having 3 elements, then we combine them together into a matrix.

# Create vectors
v1 <- c(1,2,3)
v2 <- c(90,91,92)

# Combine vectors as VERTICAL vectors and save as `matrix1`
matrix1 <-cbind(v1,v2)

# Print object matrix1
matrix1
     v1 v2
[1,]  1 90
[2,]  2 91
[3,]  3 92
# Combine vectors as HORIZONTAL vectors and save as `matrix1`
matrix2 <-rbind(v1,v2)

# Print object matrix1
matrix2
   [,1] [,2] [,3]
v1    1    2    3
v2   90   91   92

Once your matrix object has been created you can check its dimensions by using the dim() function as shown below. The dim function returns the number of rows and columns of your matrix.

As you would expect, for this particular example we have 3 rows and 2 columns.

dim(matrix1)
[1] 3 2

To access a certain element of a matrix we will use the square brackets operator [] after the name. The first element before the comma indicates the row number, while the second element the square brackets refers to the column. This way, you can point to any specific location in the matrix and extract the value saved on this particular location. In the case below, we are extracting the element in the first row second column, which corresponds to the number 90

matrix1[1,2]
v2 
90 

We can also refer a whole column or row simply by leaving the other element empty. In the case we want to observe the whole second row we just write:

matrix1[2,]
v1 v2 
 2 91 

A very important point to note is that, just as vectors, matrices can only store one class of object, i.e. numeric objects. In other words, you can not have both, string class elements and numeric class elements in the same matrix. To do this, we will use a different class of object called data frame.

Data Frames

Data frames are used to store tabular data. A data frame is like an Excel spreadsheet kind of table. However, unlike matrices, data frames can store different classes of objects in different columns.

You can create a data frame using the data.frame() function. In the example shown below, we combine both, string and numeric data in the same x object.

x <- data.frame(Student = c("Paco","Raul","Beto"), Grade = c(9, 5, 7))
x
  Student Grade
1    Paco     9
2    Raul     5
3    Beto     7

As we did before with matrices, you can also see the dimensions of the data frame using the dim() function.

dim(x)
[1] 3 2

You can also create a data frame object by transforming an existing matrix via the function as.data.frame(). In the example below, we first create a matrix called my.matrix and then transform it to be my.df. In the third line of code we check whether the new object is actually a data.frame class object.

my.matrix <- matrix(1:4, nrow = 2, ncol = 2)
my.df <- as.data.frame(my.matrix)
class(my.df)
[1] "data.frame"

The rbind or cbind functions can be also used to append rows or columns to an existing data frame. In the example below I will add a new column to the data frame x previously created, but first I will create a copy and call it grades.df just to make it more descriptive.

# Make a copy
grades.df <- x
grades.df <- cbind(grades.df, c("Pass", "Fail", "Pass"))
grades.df
  Student Grade c("Pass", "Fail", "Pass")
1    Paco     9                      Pass
2    Raul     5                      Fail
3    Beto     7                      Pass

As we can see the first two columns have a name, but the third does not mean anything. Data frames, as any object in R have attributes:

attributes(grades.df)
$names
[1] "Student"                         "Grade"                          
[3] "c(\"Pass\", \"Fail\", \"Pass\")"

$class
[1] "data.frame"

$row.names
[1] 1 2 3

In this case the data frame has 3 attributes: $names, $row.names and $class. We can not only see those but manipulate them. The $names attribute stores the names of the columns in your data frame, so if you do not like the names of your columns you can always change them using the colnames() function as follows:

colnames(grades.df) <- c("Student","Grade","Status")

You see the changes you have made, you can always print out the attributes of the data frame:

attributes(grades.df)
$names
[1] "Student" "Grade"   "Status" 

$class
[1] "data.frame"

$row.names
[1] 1 2 3

Or simply get the names of the columns or rows using the functions colnames() or rownames().

colnames(grades.df)
[1] "Student" "Grade"   "Status" 
rownames(grades.df)
[1] "1" "2" "3"

Changing the name of one of the column is easy, but now imagine that we had 100 or 1,000 columns, writing all the column names again is not practical. So we need to learn how to modify only one, to do this we have to know how to pinpoint the element we want to modify, so again we use the squared brackets [] to point to the specific location of the column that will be renamed. Let’s say you want to modify the name of the third column Status to be Pass-Fail:

colnames(grades.df)[3] <- "Pass-Fail"

# See the changes
colnames(grades.df)
[1] "Student"   "Grade"     "Pass-Fail"

The list data structure will be covered in the following workshop.

CHALLENGE

CREATE A DATA-FRAME WITH THE OPEN AND CLOSE PRICES FOR MICROSOFT STOCK (MSFT) FOR THE LAST 2 MONTHS

Name the columns as OPEN and CLOSE for the prices, and name the rows with the names of the months. You have to end up with a data frame with 2 rows and 2 columns.

Hints:

  1. You can check the monthly prices of Microsoft in Yahoo Finance!
  2. Check how to create vectors. You can create 2 vectors for the open and close prices
  3. You can use cbind to create a matrix
  4. Change the matrix to a data frame using the data.frame function

Datacamp activities

You will receive an invitation by email to register in Datacamp.com. Just follow the directions to enroll in my Datacamp group. Datacamp is one of the most comprehensive online interactive courses related to Data Science. Datacamp has hundreds of data science premium courses. For this course we will have complete (and free!) access to all premium courses. This access will last for 6 MONTHS! So, you can keep learning when we finish the course.

You have to go to the Introduction to R for Finance course and view and DO all activities for 2 chapters: a) The Basics, and b) Vectors and Matrices. You will receive points for each activity you do in these chapters.

Workshop submission

You have to submit your .html file of this workshop through Canvas BEFORE NEXT CLASS.

The grade of this Workshop will be the following:

  • Complete (100%): If you submit an ORIGINAL and COMPLETE HTML file with all the activities, with your notes, and with your OWN RESPONSES to questions

  • Incomplete (75%): If you submit an ORIGINAL HTML file with ALL the activities but you did NOT RESPOND to the questions and/or you did not do all activities and respond to some of the questions.

  • Very Incomplete (10%-70%): If you complete from 10% to 75% of the workshop or you completed more but parts of your work is a copy-paste from other workshops.

  • Not submitted (0%)