5+7
[1] 12
You will work in RStudio. Create an R Notebook document (File -> New File -> R Notebook), where you have to write whatever is asked in this workshop.
Create an R Notebook document (File -> New File -> R Notebook), where you have to write whatever is asked in this workshop. More specifically, you have to:
Replicate all the R Code along with its output.
You have to respond to the CHALLENGES of the workshop and any other question asked in the workshop.
Any QUESTION or any INTERPRETATION you need to do will be written in CAPITAL LETTERS. For ANY QUESTION or INTERPRETATION, you have to RESPOND IN CAPITAL LETTERS right after the question.
You have to keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file. Pay attention in class to know how to generate an html file from your .Rmd.
Setup title and name of your Workshop
Once you have created a new R Notebook, you will see a sample R Notebook document. You must DELETE all the lines of this sample document except the first lines related to title and output. As title, write the workshop # and course, and add a new line with your name. You have to end up with something like:
title: “Workshop 1, Financial Programming”
author: YourName
output: html_notebook
Now you are ready to continue writing your first R Notebook.
You can start writing your own notes/explanations we cover in this workshop. When you need to write lines of R Code, you need to click Insert at the top of the RStudio Window and select R. Immediately a chunk of R code will be set up to start writing your R code. You can execute this piece of code by clicking in the play button (green triangle).
Note that you can open and edit several R Notebooks, which will appear as tabs at the top of the window. You can visualize the output (results) of your code in the console, located at the bottom of the window. Also, the created variables are listed in the environment, located in the top-right pane. The bottom-right pane shows the files, plots, installed packages, help, and viewer tabs.
Save your R Notebook file as W1-YourName.Rmd. Go to the File menu and select Save As.
What is a programming language? A programming language is a set of commands or instructions that are executed by the computer usually to automate repetitive tasks or functions that usually require intensive data processing. Programming languages are also used to develop computer applications. In Economics and Finance, we use programming languages mainly to do the following:
You can effectively perform these programming processes in R.
One of the great advantages of R as oppose to other statistical software is that R is free. Its distribution is available under the GNU General Public License. The R language was initially designed for statistically computing. R is maintained by the scientific community and its popularity has increased substantially in recent years since the initial version released in 1995. Nowadays, both Python and R are the two more popular programming languages for Data Science around the world.
In R, any piece of information is stored into an object. A computer object takes bits of memory from your computer so the information is available to be read, manipulated, analysed, exported or deleted. In general terms, all you have to understand is that R will take those objects and perform the tasks indicated in the code using the data stored in the object.
In R, each object is considered to be of a specific data class. Each * R class* has its own data structure and attributes. In other programming languages any piece of data is called a variable.
R at its simplest form can be used as a calculator. We can simply write an operation at the R console to receive the answer:
5+7
[1] 12
R can do much more than a simple calculator. We can save or assign the result of an expression into an object (variable). We can use the assignment operator <-
, also known as the back arrow operator to assign a value into a variable:
<- 5+7 x
To view the contents of the new object x
, just type x
and press Enter in the Console:
x
[1] 12
R replies printing out in the screen the value of the x
variable, which in this example is equal to 12. We can do more calculations using x
and other numbers. For example:
<- x+5 z
In this case we assigned the value of x
plus 5 to a new variable z
. An interesting feature of R is that it uses vectors to store single values such as x
and z
.
When you try to see the content of x
you can see the number 1 between squared brackets []
:
x
[1] 12
The [1] means that x
is a vector of only 1 element that in this case is equal to 12. However, a vector can have more than one element, each of those can be of any of the following atomic classes: numeric
, character
, integer
, logical
(true/false) or complex
.
Besides vectors, R also uses matrices, data frames and lists. We examine those type of objects in the second part of this document.
Now we will learn about data structures in R. The most common data structures are:
A vector is a collection of values. The values of a vector can be of any of the following atomic classes: numeric
, character
, integer
, logical
(true/false) or complex
. Each vector can have only one type of data class.
Now we will create a small collection of numbers in a numeric
vector. A numeric
vector is the simplest type of data structure in R. In fact, even a single number is considered a vector of length
one.
To create a vector we can use the function c()
that means combine We can define a vector with the numbers 1, 2 and 3 using the c()
function and separating each element by a comma as shown below.
<- c(1,2,3)
y y
[1] 1 2 3
We can also create an integer
vector as follows:
<- 1:3
y y
[1] 1 2 3
We can do arithmetic operations with a numeric
vector. For example, we can add a number to each element of the vector and assign the result into another numeric
vector as follows:
<- y+1
z z
[1] 2 3 4
Also you can also make arithmetic operations with two or more vectors:
<- z-y
w w
[1] 1 1 1
When given two vectors of the same length, R performs the specified arithmetic operation (+
, -
, *
, etc.) element-by-element. You can even combine vectors to create new ones:
<- c(y,"a","b") w
In this case we have added two letters into the integer
vector y
. The final vector w
, result of this operation, will be a character
vector, so each number of the y
vector was converted into a string or character value. You can observe that the elements of this new vectors are character
since each value has quotes. A vector in R can only contain objects of the same class
, in this case it transforms from numeric
to character
:
class(y)
[1] "integer"
class(w)
[1] "character"
The elements of a vector are R objects, so these element are of any of five basic or atomic
data classes:
<- c(2,3.43,4.21) x
<- c("2", "hola", "4 gatos") x
<- c(2L,3L,4L) x
<- c(T,FALSE,F,TRUE) x
<- c(5i,4i) z
Now we will learn how to access a particular element of a vector. First we will create a vector with numbers form 101
to 105
using the function seq()
which generates a sequence of numbers according to the arguments in the function, in this case form 101
to 105
.
<- seq(from = 101, to = 105)
vector1 vector1
[1] 101 102 103 104 105
If we want to access the third element of this vector, which is 103, we write the element number between the operator []
:
3] vector1[
[1] 103
Imagine that we want to access not only one, but several element in the vector. This is done by indicating a vector of numbers that indicate the element number inside the original vector. We do this as follows:
c(1,3,5)] vector1[
[1] 101 103 105
In the case that you want to modify certain element in the existing vector `vector1
, all you have to do is to assign <-
the new value to the specific vector location indicated by the value inside the square brackets operator []
.
The example below can be read as follows: the second element of the vector vector1
gets the value 109
.
2] <- 109
vector1[ vector1
[1] 101 109 103 104 105
R considers all vectors as VERTICAL vectors. However, when you display a vector on the screen you will see it as HORIZONTAL. Let’s see an example:
<- seq(1,100)
vector2 vector2
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
We can see that the elements of the vector are displayed horizontally. Why is it important to know whether a vector is vertical or horizontal? when we do matrix operations with matrices and/or vectors we have to know whether the vector is horizontal or vertical.
When we display a vector R always shows an index [1] indicating that the vector starts with the element 1. If the vector has many elements that cannot be displayed in 1 row, then the vector continues in the following row and it display the element # in the following row as [#], where # will be the number of the first element shown in that row.
Actually, when we declare a numaric variable with only 1 value, R define it as a vector of 1 element. Then, the simpest R object is a vector, no matter if it has 1 or more elements.
Matrices are vectors with a dimension attribute. A typical matrix object has two dimensions: rows and columns. Those dimensions define the size of the matrix and must be defined every time you create a new matrix object.
In order to create a matrix you can use the function matrix()
, the first attribute is the the data in your matrix, the nrow
attribute is the desired number of rows while the ncol
attribute is the desired number of columns.
matrix(1:4, nrow = 2, ncol = 2)
[,1] [,2]
[1,] 1 3
[2,] 2 4
Alternatively, you can create a matrix by joining 2 or more different vectors, using the functions rbind()
or cbind()
. rbind
joins to vectors as rows while cbind
combines columns. In the example below, we create two vectors v1
and v2
, each having 3 elements, then we combine them together into a matrix.
# Create vectors
<- c(1,2,3)
v1 <- c(90,91,92)
v2
# Combine vectors as VERTICAL vectors and save as `matrix1`
<-cbind(v1,v2)
matrix1
# Print object matrix1
matrix1
v1 v2
[1,] 1 90
[2,] 2 91
[3,] 3 92
# Combine vectors as HORIZONTAL vectors and save as `matrix1`
<-rbind(v1,v2)
matrix2
# Print object matrix1
matrix2
[,1] [,2] [,3]
v1 1 2 3
v2 90 91 92
Once your matrix object has been created you can check its dimensions by using the dim()
function as shown below. The dim
function returns the number of rows and columns of your matrix.
As you would expect, for this particular example we have 3 rows and 2 columns.
dim(matrix1)
[1] 3 2
To access a certain element of a matrix we will use the square brackets operator []
after the name. The first element before the comma indicates the row number, while the second element the square brackets refers to the column. This way, you can point to any specific location in the matrix and extract the value saved on this particular location. In the case below, we are extracting the element in the first row second column, which corresponds to the number 90
1,2] matrix1[
v2
90
We can also refer a whole column or row simply by leaving the other element empty. In the case we want to observe the whole second row we just write:
2,] matrix1[
v1 v2
2 91
A very important point to note is that, just as vectors, matrices can only store one class of object, i.e. numeric objects. In other words, you can not have both, string class elements and numeric class elements in the same matrix. To do this, we will use a different class of object called data frame
.
Data frames are used to store tabular data. A data frame is like an Excel spreadsheet kind of table. However, unlike matrices, data frames can store different classes of objects in different columns.
You can create a data frame using the data.frame()
function. In the example shown below, we combine both, string and numeric data in the same x
object.
<- data.frame(Student = c("Paco","Raul","Beto"), Grade = c(9, 5, 7))
x x
Student Grade
1 Paco 9
2 Raul 5
3 Beto 7
As we did before with matrices, you can also see the dimensions of the data frame using the dim()
function.
dim(x)
[1] 3 2
You can also create a data frame object by transforming an existing matrix via the function as.data.frame()
. In the example below, we first create a matrix called my.matrix
and then transform it to be my.df
. In the third line of code we check whether the new object is actually a data.frame
class object.
<- matrix(1:4, nrow = 2, ncol = 2)
my.matrix <- as.data.frame(my.matrix)
my.df class(my.df)
[1] "data.frame"
The rbind
or cbind
functions can be also used to append rows or columns to an existing data frame. In the example below I will add a new column to the data frame x
previously created, but first I will create a copy and call it grades.df
just to make it more descriptive.
# Make a copy
<- x
grades.df <- cbind(grades.df, c("Pass", "Fail", "Pass"))
grades.df grades.df
Student Grade c("Pass", "Fail", "Pass")
1 Paco 9 Pass
2 Raul 5 Fail
3 Beto 7 Pass
As we can see the first two columns have a name, but the third does not mean anything. Data frames, as any object in R
have attributes:
attributes(grades.df)
$names
[1] "Student" "Grade"
[3] "c(\"Pass\", \"Fail\", \"Pass\")"
$class
[1] "data.frame"
$row.names
[1] 1 2 3
In this case the data frame has 3 attributes: $names
, $row.names
and $class
. We can not only see those but manipulate them. The $names
attribute stores the names of the columns in your data frame, so if you do not like the names of your columns you can always change them using the colnames()
function as follows:
colnames(grades.df) <- c("Student","Grade","Status")
You see the changes you have made, you can always print out the attributes
of the data frame:
attributes(grades.df)
$names
[1] "Student" "Grade" "Status"
$class
[1] "data.frame"
$row.names
[1] 1 2 3
Or simply get the names of the columns or rows using the functions colnames()
or rownames()
.
colnames(grades.df)
[1] "Student" "Grade" "Status"
rownames(grades.df)
[1] "1" "2" "3"
Changing the name of one of the column is easy, but now imagine that we had 100 or 1,000 columns, writing all the column names again is not practical. So we need to learn how to modify only one, to do this we have to know how to pinpoint the element we want to modify, so again we use the squared brackets []
to point to the specific location of the column that will be renamed. Let’s say you want to modify the name of the third column Status
to be Pass-Fail
:
colnames(grades.df)[3] <- "Pass-Fail"
# See the changes
colnames(grades.df)
[1] "Student" "Grade" "Pass-Fail"
The list data structure will be covered in the following workshop.
CREATE A DATA-FRAME WITH THE OPEN AND CLOSE PRICES FOR MICROSOFT STOCK (MSFT) FOR THE LAST 2 MONTHS
Name the columns as OPEN and CLOSE for the prices, and name the rows with the names of the months. You have to end up with a data frame with 2 rows and 2 columns.
Hints:
You will receive an invitation by email to register in Datacamp.com. Just follow the directions to enroll in my Datacamp group. Datacamp is one of the most comprehensive online interactive courses related to Data Science. Datacamp has hundreds of data science premium courses. For this course we will have complete (and free!) access to all premium courses. This access will last for 6 MONTHS! So, you can keep learning when we finish the course.
You have to go to the Introduction to R for Finance course and view and DO all activities for 2 chapters: a) The Basics, and b) Vectors and Matrices. You will receive points for each activity you do in these chapters.
You have to submit your .html file of this workshop through Canvas BEFORE NEXT CLASS.
The grade of this Workshop will be the following:
Complete (100%): If you submit an ORIGINAL and COMPLETE HTML file with all the activities, with your notes, and with your OWN RESPONSES to questions
Incomplete (75%): If you submit an ORIGINAL HTML file with ALL the activities but you did NOT RESPOND to the questions and/or you did not do all activities and respond to some of the questions.
Very Incomplete (10%-70%): If you complete from 10% to 75% of the workshop or you completed more but parts of your work is a copy-paste from other workshops.
Not submitted (0%)