Reminder:
To begin, we’ll go through the basics of R syntax by starting with R as a fancy calculator. R is what is know as an object-oriented programming language, which means that all data, variables, models, output, etc. are stored as objects. so when we read in a dataset or create a new variable, it is store in the R session as an object: this should become clearer as we go through some examples.
We want to use R for more than arithmetic, but it is an easy way to introduce R syntax. Each line in an R script is considered to be a new command, unless we link it together using something like a comma; we will work more with this later on.
Let’s go through some arithmetic examples.
# addition
# subtraction
# multiplication
# division
For each command line, R gave us the numerical answer. To demonstrate how R uses objects, let’s consider this example:
# create object a = 5+5
What we did was store the operation 5 + 5 to the object ‘a’ by using the assignment operator <-. Note that when we created the object ‘a’, R didn’t give us the answer to 5 + 5 and instead we needed to include another line with just the object name ‘a’. (FYI, R syntax is case-sensitive where lower-case or upper-case matters.)
It is also possible to do arithmetic on objects, depending on where it makes sense to do so, For example, let’s create a new object called ‘b’ that is composed of the operation 5 * 5 and some arithmetic with the object ‘a’.
# create object b = 5*5
# a + b
# a * b
# a/b
We can also create a new object from other objects.
# create object c = a+b
# create object d = a*b
Let’s spend a bit of time looking at vectors, matrices, and data frames, which are all considered different classes of objects in R. Vectors, matrices, and data frames make up datasets in R. A vector is a one-dimensional collection of information or data, usually numbers, stored in some specific order. A matrix, is a combination of columns an rows with numeric values, so, where a vector was a row or column, a matrix contains at least one row and column. Finally, a data frame is a matrix that R treats as a dataset. We can think of a data frame as a spreadsheet, where all the columns have an equal number of rows. Further, we can refer to the columns in our data frame as variables. We will discuss variables more later, but generally a variable is a collection of elements which assigns an alpha-numerical value to each observation (each row in a data frame).
Let’s look at a simple example for creating vectors, matrices and data frames. First, let’s create a vector called v1 composed of five numbers, using the c() function, the function c() stands for concatenate, which tells R to glue or paste numbers, values, or objects, to create a new object.
Now, let’s create a new vector called v2 and combine it with v1.
*Note, the difference between combining two vectors and adding them together like we will in v4.
We can turn the vector v3 into a data frame by using the as.data.frame() function and create a new object that we will call df1.
Now, we see that R has converted v3 into a dataframe where there are 10 observations with their corresponding values. In this data frame we can consider v3 to be a variable that has 10 observations.
While we’re here, let’s spend some time using some common used commands to explore dataframe v3.
When familiarizing ourselves with the data, we typically want to start out by checking basic parameters such as dimensions, row names, column names, first (and/or last) few rows, and summary stats. You’ll want to get to know these functions.
# for finding dataframe dimensions
# check first and last rows, default is 6 lines
# to check number of rows beyond the default use a comma followed by the number of rows you wish to see
# to see all operations in the global environment
# to remove an object, in this case, we'll remove v4
# it looks like it's gone, but let's double check
# View shows entire set spreadsheet format in new tab. Note the use of a capital V here (don't ask me why)
# finding summary stats