In this class, we’ll familiarize ourselves with the R environment and basic functionality of R. My assumption is that you have successfully downloaded R and R Studio. In this lesson, we’ll cover:
Let’s start by opening R Studio. When we first open the program, you’ll see three “windows” or “panes” (Figure 1). Each one serves a different purpose.
Figure 1: Looking at R
The console (2.) is the like the R engine. Here, we can enter commands (code) and run the code. In a moment, when we start working in R scripts, you’ll see how the code from the script (which appears in the source pane (1.)) is reproduced and run in the console (2.). This is also where we can see if the code we’ve tried to run has been completed successfully OR if there there are errors (and associated error messages). In other words, the source pane (1.) shows our R script or R markdown document, while the console (2.) shows any code that we have executed or ran.
The global environment (3.) is where saved “objects” are displayed. As you can see, when we first open R Studio, nothing is here. This will change soon.
Finally, the plots window (4.) is where the visualizations we make will be displayed. As you can see, there are other tabs as well. You can always switch between tabs depending on what you need.
Take a minute to start familiarizing yourself with the environment. Click on the different tabs, look through the drop down menus, adjust the size of the panes to suit your preferences.
It’s important to note that the set up you see here, which I’ll be using throughout the course, is the R Studio default. If you’re feeling ambitious, you can rearrange the order of the panes, you can turn on “dark mode”, and host of other customizations. If you do this, you’ll just need to pay a bit of extra attention to what’s going on in the lessons (e.g. a saved data frame won’t appear in the same place as it would otherwise).
Going forward, I’ll often times say “R” as shorthand for “R Studio” - to be clear, we’ll ALWAYS be working in R studio.
One of the benefits of working in R is that we can work in a R Script, which allows us to keep track of the code we’ve written and share it with others. If you’ve ever worked in excel, you may have encountered an issue where you can’t remember exactly what you did to produce a plot because there’s no way of looking back at which tabs you to navigated to.
Technically, you could execute all of your code in the console and look at exactly what you did, but you wouldn’t be able to save it upon exiting RStudio. Instead, it is better to write your code in an “R Script”. In fact, the document you are viewing right now on your browser was originally a type of R Script that I then “published” online!
For simplicity sake, we will work with two types of R files - regular R scripts and R markdown files. If you navigate to File -> New File -> R Script this will open an R script. If you navigate to File -> New File -> R Markdown this will open an R Markdown document.
An R Script is the basic way to save your code in a file format so that it can be re-ran or shared with others.
When you are writing code in an R Script, you can just type your code and run it. If you want to have some “text” or “notes” about your code, you use the # symbol ahead of the typing to do so. This is called annotating your code - I discuss this in more detail below.
An R Markdown document is similar to an R Script except R Markdown allows you to save code and format text alongside your code in the same document. Think of an R Markdown as a word document but with the added ability to write and execute code.
Figure 2: Code Chunks
Figure 3: Add a Code Chunk when working in R Markdown (not needed for R Scripts)
If you want to return to your code, you can save an R Script and an R Markdown by navigating to File -> Save As… - similar to how you would save a Word document, for instance.
WARNING: Saving the R Script or R Markdown file does not save the “global environment” (all of the objects in our environment - window 3. in Figure 1). In order to save all of the objects we’ve created while working in our R script, we will need to save the global environment separately. I will discuss this in more detail below.
Let’s go ahead and try to open a new RScript and save it. It is best practice to create a new folder for each new lecture or assignment for this course. This will make more sense once we discuss the working directory.
When working in R, it is best practice to set your Working Directory (denoted by the acronymn wd). The working directory is the place where R will default to when trying to open a new file, such as a dataset that you have saved on your computer. Every time you start a new assignment or project that will involve R, it is best practice to create a new folder on your computer. For instance, you may have a folder called “POL3325G” for this class, and then a folder for “Class 2” where you would save today’s lecture notes and R scripts.
There are two ways to set the working directory (both do the same thing).
The first way is to navigate to the toolbar tab called Session -> Set Working Directory -> Choose Directory… This will allow you to navigate to the folder that you’d like to set as your working directory for that particular R session. You can change it at any time if needed, but typically you would set your working directory once and then store all relevant files in this directory.
The second way to set the working directory is by using the
setwd()
code.
# To setwd on a Mac:
setwd("/Users/shanayavanhooren/Documents/Teaching/POL3325G/Lectures/Lecture 2")
# notice how the "pathway" in the wd is in quotation marks
# If you are working on a Mac, you can find the pathway to a folder by opening the
# "Finder" application on your computer, navigating to the folder,
# right click the relevant folder and click "Get Info" and
# then copy the pathway (located beside "where") and paste it into the setwd with "" around it.
# To setwd on a PC:
setwd("C:/Users/Shanaya/Dropbox/School/POL3325G/Lecture 2")
(You’ll notice that when you set the working directory using the first method that in your console a line of code executed. When we set the working directory using a line of code, we are skipping this extra step of navigating to the pane and we are just writing the code ourselves. Again, this shows that these two different ways of setting the working directory do the same thing!)
rm(list = ls()) # remove ALL objects from the working directory
#rm(nameofobject) # remove an object called nameofobject from the wd
# below we will discuss objects
Objects are a way to store data in a specific structure. R is an object-oriented system, meaning that we save the output of our code as “objects”, which we can then see in the Global Environment (the top right panel in R Studio).
Let’s look at some examples to get a better handle on what an object is.
First, in order to save something to an object, we use the
<-
operator. Think of this left-pointing error as “save
as”. Let’s start by creating an object which we will call x. Inside the
object x, we store the number 5.
We type the following code into our R script, and then we either
highlight the whole line and hit Ctrl + Enter
or manually
click “Run” at the top of the script.
x <- 5
If we now look to our global environment (the top right window), we see an object, named “x”, that has a value of 5.
Now let’s create an object called “hazel” and assign it the word “dog”.
hazel <- dog
When we tried to store “dog” into an object called “hazel”, we get an error. Why is that? The error message tells us that R thinks the word “dog” is already an object stored in our environment (top right panel), but if we look at our global environment (top right panel), we know that it doesn’t exist as an object.
We need to put “dog” in quotation marks because we want R to know that we want this combination of letters (character string) to be assigned to an object.
hazel <- "dog"
Now you’ll notice that there is an object called hazel
in the global environment. If we run hazel
, it will return
the word dog. We can also use the function print()
to look
at the object.
hazel #option 1 to look at the object
## [1] "dog"
print(hazel) #option 2 to look at the object
## [1] "dog"
# both return "dog"
That’s the very basics of objects! Let’s make it a bit more complicated. Let’s save a new object with multiple values (or elements). An object with a collection of elements is called a vector. Think of it in terms of lego. A single lego block is an element. We can stack lego blocks together; this is a vector.
Here, we create a new object called “animals”. Stored inside of the object “animals” is a list of different types of animals.
In order to combine elements together, we use the c()
function (technically this stand for concatenate, but it’s easier to
just think of it as combine).
animals <- c("dog", "cat", "fish", "rabbit")
fish <- c(55, 43, 60)
cat <- c(7, "small")
Let’s look at the three objects we’ve created. Do you notice anything off about any of the objects?
animals
## [1] "dog" "cat" "fish" "rabbit"
fish
## [1] 55 43 60
cat
## [1] "7" "small"
If we want to save a specific object from the global environment, we
can do so by using save()
save(fish, file="fish.RData")
The object is saved as a RData file on your computer.
Where do you think you’ll find it? In your working directory of course! This is another reason why it is so important to set the working directory at top of your R script or Markdown document.
It is best practice to keep the names of objects relatively short and informative. You cannot use a number at the start of an object name,
object_number_two <- "two" # this is waaaay too long
2_object <- "two" # won't work!
object_2 <- "two" # this works
obj_2 <- "two" # even better
Functions are the workhorse of R.
functions are commands; they contain (a little or lot of) code that does something specific, within a pretty wrapper.
As a running example, we’ll use an imaginary function called “MakeSandwich”. This function has one job: make a peanut butter and jam sandwich. The parts of this function are displayed in Figure 6.
Figure 4: The MakeSandwich Function
The function name, “MakeSandwich”, is highlighted in blue. Within the parentheses, we can provide the instructions for how to make the sandwich. Each unique instruction is highlighted in orange; the technical term for a unique instruction is an argument. This function contains four arguments: Food, Bread, PB, Jam. Each argument can receive specific inputs, which correspond with specific outputs. For example, the PB argument can take an input of either 1 or 2, which corresponds with Smooth or Crunchy bread. The example from Figure 4 would create a sandwich with the following inputs:
The first argument (Food) tells R where to find all of the food, and the remaining arguments let us customize the sandwich.
Let’s start with an easy function: mean()
. As you might
guess, this calculates the mean. So let’s calculate the mean of an
object we created previously, “fish”, which is a vector of three
elements (55, 43, and 60).
mean(fish)
## [1] 52.66667
Good! Maybe we want to save the mean of “fish” for later use, so let’s save it as an object called “fishmean”.
fishmean <- mean(fish)
# print results to check
fishmean
## [1] 52.66667
As discussed above, functions are controlled using
arguments. Let’s use a different function to
illustrate: round()
. The round()
function
takes two arguments: the number we want to round, and how many digits we
want to round by. Let’s round the number 4.232 to 2 digits.
round(x=4.232, digits = 2)
## [1] 4.23
R functions can be pretty smart; for many functions, they can interpret which instructions belong with which argument by the order you specify them. For example:
# this works
round(4.232, 2)
## [1] 4.23
# this fails
round(2, 4.232)
## [1] 2
R functions also often have default instructions. In other words, if
you don’t provide explicit instructions for one of the function’s
arguments, it proceeds with default instructions. Thinking back to our
imaginary “MakeSandwich” function, it might be the case that we set the
default for each argument to be equal to 1. In other words, if we don’t
tell R what type of jam we want, it assumes we want raspberry (1). We
can see that with the round()
function, since the default
value for the “digits” argument is 0.
round(4.232)
## [1] 4
In the bottom right corner window, you will notice a tab called “Packages”. We can think of a package as a container for functions and/or data. Packages help extend the usability of R. There are thousands of packages created by individuals in the R community, such as the R Core team, researchers, data scientists etc. Typically, a package is composed of a collection of functions and/or data that are geared towards a particular type of analysis or set of tasks. Sometimes, however, researchers create a package to simply house all of the functions that they’ve written for various tasks in R. Some packages are much broader and expansive than others.
In order to use the functions inside of a package, we need to install the package, in most cases. In some cases, a package comes “pre-installed” as part of RStudio. These packages are apart of “Base R”. However, for those packages that are not apart of Base R (which is most packages), we need to install the package first.
Let’s look at an example of a package that is pre-installed with RStudio.
The parallel
package is composed of only six functions
that help researchers conduct parallel computing, which is basically a
way to speed up the amount of time it takes for R to complete
complicated tasks (see Figure 5). This package has a clearly
defined purpose. An easy way to learn about packages is to google
them.
Figure 5: Parallel Package Information
Best practices: It is best practice to install and/or load packages at the top of your RScript or RMarkdown. This just helps keep everything organized. Even if you start working on something and realize partway through coding that you require another package, we typically return to the top of the script and make space to install/load the package.
library(parallel) # here, we load the package.
# Remember: you will need to re-load your packages that you'd like to use each time you re-open R Studio.
Some packages are apart of what we call “Base R” meaning they come
essentially pre-installed in RStudio. To use them, we only need to load
the package. This is the case for the parallel
package: if
we were to try to install the package, we received the following error
message: Warning in install.packages : package ‘parallel’ is a base
package, and should not be updated.
Most packages that you’ll be loading and working with are NOT part of
Base R. One such package is the tidyverse
package, which is
actually a package that is a collection of packages (confusing right?).
All this means is that when we load tidyverse
, we are
loading all of the packages contained inside of tidyverse
,
such as dplyr
, tidyr
, ggplot2
etc. Don’t worry too much about this - all you need to remember is that
tidyverse
is a useful package for Data Science as it
contains functions that help us wrangle and plot data - the focus of our
course.
In order to use the functions inside of the tidyverse
package, we need to install the package. Once the package has been
installed into our RStudio, we can load the package at the start of each
R session. In other words, you will “install” a package typically once
in your RStudio’s lifetime, and after that, you will simply load it each
time you re-open RStudio.
install.packages("tidyverse") # here, we install the package.
library(tidyverse) # here, we load the package. You will need to re-load your packages that you'd like to use each time you re-open R Studio.
It is best practice to annotate your code in R. Annotating code refers to writing comments as you code to better help yourself and others understand what you’re doing.
We can use the #
symbol directly embedded in our code in
order to tell R that the text that follows the #
is not
meant to executed (or run). In other words, we’re telling R that
anything we write that follows #
is our own notes and it is
not a command for the computer.
# Below, I use the # symbol to annotate my code and remind myself what each line of code does.
# Create an object called check
check <- "dog" # save the word "dog" as an object called check
print(check) # print check to make sure that it shows "dog"
## [1] "dog"
In order to be able to re-open an R Script or R Markdown document and pick up exactly where we left off, we need to save both the R Script/R Markdown AS WELL AS the global environment. Remember, the global environment is where saved “objects” are displayed. If we want to be able to return to a script while keeping all of the objects we’ve saved, we need to remember to save the global environment.
First, make sure that your working directory is set to the folder where your script is saved.
Second, use save.image()
. The save.image() function will
save all of the objects in your global environment. You will need to
provide a name for the file using the file = argument and then
specifying the file name in quotation marks. Be sure that the file name
has .RData at the end of it, which specifies the file format.
If you fail to include it, R will not store your global environment
properly and the file will not open.
save.image(file = "global-environment-Lecture2.RData") # REMEMBER: put .RData at the end of the file name
Note: another option is to use “Projects” to save your scripts and the associated global environment. For the purpose of this course, we will not be using projects.
Generally, you would follow these steps when you start working on a new script in R:
Open R Studio on your computer.
Open a new R script or R Markdown document where you will write your code (e.g. File -> New File -> R Markdown).
Set your working directory to a new folder for this particular project or lecture. Ensure all of the data you’re working with for this particular script is also saved in this same folder (e.g. Session -> Set Working Directory -> Choose Directory… ).
Load the packages that house the functions that you’ll be using in the script. (Remember, it’s okay if you forget to load a package or two, you can return to the top of the script later on to load them.)
Do your work! Write your code, execute it, comment your code etc.
When you’re finished working on your script, save it. (e.g. File -> Save As…) (You might also consider saving your script as you go so if R crashes or gets closed accidentially, you’ll have the last saved version of the script.)
Before closing RStudio, save the global environment so that when you re-open the script, you can also re-open the global environment and have all of the previously saved objects, dataframes etc. as they appeared the last time you were working on this script/markdown document.