Introduction

In this class, we’ll familiarize ourselves with the R environment and basic functionality of R. My assumption is that you have successfully downloaded R and R Studio. In this lesson, we’ll cover:

  • The different R Studio windows
  • How to open and save an R script
  • How to set your working directory
  • How to annotate your script with comments
  • Objects
  • Functions
  • Packages
  • The global environment and how to save it

Looking at RStudio for the first time

Let’s start by opening R Studio. When we first open the program, you’ll see three “windows” or “panes” (Figure 1). Each one serves a different purpose.

Figure 1: Looking at R

Figure 1: Looking at R

The console (2.) is the like the R engine. Here, we can enter commands (code) and run the code. In a moment, when we start working in R scripts, you’ll see how the code from the script (which appears in the source pane (1.)) is reproduced and run in the console (2.). This is also where we can see if the code we’ve tried to run has been completed successfully OR if there there are errors (and associated error messages). In other words, the source pane (1.) shows our R script or R markdown document, while the console (2.) shows any code that we have executed or ran.

The global environment (3.) is where saved “objects” are displayed. As you can see, when we first open R Studio, nothing is here. This will change soon.

Finally, the plots window (4.) is where the visualizations we make will be displayed. As you can see, there are other tabs as well. You can always switch between tabs depending on what you need.

Take a minute to start familiarizing yourself with the environment. Click on the different tabs, look through the drop down menus, adjust the size of the panes to suit your preferences.

It’s important to note that the set up you see here, which I’ll be using throughout the course, is the R Studio default. If you’re feeling ambitious, you can rearrange the order of the panes, you can turn on “dark mode”, and host of other customizations. If you do this, you’ll just need to pay a bit of extra attention to what’s going on in the lessons (e.g. a saved data frame won’t appear in the same place as it would otherwise).

Going forward, I’ll often times say “R” as shorthand for “R Studio” - to be clear, we’ll ALWAYS be working in R studio.

R Scripts and R Markdown documents

One of the benefits of working in R is that we can work in a R Script, which allows us to keep track of the code we’ve written and share it with others. If you’ve ever worked in excel, you may have encountered an issue where you can’t remember exactly what you did to produce a plot because there’s no way of looking back at which tabs you to navigated to.

Technically, you could execute all of your code in the console and look at exactly what you did, but you wouldn’t be able to save it upon exiting RStudio. Instead, it is better to write your code in an “R Script”. In fact, the document you are viewing right now on your browser was originally a type of R Script that I then “published” online!

Opening R scripts or R markdown files

For simplicity sake, we will work with two types of R files - regular R scripts and R markdown files. If you navigate to File -> New File -> R Script this will open an R script. If you navigate to File -> New File -> R Markdown this will open an R Markdown document.

What is the difference between an R Script and an R Markdown file?

  • An R Script is the basic way to save your code in a file format so that it can be re-ran or shared with others.

  • When you are writing code in an R Script, you can just type your code and run it. If you want to have some “text” or “notes” about your code, you use the # symbol ahead of the typing to do so. This is called annotating your code - I discuss this in more detail below.

  • An R Markdown document is similar to an R Script except R Markdown allows you to save code and format text alongside your code in the same document. Think of an R Markdown as a word document but with the added ability to write and execute code.

    • You could write a paper in R Markdown like you would in Word, but you can also show the code that you ran to create plots etc. In order to write code that can be executed in an R script, you must include it in a code chunk. In Figure 2, the code chunks are outlined in red. To add a code chunk, navigate to the green c+ in the top right of the source pane (outlined in green in Figure 2). A drop-down menu will appear and you then you click “R” to insert a code chunk. This is shown in Figure 3.
    • R Markdown will be especially helpful for writing reports or papers, such as the Written Data Assignments that you will complete for this class. You can also use R Markdown to publish your code online.
Figure 2: Code Chunks

Figure 2: Code Chunks

Figure 3: Add a Code Chunk when working in R Markdown (not needed for R Scripts)

Figure 3: Add a Code Chunk when working in R Markdown (not needed for R Scripts)

Saving an R Script or R Markdown file

If you want to return to your code, you can save an R Script and an R Markdown by navigating to File -> Save As… - similar to how you would save a Word document, for instance.

WARNING: Saving the R Script or R Markdown file does not save the “global environment” (all of the objects in our environment - window 3. in Figure 1). In order to save all of the objects we’ve created while working in our R script, we will need to save the global environment separately. I will discuss this in more detail below.

Let’s go ahead and try to open a new RScript and save it. It is best practice to create a new folder for each new lecture or assignment for this course. This will make more sense once we discuss the working directory.

The working directory

When working in R, it is best practice to set your Working Directory (denoted by the acronymn wd). The working directory is the place where R will default to when trying to open a new file, such as a dataset that you have saved on your computer. Every time you start a new assignment or project that will involve R, it is best practice to create a new folder on your computer. For instance, you may have a folder called “POL3325G” for this class, and then a folder for “Class 2” where you would save today’s lecture notes and R scripts.

How do you set the working directory?

There are two ways to set the working directory (both do the same thing).

  1. The first way is to navigate to the toolbar tab called Session -> Set Working Directory -> Choose Directory… This will allow you to navigate to the folder that you’d like to set as your working directory for that particular R session. You can change it at any time if needed, but typically you would set your working directory once and then store all relevant files in this directory.

  2. The second way to set the working directory is by using the setwd()code.

# To setwd on a Mac: 
setwd("/Users/shanayavanhooren/Documents/Teaching/POL3325G/Lectures/Lecture 2")  

# notice how the "pathway" in the wd is in quotation marks 
# If you are working on a Mac, you can find the pathway to a folder by opening the 
# "Finder" application on your computer, navigating to the folder, 
# right click the relevant folder and click "Get Info" and 
# then copy the pathway (located beside "where") and paste it into the setwd with "" around it.


# To setwd on a PC: 
setwd("C:/Users/Shanaya/Dropbox/School/POL3325G/Lecture 2")

(You’ll notice that when you set the working directory using the first method that in your console a line of code executed. When we set the working directory using a line of code, we are skipping this extra step of navigating to the pane and we are just writing the code ourselves. Again, this shows that these two different ways of setting the working directory do the same thing!)

Removing objects from the working directory

rm(list = ls()) # remove ALL objects from the working directory
#rm(nameofobject) # remove an object called nameofobject from the wd 
# below we will discuss objects 

What is an object?

Objects are a way to store data in a specific structure. R is an object-oriented system, meaning that we save the output of our code as “objects”, which we can then see in the Global Environment (the top right panel in R Studio).

Let’s look at some examples to get a better handle on what an object is.

First, in order to save something to an object, we use the <- operator. Think of this left-pointing error as “save as”. Let’s start by creating an object which we will call x. Inside the object x, we store the number 5.

We type the following code into our R script, and then we either highlight the whole line and hit Ctrl + Enter or manually click “Run” at the top of the script.

x <- 5

If we now look to our global environment (the top right window), we see an object, named “x”, that has a value of 5.

Now let’s create an object called “hazel” and assign it the word “dog”.

hazel <- dog 

When we tried to store “dog” into an object called “hazel”, we get an error. Why is that? The error message tells us that R thinks the word “dog” is already an object stored in our environment (top right panel), but if we look at our global environment (top right panel), we know that it doesn’t exist as an object.

We need to put “dog” in quotation marks because we want R to know that we want this combination of letters (character string) to be assigned to an object.

hazel <- "dog" 

Now you’ll notice that there is an object called hazel in the global environment. If we run hazel, it will return the word dog. We can also use the function print() to look at the object.

hazel #option 1 to look at the object 
## [1] "dog"
print(hazel) #option 2 to look at the object 
## [1] "dog"
# both return "dog" 

That’s the very basics of objects! Let’s make it a bit more complicated. Let’s save a new object with multiple values (or elements). An object with a collection of elements is called a vector. Think of it in terms of lego. A single lego block is an element. We can stack lego blocks together; this is a vector.

Here, we create a new object called “animals”. Stored inside of the object “animals” is a list of different types of animals.

In order to combine elements together, we use the c() function (technically this stand for concatenate, but it’s easier to just think of it as combine).

animals <- c("dog", "cat", "fish", "rabbit")

fish <- c(55, 43, 60) 

cat <- c(7, "small")

Let’s look at the three objects we’ve created. Do you notice anything off about any of the objects?

animals
## [1] "dog"    "cat"    "fish"   "rabbit"
fish
## [1] 55 43 60
cat
## [1] "7"     "small"
  • R will only let you combine elements of the same type into a vector.
    • Fish combines three numbers. Animals combines four words. Cat, however, tries to combine a word AND and a number. To compensate for our mistake, R converts the number 7 into a word. We can see this in the printed results by looking at the use of quotation marks. There are no quotation marks around the numbers 55, 43, and 60 in the fish output. By comparison, there are quotation marks around the number 7 in the snake output. This matters when we try to do other things with that element/vector. For example, you can take the mean (average) of a vector of numbers; you can’t take the mean (average) of a vector of words.

Saving objects so they can be re-opened at a later date

If we want to save a specific object from the global environment, we can do so by using save()

save(fish, file="fish.RData")

The object is saved as a RData file on your computer.

Where do you think you’ll find it? In your working directory of course! This is another reason why it is so important to set the working directory at top of your R script or Markdown document.

Object naming best practices

It is best practice to keep the names of objects relatively short and informative. You cannot use a number at the start of an object name,

object_number_two <- "two" # this is waaaay too long 
2_object <- "two" # won't work! 
object_2 <- "two" # this works
obj_2 <- "two" # even better

What is a function?

Functions are the workhorse of R.

functions are commands; they contain (a little or lot of) code that does something specific, within a pretty wrapper.

As a running example, we’ll use an imaginary function called “MakeSandwich”. This function has one job: make a peanut butter and jam sandwich. The parts of this function are displayed in Figure 6.

Figure 4: The MakeSandwich Function

Figure 4: The MakeSandwich Function

The function name, “MakeSandwich”, is highlighted in blue. Within the parentheses, we can provide the instructions for how to make the sandwich. Each unique instruction is highlighted in orange; the technical term for a unique instruction is an argument. This function contains four arguments: Food, Bread, PB, Jam. Each argument can receive specific inputs, which correspond with specific outputs. For example, the PB argument can take an input of either 1 or 2, which corresponds with Smooth or Crunchy bread. The example from Figure 4 would create a sandwich with the following inputs:

  • Food from the kitchen
  • Whole wheat bread
  • Crunchy peanut butter
  • Raspberry jam

The first argument (Food) tells R where to find all of the food, and the remaining arguments let us customize the sandwich.

Let’s start with an easy function: mean(). As you might guess, this calculates the mean. So let’s calculate the mean of an object we created previously, “fish”, which is a vector of three elements (55, 43, and 60).

mean(fish)
## [1] 52.66667

Good! Maybe we want to save the mean of “fish” for later use, so let’s save it as an object called “fishmean”.

fishmean <- mean(fish)

# print results to check
fishmean
## [1] 52.66667

As discussed above, functions are controlled using arguments. Let’s use a different function to illustrate: round(). The round() function takes two arguments: the number we want to round, and how many digits we want to round by. Let’s round the number 4.232 to 2 digits.

round(x=4.232, digits = 2)
## [1] 4.23

R functions can be pretty smart; for many functions, they can interpret which instructions belong with which argument by the order you specify them. For example:

# this works
round(4.232, 2)
## [1] 4.23
# this fails
round(2, 4.232)
## [1] 2

R functions also often have default instructions. In other words, if you don’t provide explicit instructions for one of the function’s arguments, it proceeds with default instructions. Thinking back to our imaginary “MakeSandwich” function, it might be the case that we set the default for each argument to be equal to 1. In other words, if we don’t tell R what type of jam we want, it assumes we want raspberry (1). We can see that with the round() function, since the default value for the “digits” argument is 0.

round(4.232)
## [1] 4

What is a package?

In the bottom right corner window, you will notice a tab called “Packages”. We can think of a package as a container for functions and/or data. Packages help extend the usability of R. There are thousands of packages created by individuals in the R community, such as the R Core team, researchers, data scientists etc. Typically, a package is composed of a collection of functions and/or data that are geared towards a particular type of analysis or set of tasks. Sometimes, however, researchers create a package to simply house all of the functions that they’ve written for various tasks in R. Some packages are much broader and expansive than others.

In order to use the functions inside of a package, we need to install the package, in most cases. In some cases, a package comes “pre-installed” as part of RStudio. These packages are apart of “Base R”. However, for those packages that are not apart of Base R (which is most packages), we need to install the package first.

Let’s look at an example of a package that is pre-installed with RStudio.

Example of a package: Parallel

The parallel package is composed of only six functions that help researchers conduct parallel computing, which is basically a way to speed up the amount of time it takes for R to complete complicated tasks (see Figure 5). This package has a clearly defined purpose. An easy way to learn about packages is to google them.

Figure 5: Parallel Package Information

Figure 5: Parallel Package Information

Best practices: It is best practice to install and/or load packages at the top of your RScript or RMarkdown. This just helps keep everything organized. Even if you start working on something and realize partway through coding that you require another package, we typically return to the top of the script and make space to install/load the package.

library(parallel) # here, we load the package. 
# Remember: you will need to re-load your packages that you'd like to use each time you re-open R Studio.

Some packages are apart of what we call “Base R” meaning they come essentially pre-installed in RStudio. To use them, we only need to load the package. This is the case for the parallel package: if we were to try to install the package, we received the following error message: Warning in install.packages : package ‘parallel’ is a base package, and should not be updated.

Example of a package: Tidyverse

Most packages that you’ll be loading and working with are NOT part of Base R. One such package is the tidyverse package, which is actually a package that is a collection of packages (confusing right?). All this means is that when we load tidyverse, we are loading all of the packages contained inside of tidyverse, such as dplyr, tidyr, ggplot2 etc. Don’t worry too much about this - all you need to remember is that tidyverse is a useful package for Data Science as it contains functions that help us wrangle and plot data - the focus of our course.

In order to use the functions inside of the tidyverse package, we need to install the package. Once the package has been installed into our RStudio, we can load the package at the start of each R session. In other words, you will “install” a package typically once in your RStudio’s lifetime, and after that, you will simply load it each time you re-open RStudio.

install.packages("tidyverse") # here, we install the package.
library(tidyverse) # here, we load the package. You will need to re-load your packages that you'd like to use each time you re-open R Studio.

Annotating your code

It is best practice to annotate your code in R. Annotating code refers to writing comments as you code to better help yourself and others understand what you’re doing.

We can use the # symbol directly embedded in our code in order to tell R that the text that follows the # is not meant to executed (or run). In other words, we’re telling R that anything we write that follows # is our own notes and it is not a command for the computer.

# Below, I use the # symbol to annotate my code and remind myself what each line of code does. 

# Create an object called check
check <- "dog" # save the word "dog" as an object called check 
print(check) # print check to make sure that it shows "dog" 
## [1] "dog"

Saving the global environment

In order to be able to re-open an R Script or R Markdown document and pick up exactly where we left off, we need to save both the R Script/R Markdown AS WELL AS the global environment. Remember, the global environment is where saved “objects” are displayed. If we want to be able to return to a script while keeping all of the objects we’ve saved, we need to remember to save the global environment.

How do we save our global environment?

First, make sure that your working directory is set to the folder where your script is saved.

Second, use save.image(). The save.image() function will save all of the objects in your global environment. You will need to provide a name for the file using the file = argument and then specifying the file name in quotation marks. Be sure that the file name has .RData at the end of it, which specifies the file format. If you fail to include it, R will not store your global environment properly and the file will not open.

save.image(file = "global-environment-Lecture2.RData") # REMEMBER: put .RData at the end of the file name

Note: another option is to use “Projects” to save your scripts and the associated global environment. For the purpose of this course, we will not be using projects.

Important steps when working in a script/markdown documents:

Generally, you would follow these steps when you start working on a new script in R:

  1. Open R Studio on your computer.

  2. Open a new R script or R Markdown document where you will write your code (e.g. File -> New File -> R Markdown).

  3. Set your working directory to a new folder for this particular project or lecture. Ensure all of the data you’re working with for this particular script is also saved in this same folder (e.g. Session -> Set Working Directory -> Choose Directory… ).

  4. Load the packages that house the functions that you’ll be using in the script. (Remember, it’s okay if you forget to load a package or two, you can return to the top of the script later on to load them.)

  5. Do your work! Write your code, execute it, comment your code etc.

  6. When you’re finished working on your script, save it. (e.g. File -> Save As…) (You might also consider saving your script as you go so if R crashes or gets closed accidentially, you’ll have the last saved version of the script.)

  7. Before closing RStudio, save the global environment so that when you re-open the script, you can also re-open the global environment and have all of the previously saved objects, dataframes etc. as they appeared the last time you were working on this script/markdown document.

Some Helpful RStudio options

  • Highlight R function calls (Tools > Global Options > Code > Display)
  • Rainbow parentheses (Tools > Global Options > Code > Display)
  • Using a dark theme can help reduce eye strain (Tools > Global Options > Appearance > Editor Theme)

Exercises:

  • Open a new R Script. Save it as “Exercises-Lecture2”. It may be helpful to put it into its own folder/subfolder for this week (follow the steps above to set the working directory to this folder).
  • Create an object called “country” and list the names of five different countries.
  • Create an object called “age” and list ten ages. If you save ‘age’, are you able to re-open in RStudio? If not, why didn’t it work (hint check that you’ve set the working directory properly, check that you listed the name of the object correctly with the “” around it and .RData at the end)
  • Try changing some of the RStudio display options (e.g., select a dark theme or enable rainbow parentheses). Find what works best for you.