Learning Objectives:
R is essentially another option for computing statistics. It is (initially) less intuitive to use compared to SPSS and Excel, but it has several advantages that make it a useful tool. Over the next few months, you will most likely consistently ask yourself: “Why am I struggling with this instead of just using SPSS or Excel?” That’s totally normal, and you just have to remind yourself of the two major advantages of R:
Workflow in R, image from https://r4ds.had.co.nz/explore-intro.html
R is essentially a one-stop-shop for all your research needs: from data cleaning to analysis to visualization, everything can be done in one place.
To get things started, you will need to download R as well as RStudio:
Downloading R:
(Links last updated: March 11, 2026)
Now that we have RStudio installed, let’s take a look at the different parts of the interface. First, let’s explore what R can do by looking at the console. The console is where you give ‘commands’ to R, ranging anywhere from simple math questions to several lines of code.
To illustrate, let’s ask R to solve a math questions that’s literally impossible for an average human to compute:
9+10
## [1] 19
Here are some common operations and how to enter them:
| operation | operator/symbol | example input | example output |
|---|---|---|---|
| addition | + | 2 + 2 | 4 |
| subtraction | - | 5- 1 | 4 |
| multiplication | * | 2 * 2 | 4 |
| division | / | 8 / 2 | 4 |
| power | ^ | 2 ^ 2 | 4 |
One thing to note here is that R doesn’t care about spacing: 1+1 and 1 + 1 is treated the same way, so you can organize your eventual code according to how you like it visually.
Exercise 1
Try entering and running some operations in the console.
Using R like a calculator is fun, but we can also store our answers as temporary variables in our workspace using the <- operator. Try typing the following in your console:
answer <- 3 + 5
Notice now it doesn’t give you the answer as an output in the console, but it stores the answer as an object called “answer” in your workspace (or your Environment). Now, type in answer in your console to recall it:
answer
## [1] 8
This will be especially useful when you have to do more complex things like applying functions to larger data sets. Notice that you have to type in the variable name every single time you use it, so try to avoid long but more informative variable names like “Number of times baby chose blue”. Instead, here are some commons ways people standardize variables:
General rules to follow when naming variables: avoid capitalization and spaces. Since R uses white space as a separator, anything separated by white space is treated as two objects. This will get annoying for you later, so best to avoid it altogether.
R would not be a reproducible solution if we are re-entering code every time. Instead, we want to create a workflow such that anyone can reproduce your analysis pipeline on their own computer. To do this, we need to organize our data and code in a way that makes sense.
The first thing we will do is create an R Project.
Creating an R Project
Your R project is now created, and you can access it at any time by going to where you saved the project and opening the file.
Next, we want to organize our project in a way that is both intuitive
and accessible with code. Take a look at your Files pane.
You can use this pane to navigate your project folder, including making
new folders and opening scripts. To start, click on “New Folder” and
create two new folders: “data” and “scripts”.
Go into your “scripts” folder. Now, click on the “File” button to create a new R script. Let’s call this “workshop_1.R”.
R scripts is how we can store code for later use instead of retyping it every time. Let’s try it out - Write out the following line of code in your R script:
sum <- 5 + 6 + 7
To run this line of code, either highlight the entire code that you want to run, or simply have your cursor at the line you want to run, then hit the “Run” button (or Ctrl + Enter on Windows; Cmd + Enter on Mac). This should do the exact same thing as typing this line of code directly in your console.
Exercise 2
Pretend you ran 5 babies in a looking time study, and their looking times on the test trial are 5, 7, 15, 21, and 25 seconds. Create a variable to store the sum of their looking time, a variable that represents the number of babies you ran. Finally, using those two variable, create a third variable that represents the mean looking time for your study.
Now we know how to store variables, we can start doing cool things to them using functions. Functions are essentially code that someone wrote already to be applied to a given set of input, and all we have to do is provide the correct input. These inputs are called “arguments”.
Some functions are very simple, like nchar() which just
counts the number of characters in a given object:
nchar("Francis")
## [1] 7
Some can do useful statistical computations, like sd()
which calculates the standard deviation of a set of numbers. Generally,
functions are used using the format:
function(argument1, argument2, argument3, etc…)
Every function has its own unique set of arguments, and they are very
well documented. As an example, let’s take a look at the function
round(). First, type in ?round() in your
console, which will give you the explanation of the function. Scroll
down to the ‘arguments’ section, and notice that it takes two main
arguments: x and digits. That means you have to specificy x (what
number/vector you want rounded) and digits (how many digits to round
to). Now that we know what arguments this function takes, try to round
3.14159 to 3 digits:
round(x = 3.14159, digits = 3)
## [1] 3.142
But what if we don’t specify and just try to type it in?
round(3.14159, 3)
## [1] 3.142
Notice that it still works because the things we put in for the arguments are in the correct order. However, if we do this:
round(2, 3.14159)
## [1] 2
It doesn’t work the same way. Generally, it is highly recommended to name your arguments even if you know you are doing it in the correct order because:
People are constantly building new functions and combining them into
collections called ‘packages’. These packages are free for you to use,
instead of being limited by whatever your program offers (like Excel or
SPSS) and having to pay subscription. The function you need to use to
install packages is install.packages("package_name"). To
start, let’s install three of my most commonly used packages:
“tidyverse”, “here”, and “janitor”. Run these lines in your console
one-by-one (we don’t need to save these in the R script since this is a
one-time installation:
install.packages("tidyverse")install.packages("here")install.packages("janitor")Before you can use any of your installed packages, you have to first
‘load’ them by adding them to your current working library using the
library() function. Note that you will need to do this
whenever you start a new R session, so this part is often, if not
always, the first few lines of an R script:
library(tidyverse)
library(here)
library(janitor)
Now that we have our packages loaded, we are ready to read in some
data. The vast majority of datasets you will work with will be in .csv,
or comma separated values format (sometimes called comma delimited). The
function we will use is read_csv(). Though it can
technically take many arguments, we will only worry about one: the
dataset name. For instance, the code to read in a dataset called
“sample_data.csv”, and store it as an object called “workshop_data”,
will be as follows:
workshop_data <- read_csv("sample_data.csv")
But you might have noticed a problem: How does R know where your data is located? The traditional way of helping R find your data is to specify your R session’s working directory. You can do so in two ways:
Go to Session -> Set Working Directory -> Choose Directory, and navigate to the folder where your data is saved
Use the setwd() function, and specify the pathway as
the argument
Here is an example of what that approach would look like:
setwd("C:/Users/franc/OneDrive/Desktop/RWorkshop 2026 Curriculum/data")
workshop_data <- read_csv("sample_data.csv")
However, for our workshop we will never be using
this method because it has many inherent issues, the most glaring of
which is that it is not reproducible. This code works when
I run it on my computer, but the same code will not work on someone
else’s computer because this exact pathway is unlikely to exist. To
ensure data reading works properly in a vacuum - that is, without
needing to manually set working directory every time - we will be
restricting the search space (i.e. where R looks for your dataset) using
the here() function.
here() takes two argument: The pathway where your data
is located starting at the R Project, and the dataset’s name.
Using the above example, instead of
C:/Users/franc/OneDrive/Desktop/Github/RWorkshop/data, the
pathway using here() will only be data. A
quick way to verify which part of your pathway can be taken out is by
running the here() function without any arguments:
here()
## [1] "C:/Users/franc/OneDrive/Desktop/Github/RWorkshop"
The pathway that here() shows here are all the parts you
can delete, as these components will differ depending on where you saved
your entire project folder.
Exercise 3
Move your entire folder (containing your RProject, data folder, and
script folder) to a totally different location on your computer. When
you open your project and run here() again, do you get the
same output?
Now, we will combine the two functions. First, our complete line of
code using here() to specify our file location, in our
example, will be as follows:
here("data", "sample_data.csv")
Then, we take this *entire line of code, and insert it as one
argument for the read_csv() function:
workshop_data <- read_csv(here("data", "sample_data.csv"))