Learning Objectives:

  • Introduction to R
  • Installing R and RStudio
  • Project Setup
  • Using Functions and Packages
  • Reading in datasets

Introduction to R

R is essentially another option for computing statistics. It is (initially) less intuitive to use compared to SPSS and Excel, but it has several advantages that make it a useful tool. Over the next few months, you will most likely consistently ask yourself: “Why am I struggling with this instead of just using SPSS or Excel?” That’s totally normal, and you just have to remind yourself of the two major advantages of R:

Workflow in R, image from https://r4ds.had.co.nz/explore-intro.html

Workflow in R, image from https://r4ds.had.co.nz/explore-intro.html
  1. Flexibility: R has many more options as your computations get more complicated
  2. Reproducibility: R calculations are 100% reproducible, and even entire manuscripts can be reproducible when used in conjunction with RMarkdown (more on this later)

R is essentially a one-stop-shop for all your research needs: from data cleaning to analysis to visualization, everything can be done in one place.

Installing R and RStudio

To get things started, you will need to download R as well as RStudio:

Downloading R:

R RStudio

(Links last updated: March 11, 2026)

Math operations

Now that we have RStudio installed, let’s take a look at the different parts of the interface. First, let’s explore what R can do by looking at the console. The console is where you give ‘commands’ to R, ranging anywhere from simple math questions to several lines of code.

To illustrate, let’s ask R to solve a math questions that’s literally impossible for an average human to compute:

9+10
## [1] 19

Here are some common operations and how to enter them:

operation operator/symbol example input example output
addition + 2 + 2 4
subtraction - 5- 1 4
multiplication * 2 * 2 4
division / 8 / 2 4
power ^ 2 ^ 2 4

One thing to note here is that R doesn’t care about spacing: 1+1 and 1 + 1 is treated the same way, so you can organize your eventual code according to how you like it visually.

Exercise 1

Try entering and running some operations in the console.

Storing variables as objects

Using R like a calculator is fun, but we can also store our answers as temporary variables in our workspace using the <- operator. Try typing the following in your console:

answer <- 3 + 5

Notice now it doesn’t give you the answer as an output in the console, but it stores the answer as an object called “answer” in your workspace (or your Environment). Now, type in answer in your console to recall it:

answer
## [1] 8

This will be especially useful when you have to do more complex things like applying functions to larger data sets. Notice that you have to type in the variable name every single time you use it, so try to avoid long but more informative variable names like “Number of times baby chose blue”. Instead, here are some commons ways people standardize variables:

  • choice_blue
  • blue.choice.num
  • ChoseBlue

General rules to follow when naming variables: avoid capitalization and spaces. Since R uses white space as a separator, anything separated by white space is treated as two objects. This will get annoying for you later, so best to avoid it altogether.

Project Setup

R Projects and R Scripts

R would not be a reproducible solution if we are re-entering code every time. Instead, we want to create a workflow such that anyone can reproduce your analysis pipeline on their own computer. To do this, we need to organize our data and code in a way that makes sense.

The first thing we will do is create an R Project.

  • Go to File -> New Project
  • Select New Directory -> New Project
  • Give your project an appropriate name. Let’s call it “Workshop 1”
  • Click “Browse” and navigate to where you want to save this project. You can just put it on your desktop
  • Check the box that says “Open in new session”, and press “Create Project”

Creating an R Project

Creating an R Project

Your R project is now created, and you can access it at any time by going to where you saved the project and opening the file.

Folder structure

Next, we want to organize our project in a way that is both intuitive and accessible with code. Take a look at your Files pane. You can use this pane to navigate your project folder, including making new folders and opening scripts. To start, click on “New Folder” and create two new folders: “data” and “scripts”.

Go into your “scripts” folder. Now, click on the “File” button to create a new R script. Let’s call this “workshop_1.R”.

Working in R scripts

R scripts is how we can store code for later use instead of retyping it every time. Let’s try it out - Write out the following line of code in your R script:

sum <- 5 + 6 + 7

To run this line of code, either highlight the entire code that you want to run, or simply have your cursor at the line you want to run, then hit the “Run” button (or Ctrl + Enter on Windows; Cmd + Enter on Mac). This should do the exact same thing as typing this line of code directly in your console.

Exercise 2

Pretend you ran 5 babies in a looking time study, and their looking times on the test trial are 5, 7, 15, 21, and 25 seconds. Create a variable to store the sum of their looking time, a variable that represents the number of babies you ran. Finally, using those two variable, create a third variable that represents the mean looking time for your study.

Using Functions and Packages

Functions and Arguments

Now we know how to store variables, we can start doing cool things to them using functions. Functions are essentially code that someone wrote already to be applied to a given set of input, and all we have to do is provide the correct input. These inputs are called “arguments”.

Some functions are very simple, like nchar() which just counts the number of characters in a given object:

nchar("Francis")
## [1] 7

Some can do useful statistical computations, like sd() which calculates the standard deviation of a set of numbers. Generally, functions are used using the format:

function(argument1, argument2, argument3, etc…)

Every function has its own unique set of arguments, and they are very well documented. As an example, let’s take a look at the function round(). First, type in ?round() in your console, which will give you the explanation of the function. Scroll down to the ‘arguments’ section, and notice that it takes two main arguments: x and digits. That means you have to specificy x (what number/vector you want rounded) and digits (how many digits to round to). Now that we know what arguments this function takes, try to round 3.14159 to 3 digits:

round(x = 3.14159, digits = 3)
## [1] 3.142

But what if we don’t specify and just try to type it in?

round(3.14159, 3)
## [1] 3.142

Notice that it still works because the things we put in for the arguments are in the correct order. However, if we do this:

round(2, 3.14159)
## [1] 2

It doesn’t work the same way. Generally, it is highly recommended to name your arguments even if you know you are doing it in the correct order because:

  1. You can use them out of order (e.g. round(digits = 3, x = 3.14159) will work!)
  2. The next person reading your code will know what is going on
  3. Future you will know what is going on
  4. As functions get more complicated, you will want to keep track to prevent mistakes

Packages

People are constantly building new functions and combining them into collections called ‘packages’. These packages are free for you to use, instead of being limited by whatever your program offers (like Excel or SPSS) and having to pay subscription. The function you need to use to install packages is install.packages("package_name"). To start, let’s install three of my most commonly used packages: “tidyverse”, “here”, and “janitor”. Run these lines in your console one-by-one (we don’t need to save these in the R script since this is a one-time installation:

  • install.packages("tidyverse")
  • install.packages("here")
  • install.packages("janitor")

Before you can use any of your installed packages, you have to first ‘load’ them by adding them to your current working library using the library() function. Note that you will need to do this whenever you start a new R session, so this part is often, if not always, the first few lines of an R script:

library(tidyverse)
library(here)
library(janitor)

Reading in datasets

Now that we have our packages loaded, we are ready to read in some data. The vast majority of datasets you will work with will be in .csv, or comma separated values format (sometimes called comma delimited). The function we will use is read_csv(). Though it can technically take many arguments, we will only worry about one: the dataset name. For instance, the code to read in a dataset called “sample_data.csv”, and store it as an object called “workshop_data”, will be as follows:

workshop_data <- read_csv("sample_data.csv")

But you might have noticed a problem: How does R know where your data is located? The traditional way of helping R find your data is to specify your R session’s working directory. You can do so in two ways:

  1. Go to Session -> Set Working Directory -> Choose Directory, and navigate to the folder where your data is saved

  2. Use the setwd() function, and specify the pathway as the argument

Here is an example of what that approach would look like:

setwd("C:/Users/franc/OneDrive/Desktop/RWorkshop 2026 Curriculum/data")

workshop_data <- read_csv("sample_data.csv")

However, for our workshop we will never be using this method because it has many inherent issues, the most glaring of which is that it is not reproducible. This code works when I run it on my computer, but the same code will not work on someone else’s computer because this exact pathway is unlikely to exist. To ensure data reading works properly in a vacuum - that is, without needing to manually set working directory every time - we will be restricting the search space (i.e. where R looks for your dataset) using the here() function.

here() takes two argument: The pathway where your data is located starting at the R Project, and the dataset’s name. Using the above example, instead of C:/Users/franc/OneDrive/Desktop/Github/RWorkshop/data, the pathway using here() will only be data. A quick way to verify which part of your pathway can be taken out is by running the here() function without any arguments:

here()
## [1] "C:/Users/franc/OneDrive/Desktop/Github/RWorkshop"

The pathway that here() shows here are all the parts you can delete, as these components will differ depending on where you saved your entire project folder.

Exercise 3

Move your entire folder (containing your RProject, data folder, and script folder) to a totally different location on your computer. When you open your project and run here() again, do you get the same output?

Putting it together: Reproducible reading with here()

Now, we will combine the two functions. First, our complete line of code using here() to specify our file location, in our example, will be as follows:

here("data", "sample_data.csv")

Then, we take this *entire line of code, and insert it as one argument for the read_csv() function:

workshop_data <- read_csv(here("data", "sample_data.csv"))