This module will cover: - Introduction to R and RStudio - Basic R operations and syntax - Working with different types of data (vectors and data frames) - Simple calculations in R - Installing and using R packages - Creating R Notebooks for reproducible research
R is a powerful programming language designed specifically for statistical computing and data visualization. It’s:
Let’s start with a simple command:
[1] "Hello Education Researchers!"
RStudio is distinct from the language R. Both need to be installed on your computer. R alone can only run scripts or provide a very simple command line interface. RStudio provides a modern software environment to complete all of your analytics work.
RStudio has four main panes: 1. Source Editor (top left) - where we write our code 2. Console (bottom left) - where code runs and we see output 3. Environment/History (top right) - shows what data is currently loaded 4. Files/Plots/Help (bottom right) - for file management, viewing plots, and getting help
Think of an R Project like a folder that keeps all your work organized. Just like you might have different folders for different work tasks, R Projects help you keep your R work tidy and in one place.
To create a new project: 1. Click File > New Project 2. Choose “New Directory” 3. Select “New Project” 4. Pick a name and where to save it -something like “R Workshop” 5. Click “Create Project”
Your new project folder will store: - The R work we do today - Any data files we use - Settings specific to this project
When you open your project later: - RStudio remembers where all your files are - Opens the files you were working on - Keeps everything together
Let’s create a new project now to store the work we’ll do in this workshop.
[1] 8
[1] 5
[1] 12
[1] 8
Let’s start with the simplest possible R commands:
# This line is a comment - R ignores anything after #
2 + 2 # Basic math works just like a calculator[1] 4
# We can save values by giving them names (no spaces allowed in names!)
my_number <- 5
another_number <- 10
# Now we can use those names in calculations
my_number + another_number[1] 15
# We can save the result too
total <- my_number + another_number
total # Type the name to see what's stored in it[1] 15
# Multiple values can be combined into a list called a "vector" in R
numbers <- c(8, 9, 15) # The c() function combines values
# Functions take inputs inside round brackets
mean(numbers) # Calculates the average[1] 10.66667
# Functions can be used inside other functions
round(mean(numbers)) # First calculates mean, then rounds the result[1] 11
A few important things to remember: - R is case sensitive (“Number” and “number” are different) - Use <- to save values (this is called “assignment”) - Type a variable name to see what’s stored in it - Functions always use round brackets () - Use # for comments that R will ignore
Don’t worry if you make mistakes! Everyone does when learning R. We’ll practice these basics together.
Vectors are how R stores data in its most basic form. In social sciences, we might call what they contain variables. A vector is a list of one type of data - it could be numbers (like student grades), text (like student names), or TRUE/FALSE values (like whether a student passed).
# Different types of vectors
# Numeric (decimal numbers of any precision)
scores <- c(75, 82.3, 90.01, 68.7, 95.2)
print(scores) # Fixed variable name from test_scores to scores[1] 75.00 82.30 90.01 68.70 95.20
# Character (text)
student_names <- c("Alice", "Bob", "Charlie", "David", "Emma")
print(student_names)[1] "Alice" "Bob" "Charlie" "David" "Emma"
[1] TRUE TRUE FALSE TRUE FALSE
# Factor (categorical data - like grade levels)
grade_levels <- factor(c("9", "10th", "9th", "10th", "9th"))
print(grade_levels)[1] 9 10th 9th 10th 9th
Levels: 10th 9 9th
[1] 3
10th 9 9th
2 1 2
With numeric data, we can calculate various statistics:
[1] 82.242
[1] 82.3
[1] 10.77134
[1] 68.7
[1] 95.2
0% 25% 50% 75% 100%
68.70 75.00 82.30 90.01 95.20
10% 90%
71.220 93.124
[1] 15.01
A data frame combines multiple vectors of the same length. Think of it like a spreadsheet table where each column can contain a different type of data.
# Numeric data
math_scores <- c(85, 92, 78, 95, 88)
# Character data (text)
student_names <- c("Alice", "Bob", "Charlie", "David", "Emma")
# Logical data (TRUE/FALSE)
passed <- math_scores >= 80 # Creates TRUE for scores 80 or above
# Create a data frame
student_data <- data.frame(
student_names,
math_scores,
passed
)
# View the data
student_data[1] 85 92 78 95 88
[1] 85 92 78 95 88
Note: While column names in a data frame can include spaces, it’s best to avoid them as they make your code more complicated.
The pipe operator (|> or %>%) makes
code easier to read by passing data through a sequence of operations.
Think of it as saying “then”:
[1] 82.242
[1] 82.242
You may see both |> and %>% online -
they work similarly. We’ll use |> as it’s built into
R.
# Create some test data
test_scores <- c(65, 72, 88, 95, 78, 84)
# Use the pipe to:
# 1. Start with test_scores THEN
# 2. Calculate the average with mean() THEN
# 3. Round to whole numbers with round()Later, this pipe workflow will help us make complex, step-by-step changes to data frames in a way that’s easier to write and understand.
R’s power comes from its extensive ecosystem of packages. These packages add new tools and capabilities to R.
The simplest way to install packages is through the “Tools” menu in RStudio.
All packages are available from CRAN (Comprehensive R Archive Network). You can browse packages at https://cran.r-project.org/web/packages/ or search for specific tasks at https://rseek.org/.
Packages can also be installed using code:
# Install a single package
install.packages("dplyr")
# Install multiple packages at once (uncomment to run)
#install.packages(c("dplyr", "ggplot2", "tidyr"))You only need to install a package once, but you need to load it each time you start R. It’s helpful to keep installation commands (commented out) at the start of your project files.
After installation, packages must be loaded to use them. Load only what you need for your current project to keep R running efficiently.
The dplyr package is particularly useful for working
with data:
Throughout this course, we’ll use dplyr functions like
summarize(), filter(), and
rename(). If these don’t work, make sure you’ve loaded
dplyr with the library() command above.
An R Notebook combines: - Text explanations (in Markdown format) - R code (in special code sections) - Output (tables, plots, statistics)
To create code sections (called “chunks”): - Click the “Insert” button or press Ctrl+Alt+I - Type your R code between the ``` marks - Run code with the green “play” button or Ctrl+Shift+Enter
Try creating a notebook: 1. Click File > New File > R Notebook 2. Save the example notebook that opens 3. Click Preview to see how it looks
Things to try: [] Change the output format to PDF or Word [] Add section headers and see how they appear in the preview [] Create and run a new code chunk
[1] 85
# Use <- for assignment (storing values)
x <- 10
# Use == to test if things are equal
x == 10 # Returns TRUE[1] TRUE
There are often several ways to do the same thing in R. Here are three ways to find students with math scores over 80:
All three give the same result. The tidyverse style is often easier to read and understand.
Create your own notebook and try these basic operations:
Let’s review what we’ve covered so far:
Now let’s put these skills into practice!
Take 10 minutes to: