knitr::opts_chunk$set(
 echo = TRUE,
 message = FALSE,
 warning = FALSE
)

Module 1: Getting Started with R

This module will cover: - Introduction to R and RStudio - Basic R operations and syntax - Working with different types of data (vectors and data frames) - Simple calculations in R - Installing and using R packages - Creating R Notebooks for reproducible research

What is R?

R is a powerful programming language designed specifically for statistical computing and data visualization. It’s:

  • Free and open-source
  • Has extensive package support (18,000+ packages)
  • Strong community, especially in research
  • Excellent for data analysis and visualization
  • Works on all major platforms

Why R for Education Research?

  • Reproducible research (literate programming)
  • Compartmentalize work (e.g., accessing data, preparing data, reporting)
  • Publication-quality reporting
  • Handles large datasets efficiently
  • Integrates with other tools (PowerBI, Alchemer, databases)
  • Active community

Let’s start with a simple command:

print("Hello Education Researchers!")
[1] "Hello Education Researchers!"

The RStudio Interface

RStudio is distinct from the language R. Both need to be installed on your computer. R alone can only run scripts or provide a very simple command line interface. RStudio provides a modern software environment to complete all of your analytics work.

RStudio has four main panes: 1. Source Editor (top left) - where we write our code 2. Console (bottom left) - where code runs and we see output 3. Environment/History (top right) - shows what data is currently loaded 4. Files/Plots/Help (bottom right) - for file management, viewing plots, and getting help

R Projects in RStudio

Think of an R Project like a folder that keeps all your work organized. Just like you might have different folders for different work tasks, R Projects help you keep your R work tidy and in one place.

To create a new project: 1. Click File > New Project 2. Choose “New Directory” 3. Select “New Project” 4. Pick a name and where to save it -something like “R Workshop” 5. Click “Create Project”

Your new project folder will store: - The R work we do today - Any data files we use - Settings specific to this project

When you open your project later: - RStudio remembers where all your files are - Opens the files you were working on - Keeps everything together

Let’s create a new project now to store the work we’ll do in this workshop.

Basic R Operations

R as a Calculator

5 + 3
[1] 8
10 / 2
[1] 5
3 * 4
[1] 12
2^3  # Exponents (2 raised to power of 3)
[1] 8

Basic R Syntax

Let’s start with the simplest possible R commands:

# This line is a comment - R ignores anything after #
2 + 2    # Basic math works just like a calculator
[1] 4
# We can save values by giving them names (no spaces allowed in names!)
my_number <- 5
another_number <- 10

# Now we can use those names in calculations
my_number + another_number
[1] 15
# We can save the result too
total <- my_number + another_number
total    # Type the name to see what's stored in it
[1] 15
# Multiple values can be combined into a list called a "vector" in R
numbers <- c(8, 9, 15)  # The c() function combines values

# Functions take inputs inside round brackets
mean(numbers)  # Calculates the average
[1] 10.66667
# Functions can be used inside other functions
round(mean(numbers)) # First calculates mean, then rounds the result
[1] 11

A few important things to remember: - R is case sensitive (“Number” and “number” are different) - Use <- to save values (this is called “assignment”) - Type a variable name to see what’s stored in it - Functions always use round brackets () - Use # for comments that R will ignore

Don’t worry if you make mistakes! Everyone does when learning R. We’ll practice these basics together.

Creating and Working with Vectors

Vectors are how R stores data in its most basic form. In social sciences, we might call what they contain variables. A vector is a list of one type of data - it could be numbers (like student grades), text (like student names), or TRUE/FALSE values (like whether a student passed).

# Different types of vectors
# Numeric (decimal numbers of any precision)
scores <- c(75, 82.3, 90.01, 68.7, 95.2)
print(scores)  # Fixed variable name from test_scores to scores
[1] 75.00 82.30 90.01 68.70 95.20
# Character (text)
student_names <- c("Alice", "Bob", "Charlie", "David", "Emma")
print(student_names)
[1] "Alice"   "Bob"     "Charlie" "David"   "Emma"   
# Logical (TRUE/FALSE)
passed_course <- c(TRUE, TRUE, FALSE, TRUE, FALSE)
print(passed_course)
[1]  TRUE  TRUE FALSE  TRUE FALSE
# Factor (categorical data - like grade levels)
grade_levels <- factor(c("9", "10th", "9th", "10th", "9th"))
print(grade_levels)
[1] 9    10th 9th  10th 9th 
Levels: 10th 9 9th
length(levels(grade_levels)) # Shows we have 3 different categories! Why?
[1] 3
summary(grade_levels)        # Shows how many students in each category
10th    9  9th 
   2    1    2 

With numeric data, we can calculate various statistics:

# Basic statistics
mean(scores)    # Average
[1] 82.242
median(scores)  # Middle value
[1] 82.3
sd(scores)      # Standard deviation
[1] 10.77134
min(scores)     # Minimum
[1] 68.7
max(scores)     # Maximum
[1] 95.2
quantile(scores)  # Quartiles (0%, 25%, 50%, 75%, 100%)
   0%   25%   50%   75%  100% 
68.70 75.00 82.30 90.01 95.20 
quantile(scores, probs = c(.1, .9))  # 10th and 90th percentiles
   10%    90% 
71.220 93.124 
IQR(scores)     # Interquartile range (75th - 25th percentile)
[1] 15.01

Working with Different Types of Data: The Data Frame

A data frame combines multiple vectors of the same length. Think of it like a spreadsheet table where each column can contain a different type of data.

# Numeric data
math_scores <- c(85, 92, 78, 95, 88)

# Character data (text)
student_names <- c("Alice", "Bob", "Charlie", "David", "Emma")

# Logical data (TRUE/FALSE)
passed <- math_scores >= 80  # Creates TRUE for scores 80 or above

# Create a data frame
student_data <- data.frame(
  student_names,
  math_scores,
  passed
)

# View the data
student_data

# Access a single column (variable)
student_data$math_scores
[1] 85 92 78 95 88
# Alternative way to access a column
student_data[["math_scores"]]
[1] 85 92 78 95 88
# You can also click on "student_data" in the environment pane to view it

Note: While column names in a data frame can include spaces, it’s best to avoid them as they make your code more complicated.

The Pipe Operator

The pipe operator (|> or %>%) makes code easier to read by passing data through a sequence of operations. Think of it as saying “then”:

# Without pipe
mean(scores)
[1] 82.242
# With pipe: "Start with scores THEN calculate the mean"
scores |> mean()
[1] 82.242

You may see both |> and %>% online - they work similarly. We’ll use |> as it’s built into R.

Exercise: Try the Pipe

# Create some test data
test_scores <- c(65, 72, 88, 95, 78, 84)

# Use the pipe to:
# 1. Start with test_scores THEN
# 2. Calculate the average with mean() THEN
# 3. Round to whole numbers with round()

Later, this pipe workflow will help us make complex, step-by-step changes to data frames in a way that’s easier to write and understand.

R Libraries and Packages

R’s power comes from its extensive ecosystem of packages. These packages add new tools and capabilities to R.

Installing Packages

The simplest way to install packages is through the “Tools” menu in RStudio.

All packages are available from CRAN (Comprehensive R Archive Network). You can browse packages at https://cran.r-project.org/web/packages/ or search for specific tasks at https://rseek.org/.

Packages can also be installed using code:

# Install a single package
install.packages("dplyr")

# Install multiple packages at once (uncomment to run)
#install.packages(c("dplyr", "ggplot2", "tidyr"))

You only need to install a package once, but you need to load it each time you start R. It’s helpful to keep installation commands (commented out) at the start of your project files.

Loading Packages

After installation, packages must be loaded to use them. Load only what you need for your current project to keep R running efficiently.

The dplyr package is particularly useful for working with data:

# Load dplyr
library(dplyr)

Throughout this course, we’ll use dplyr functions like summarize(), filter(), and rename(). If these don’t work, make sure you’ve loaded dplyr with the library() command above.

Creating Your First R Notebook

An R Notebook combines: - Text explanations (in Markdown format) - R code (in special code sections) - Output (tables, plots, statistics)

To create code sections (called “chunks”): - Click the “Insert” button or press Ctrl+Alt+I - Type your R code between the ``` marks - Run code with the green “play” button or Ctrl+Shift+Enter

Try creating a notebook: 1. Click File > New File > R Notebook 2. Save the example notebook that opens 3. Click Preview to see how it looks

Things to try: [] Change the output format to PDF or Word [] Add section headers and see how they appear in the preview [] Create and run a new code chunk

Common Mistakes to Watch Out For

Case Sensitivity

# These are different variables
score <- 85
SCORE <- 95
print(score)  # Shows 85
[1] 85
# This would cause an error because Score (capital S) doesn't exist
# print(Score)

Assignment vs. Equality Testing

# Use <- for assignment (storing values)
x <- 10

# Use == to test if things are equal
x == 10  # Returns TRUE
[1] TRUE
x = 10   # This works but <- is preferred in R

Getting Help

# Add # for comments that R ignores
# Use help() or ? to get documentation
?mean
help(mean)

Different Coding Styles

There are often several ways to do the same thing in R. Here are three ways to find students with math scores over 80:

# Base R style using subset()
subset(student_data, math_scores > 80)

# Base R with brackets
student_data[student_data$math_scores > 80,]

# Modern tidyverse style
library(dplyr)
student_data |>
  filter(math_scores > 80)

save.image()

All three give the same result. The tidyverse style is often easier to read and understand.

Exercise

Create your own notebook and try these basic operations:

  1. Create a vector of 10 test scores
  2. Calculate the mean and standard deviation
  3. Create a data frame with student names and scores
  4. Use summary() to explore your data
# Your code here:
# scores <- c(85, 92, 78, 88, 95, 82, 90, 85, 89, 91)

Practice Time

Let’s review what we’ve covered so far:

  • We learned about R and RStudio:
    • R is a powerful programming language for statistics and data
    • RStudio provides a user-friendly environment with helpful features
    • Projects help keep our work organized
  • We practiced basic R operations:
    • Using R as a calculator
    • Creating variables with <-
    • Building vectors with c()
    • Working with different data types (numbers, text, TRUE/FALSE)
    • Creating data frames to organize multiple variables
    • Installing and loading packages
    • Using the pipe operator |>
  • We started working with R Notebooks:
    • Combining text, code, and output
    • Creating code chunks
    • Running code and seeing results
    • Adding explanatory text
  • We learned about common pitfalls:
    • R is case sensitive
    • Different ways to write the same operation
    • How to get help when stuck

Now let’s put these skills into practice!

Take 10 minutes to:

  • Explore the RStudio interface:
    • Environment pane (where your data lives)
    • Console pane (where you can type commands)
    • Source Pane (where you write your code)
      • Try the Outline button
      • Use the Preview button
      • Practice with the Run button
  • Create and run your own code chunks
  • Try some basic R commands
  • Ask questions - we’re here to help!
