R Basics — Optional On-Ramp

Complete this before the first module (Data Collection & Prep) if you are new to R

Author

Meredith

Published

June 3, 2026

About this file

This file is optional and ungraded. It exists for one reason: if you have never used R before, working through it before the required file (e.g., Data collection & prep) will make the next week’s practice significantly less stressful.

If you can already answer yes to all three of these, skip this file:

Do you know what <- does in R?
Do you know what a data frame is?
Have you used filter() or select() from the tidyverse before?

One important note before you start

Everything you practice here you will do again in 1_Data_Collection_and_Prep.qmd with real data and a real submission. This file is the warm-up.

Part 1 · Variables and data types

A variable stores a value so you can use it later. In R, you assign values with <- — read it as “gets.” x <- 10 means “x gets 10.”

x <- 10
y <- 5
z <- x + y
z

[1] 15

# starts a comment. R ignores everything after it. Use comments to explain what your code does.

The three data types you will use most

# Numeric — any number
score <- 87.5

# Character — text, always in quotes
subject <- "Mathematics"

# Logical — TRUE or FALSE (always capitalized)
passed <- TRUE

# Check what type a variable is
class(score)

[1] "numeric"

class(subject)

[1] "character"

class(passed)

[1] "logical"

Why data types matter

When you load real data later, R sometimes imports numeric columns as character. Knowing how to check (class()) and fix types (as.numeric(), as.character()) is a data cleaning skill you will use often in next modules.

Practice 1

# Create a variable called 'my_score' with any number between 0 and 100.
my_score <- 67

# Create a variable called 'my_subject' with the name of a subject you
# teach or plan to teach.
my_subject <- "ESL"

# Print both variables.
print(my_score)

[1] 67

print(my_subject)

[1] "ESL"

Part 2 · Vectors

A vector is a sequence of values of the same type. Create one with c() — which stands for “combine.”

# Numeric vector — quiz scores for 5 students
scores <- c(85, 92, 78, 95, 88)
scores

[1] 85 92 78 95 88

# Character vector — student names
students <- c("Maya", "Jordan", "Sam", "Alex", "Riley")
students

[1] "Maya"   "Jordan" "Sam"    "Alex"   "Riley"

# How many items?
length(scores)

[1] 5

Working with vectors

mean(scores)      # average

[1] 87.6

median(scores)    # middle value

[1] 88

max(scores)       # highest

[1] 95

min(scores)       # lowest

[1] 78

sum(scores)       # total

[1] 438

# Access a specific item by position (R counts from 1, not 0)
scores[1]         # first item

[1] 85

scores[3]         # third item

[1] 78

Practice 2

# Create a numeric vector called 'weekly_hours' with 5 values between 1–20,
# representing hours 5 students spent on an LMS in one week.
weekly_hours <- c(5,6,7,8,9)

# Calculate the mean and max.
mean(weekly_hours)

[1] 7

max(weekly_hours)

[1] 9

# Access the second item.
weekly_hours[2]

[1] 6

Part 3 · Data frames

A data frame is like a spreadsheet — rows are observations (students, learners), columns are variables (scores, attendance, time spent).

student_data <- data.frame(
  student_id   = c(101, 102, 103, 104),
  name         = c("Maya", "Jordan", "Sam", "Alex"),
  quiz_score   = c(85, 92, 78, 95),
  time_on_task = c(25, 30, 20, 35)
)

student_data

  student_id   name quiz_score time_on_task
1        101   Maya         85           25
2        102 Jordan         92           30
3        103    Sam         78           20
4        104   Alex         95           35

Inspecting a data frame

nrow(student_data)      # number of rows

[1] 4

ncol(student_data)      # number of columns

[1] 4

glimpse(student_data)   # structure — names, types, first values

Rows: 4
Columns: 4
$ student_id   <dbl> 101, 102, 103, 104
$ name         <chr> "Maya", "Jordan", "Sam", "Alex"
$ quiz_score   <dbl> 85, 92, 78, 95
$ time_on_task <dbl> 25, 30, 20, 35

summary(student_data)   # summary statistics

   student_id        name             quiz_score     time_on_task  
 Min.   :101.0   Length:4           Min.   :78.00   Min.   :20.00  
 1st Qu.:101.8   Class :character   1st Qu.:83.25   1st Qu.:23.75  
 Median :102.5   Mode  :character   Median :88.50   Median :27.50  
 Mean   :102.5                      Mean   :87.50   Mean   :27.50  
 3rd Qu.:103.2                      3rd Qu.:92.75   3rd Qu.:31.25  
 Max.   :104.0                      Max.   :95.00   Max.   :35.00

Accessing columns

# Access a column with $
student_data$quiz_score

[1] 85 92 78 95

# Mean of a column
mean(student_data$quiz_score)

[1] 87.5

# Select specific columns with tidyverse select()
student_data |> select(name, quiz_score)

    name quiz_score
1   Maya         85
2 Jordan         92
3    Sam         78
4   Alex         95

Filtering rows

# Keep only students who scored above 85
high_scorers <- student_data |>
  filter(quiz_score > 85)

high_scorers

  student_id   name quiz_score time_on_task
1        102 Jordan         92           30
2        104   Alex         95           35

The tidyverse pipe |>

filter() and select() are from the tidyverse. The |> operator means “take this, then do this.” You will use it constantly in next weeks. Read student_data |> filter(quiz_score > 85) as: “take student_data, then keep rows where quiz_score is above 85.”

Practice 3

# Create a data frame called 'my_class' with 5 rows:
#   - student_id (any numbers)
#   - subject (any subject name)
#   - score (numbers between 60–100)
#   - attended (TRUE or FALSE)

 my_class <- data.frame(
  student_id   = c(12345, 54321, 13524, 42531),
  subject         = c("ESL","Math","Science","ELD"),
  score   = c(55, 70, 85, 100),
  attended = c(TRUE)
)

# Use glimpse() to inspect it.
glimpse(my_class)

Rows: 4
Columns: 4
$ student_id <dbl> 12345, 54321, 13524, 42531
$ subject    <chr> "ESL", "Math", "Science", "ELD"
$ score      <dbl> 55, 70, 85, 100
$ attended   <lgl> TRUE, TRUE, TRUE, TRUE

# Filter to keep only students with score above 75.
my_class |> filter(score > 75)

  student_id subject score attended
1      13524 Science    85     TRUE
2      42531     ELD   100     TRUE

# Select only student_id and score.
my_class |> select(student_id, score)

  student_id score
1      12345    55
2      54321    70
3      13524    85
4      42531   100

Part 4 · Loading data — a preview

In the next module, you will load a real CSV file. Here is the syntax so it is not new when you see it:

# eval: false means this chunk will NOT run — it is just for reading.
# You will use this in next modules with a real file name.

data <- read_csv("data/your_file_name.csv")

glimpse(data)
head(data)
nrow(data)

read_csv() vs read.csv()

Older R tutorials use read.csv(). This course uses read_csv() from the tidyverse — faster and more consistent. When you see read.csv() in an online example, you can usually swap it for read_csv().

You are ready for next module

If you worked through all four practice sections, you are ready for 1_Data_Collection_and_Prep.qmd. What will be new there:

Loading and cleaning a real educational dataset
Handling missing values with drop_na() and replace_na()
Creating new variables with mutate()
Sorting data with arrange()
Chaining operations with the pipe |>
Your first scatter plot with ggplot2

No submission needed

This file is optional and ungraded. Nothing to render or submit. When you are ready, open 1_Data_Collection_and_Prep.qmd.

Want to keep this as a reference?

This file is optional and ungraded — there is no submission required. But if you would like to publish it as part of your e-portfolio to show your R learning journey, you can.

Render

Click the Render button in the toolbar above. A formatted HTML page will appear in your Viewer tab or a new browser window.

Publish options

Choose any method that fits your portfolio:

Option	Best for	Link
Posit Cloud	Quickest — one click from your workspace	Guide
RPubs	Free, public, easy to share a link	rpubs.com
Quarto Pub	Clean public portfolio pages	Guide
GitHub Pages	Best for a professional portfolio	Guide

E-portfolio tip

If you are building an e-portfolio, publishing all four .qmd files (this one plus the three required files) tells a complete story — from your first R steps to a full learning analytics capstone. A viewer can follow your progression across the semester.

If you have any questions or run into technical issues, post in the course discussion board or contact your instructor.