R Basics — Optional On-Ramp

Complete this before the first module (Data Collection & Prep) if you are new to R

Author

Meredith

Published

June 3, 2026


About this file

This file is optional and ungraded. It exists for one reason: if you have never used R before, working through it before the required file (e.g., Data collection & prep) will make the next week’s practice significantly less stressful.

If you can already answer yes to all three of these, skip this file:

  • Do you know what <- does in R?
  • Do you know what a data frame is?
  • Have you used filter() or select() from the tidyverse before?
NoteOne important note before you start

Everything you practice here you will do again in 1_Data_Collection_and_Prep.qmd with real data and a real submission. This file is the warm-up.


Part 1 · Variables and data types

A variable stores a value so you can use it later. In R, you assign values with <- — read it as “gets.” x <- 10 means “x gets 10.”

x <- 10
y <- 5
z <- x + y
z
[1] 15

# starts a comment. R ignores everything after it. Use comments to explain what your code does.

The three data types you will use most

# Numeric — any number
score <- 87.5

# Character — text, always in quotes
subject <- "Mathematics"

# Logical — TRUE or FALSE (always capitalized)
passed <- TRUE

# Check what type a variable is
class(score)
[1] "numeric"
class(subject)
[1] "character"
class(passed)
[1] "logical"
TipWhy data types matter

When you load real data later, R sometimes imports numeric columns as character. Knowing how to check (class()) and fix types (as.numeric(), as.character()) is a data cleaning skill you will use often in next modules.

Practice 1

# Create a variable called 'my_score' with any number between 0 and 100.
my_score <- 67

# Create a variable called 'my_subject' with the name of a subject you
# teach or plan to teach.
my_subject <- "ESL"

# Print both variables.
print(my_score)
[1] 67
print(my_subject)
[1] "ESL"

Part 2 · Vectors

A vector is a sequence of values of the same type. Create one with c() — which stands for “combine.”

# Numeric vector — quiz scores for 5 students
scores <- c(85, 92, 78, 95, 88)
scores
[1] 85 92 78 95 88
# Character vector — student names
students <- c("Maya", "Jordan", "Sam", "Alex", "Riley")
students
[1] "Maya"   "Jordan" "Sam"    "Alex"   "Riley" 
# How many items?
length(scores)
[1] 5

Working with vectors

mean(scores)      # average
[1] 87.6
median(scores)    # middle value
[1] 88
max(scores)       # highest
[1] 95
min(scores)       # lowest
[1] 78
sum(scores)       # total
[1] 438
# Access a specific item by position (R counts from 1, not 0)
scores[1]         # first item
[1] 85
scores[3]         # third item
[1] 78

Practice 2

# Create a numeric vector called 'weekly_hours' with 5 values between 1–20,
# representing hours 5 students spent on an LMS in one week.
weekly_hours <- c(5,6,7,8,9)

# Calculate the mean and max.
mean(weekly_hours)
[1] 7
max(weekly_hours)
[1] 9
# Access the second item.
weekly_hours[2]
[1] 6

Part 3 · Data frames

A data frame is like a spreadsheet — rows are observations (students, learners), columns are variables (scores, attendance, time spent).

student_data <- data.frame(
  student_id   = c(101, 102, 103, 104),
  name         = c("Maya", "Jordan", "Sam", "Alex"),
  quiz_score   = c(85, 92, 78, 95),
  time_on_task = c(25, 30, 20, 35)
)

student_data
  student_id   name quiz_score time_on_task
1        101   Maya         85           25
2        102 Jordan         92           30
3        103    Sam         78           20
4        104   Alex         95           35

Inspecting a data frame

nrow(student_data)      # number of rows
[1] 4
ncol(student_data)      # number of columns
[1] 4
glimpse(student_data)   # structure — names, types, first values
Rows: 4
Columns: 4
$ student_id   <dbl> 101, 102, 103, 104
$ name         <chr> "Maya", "Jordan", "Sam", "Alex"
$ quiz_score   <dbl> 85, 92, 78, 95
$ time_on_task <dbl> 25, 30, 20, 35
summary(student_data)   # summary statistics
   student_id        name             quiz_score     time_on_task  
 Min.   :101.0   Length:4           Min.   :78.00   Min.   :20.00  
 1st Qu.:101.8   Class :character   1st Qu.:83.25   1st Qu.:23.75  
 Median :102.5   Mode  :character   Median :88.50   Median :27.50  
 Mean   :102.5                      Mean   :87.50   Mean   :27.50  
 3rd Qu.:103.2                      3rd Qu.:92.75   3rd Qu.:31.25  
 Max.   :104.0                      Max.   :95.00   Max.   :35.00  

Accessing columns

# Access a column with $
student_data$quiz_score
[1] 85 92 78 95
# Mean of a column
mean(student_data$quiz_score)
[1] 87.5
# Select specific columns with tidyverse select()
student_data |> select(name, quiz_score)
    name quiz_score
1   Maya         85
2 Jordan         92
3    Sam         78
4   Alex         95

Filtering rows

# Keep only students who scored above 85
high_scorers <- student_data |>
  filter(quiz_score > 85)

high_scorers
  student_id   name quiz_score time_on_task
1        102 Jordan         92           30
2        104   Alex         95           35
ImportantThe tidyverse pipe |>

filter() and select() are from the tidyverse. The |> operator means “take this, then do this.” You will use it constantly in next weeks. Read student_data |> filter(quiz_score > 85) as: “take student_data, then keep rows where quiz_score is above 85.”

Practice 3

# Create a data frame called 'my_class' with 5 rows:
#   - student_id (any numbers)
#   - subject (any subject name)
#   - score (numbers between 60–100)
#   - attended (TRUE or FALSE)

 my_class <- data.frame(
  student_id   = c(12345, 54321, 13524, 42531),
  subject         = c("ESL","Math","Science","ELD"),
  score   = c(55, 70, 85, 100),
  attended = c(TRUE)
)

# Use glimpse() to inspect it.
glimpse(my_class)
Rows: 4
Columns: 4
$ student_id <dbl> 12345, 54321, 13524, 42531
$ subject    <chr> "ESL", "Math", "Science", "ELD"
$ score      <dbl> 55, 70, 85, 100
$ attended   <lgl> TRUE, TRUE, TRUE, TRUE
# Filter to keep only students with score above 75.
my_class |> filter(score > 75)
  student_id subject score attended
1      13524 Science    85     TRUE
2      42531     ELD   100     TRUE
# Select only student_id and score.
my_class |> select(student_id, score)
  student_id score
1      12345    55
2      54321    70
3      13524    85
4      42531   100

Part 4 · Loading data — a preview

In the next module, you will load a real CSV file. Here is the syntax so it is not new when you see it:

# eval: false means this chunk will NOT run — it is just for reading.
# You will use this in next modules with a real file name.

data <- read_csv("data/your_file_name.csv")

glimpse(data)
head(data)
nrow(data)
Tipread_csv() vs read.csv()

Older R tutorials use read.csv(). This course uses read_csv() from the tidyverse — faster and more consistent. When you see read.csv() in an online example, you can usually swap it for read_csv().


You are ready for next module

If you worked through all four practice sections, you are ready for 1_Data_Collection_and_Prep.qmd. What will be new there:

  • Loading and cleaning a real educational dataset
  • Handling missing values with drop_na() and replace_na()
  • Creating new variables with mutate()
  • Sorting data with arrange()
  • Chaining operations with the pipe |>
  • Your first scatter plot with ggplot2
NoteNo submission needed

This file is optional and ungraded. Nothing to render or submit. When you are ready, open 1_Data_Collection_and_Prep.qmd.


Want to keep this as a reference?

This file is optional and ungraded — there is no submission required. But if you would like to publish it as part of your e-portfolio to show your R learning journey, you can.

Render

Click the Render button in the toolbar above. A formatted HTML page will appear in your Viewer tab or a new browser window.

Publish options

Choose any method that fits your portfolio:

Option Best for Link
Posit Cloud Quickest — one click from your workspace Guide
RPubs Free, public, easy to share a link rpubs.com
Quarto Pub Clean public portfolio pages Guide
GitHub Pages Best for a professional portfolio Guide
TipE-portfolio tip

If you are building an e-portfolio, publishing all four .qmd files (this one plus the three required files) tells a complete story — from your first R steps to a full learning analytics capstone. A viewer can follow your progression across the semester.

If you have any questions or run into technical issues, post in the course discussion board or contact your instructor.