1 How to use this file (for students)

This is a self-contained lesson. Read the note under each heading, copy and then run the chunk.

2 0) R as a calculator

R does everyday math. Angles in trig functions are radians (π radians = 180°).

1 / 200 * 30                # divide, then multiply
## [1] 0.15
(59 + 73 + 2) / 3           # average of three numbers
## [1] 44.66667
sin(pi / 2)                 # sine of 90 degrees (pi/2 radians)
## [1] 1

Note: pi is built-in (~3.14159). Computers approximate decimals, so answers can be very close to (not exactly) the math value.

3 0A) Environment: list / remove / clear objects + mini plot

Your Environment is R’s working memory. Learn to list, delete, reset — then make a tiny plot.

rm(list = ls())             # start clean: remove everything from the Global Environment

# Create a few objects
x <- 2 + 2                  # preferred assignment operator in scripts
y = 2 + 2                   # also valid; less common in scripts
z <- 5

ls()                        # list object names (expect "x","y","z")
## [1] "x" "y" "z"

# Delete ONE object
rm(z)
exists("z")                 # should print FALSE
## [1] FALSE
ls()                        # only x and y remain
## [1] "x" "y"

# Delete SEVERAL at once
rm(x, y)
ls()                        # empty -> character(0)
## character(0)

# Safer bulk removal (won't error if a name is missing)
# rm(list = c("x","y","z"))

# Clear ALL, then keep only some
a <- 1; b <- 2; c <- 3
rm(list = ls()); ls()
## character(0)

a <- 1; b <- 2; c <- 3
rm(list = setdiff(ls(), c("a","c")))  # remove all EXCEPT 'a' and 'c'
ls()                                   # should show 'a' and 'c'
## [1] "a" "c"

# Tiny arithmetic + two mini plots (watch the Plots tab)
a <- 10; b <- 20; c <- a + b; c
## [1] 30
plot(1:5, 1:5)

plot(1:10, 1:10)

4 1) Objects and assignment

Objects are named boxes that store values. Use <- to assign.

x <- 3 * 4
x
## [1] 12

5 2) Vectors

A vector is a one-dimensional collection of the same type. Most R functions work element-by-element.

primes <- c(2, 3, 5, 7, 11, 13)  # c() combines values
primes * 2                       # element-wise multiply
## [1]  4  6 10 14 22 26
primes - 1
## [1]  1  2  4  6 10 12

6 3) Comments and naming rules

Use # for notes. Good names help you later. R is case-sensitive (age ≠ AGE). Valid names start with a letter; then letters, numbers, _, or ..

i_use_snake_case <- 5      # snake_case uses the _ character between words
CamelCaseExample <- 10     # CamelCase capitalizes words
some.people.use.periods <- 1

r_rocks <- 2^3; r_rocks    # "R_rocks" would be a different object name
## [1] 8

7 4) Packages — install and load

Packages add functions. Install once; load every session.

install.packages("tidyverse")     # run once on your computer

library(tidyverse)                # load for this session (ggplot2, dplyr, tidyr, readr, etc.)

8 5) Core tidyverse (quick reference)

ggplot2 — plots
dplyr — filter/select/mutate/summarise
tidyr — reshape (pivot_longer/pivot_wider)
readr — read flat files (CSV/TSV)
tibble — modern data frame
stringr / forcats — text & categories
purrr — map functions over lists/vectors
lubridate — dates/times

9 6) Importing data — web → save to folder → read local copy

Start from the web (fastest for class), then save a copy for later, then read the saved file. Optional: set a working directory for your lab folder.

# Optional: working directory (edit path for your machine)
# setwd("C:/Users/touseefh/OneDrive - Høgskolen i Innlandet/labs")
# getwd()  # check current working directory

# 1) Read directly from the web
url_students <- "https://raw.githubusercontent.com/hadley/r4ds/main/data/students.csv"
students_web <- readr::read_csv(url_students)
head(students_web)  # quick check
## # A tibble: 6 × 5
##   `Student ID` `Full Name`      favourite.food     mealPlan            AGE  
##          <dbl> <chr>            <chr>              <chr>               <chr>
## 1            1 Sunil Huffmann   Strawberry yoghurt Lunch only          4    
## 2            2 Barclay Lynn     French fries       Lunch only          5    
## 3            3 Jayendra Lyne    N/A                Breakfast and lunch 7    
## 4            4 Leon Rossini     Anchovies          Lunch only          <NA> 
## 5            5 Chidiegwu Dunkel Pizza              Breakfast and lunch five 
## 6            6 Güvenç Attila    Ice cream          Lunch only          6

# 2) Save a copy to your OneDrive "labs" folder
labs_dir <- "C:/Users/touseefh/OneDrive - Høgskolen i Innlandet/labs"
labs_csv <- file.path(labs_dir, "students.csv")

# create the folder if needed (harmless if it already exists)
if (!dir.exists(labs_dir)) dir.create(labs_dir, recursive = TRUE)

# 2A) Save the data already in R to disk
readr::write_csv(students_web, labs_csv)

# 2B) OR download straight to disk (alternative; leave commented for class)
# utils::download.file(url_students, destfile = labs_csv, mode = "wb")

file.exists(labs_csv)  # should return TRUE
## [1] TRUE

# 3) Read the SAVED LOCAL FILE (from your OneDrive folder)
students_local <- readr::read_csv(labs_csv)
head(students_local)
## # A tibble: 6 × 5
##   `Student ID` `Full Name`      favourite.food     mealPlan            AGE  
##          <dbl> <chr>            <chr>              <chr>               <chr>
## 1            1 Sunil Huffmann   Strawberry yoghurt Lunch only          4    
## 2            2 Barclay Lynn     French fries       Lunch only          5    
## 3            3 Jayendra Lyne    N/A                Breakfast and lunch 7    
## 4            4 Leon Rossini     Anchovies          Lunch only          <NA> 
## 5            5 Chidiegwu Dunkel Pizza              Breakfast and lunch five 
## 6            6 Güvenç Attila    Ice cream          Lunch only          6

10 7) Quick data checks

Always glance at the shape, names, and a few rows before analysis.

head(students_web)            # first 6 rows
## # A tibble: 6 × 5
##   `Student ID` `Full Name`      favourite.food     mealPlan            AGE  
##          <dbl> <chr>            <chr>              <chr>               <chr>
## 1            1 Sunil Huffmann   Strawberry yoghurt Lunch only          4    
## 2            2 Barclay Lynn     French fries       Lunch only          5    
## 3            3 Jayendra Lyne    N/A                Breakfast and lunch 7    
## 4            4 Leon Rossini     Anchovies          Lunch only          <NA> 
## 5            5 Chidiegwu Dunkel Pizza              Breakfast and lunch five 
## 6            6 Güvenç Attila    Ice cream          Lunch only          6
nrow(students_web); ncol(students_web)
## [1] 6
## [1] 5
names(students_web)
## [1] "Student ID"     "Full Name"      "favourite.food" "mealPlan"      
## [5] "AGE"
# str(students_web)           # structure (types & preview)
# dplyr::glimpse(students_web) # compact structure view
ls()                           # objects in your Environment
##  [1] "a"                       "b"                      
##  [3] "c"                       "CamelCaseExample"       
##  [5] "i_use_snake_case"        "labs_csv"               
##  [7] "labs_dir"                "primes"                 
##  [9] "r_rocks"                 "some.people.use.periods"
## [11] "students_local"          "students_web"           
## [13] "url_students"            "x"
# View(students_web)          # spreadsheet view (use in RStudio)

11 8) Cleaning: treat “N/A” and blanks as missing on read

Convert common missing markers to NA during import to avoid surprises later.

students_clean <- readr::read_csv(
  "https://raw.githubusercontent.com/hadley/r4ds/main/data/students.csv",
  na = c("N/A", "")
)

# Quick missingness check
sum(is.na(students_clean))       # total missing cells
## [1] 2
colSums(is.na(students_clean))   # missing cells by column
##     Student ID      Full Name favourite.food       mealPlan            AGE 
##              0              0              1              0              1

12 9) Fixing data types (character → numeric)

Example: if AGE contains the word “five”, change it to 5 and convert the column to numeric. Guard with a name check so the knit won’t fail if AGE is absent.

if ("AGE" %in% names(students_clean)) {
  students_clean$AGE <- gsub("five", "5", students_clean$AGE)  # replace text with digits
  students_clean$AGE <- as.numeric(students_clean$AGE)         # convert to numeric
  summary(students_clean$AGE)                                  # sanity check
} else {
  "AGE column not found; skipping type conversion."
}
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     4.0     5.0     5.0     5.4     6.0     7.0       1

13 10) Multiple files at once (tag rows with source)

You can read several files and tag each row with its source. Keep this as a template.

# Remote example (uncomment only if the URLs point to real CSVs)
sales_files <- c(
  "https://pos.it/r4ds-01-sales",
  "https://pos.it/r4ds-02-sales",
  "https://pos.it/r4ds-03-sales"
)
sales_all <- readr::read_csv(sales_files, id = "file")   # adds a 'file' column with the source
head(sales_all)

# Local folder example: read all CSVs in a folder
# files <- list.files("C:/path/to/folder", pattern = "\\.csv$", full.names = TRUE)
# sales_all <- purrr::map_dfr(files, readr::read_csv, .id = "source_file")
# head(sales_all)

14 11) Other file formats (readr + friends)

Use the reader that matches your data.

# readr::read_csv2("data_semicolon.csv")        # semicolon ;
# readr::read_tsv("data.tsv")                   # tab \t
# readr::read_delim("data_pipe.txt", delim="|") # any delimiter
# readr::read_fwf("data_fwf.txt",
#   col_positions = readr::fwf_widths(c(3,5), col_names = c("col1","col2")))
# readr::read_table("data.txt")                 # whitespace
# readr::read_log("access.log")                 # web server logs

# Non-readr formats (install as needed)
# install.packages("readxl");  library(readxl)
# excel_df <- readxl::read_excel("data.xlsx", sheet = 1)
# install.packages("jsonlite"); library(jsonlite)
# json_df  <- jsonlite::fromJSON("data.json", flatten = TRUE)
# install.packages("arrow");    library(arrow)
# parquet_df <- arrow::read_parquet("data.parquet")

Lecture 3 — R & Tidyverse package Guide

Touseef Hameed

2025-08-28