R Programming - Class 2: Packages, Data Import, and Tidyverse Basics

Author

Akash Mitra

🎯 Objectives

By the end of this class, you will be able to:

Install and load R packages.
Import .csv and .xlsx files into R.
Understand what the tidyverse is and how to use it.
Perform basic data manipulation using dplyr.

🗂️ Files associated with this class can be downloaded from the links:

🧭 Class Outline

1. 📦 Packages in R

Packages extend R’s functionality.
Use install.packages() to install.
Use library() to load into your session.

# Installing a package
#install.packages("tidyverse")

# Loading a package
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Installed only once, but loaded each time you start R.

2. 📁 Importing Data

🔹 Import CSV

data <- read.csv("data.csv")

# Using readr (part of tidyverse)
library(readr)
data <- read_csv("data.csv")

Rows: 4 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): name, department
dbl (2): age, score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

🔸 Import Excel

# Requires readxl package
#install.packages("readxl")
library(readxl)

data <- read_excel("data.xlsx")

3. 🌐 Introduction to Tidyverse

A collection of packages for data science:
- ggplot2, dplyr, tidyr, readr, tibble, stringr, forcats

library(tidyverse)

Tidyverse promotes a consistent and readable syntax.

4. 🧹 Basic Data Manipulation with dplyr

🔍 View and inspect your data

head(data)

# A tibble: 4 × 4
  name    age score department
  <chr> <dbl> <dbl> <chr>     
1 John     25    80 Sales     
2 Sara     30    95 HR        
3 Alex     22    85 Sales     
4 Maya     28    90 Marketing

glimpse(data)

Rows: 4
Columns: 4
$ name       <chr> "John", "Sara", "Alex", "Maya"
$ age        <dbl> 25, 30, 22, 28
$ score      <dbl> 80, 95, 85, 90
$ department <chr> "Sales", "HR", "Sales", "Marketing"

✨ Key `dplyr` functions

Function	Description
`select()`	Choose columns
`filter()`	Subset rows
`mutate()`	Create new variables
`arrange()`	Sort rows
`summarise()`	Aggregate values
`group_by()`	Group data

✅ Piping into the dataset:

#using the pipe operator '|>' or '%>%'
df <- read_excel("data.xlsx")

df |> 
  arrange(desc(age)) |> 
  summarise(mean = mean(age))

# A tibble: 1 × 1
   mean
  <dbl>
1  26.2

# Selecting columns
data %>% select(name, age)

# A tibble: 4 × 2
  name    age
  <chr> <dbl>
1 John     25
2 Sara     30
3 Alex     22
4 Maya     28

# Filtering rows
data %>% filter(age > 25)

# A tibble: 2 × 4
  name    age score department
  <chr> <dbl> <dbl> <chr>     
1 Sara     30    95 HR        
2 Maya     28    90 Marketing

# Creating a new column
data %>% mutate(score_percent = score / 100)

# A tibble: 4 × 5
  name    age score department score_percent
  <chr> <dbl> <dbl> <chr>              <dbl>
1 John     25    80 Sales               0.8 
2 Sara     30    95 HR                  0.95
3 Alex     22    85 Sales               0.85
4 Maya     28    90 Marketing           0.9

# Sorting
data %>% arrange(desc(score))

# A tibble: 4 × 4
  name    age score department
  <chr> <dbl> <dbl> <chr>     
1 Sara     30    95 HR        
2 Maya     28    90 Marketing 
3 Alex     22    85 Sales     
4 John     25    80 Sales

# Grouping and summarizing
data %>%
  group_by(department) %>%
  summarise(avg_score = mean(score, na.rm = TRUE))

# A tibble: 3 × 2
  department avg_score
  <chr>          <dbl>
1 HR              95  
2 Marketing       90  
3 Sales           82.5

🎯 Objectives

🧭 Class Outline

1. 📦 Packages in R

2. 📁 Importing Data

🔹 Import CSV

🔸 Import Excel

3. 🌐 Introduction to Tidyverse

4. 🧹 Basic Data Manipulation with dplyr

🔍 View and inspect your data

✨ Key dplyr functions

✅ Piping into the dataset:

✨ Key `dplyr` functions