Data Wrangling with R

Author

Angelina Jao

Published

March 18, 2026

Note

Note This codebook will follow the structure and content of the published codebook for the project: https://jmjung.quarto.pub/m05-1-principles-data-wrangling-with-tidyverse-in-r/

Overview

Learning Outcomes

By the end of this module, you will be able to:

  • Describe the concept of Data Wrangling.
  • Describe how Tibbles are different from data frames
  • Explain how to convert wide or long data to “Tidy” data
  • Explain how to merge relational data sets using join functions. (Next module)
  • Explain how to use grouped mutates and filter together.
  • Be familiar with major dplyr functions for transforming data.
  • Create a new variable with mutate() and case_when().
  • Use the pipe operator to shape the data to prepare for analysis and visualization

The textbook chapters to cover

  • Ch3: Data Transformation
  • Ch5: Data Tidying
  • Ch13: Numbers

Introduction to Data Wrangling

Loading the packages

  • to add a code chunk, use cmd + option + I
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

What is Tidyverse?

The tidyverse is a collection of R packages that share a common design philosophy and are designed to work together seamlessly. The tidyverse includes packages for data manipulation, visualization, and modeling, among other tasks. Some of the core packages in the tidyverse include:

  • ggplot2: for data visualization
  • dplyr: for data manipulation
  • tidyr: for data tidying
  • readr: for data import
  • purrr: for functional programming
  • tibble: for data frames
  • stringr: for string manipulation
  • forcats: for working with categorical variables

Initial Data Preparation and Exploration

#install.packages("nycflights13")
library(nycflights13)
data()

Data Wrangling with dplyr

Note: use shift + option + I for multi-cursor activation.

  • airlines Airline names.
  • airports Airport metadata
  • flights Flights data
  • planes Plane metadata.
  • weather Hourly weather data
flights <- flights

flights |> 
  count(year, month) |> 
  arrange(desc(n)) |> 
  mutate(month = as_factor(month)) |> 
  mutate(month = fct_reorder(month, n)) |> 
  ggplot(aes(month, n, fill = year)) +
  geom_col(fill = "#c6891f", show.legend = FALSE) +
  coord_flip() +
    labs(x = "Month",
       y = "# of Flights",
       title = "# of Flights by Month during Year 2013 at New York")