Slide 1: Title Slide
- Title: “Introduction to
dplyr: Data Manipulation in R” - Subtitle: “A powerful tool for data wrangling and analysis”
- Your Name
- Date
2025-02-06
dplyr: Data Manipulation in R”dplyr?dplyr is an R package for data manipulation, part of the tidyverse.%>%).dplyrCode:
install.packages("dplyr") # Install the package
library(dplyr) # Load the packageNote: dplyr is part of the tidyverse, so you can also install it via:
install.packages("tidyverse")
library(tidyverse)dplyr Verbsselect(): Select columns.filter(): Filter rows based on conditions.mutate(): Create or modify columns.arrange(): Sort rows.summarize(): Aggregate data.group_by(): Group data for grouped operations.Dataset: Use a built-in dataset like mtcars or iris.
Code:
data("mtcars")
head(mtcars)select()Purpose: Choose specific columns.
Example:
mtcars %>% select(mpg, hp, wt)
Output: Show the resulting subset of columns.
filter()Purpose: Subset rows based on conditions.
Example:
mtcars %>% filter(mpg > 20, cyl == 4)
Output: Show the filtered rows.
mutate()Purpose: Add or modify columns.
Example:
mtcars %>% mutate(kpl = mpg * 0.425) # Convert mpg to km per liter
Output: Show the new column.
arrange()Purpose: Reorder rows.
Example:
mtcars %>% arrange(desc(mpg)) # Sort by mpg in descending order
Output: Show the sorted data.
summarize()Purpose: Compute summary statistics.
Example:
mtcars %>% summarize(avg_mpg = mean(mpg), max_hp = max(hp))
Output: Show the summary.
group_by()Purpose: Perform operations by group.
Example:
mtcars %>% group_by(cyl) %>% summarize(avg_mpg = mean(mpg))
Output: Show the grouped summary.
%>%)Purpose: Chain multiple operations.
Example:
mtcars %>% filter(cyl == 6) %>% select(mpg, hp) %>% arrange(desc(mpg))
Output: Show the final result.
Purpose: Combine datasets.
Key Functions:
left_join(), right_join(), inner_join(), full_join()Example:
left_join(df1, df2, by = "key_column")
na.omit() to handle missing data.dplyr with ggplot2 for visualization.dplyr documentation: ?dplyr