2025-02-06

Slide 1: Title Slide

  • Title: “Introduction to dplyr: Data Manipulation in R”
  • Subtitle: “A powerful tool for data wrangling and analysis”
  • Your Name
  • Date

Slide 2: What is dplyr?

  • Definition:
    • dplyr is an R package for data manipulation, part of the tidyverse.
    • It provides a consistent and intuitive set of verbs for working with data frames.
  • Key Features:
    • Fast and efficient for large datasets.
    • Works seamlessly with pipes (%>%).
    • Easy to read and write code.

Slide 3: Installing and Loading dplyr

  • Code:

    install.packages("dplyr")  # Install the package
    library(dplyr)             # Load the package
  • Note: dplyr is part of the tidyverse, so you can also install it via:

    install.packages("tidyverse")
    library(tidyverse)

Slide 4: Key dplyr Verbs

  • Core Functions:
    • select(): Select columns.
    • filter(): Filter rows based on conditions.
    • mutate(): Create or modify columns.
    • arrange(): Sort rows.
    • summarize(): Aggregate data.
    • group_by(): Group data for grouped operations.

Slide 5: Example Dataset

  • Dataset: Use a built-in dataset like mtcars or iris.

  • Code:

    data("mtcars")
    head(mtcars)

Slide 6: Selecting Columns with select()

  • Purpose: Choose specific columns.

  • Example:

    mtcars %>% select(mpg, hp, wt)
  • Output: Show the resulting subset of columns.

Slide 7: Filtering Rows with filter()

  • Purpose: Subset rows based on conditions.

  • Example:

    mtcars %>% filter(mpg > 20, cyl == 4)
  • Output: Show the filtered rows.

Slide 8: Creating New Columns with mutate()

  • Purpose: Add or modify columns.

  • Example:

    mtcars %>% mutate(kpl = mpg * 0.425)  # Convert mpg to km per liter
  • Output: Show the new column.

Slide 9: Sorting Data with arrange()

  • Purpose: Reorder rows.

  • Example:

    mtcars %>% arrange(desc(mpg))  # Sort by mpg in descending order
  • Output: Show the sorted data.

Slide 10: Aggregating Data with summarize()

  • Purpose: Compute summary statistics.

  • Example:

    mtcars %>% summarize(avg_mpg = mean(mpg), max_hp = max(hp))
  • Output: Show the summary.

Slide 11: Grouped Operations with group_by()

  • Purpose: Perform operations by group.

  • Example:

    mtcars %>% group_by(cyl) %>% summarize(avg_mpg = mean(mpg))
  • Output: Show the grouped summary.

Slide 12: Combining Verbs with Pipes (%>%)

  • Purpose: Chain multiple operations.

  • Example:

    mtcars %>%
      filter(cyl == 6) %>%
      select(mpg, hp) %>%
      arrange(desc(mpg))
  • Output: Show the final result.

Slide 13: Joining Data Frames

  • Purpose: Combine datasets.

  • Key Functions:

    • left_join(), right_join(), inner_join(), full_join()
  • Example:

    left_join(df1, df2, by = "key_column")

Slide 14: Practical Tips

  • Tips:
    • Use na.omit() to handle missing data.
    • Combine dplyr with ggplot2 for visualization.
    • Practice with real-world datasets.

Slide 15: Resources

  • Learning Resources:
  • Community: RStudio Community, Stack Overflow.

Slide 16: Questions?

  • Title: “Any Questions?”
  • Contact Information (if applicable).