✔️ The dplyr library is a powerful tool in R for data manipulation.
✔️ It provides various functions for data wrangling,particularly useful for handling and transforming data frames.
✔️ Here, dplyr is used to perform joins, filter rows, and select columns efficiently.
Three data frames, employees, departments, and projects, are created with different columns:
employees: Contains employee IDs, names, genders, and department IDs.
departments: Maps department IDs to department names.
projects: Contains project IDs, project names, and assigned employee IDs.The emp_id in projects indicates which employees are assigned to which projects. The missing (NA) value in emp_id for “Project D” represents an unassigned project.
employees <- data.frame(
emp_id = c(1, 2, 3, 4, 5),
emp_name = c("Alexa", "Barath", "Chandran", "Divya", "Elango"),
gender = c("F", "M", "M", "F", "M"),
dept_id = c(1, 1, 2, 2, 3)
)
departments <- data.frame(
dept_id = c(1, 2, 3, 4),
dept_name = c("HR", "Finance", "IT", "Marketing")
)
projects <- data.frame(
proj_id = c(1, 2, 3, 4),
proj_name = c("Project A", "Project B", "Project C", "Project D"),
emp_id = c(1, 2, 2, NA)
)An inner join merges employees and projects where emp_id exists in both tables, capturing only employees assigned to projects. The select() function specifies the columns to keep: employee ID, employee name, and project name.
assigned_employees <- employees %>%
inner_join(projects, by = "emp_id") %>%
select(emp_id, emp_name, proj_name)
print(assigned_employees)## emp_id emp_name proj_name
## 1 1 Alexa Project A
## 2 2 Barath Project B
## 3 2 Barath Project C
Output: Only employees with project assignments are listed.
A left join retains all rows from employees and matches rows from projects based on emp_id. Unassigned employees appear with NA in proj_name.
all_employees_projects <- employees %>%
left_join(projects, by = "emp_id") %>%
select(emp_id, emp_name, proj_name)
print(all_employees_projects)## emp_id emp_name proj_name
## 1 1 Alexa Project A
## 2 2 Barath Project B
## 3 2 Barath Project C
## 4 3 Chandran <NA>
## 5 4 Divya <NA>
## 6 5 Elango <NA>
Output: Displays all employees, with or without project assignments.
A right join keeps all rows from employees and attempts to match rows from projects on emp_id.
all_projects_employees <- projects %>%
right_join(employees, by = "emp_id") %>%
select(proj_id, proj_name, emp_name)
print(all_projects_employees)## proj_id proj_name emp_name
## 1 1 Project A Alexa
## 2 2 Project B Barath
## 3 3 Project C Barath
## 4 NA <NA> Chandran
## 5 NA <NA> Divya
## 6 NA <NA> Elango
Output: Lists all projects and employees, including NA for unassigned projects, as in “Project D.”
A full join keeps all records from both tables, whether they have matching rows or not. This join is useful for getting a comprehensive view of all employees and projects, including unassigned employees and projects.
complete_list <- employees %>%
full_join(projects, by = "emp_id") %>%
select(emp_id, emp_name, proj_name)
print(complete_list)## emp_id emp_name proj_name
## 1 1 Alexa Project A
## 2 2 Barath Project B
## 3 2 Barath Project C
## 4 3 Chandran <NA>
## 5 4 Divya <NA>
## 6 5 Elango <NA>
## 7 NA <NA> Project D
Output: Shows all employees and projects, filling with NA where no match exists.
This version of an inner join lists unique employees who are assigned to at least one project, using distinct() to remove duplicates.
assigned_employees_distinct <- employees %>%
inner_join(projects, by = "emp_id") %>%
distinct(emp_id, emp_name)
print(assigned_employees_distinct)## emp_id emp_name
## 1 1 Alexa
## 2 2 Barath
Output: Returns each assigned employee only once, regardless of the number of projects they work on.
An anti join captures rows in employees that have no match in projects by emp_id. This is useful to identify employees without any project assignments.
unassigned_employees <- employees %>%
anti_join(projects, by = "emp_id") %>%
select(emp_id, emp_name)
print(unassigned_employees)## emp_id emp_name
## 1 3 Chandran
## 2 4 Divya
## 3 5 Elango
Output: Displays employees who are currently unassigned.