LOAD LIBRARY:

✔️ The dplyr library is a powerful tool in R for data manipulation.

✔️ It provides various functions for data wrangling,particularly useful for handling and transforming data frames.

✔️ Here, dplyr is used to perform joins, filter rows, and select columns efficiently.

library(dplyr)

CREATING A DATA FRAME:

Three data frames, employees, departments, and projects, are created with different columns:

employees: Contains employee IDs, names, genders, and department IDs.

departments: Maps department IDs to department names.

projects: Contains project IDs, project names, and assigned employee IDs.The emp_id in projects indicates which employees are assigned to which projects. The missing (NA) value in emp_id for “Project D” represents an unassigned project.

employees <- data.frame(
  emp_id = c(1, 2, 3, 4, 5),
  emp_name = c("Alexa", "Barath", "Chandran", "Divya", "Elango"),
  gender = c("F", "M", "M", "F", "M"),
  dept_id = c(1, 1, 2, 2, 3)
)

departments <- data.frame(
  dept_id = c(1, 2, 3, 4),
  dept_name = c("HR", "Finance", "IT", "Marketing")
)

projects <- data.frame(
  proj_id = c(1, 2, 3, 4),
  proj_name = c("Project A", "Project B", "Project C", "Project D"),
  emp_id = c(1, 2, 2, NA)
)

A Inner Join: assigned_employees:

An inner join merges employees and projects where emp_id exists in both tables, capturing only employees assigned to projects. The select() function specifies the columns to keep: employee ID, employee name, and project name.

assigned_employees <- employees %>%
  inner_join(projects, by = "emp_id") %>%
  select(emp_id, emp_name, proj_name)
print(assigned_employees)
##   emp_id emp_name proj_name
## 1      1    Alexa Project A
## 2      2   Barath Project B
## 3      2   Barath Project C

Output: Only employees with project assignments are listed.

B.Left Join: all_employees_projects:

A left join retains all rows from employees and matches rows from projects based on emp_id. Unassigned employees appear with NA in proj_name.

all_employees_projects <- employees %>%
  left_join(projects, by = "emp_id") %>%
  select(emp_id, emp_name, proj_name)
print(all_employees_projects)
##   emp_id emp_name proj_name
## 1      1    Alexa Project A
## 2      2   Barath Project B
## 3      2   Barath Project C
## 4      3 Chandran      <NA>
## 5      4    Divya      <NA>
## 6      5   Elango      <NA>

Output: Displays all employees, with or without project assignments.

C.Right Join: all_projects_employees:

A right join keeps all rows from employees and attempts to match rows from projects on emp_id.

all_projects_employees <- projects %>%
  right_join(employees, by = "emp_id") %>%
  select(proj_id, proj_name, emp_name)
print(all_projects_employees)
##   proj_id proj_name emp_name
## 1       1 Project A    Alexa
## 2       2 Project B   Barath
## 3       3 Project C   Barath
## 4      NA      <NA> Chandran
## 5      NA      <NA>    Divya
## 6      NA      <NA>   Elango

Output: Lists all projects and employees, including NA for unassigned projects, as in “Project D.”

D.Full Join: complete_list:

A full join keeps all records from both tables, whether they have matching rows or not. This join is useful for getting a comprehensive view of all employees and projects, including unassigned employees and projects.

complete_list <- employees %>%
  full_join(projects, by = "emp_id") %>%
  select(emp_id, emp_name, proj_name)
print(complete_list)
##   emp_id emp_name proj_name
## 1      1    Alexa Project A
## 2      2   Barath Project B
## 3      2   Barath Project C
## 4      3 Chandran      <NA>
## 5      4    Divya      <NA>
## 6      5   Elango      <NA>
## 7     NA     <NA> Project D

Output: Shows all employees and projects, filling with NA where no match exists.

D. Inner Join with distinct(): assigned_employees_distinct:

This version of an inner join lists unique employees who are assigned to at least one project, using distinct() to remove duplicates.

assigned_employees_distinct <- employees %>%
  inner_join(projects, by = "emp_id") %>%
  distinct(emp_id, emp_name)
print(assigned_employees_distinct)
##   emp_id emp_name
## 1      1    Alexa
## 2      2   Barath

Output: Returns each assigned employee only once, regardless of the number of projects they work on.

F. Anti Join: unassigned_employees:

An anti join captures rows in employees that have no match in projects by emp_id. This is useful to identify employees without any project assignments.

unassigned_employees <- employees %>%
  anti_join(projects, by = "emp_id") %>%
  select(emp_id, emp_name)
print(unassigned_employees)
##   emp_id emp_name
## 1      3 Chandran
## 2      4    Divya
## 3      5   Elango

Output: Displays employees who are currently unassigned.