dplyr is an R package that provides a set of functions for manipulating data frames in a user-friendly way.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Creating employee dataframe which contains emp_id,emp_name,gender,dept_id as variables.
Creating departments dataframe which contains dept_id,dept_nam as variables.
Creating projects dataframe which contains proj_id,proj_name,emp_id as variables.
projects <- data.frame(
proj_id = c(1, 2, 3, 4),
proj_name = c("Project A", "Project B", "Project C", "Project D"),
emp_id = c(1, 2, 2, NA)
)A semi join returns the rows of the employees table where it can find a match in the projects table without adding the columns from projects table.
The left_join combines two tables by retaining all rows from the Employees table(1st table) and adding matching rows from the Projects tables(2nd table), filling unmatched entries with NA.
The right_join combines two tables by retaining all rows from the employees table(2nd table)and adding matching rows from the projects table(1st table),filling unmatched entries with NA.
The left_join combines two tables by retaining all rows from the Employees table(1st table) and adding matching rows from the Projects table(2nd table), filling unmatched entries with NA.
A semi join returns the rows of the employees table where it can find a match in the projects table without adding the columns from projects table.