Load the Datasets

library(dplyr)
# Dataset 1: Customers
customers <- tibble(
  customer_id = c(1, 2, 3, 4, 5),
  name = c("Alice", "Bob", "Charlie", "David", "Eve"),
  city = c("New York", "Los Angeles", "Chicago", "Houston", "Phoenix")
)

# Dataset 2: Orders
orders <- tibble(
  order_id = c(101, 102, 103, 104, 105, 106),
  customer_id = c(1, 2, 3, 2, 6, 7),
  product = c("Laptop", "Phone", "Tablet", "Desktop", "Camera", "Printer"),
  amount = c(1200, 800, 300, 1500, 600, 150)
)

Question 1

q1 <- customers %>%
  inner_join(orders, by = "customer_id")
print(q1)
## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
nrow(q1)
## [1] 4

A. There are four rows in the table

B. Customer id number 5 (Eve) and id number 4 (David) is not represented in the orders data set (did not buy anything) so it is removed

Question 2

q2 <- customers %>%
  left_join(orders, by = "customer_id")
print(q2)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
nrow(q2)
## [1] 6

A. Six rows in the table

B. The number of rows is larger than the inner join because all customers are included, regardless of whether they have matching orders

Question 3

q3 <- customers %>%
  right_join(orders, by = "customer_id")
print(q3)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           6 <NA>    <NA>             105 Camera     600
## 6           7 <NA>    <NA>             106 Printer    150
nrow(q3)
## [1] 6

A. Six rows in the table

B. Customer ids six and seven are null in the table, this is because their numbers are not within the orders data set

Question 4

q4 <- customers %>%
  full_join(orders, by = "customer_id")
print(q4)
## # A tibble: 8 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
## 7           6 <NA>    <NA>             105 Camera     600
## 8           7 <NA>    <NA>             106 Printer    150
nrow(q4)
## [1] 8

A. Eight rows in the table

B. If a row has NULL in the orders column, that means the customer has not placed any orders. If a row has NULL in the customers column, that means there is an order that does not correspond to any existing customer

Question 5

q5 <- customers %>%
  semi_join(customers, orders, by = "customer_id")
print(q5)
## # A tibble: 5 × 3
##   customer_id name    city       
##         <dbl> <chr>   <chr>      
## 1           1 Alice   New York   
## 2           2 Bob     Los Angeles
## 3           3 Charlie Chicago    
## 4           4 David   Houston    
## 5           5 Eve     Phoenix
nrow(q5)
## [1] 5

A. Five rows in the table

B. The inner join would return all matching rows between the two tables and includes columns from both tables. A semi join only returns rows from the left table that have matches in the right table, without including columns from the right table.

Question 6

q6 <- customers %>%
  anti_join(orders, by = "customer_id")
print(q6)
## # A tibble: 2 × 3
##   customer_id name  city   
##         <dbl> <chr> <chr>  
## 1           4 David Houston
## 2           5 Eve   Phoenix
nrow(q6)
## [1] 2

A. Two rows in the table

B. The anti join would result in returning the customer ids for those who do not have an order (ID)

Question 7

q7a <- customers %>%
  full_join(orders, by = "customer_id")
print(q7a)
## # A tibble: 8 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
## 7           6 <NA>    <NA>             105 Camera     600
## 8           7 <NA>    <NA>             106 Printer    150
q7b <- customers %>%
  right_join(orders, by = "customer_id")
print(q7b)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           6 <NA>    <NA>             105 Camera     600
## 6           7 <NA>    <NA>             106 Printer    150

A. Using left Join gives a full list of customers, including those with no orders (customers without orders will show NA for order details)

B. Using Inner Join will only include customers who have placed at least one order, as it shows only the matches between the two tables.