Q1 <- inner_join(customers, orders)
## Joining with `by = join_by(customer_id)`
The resulting data has 4 rows.
Because the customers and orders that are not included did not have a match in the other table.
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
Q2 <- left_join(customers , orders)
## Joining with `by = join_by(customer_id)`
The resulting data has 6 rows.
Because the left join keeps all customers and the customers without orders appear with NA values.
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
Q3 <- right_join(customers , orders)
## Joining with `by = join_by(customer_id)`
The resulting data has 6 rows.
Orders exist for 6 and 7, but no matching customers because right join keeps all rows from orders.
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
Q4 <- full_join(customers , orders)
## Joining with `by = join_by(customer_id)`
The resulting data has 8 rows.
The full join returns 8 rows because it includes all customers and all orders, with customers 4 and 5 having no orders and orders 6 and 7 having no customers.
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
Q5 <- semi_join(customers , orders)
## Joining with `by = join_by(customer_id)`
The resulting data has 3 rows.
The semi join returns 3 rows because it includes only customers 1,2, and 3 who have at least one order and does not include order details like the inner join.
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
Q6 <- anti_join(customers , orders)
## Joining with `by = join_by(customer_id)`
The anti join returns customers 4 (David) and 5(Eve), showing that these customers have not placed any orders.
The result tells us that David and Eve do not have any matching records in the orders table, meaning that they have not placed any orders.
## # A tibble: 2 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 4 David Houston
## 2 5 Eve Phoenix
I would used a left join because it keeps all customers even if they do not have matching orders.
I would use a semi join because it returns only customers who have at least one matching order.
Q7 <- left_join(customers, orders)
## Joining with `by = join_by(customer_id)`
Q7_orders_only <- semi_join(customers, orders, by = "customer_id")
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
## # A tibble: 5 × 5
## customer_id name city total_orders total_spent
## <dbl> <chr> <chr> <int> <dbl>
## 1 1 Alice New York 1 1200
## 2 2 Bob Los Angeles 2 2300
## 3 3 Charlie Chicago 1 300
## 4 4 David Houston 0 0
## 5 5 Eve Phoenix 0 0