q1 <- inner_join(customers, orders, by = "customer_id")
There are 4 rows
They do not have a match in the other table.
q1
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
q2 <- left_join(customers, orders, by = "customer_id")
There are 6 rows
Left join keeps all customers even if they haven’t placed any orders.
q2
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
q3 <- right_join(customers, orders, by = "customer_id")
There are 6 rows
The customer id with NULL for customer name and city exist in the orders table but not in the customers table.
q3
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
q4 <- full_join(customers, orders, by = "customer_id")
There are 8 rows
There was a customer with no order or an order with no matching customer.
q4
## # A tibble: 8 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
## 7 6 <NA> <NA> 105 Camera 600
## 8 7 <NA> <NA> 106 Printer 150
q5 <- semi_join(customers, orders, by = "customer_id")
There are 3 rows
Semi join returns customers who have orders but doesn’t show the order details, while inner join returns customers and order details.
q5
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
q6 <- anti_join(customers, orders, by = "customer_id")
People who have no orders in the order table.
Customers are registered but have not placed an order.
q6
## # A tibble: 2 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 4 David Houston
## 2 5 Eve Phoenix
Use a left join to find all customers. It keeps every customer even those who have no matching order.
Use semi join to find only customers who have placed orders. It shows just the customers with at least one order.
q7 <- left_join(customers, orders, by = "customer_id"); semi_join(customers, orders, by = "customer_id")
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
q7
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
name, city, total number of orders, and total amount spent. Include all customers, even those without orders. Hint: You’ll need to use a combination of joins and group_by/summarize operations.
q8 <- customers %>%
left_join(orders, by = "customer_id") %>%
group_by(customer_id, city) %>%
summarize(total_orders = n(), total_spent = sum(amount, na.rm = TRUE)) %>%
arrange(desc(total_spent))
## `summarise()` has grouped output by 'customer_id'. You can override using the
## `.groups` argument.
q8
## # A tibble: 5 × 4
## # Groups: customer_id [5]
## customer_id city total_orders total_spent
## <dbl> <chr> <int> <dbl>
## 1 2 Los Angeles 2 2300
## 2 1 New York 1 1200
## 3 3 Chicago 1 300
## 4 4 Houston 1 0
## 5 5 Phoenix 1 0