Perform an inner join between the customers and orders datasets.
Q1 <- inner_join(customers, orders, by = "customer_id")
print(Q1)
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
Perform a left join with customers as the left table and orders as the right table.
Q2 <- left_join(customers, orders, by = "customer_id")
print(Q2)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
Perform a right join with customers as the left table and orders as the right table.
Q3 <- right_join(customers, orders, by = "customer_id")
print(Q3)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
Perform a full join between customers and orders.
Q4 <- full_join(customers, orders, by = "customer_id")
print(Q4)
## # A tibble: 8 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
## 7 6 <NA> <NA> 105 Camera 600
## 8 7 <NA> <NA> 106 Printer 150
Perform a semi join with customers as the left table and orders as the right table.
Q5 <- semi_join(customers, orders, by = "customer_id")
print(Q5)
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
Perform an anti join with customers as the left table and orders as the right table.
Q6 <- anti_join(customers, orders, by = "customer_id")
print (Q6)
## # A tibble: 2 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 4 David Houston
## 2 5 Eve Phoenix
scenario_a <- full_join(customers, orders, by = "customer_id") %>%
select(customer_id, name, city) %>%
distinct()
scenario_b <- right_join(customers, orders, by = "customer_id") %>%
select(customer_id, name, city) %>%
distinct()
print(scenario_a)
## # A tibble: 7 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
## 4 4 David Houston
## 5 5 Eve Phoenix
## 6 6 <NA> <NA>
## 7 7 <NA> <NA>
print(scenario_b)
## # A tibble: 5 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
## 4 6 <NA> <NA>
## 5 7 <NA> <NA>
Create a summary that shows each customer’s name, city, total number of orders, and total amount spent. Include all customers, even those without orders. Hint: You’ll need to use a combination of joins and group_by/summarize operations.
summary_data <- customers %>%
left_join(orders, by = "customer_id") %>%
group_by(name, city) %>%
summarize(
total_orders = n_distinct(order_id, na.rm = TRUE),
total_amount_spent = sum(amount, na.rm = TRUE)
) %>%
ungroup()
print(summary_data)
## # A tibble: 5 × 4
## name city total_orders total_amount_spent
## <chr> <chr> <int> <dbl>
## 1 Alice New York 1 1200
## 2 Bob Los Angeles 2 2300
## 3 Charlie Chicago 1 300
## 4 David Houston 0 0
## 5 Eve Phoenix 0 0