4 Rows
because, they did not have a match on the other table
Q1 <- inner_join(customers, orders, by = "customer_id")
head(Q1)
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
6 Rows
Because, all customers are included, even those without order information.
Q2 <- left_join(customers, orders, by = "customer_id")
head(Q2)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
6 Rows
The customer_ids which have null are id 105 and 106
Q3 <- right_join(customers, orders, by = "customer_id")
head(Q3)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
8 Rows
The rows where information is only from one table is that rows 5 and 6 are from the customers table and rows 7 and 8 are from the orders table
Q4 <- full_join(customers, orders, by = "customer_id")
head(Q4)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
3 Rows
Only customers with order information are returned, but order data isn’t included.
Q5 <- semi_join(customers, orders, by = "customer_id")
head(Q5)
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
The customers that are in the result are customer 4(David) and customer 5(Eve)
It shows that only customers without order information are returned
Q6 <- anti_join(customers, orders, by = "customer_id")
head(Q6)
## # A tibble: 2 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 4 David Houston
## 2 5 Eve Phoenix
Full Join, because it shows all customers and orders from both tables and shows customers that do not have orders and orders that do not have customer information.
Inner join, because it shows all the customers who have placed orders leaving out the ones who have not placed orders and the orders that do not have customer information.
Q7 <- full_join(customers, orders, by = "customer_id")
head(Q7)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
Q7 <- inner_join(customers, orders, by = "customer_id")
head(Q7)
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
customer_summary <- customers %>%
left_join(orders, by = "customer_id") %>% # Ensure all customers are included
group_by(customer_id, name, city) %>%
summarize(
total_orders = n_distinct(order_id, na.rm = TRUE), # Count unique orders
total_spent = sum(amount, na.rm = TRUE), # Sum total amount spent
.groups = "drop"
) %>%
arrange(desc(total_spent)) # Optional: Sort by highest spenders
head(customer_summary)
## # A tibble: 5 × 5
## customer_id name city total_orders total_spent
## <dbl> <chr> <chr> <int> <dbl>
## 1 2 Bob Los Angeles 2 2300
## 2 1 Alice New York 1 1200
## 3 3 Charlie Chicago 1 300
## 4 4 David Houston 0 0
## 5 5 Eve Phoenix 0 0