q1 <- inner_join(customers , orders )
## Joining with `by = join_by(customer_id)`
nrow(q1)
## [1] 4
Inner join returns all rows from both tables where there is a match.Therefore the rows not returned did not have a match in the other table.
head(q1)
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
q2 <- left_join(customers , orders )
## Joining with `by = join_by(customer_id)`
nrow(q2)
## [1] 6
Left join returns all rows from the left table and matching rows from the right table.
head(q2)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
q3 <- right_join(customers , orders )
## Joining with `by = join_by(customer_id)`
nrow(q3)
## [1] 6
Customer_ids 6 and 7 because they don’t exist in customers. Right join returns all rows from the right table and matching rows from the left table.
head(q3)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
q4 <- full_join(customers , orders )
## Joining with `by = join_by(customer_id)`
nrow(q4)
## [1] 8
Customers 4 and 5 have no orders. Orders with customer_id 6 and 7 have no customer match. Full join returns all rows when there is a match in either left or right table.
head(q4)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
q5 <- semi_join(customers , orders )
## Joining with `by = join_by(customer_id)`
nrow(q5)
## [1] 3
Semi join returns all rows from the left table where there is a match in the right table.
head(q5)
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
q6 <- anti_join(customers , orders )
## Joining with `by = join_by(customer_id)`
nrow(q6)
## [1] 2
It shows employees without salary information.
head(q6)
## # A tibble: 2 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 4 David Houston
## 2 5 Eve Phoenix
Left join because it will show all the customers in the dataset even without orders.
Inner join because it will only show customers that have value in the orders table.
q7 <- left_join(customers , orders )
## Joining with `by = join_by(customer_id)`
q8 <- inner_join(customers , orders )
## Joining with `by = join_by(customer_id)`
head(q7)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
head(q8)
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
summary_result <- customers %>%
left_join(orders, by = "customer_id") %>%
group_by(customer_id, name, city) %>%
summarise(
total_orders = n_distinct(order_id, na.rm = TRUE),
total_amount = sum(amount, na.rm = TRUE),
.groups = "drop"
)
head(summary_result)
## # A tibble: 5 × 5
## customer_id name city total_orders total_amount
## <dbl> <chr> <chr> <int> <dbl>
## 1 1 Alice New York 1 1200
## 2 2 Bob Los Angeles 2 2300
## 3 3 Charlie Chicago 1 300
## 4 4 David Houston 0 0
## 5 5 Eve Phoenix 0 0