Q1 <- inner_join(
customers,
orders,
by = "customer_id"
)
print(Q1)
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
print(
paste(
nrow(Q1),
"rows in the result"
)
)
## [1] "4 rows in the result"
Inner join returns all rows from both the customers and orders tables where there is a match. Only customers with matching customer_id in both the customers table and the orders table are included. This means there are some customers without order information or some orders without customer information that are excluded.
Q2 <- left_join(
customers,
orders,
by = "customer_id"
)
print(Q2)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
print(
paste(
nrow(Q2),
"rows in the result"
)
)
## [1] "6 rows in the result"
Left join returns all rows from the customers table and matching rows from the orders table. All customers are included (even those without order information). Orders without customer information are still excluded.
Q3 <- right_join(
customers,
orders,
by = "customer_id"
)
print(Q3)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
print(
paste(
nrow(Q3),
"rows in the result"
)
)
## [1] "6 rows in the result"
print(
filter(
Q3,
is.na(name),
is.na(city)
)$customer_id
)
## [1] 6 7
Right join returns all rows from the orders table and matching rows from the customers table. All orders are included (even those without customer information). Customers without order information are still excluded. customer_ids with NULL for customer name and city are orders without customer information.
Q4 <- full_join(
customers,
orders,
by = "customer_id"
)
print(Q4)
## # A tibble: 8 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
## 7 6 <NA> <NA> 105 Camera 600
## 8 7 <NA> <NA> 106 Printer 150
print(
paste(
nrow(Q4),
"rows in the result"
)
)
## [1] "8 rows in the result"
print((
mutate(
Q4,
row_num = row_number()
) %>%
filter(
(is.na(name) & is.na(city)) |
(is.na(product) & is.na(amount))
)
)$row_num)
## [1] 5 6 7 8
Full join returns all rows when there is a match in either the customers or orders table. All rows from both tables are included (with NA where there is no match). Rows with NA have results from only one table.
Q5 <- semi_join(
customers,
orders,
by = "customer_id"
)
print(Q5)
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
print(
paste(
nrow(Q5),
"rows in the result"
)
)
## [1] "3 rows in the result"
Semi join returns all rows from the customers table where there is a match in the orders table. Only customers with order information are returned. Order information is not included in the result.
Q6 <- anti_join(
customers,
orders,
by = "customer_id"
)
print(Q6)
## # A tibble: 2 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 4 David Houston
## 2 5 Eve Phoenix
print(Q6$name)
## [1] "David" "Eve"
Anti join returns all rows from the customers table where there is no match in the orders table. Only customers without order information are returned. This means David and Eve have not placed any orders.
Left join. Left join returns all rows from the customers table and matching rows from the orders table. All customers are included (even those without order information).
Right join. Right join returns all rows from the orders table and matching rows from the customers table. All orders are included (even those without customer information).
Q7A <- left_join(
customers,
orders,
by = "customer_id"
)
print(Q7A)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
print((
group_by(
Q7A,
customer_id,
name
) %>%
summarise()
)$name)
## [1] "Alice" "Bob" "Charlie" "David" "Eve"
Q7B <- right_join(
customers,
orders,
by = "customer_id"
)
print(Q7B)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
print((
filter(
Q7B,
!is.na(name)
) %>%
group_by(
customer_id,
name
) %>%
summarise()
)$name)
## [1] "Alice" "Bob" "Charlie"
Q8 <-
full_join(
customers,
orders,
by = "customer_id"
) %>%
group_by(
customer_id,
name,
city
) %>%
summarise(
total_orders = sum(!is.na(order_id)),
total_spent = sum(amount)
)
print(Q8)
## # A tibble: 7 × 5
## # Groups: customer_id, name [7]
## customer_id name city total_orders total_spent
## <dbl> <chr> <chr> <int> <dbl>
## 1 1 Alice New York 1 1200
## 2 2 Bob Los Angeles 2 2300
## 3 3 Charlie Chicago 1 300
## 4 4 David Houston 0 NA
## 5 5 Eve Phoenix 0 NA
## 6 6 <NA> <NA> 1 600
## 7 7 <NA> <NA> 1 150