q1<- inner_join(customers,orders)
## Joining with `by = join_by(customer_id)`
There are 4 rows
It is showing all of the orders that had a customer in the customer table. If a customer did not make an order or if they made and order but they are not in the custumer table it will not be included.
head(q1)
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
q2<-left_join(customers,orders)
## Joining with `by = join_by(customer_id)`
There are 6 rows.
This displays all data from the customers table regardless whether or not they made an order or not.
head(q2)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
q3<- right_join(customers, orders)
## Joining with `by = join_by(customer_id)`
There are 6 rows.
Customer IDs 6 and 7 have null for customer name and city because they are not included in the customers table.
head(q3)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
q4<- full_join(customers, orders)
## Joining with `by = join_by(customer_id)`
There are 8 rows.
Rows 5, 6, 7, and 8 all have information from only one table because their customerid is not included in the other table.
head(q4)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
q5<- semi_join(customers, orders)
## Joining with `by = join_by(customer_id)`
There are 3 rows
It includes all the customers that made an order but it does not show the order data.
head(q5)
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
q6<- anti_join(customers, orders)
## Joining with `by = join_by(customer_id)`
David and Eve (customers 4 and 5)
It tells us that these customers did not make orders.
head(q6)
## # A tibble: 2 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 4 David Houston
## 2 5 Eve Phoenix
I would use a left join with the customers table on the left because it shows customers regardless of whether they make an order or not. ### Which join would you use to find only the customers who have placed orders? Why? I would use a right join with customers as the left table because it shows customers who place orders regardless if they are on the customer table ### Write the R code for both scenarios.
all_customers <- left_join(customers, orders)
## Joining with `by = join_by(customer_id)`
customers_with_orders <- right_join(customers, orders)
## Joining with `by = join_by(customer_id)`
head(all_customers)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
head(customers_with_orders)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
customer_summary <- customers %>%
left_join(orders, by = "customer_id") %>%
group_by(customer_id, name, city) %>%
summarize(
total_orders = sum(!is.na(order_id)),
total_spent = sum(amount, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(customer_id)
customer_summary
## # A tibble: 5 × 5
## customer_id name city total_orders total_spent
## <dbl> <chr> <chr> <int> <dbl>
## 1 1 Alice New York 1 1200
## 2 2 Bob Los Angeles 2 2300
## 3 3 Charlie Chicago 1 300
## 4 4 David Houston 0 0
## 5 5 Eve Phoenix 0 0