q1 <- inner_join(customers, orders , by = 'customer_id')
## [1] 4
Becuase they do not have a match in the other table.
q1
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
q2 <- left_join(customers, orders, by = "customer_id")
## [1] 6
This number of rows is different from the inner join result because it shows the values tha produce NA results
q2
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
q3 <- right_join(customers, orders, by = "customer_id")
## [1] 6
Customer ids 6 and 7 both have NULL for customer name and city because there is not customer data for them in the database.
q3
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
q4 <- full_join(customers, orders, by = "customer_id")
## [1] 8
Rows 5,6,7, and 8 have information from only one table. Rows 5 and 6 are missing order_id information for customers with the customer_id of 4 and 5. Rows 7 and 8 are missing customer_id information withorders withe the order_id of 105 and 106.
q4
## # A tibble: 8 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
## 7 6 <NA> <NA> 105 Camera 600
## 8 7 <NA> <NA> 106 Printer 150
q5 <- semi_join(customers, orders, by = "customer_id")
## [1] 3
This result does not duplicate any same results while the inner join has duplicates.
q5
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
q6 <- anti_join(customers, orders, by = "customer_id")
David and Eve
They have never purchased anything.
q6
## # A tibble: 2 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 4 David Houston
## 2 5 Eve Phoenix
The only customers that count are the ones that appear in the customer table. To find these customers I would use left join because this includes all customers, and those without orders will still appear with their order details as NA.
To find only customers who have placed orders I would use inner join because this will return only customers who have matching orders, and exclude those who haven’t placed any.
left_join(customers, orders, by = "customer_id")
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
inner_join(customers, orders , by = 'customer_id')
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
challenge <- left_join(customers, orders, by = "customer_id") %>%
group_by(name, city) %>%
summarize(
total_orders = sum(!is.na(order_id)),
total_spent = sum(amount, na.rm = TRUE)
) %>%
ungroup()
challenge
## # A tibble: 5 × 4
## name city total_orders total_spent
## <chr> <chr> <int> <dbl>
## 1 Alice New York 1 1200
## 2 Bob Los Angeles 2 2300
## 3 Charlie Chicago 1 300
## 4 David Houston 0 0
## 5 Eve Phoenix 0 0