q1 <- inner_join(customers , orders)
## Joining with `by = join_by(customer_id)`
The resulting data has 4 rows.
Because the customers and orders that are not included did not have a match in the other table, and inner join only returns when there is a match.
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
q2 <- left_join(customers , orders)
## Joining with `by = join_by(customer_id)`
The resulting data has 6 rows.
Customers with a customer_id, but have yet to place an order with said customer_id.
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
q3<- right_join(customers , orders)
## Joining with `by = join_by(customer_id)`
The resulting data has 6 rows.
Customer_ids 6 and 7, because there are no customers that have customer_id 6 or 7.
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
q4 <- full_join(customers , orders)
## Joining with `by = join_by(customer_id)`
The resulting data has 8 rows.
In rows 5 and 6, only information from customers is present, and in rows 7 and 8 only orders are present, because there are missing customer ids in customers, and customers who have not placed orders with their ids yet.
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
q5 <- semi_join(customers , orders)
## Joining with `by = join_by(customer_id)`
The resulting data has 3 rows.
Regardless of how many orders are made by one person, semi join only returns a customer id once, where inner join will return a customer id for each unique order placed.
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
q6 <- anti_join(customers , orders)
## Joining with `by = join_by(customer_id)`
It tells me these customers have a customer id, but have not placed any orders with their customer ids.
## # A tibble: 2 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 4 David Houston
## 2 5 Eve Phoenix
I would use left join because that way I see all customer ids, including customers without orders placed.
I would use semi join, because it only shows customer ids that are found on both datasets.
q2 <- left_join(customers , orders)
## Joining with `by = join_by(customer_id)`
q5<- semi_join(customers , orders)
## Joining with `by = join_by(customer_id)`
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago