1. Inner Join (3 points) Perform an inner join between the customers and orders datasets.

q1 <- inner_join(customers , orders )
## Joining with `by = join_by(customer_id)`

How many rows are in the result?

There are four rows

Why are some customers or orders not included in the result?

inner join returns all rows from both tables where there is a match, therefore the rows not returned did not have a match in the other table

Display the result

head(q1)
## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300

2. Left Join (3 points) Perform a left join with customers as the left table and orders as the right table.

q2 <- left_join(customers, orders, by = "customer_id")

How many rows are in the result?

There are six rows

Explain why this number differs from the inner join result.

A left join keeps all rows from the left table (customers), and adds matching rows from the right table (orders). If there’s no match in orders, the result will have NA values for those columns. Therefore, the number of rows equals the number of rows in ‘customers’(even if some customers have no matching orders).

Display the result

head(q2)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA

3. Right Join (3 points) Perform a right join with customers as the left table and orders as the right table.

q3 <- right_join(customers, orders, by = "customer_id")

How many rows are in the result?

There are six rows

Which customer_ids in the result have NULL for customer name and city? Explain why.

q3[is.na(q3$name), "customer_id"]
## # A tibble: 2 × 1
##   customer_id
##         <dbl>
## 1           6
## 2           7

customer ids 6 and 7 have no matching entries in the ‘customers’ table, so their name and city are NA.

Display the result

head(q3)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           6 <NA>    <NA>             105 Camera     600
## 6           7 <NA>    <NA>             106 Printer    150