1. Perform an inner join between the customers and orders datasets:

q1 <- inner_join(customers, orders )
## Joining with `by = join_by(customer_id)`

A. How many rows are in the result?

as.english(nrow(q1))
## [1] four

B. Why are some customers or orders not included in the result?

Inner joins returns all rows from both tables where there is a match. Therefore the rows not returned did not have a match in the other table.

C. Display the result:

head(q1)
## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300

2. Perform a left join with customers as the left table and orders as the right table:

q2 <- left_join(customers, orders )
## Joining with `by = join_by(customer_id)`

A. How many rows are in the result?

as.english(nrow(q2))
## [1] six

B. Explain why this number differs from the inner join result.

Left join keep all orders even if they do not have orders.

C. Display the result:

head(q2)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA

3. Perform a right join with customers as the left table and orders as the right table:

q3 <- right_join(customers, orders )
## Joining with `by = join_by(customer_id)`

A. How many rows are in the result?

as.english(nrow(q3))
## [1] six

B. Which customer_ids in the result have NULL for customer name and city? Explain why.

Customer_ids 6 and 7 have NULL for customer name and city because they only exist in orders and not in customers.

C. Display the result:

head(q3)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           6 <NA>    <NA>             105 Camera     600
## 6           7 <NA>    <NA>             106 Printer    150

4. Perform a full join between customers and orders:

q4 <- full_join(customers, orders )
## Joining with `by = join_by(customer_id)`

A. How many rows are in the result?

as.english(nrow(q4))
## [1] eight

B. Identify any rows where there’s information from only one table. Explain these results.

Customers without orders (David, Eve). Orders without valid customers (IDs 6, 7).

C. Display the result:

head(q4)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA

5. Perform a semi join with customers as the left table and orders as the right table:

q5 <- semi_join(customers, orders )
## Joining with `by = join_by(customer_id)`

A. How many rows are in the result?

as.english(nrow(q5))
## [1] three

B. How does this result differ from the inner join result?

Semi join only keeps customers who have orders, but doesn’t bring in order details.

C. Display the result:

head(q5)
## # A tibble: 3 × 3
##   customer_id name    city       
##         <dbl> <chr>   <chr>      
## 1           1 Alice   New York   
## 2           2 Bob     Los Angeles
## 3           3 Charlie Chicago

6. Perform an anti join with customers as the left table and orders as the right table.

q6<- anti_join(customers, orders) 
## Joining with `by = join_by(customer_id)`

A. Which customers are in the result?

David and Eve.

B. Explain what this result tells you about these customers.

These customers exist in the customers data set but have not placed any orders.

C. Display the result:

head(q6)
## # A tibble: 2 × 3
##   customer_id name  city   
##         <dbl> <chr> <chr>  
## 1           4 David Houston
## 2           5 Eve   Phoenix

7. Imagine you’re analyzing customer behavior.

A. Which join would you use to find all customers, including those who haven’t placed any orders? Why?

We would use left join to find all customers including those who have not placed any orders because the left join keeps every row from the customers table and adds its order details even for those who have not placed orders.

B. Which join would you use to find only the customers who have placed orders? Why?

To find only the customers who have placed orders we would use inner join because it includes only matching customer_ids that exist in both tables.

C. Write the R code for both scenarios.

all_customers <- left_join(customers, orders) 
## Joining with `by = join_by(customer_id)`
customers_with_orders <- inner_join(customers, orders) 
## Joining with `by = join_by(customer_id)`

D. Display the result:

head(all_customers)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
head(customers_with_orders)
## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300

8. Create a summary that shows each customer’s name, city, total number of orders, and total amount spent.

summary_table <- customers %>%
  left_join(orders, by = "customer_id") %>%
  group_by(name, city) %>%
  summarise(
    total_orders = n_distinct(order_id, na.rm = TRUE),
    total_amount_spent = sum(amount, na.rm = TRUE),
    .groups = "drop"
  )

Display the result:

summary_table
## # A tibble: 5 × 4
##   name    city        total_orders total_amount_spent
##   <chr>   <chr>              <int>              <dbl>
## 1 Alice   New York               1               1200
## 2 Bob     Los Angeles            2               2300
## 3 Charlie Chicago                1                300
## 4 David   Houston                0                  0
## 5 Eve     Phoenix                0                  0