Question 1: Inner Join (3 points) Perform an inner join between the customers and orders datasets.

How many rows are in the result?

4 Rows

Why are some customers or orders not included in the result?

because, they did not have a match on the other table

Display the result
Q1 <- inner_join(customers, orders, by = "customer_id")

head(Q1)
## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300

Question 2: Left Join (3 points) Perform a left join with customers as the left table and orders as the right table.

How many rows are in the result?

6 Rows

Explain why this number differs from the inner join result.

Because, all customers are included, even those without order information.

Display the result
Q2 <- left_join(customers, orders, by = "customer_id")

head(Q2)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA

Question 3: Right Join (3 points) Perform a right join with customers as the left table and orders as the right table.

How many rows are in the result?

6 Rows

Which customer_ids in the result have NULL for customer name and city? Explain why.

The customer_ids which have null are id 105 and 106

Display the result
Q3 <- right_join(customers, orders, by = "customer_id")

head(Q3)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           6 <NA>    <NA>             105 Camera     600
## 6           7 <NA>    <NA>             106 Printer    150

Question 4: Full Join (3 points) Perform a full join between customers and orders.

How many rows are in the result?

8 Rows

Identify any rows where there’s information from only one table. Explain these results.

The rows where information is only from one table is that rows 5 and 6 are from the customers table and rows 7 and 8 are from the orders table

Display the result
Q4 <- full_join(customers, orders, by = "customer_id")

head(Q4)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA

Question 5: Semi Join (3 points) Perform a semi join with customers as the left table and orders as the right table.

How many rows are in the result?

3 Rows

How does this result differ from the inner join result?

Only customers with order information are returned, but order data isn’t included.

Display the result
Q5 <- semi_join(customers, orders, by = "customer_id")

head(Q5)
## # A tibble: 3 × 3
##   customer_id name    city       
##         <dbl> <chr>   <chr>      
## 1           1 Alice   New York   
## 2           2 Bob     Los Angeles
## 3           3 Charlie Chicago

Question 6: Anti Join (3 points) Perform an anti join with customers as the left table and orders as the right table.

Which customers are in the result?

The customers that are in the result are customer 4(David) and customer 5(Eve)

Explain what this result tells you about these customers.

It shows that only customers without order information are returned

Display the result
Q6 <- anti_join(customers, orders, by = "customer_id")

head(Q6)
## # A tibble: 2 × 3
##   customer_id name  city   
##         <dbl> <chr> <chr>  
## 1           4 David Houston
## 2           5 Eve   Phoenix

Practical Application (4 points) Imagine you’re analyzing customer behavior.

Which join would you use to find all customers, including those who haven’t placed any orders? Why?

Full Join, because it shows all customers and orders from both tables and shows customers that do not have orders and orders that do not have customer information.

Which join would you use to find only the customers who have placed orders? Why?

Inner join, because it shows all the customers who have placed orders leaving out the ones who have not placed orders and the orders that do not have customer information.

Write the R code for both scenarios.
Display the result
Q7 <- full_join(customers, orders, by = "customer_id")

head(Q7)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
Q7 <- inner_join(customers, orders, by = "customer_id")

head(Q7)
## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300

Challenge Question (3 points) Create a summary that shows each customer’s name, city, total number of orders, and total amount spent. Include all customers, even those without orders. Hint: You’ll need to use a combination of joins and group_by/summarize operations.

In this summary, I only consider the customers in the customer table.
customer_summary <- customers %>%
  left_join(orders, by = "customer_id") %>%  # Ensure all customers are included
  group_by(customer_id, name, city) %>%
  summarize(
    total_orders = n_distinct(order_id, na.rm = TRUE),  # Count unique orders
    total_spent = sum(amount, na.rm = TRUE),           # Sum total amount spent
    .groups = "drop"
  ) %>%
  arrange(desc(total_spent))  # Optional: Sort by highest spenders

head(customer_summary)
## # A tibble: 5 × 5
##   customer_id name    city        total_orders total_spent
##         <dbl> <chr>   <chr>              <int>       <dbl>
## 1           2 Bob     Los Angeles            2        2300
## 2           1 Alice   New York               1        1200
## 3           3 Charlie Chicago                1         300
## 4           4 David   Houston                0           0
## 5           5 Eve     Phoenix                0           0