1. Inner Join (3 points) Perform an inner join between the customers and orders datasets.

q1 <- inner_join(customers , orders, by ='customer_id')

#How many rows are in the result?

nrow(q1)
## [1] 4

Why are some customers or orders not included in the result?

Because only customers 1,2,3 were present in both tables

#Display the result

q1
## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300

#2. Left Join (3 points) Perform a left join with customers as the left table and orders as the right table.

q2 <- left_join(customers, orders, by =‘customer_id’)

#How many rows are in the result?

nrow(q2)

#Explain why this number differs from the inner join result.

In a left join, all of the items from the left table(customers), are included even if they don’t match to what is in the right table(orders). Those that don’t have a match return ’N/A’. Inner join returns items where there is a matching record between both tables.

#Display the result

q2

#3. Right Join (3 points) Perform a right join with customers as the left table and orders as the right table.

q3 <-right_join(customers, orders, by = ‘customer_id’)

#How many rows are in the result?

nrow(q3)

#Which customer_ids in the result have NULL for customer name and city? Explain why.

Customer ID 6 and 7 because they do not exist on the customers table

#Display the result

q3

#4. Full Join (3 points) Perform a full join between customers and orders.

q4 <- full_join(customers, orders, by = “customer_id”) print(q4)

#How many rows are in the result?

nrow(q4)

#Identify any rows where there’s information from only one table. Explain these results.

Customer id 4 and 4 have no information from the orders table and customer id 5 and 6 have missing information from the customers table

#Display the result

q4

#5. Semi Join (3 points) Perform a semi join with customers as the left table and orders as the right table.

q5 <-semi_join(customers, orders, by =‘customer_id’)

#How many rows are in the result?

nrow(q5)

#How does this result differ from the inner join result?

This differs from inner join by only shows distinct customers who have orders and not showing order details

#Display the result

q5

#6. Anti Join (3 points) Perform an anti join with customers as the left table and orders as the right table.

q6 <-anti_join(customers, orders, by = ‘customer_id’)

#Which customers are in the result?

David and Eve

#Explain what this result tells you about these customers.

This tells us that David and Eve have not placed any orders

#Display the result

q6

#7. Practical Application (4 points) Imagine you’re analyzing customer behavior.

q7a <-full_join(customers, orders, by = ‘customer_id’) head(q7a)

q7b <-right_join(customers, orders, by = ‘customer_id’) print(q7b)

#Which join would you use to find all customers, including those who haven’t placed any orders? Why?

#Which join would you use to find only the customers who have placed orders? Why?

#Write the R code for both scenarios.

#Display the result

q7a

q7b

#8. Challenge Question (3 points) Create a summary that shows each customer’s name, city, total number of orders, and total amount spent. Include all customers, even those without orders. Hint: You’ll need to use a combination of joins and group_by/summarize operations.

challenge_question <- customers %>% full_join(orders, by = “customer_id”) %>% group_by(customer_id, name, city) %>% summarize( total_orders = sum(!is.na(order_id)), total_amount_spent = sum(amount, na.rm = TRUE), .groups = “drop” ) head(challenge_question)