q1 <- inner_join(customers , orders, by ='customer_id')
#How many rows are in the result?
nrow(q1)
## [1] 4
Because only customers 1,2,3 were present in both tables
#Display the result
q1
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
#2. Left Join (3 points) Perform a left join with customers as the left table and orders as the right table.
q2 <- left_join(customers, orders, by =‘customer_id’)
#How many rows are in the result?
nrow(q2)
#Explain why this number differs from the inner join result.
In a left join, all of the items from the left table(customers), are included even if they don’t match to what is in the right table(orders). Those that don’t have a match return ’N/A’. Inner join returns items where there is a matching record between both tables.
#Display the result
q2
#3. Right Join (3 points) Perform a right join with customers as the left table and orders as the right table.
q3 <-right_join(customers, orders, by = ‘customer_id’)
#How many rows are in the result?
nrow(q3)
#Which customer_ids in the result have NULL for customer name and city? Explain why.
Customer ID 6 and 7 because they do not exist on the customers table
#Display the result
q3
#4. Full Join (3 points) Perform a full join between customers and orders.
q4 <- full_join(customers, orders, by = “customer_id”) print(q4)
#How many rows are in the result?
nrow(q4)
#Identify any rows where there’s information from only one table. Explain these results.
Customer id 4 and 4 have no information from the orders table and customer id 5 and 6 have missing information from the customers table
#Display the result
q4
#5. Semi Join (3 points) Perform a semi join with customers as the left table and orders as the right table.
q5 <-semi_join(customers, orders, by =‘customer_id’)
#How many rows are in the result?
nrow(q5)
#How does this result differ from the inner join result?
This differs from inner join by only shows distinct customers who have orders and not showing order details
#Display the result
q5
#6. Anti Join (3 points) Perform an anti join with customers as the left table and orders as the right table.
q6 <-anti_join(customers, orders, by = ‘customer_id’)
#Which customers are in the result?
David and Eve
#Explain what this result tells you about these customers.
This tells us that David and Eve have not placed any orders
#Display the result
q6
#7. Practical Application (4 points) Imagine you’re analyzing customer behavior.
q7a <-full_join(customers, orders, by = ‘customer_id’) head(q7a)
q7b <-right_join(customers, orders, by = ‘customer_id’) print(q7b)
#Which join would you use to find all customers, including those who haven’t placed any orders? Why?
#Which join would you use to find only the customers who have placed orders? Why?
#Write the R code for both scenarios.
#Display the result
q7a
q7b
#8. Challenge Question (3 points) Create a summary that shows each customer’s name, city, total number of orders, and total amount spent. Include all customers, even those without orders. Hint: You’ll need to use a combination of joins and group_by/summarize operations.
challenge_question <- customers %>% full_join(orders, by = “customer_id”) %>% group_by(customer_id, name, city) %>% summarize( total_orders = sum(!is.na(order_id)), total_amount_spent = sum(amount, na.rm = TRUE), .groups = “drop” ) head(challenge_question)