Question 1

Inner join (3 points) Perform an inner join between the customers and orders datasets. How many rows are in the result?

q1 <- inner_join(customers , orders, by = 'customer_id')
nrow(q1)
## [1] 4
print(q1)
## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300

Why are some customers or orders not included in the result?
They do not have a match on the other table

Question 2

Left Join (3 points) Perform a left join with customers as the left table and orders as the right table.

 q2 <- left_join(customers, orders, by = "customer_id")
# Display result
print(q2)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA

How many rows are in the result? Explain why this number differs from the inner join result.
There are 6 rows.
The result is different because the left join shows all customers even if they had no orders.

Question 3

Right Join (3 points) Perform a right join with customers as the left table and orders as the right table.How many rows are in the result? Display the result

Perform RIGHT JOIN (Orders as the Right Table)

q3 <- right_join(customers, orders, by = “customer_id”)

q3 <- right_join(customers, orders, by = "customer_id")
# Display the result
print(q3)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           6 <NA>    <NA>             105 Camera     600
## 6           7 <NA>    <NA>             106 Printer    150

Count number of rows
There are 6 rows.

Which customer_ids in the result have NULL for customer name and city? Explain why.
IDs 6 and 7 have NULL because the order exists but it has no matching customers

Question 4

full Join (3 points) Perform a full join between customers and orders. How many rows are in the result? Display the result

q4 <- full_join(customers, orders, by = 'customer_id')
print(q4)
## # A tibble: 8 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
## 7           6 <NA>    <NA>             105 Camera     600
## 8           7 <NA>    <NA>             106 Printer    150

How many rows are in the result?
There are 8

Identify any rows where there’s information from only one table. Explain these results.
Customers 4 & 5 appear with NA for order details (they have no orders). Orders from customer_id 6 & 7 appear with NA for customer details (no matching customer).

Question 5

Semi Join (3 points) Perform a semi join with customers as the left table and orders as the right table. How many rows are in the result? Display the result

q5 <- semi_join(customers, orders, by = "customer_id")
print(q5)
## # A tibble: 3 × 3
##   customer_id name    city       
##         <dbl> <chr>   <chr>      
## 1           1 Alice   New York   
## 2           2 Bob     Los Angeles
## 3           3 Charlie Chicago

How does this result differ from the inner join result?
Its different from an inner join because it excludes order details

Question 6

Anti Join (3 points) Perform an anti join with customers as the left table and orders as the right table. Display the result

q6 <- anti_join(customers, orders, by = "customer_id")
print(q6)
## # A tibble: 2 × 3
##   customer_id name  city   
##         <dbl> <chr> <chr>  
## 1           4 David Houston
## 2           5 Eve   Phoenix

Which customers are in the result? Explain what this result tells you about these customers.
Customers 4 and 5 are in the result. It tells you that these customers are inactive

Question 7

Practical Application (4 points) Imagine you’re analyzing customer behavior. q7a) Which join would you use to find all customers, including those who haven’t placed any orders? Why?
left join because it ensures all customers are included, even if they haven’t placed orders and is useful for analyzing the total customer base.

q7b) Which join would you use to find only the customers who have placed orders? Why?
Inner join because it ensures only customers who have placed orders are included and is useful for analyzing active customers only.

q7a <- left_join(customers, orders, by = "customer_id")

# Display result
print(q7a)
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
q7b <- inner_join(customers, orders, by = "customer_id")
# Display result
print(q7b)
## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300

Question 8

Challenge Question (3 points) Create a summary that shows each customer’s name, city, total number of orders, and total amount spent. Include all customers, even those without orders.

q8 <- customers %>%
left_join(orders, by = "customer_id") %>%
group_by(customer_id, name, city) %>%
summarize(total_orders = n(),
total_amount_spent = sum(amount, na.rm = TRUE))
## `summarise()` has grouped output by 'customer_id', 'name'. You can override
## using the `.groups` argument.
# Display the summary
print(q8)
## # A tibble: 5 × 5
## # Groups:   customer_id, name [5]
##   customer_id name    city        total_orders total_amount_spent
##         <dbl> <chr>   <chr>              <int>              <dbl>
## 1           1 Alice   New York               1               1200
## 2           2 Bob     Los Angeles            2               2300
## 3           3 Charlie Chicago                1                300
## 4           4 David   Houston                1                  0
## 5           5 Eve     Phoenix                1                  0