## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Q1 Inner Join Perform an inner join between the customers and orders datasets.

How many rows are in the result?

4 rows

Why are some customers or orders not included in the result?

An inner join only includes rows where there is a match in both datasets, which is why some customers (David & Eve) are not included.

# Perform an inner join
inner_join_result <- inner_join(customers, orders, by = "customer_id")

# Display the result
inner_join_result
## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300

Q2 Left Join Perform a left join with customers as the left table and orders as the right table.

How many rows are in the result?

6 rows

Explain why this number differs from the inner join result.

This is different from Inner Join because a Left Join includes all rows from the customers table, even if there is no matching record in the orders table. Customers David and Eve are included in the result, even though they have not placed any orders. Their corresponding values in the orders columns appear as NA, indicating no matching record in the orders dataset.

# Perform a left join (customers as left table)
left_join_result <- left_join(customers, orders, by = "customer_id")

# Display the result
left_join_result
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA

Q3 Right Join perform a right join with customers as the left table and orders as the right table.

How many rows are in the result?

6 rows

Which customer_ids in the result have NULL for customer name and city? Explain why.

The 5th and 6th customers have NULL values for name and city. This is because they exist in the orders table but do not exist in the customers table. Since there is no matching customer record, their details appear as missing (NA) in the result.

# Perform a right join (orders as the right table)
right_join_result <- right_join(customers, orders, by = "customer_id")

# Display the result
right_join_result
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           6 <NA>    <NA>             105 Camera     600
## 6           7 <NA>    <NA>             106 Printer    150

Q4 Full Join Perform a full join between customers and orders.

How many rows are in the result?

8 rows

Identify any rows where there’s information from only one table. Explain these results.

David and Eve do not have order information because they have never placed an order. Customers 6 and 7 appear with NULL values for name and city because they exist in the orders table but do not exist in the customers table. This is because a full join includes all records from both tables, and where there is no match, NULL values are used in the result.

# Perform a full join between customers and orders
full_join_result <- full_join(customers, orders, by = "customer_id")

# Display the result
full_join_result
## # A tibble: 8 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
## 7           6 <NA>    <NA>             105 Camera     600
## 8           7 <NA>    <NA>             106 Printer    150

Q5 Semi Join perform a semi join with customers as the left table and orders as the right table.

How many rows are in the result?

3 rows

How does this result differ from the inner join result?

This result differs from the Inner Join result because the Semi Join only returns customers who have at least one order. The Semi Join has fewer rows than the Inner Join because it keeps only unique customers who placed orders, rather than including multiple order records for the same customer.

# Perform a semi join (customers as the left table)
semi_join_result <- semi_join(customers, orders, by = "customer_id")

# Display the result
semi_join_result
## # A tibble: 3 × 3
##   customer_id name    city       
##         <dbl> <chr>   <chr>      
## 1           1 Alice   New York   
## 2           2 Bob     Los Angeles
## 3           3 Charlie Chicago

Q6 Anti Join Perform an anti join with customers as the left table and orders as the right table.

Which customers are in the result?

David and Eve

Explain what this result tells you about these customers.

These customers are in the result because they have not yet made a purchase. The Anti Join returns only the customers who do not have a matching record in the orders table, meaning they have not placed any orders.

# Perform an anti join (customers as the left table)
anti_join_result <- anti_join(customers, orders, by = "customer_id")

# Display the result
anti_join_result
## # A tibble: 2 × 3
##   customer_id name  city   
##         <dbl> <chr> <chr>  
## 1           4 David Houston
## 2           5 Eve   Phoenix

Q7 Practical Application Imagine you’re analyzing customer behavior.

Which join would you use to find all customers, including those who haven’t placed any orders? Why?

To find all customers, including those who haven’t placed any orders, I would use a Left Join. A Left Join ensures that all customers from the customers table are included in the result, even if they have not placed any orders. If a customer has not placed an order, their corresponding order details will appear as NA.

Which join would you use to find only the customers who have placed orders? Why?

To find only customers who have placed orders, I would use an Inner Join. An Inner Join includes only the rows where there is a match in both tables, meaning it will return only customers who have at least one order in the orders table, excluding those who haven’t placed any orders.

# Perform a left join to find all customers, including those who haven't placed any orders
left_join_result_all_customers <- left_join(customers, orders, by = "customer_id")

# Display the result
left_join_result_all_customers
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
# Perform a semi join to find only customers who have placed orders
semi_join_result_customers_with_orders <- semi_join(customers, orders, by = "customer_id")

# Display the result
semi_join_result_customers_with_orders
## # A tibble: 3 × 3
##   customer_id name    city       
##         <dbl> <chr>   <chr>      
## 1           1 Alice   New York   
## 2           2 Bob     Los Angeles
## 3           3 Charlie Chicago

Q8 (Challenge question) create a summary that shows each customer’s name, city, total number of orders, and total amount spent.

Include all customers, even those without orders.

To ensure all customers are included in the summary, we will use a Left Join to merge the customers and orders tables. This allows us to keep all customer records, even if they have not placed any orders. We then use group_by() and summarize() to calculate the total number of orders and the total amount spent by each customer.

# Perform a left join to include all customers, even those without orders
summary_result <- customers %>%
  left_join(orders, by = "customer_id") %>%
  group_by(customer_id, name, city) %>%
  summarize(
    total_orders = n(),               # Count the number of orders for each customer
    total_spent = sum(amount, na.rm = TRUE)  # Sum the amount spent by each customer, ignoring NA
  ) %>%
  ungroup()  # Remove grouping for easier display
## `summarise()` has grouped output by 'customer_id', 'name'. You can override
## using the `.groups` argument.
# Display the result
summary_result
## # A tibble: 5 × 5
##   customer_id name    city        total_orders total_spent
##         <dbl> <chr>   <chr>              <int>       <dbl>
## 1           1 Alice   New York               1        1200
## 2           2 Bob     Los Angeles            2        2300
## 3           3 Charlie Chicago                1         300
## 4           4 David   Houston                1           0
## 5           5 Eve     Phoenix                1           0