1. Inner Join (3 points) Perform an inner join between the customers and orders datasets.

    ## How many rows are in the result? # 4 rows ## Why are some customers or orders not included in the result? # Because, inner join returns data that is only a match on the other table by customer ID. ## Display the result
## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
  1. Left Join (3 points) Perform a left join with customers as the left table and orders as the right table.

    ## How many rows are in the result? # 6 ## Explain why this number differs from the inner join result. # Because left join returns matching data from the left data frame and the same from the right ##Display the result
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
  1. Right Join (3 points) Perform a right join with customers as the left table and orders as the right table.

    ## How many rows are in the result? # 6 ## Which customer_ids in the result have NULL for customer name and city? Explain why. # customer_ids that have null are 6 & 7 because this represents the name & city of these two ID’s are unknown. ## Display the result
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           6 <NA>    <NA>             105 Camera     600
## 6           7 <NA>    <NA>             106 Printer    150
  1. Full Join (3 points) Perform a full join between customers and orders.

    ## How many rows are in the result? # 8 ## Identify any rows where there’s information from only one table. Explain these results. # rows with information from one table are 4, 5, 7 & 8. This means that some information is missing. ## Display the result
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
  1. Semi Join (3 points) Perform a semi join with customers as the left table and orders as the right table.

    ##How many rows are in the result? # 3 ## How does this result differ from the inner join result? # inner join result had 4 rows ## Display the result
## # A tibble: 3 × 3
##   customer_id name    city       
##         <dbl> <chr>   <chr>      
## 1           1 Alice   New York   
## 2           2 Bob     Los Angeles
## 3           3 Charlie Chicago
  1. Anti Join (3 points) Perform an anti join with customers as the left table and orders as the right table.

    ## Which customers are in the result? # customers in the result are David from Houston & Eve from Phoenix ## Explain what this result tells you about these customers. # it means these customers dont’t have any info for the product & the amount. ## Display the result
## # A tibble: 2 × 3
##   customer_id name  city   
##         <dbl> <chr> <chr>  
## 1           4 David Houston
## 2           5 Eve   Phoenix
  1. Practical Application (4 points) Imagine you’re analyzing customer behavior.

    ## Which join would you use to find all customers, including those who haven’t placed any orders? Why? # left join because it returns the customer names who made an order, including customers that haven’t ordered. ## Which join would you use to find only the customers who have placed orders? Why? # inner join because it returns customers who have an order. ## Write the R code for both scenarios. ## Display the result
all_customers <- customers %>%
      left_join(orders, by = "customer_id")

    customers_with_orders <- customers %>%
      inner_join(orders, by = "customer_id")
    
    list("All Customers (Left Join)" = all_customers,
      "Customers with Orders (Inner Join)" = customers_with_orders)
## $`All Customers (Left Join)`
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
## 
## $`Customers with Orders (Inner Join)`
## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
  1. Challenge Question (3 points) Create a summary that shows each customer’s name, city, total number of orders, and total amount spent. Include all customers, even those without orders. Hint: You’ll need to use a combination of joins and group_by/summarize operations.

## # A tibble: 5 × 5
##   customer_id name    city        total_orders total_spent
##         <dbl> <chr>   <chr>              <int>       <dbl>
## 1           1 Alice   New York               1        1200
## 2           2 Bob     Los Angeles            2        2300
## 3           3 Charlie Chicago                1         300
## 4           4 David   Houston                1           0
## 5           5 Eve     Phoenix                1           0