knitr::opts_chunk$set(echo = FALSE)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Dataset 1: Customers

Dataset 2: Orders

1 Inner Join (3 points) Perform an inner join between the customers and orders datasets.

#1a how many row are in the result

## [1] 4

#1b why are some customers or orders not included in the result?
Because they do not have a match in the other table

#1c Display the result

## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300

Left Join (3 points) Perform a left join with customers as the left table and orders as the right table.

#How many rows are in the result?

## [1] 6

#Explain why this number differs from the inner join result.
This funtion only allow matching customer ID from both table to show up

#Display the result

## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA

Right Join (3 points) Perform a right join with customers as the left table and orders as the right table.

#How many rows are in the result?

## [1] 6

#Which customer_ids in the result have NULL for customer name and city? Explain why.
Id 6 and 7 have null for customer name and city since they only exist in the order table but not in the customer table #Display the result

## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           6 <NA>    <NA>             105 Camera     600
## 6           7 <NA>    <NA>             106 Printer    150

Full Join (3 points) Perform a full join between customers and orders.

## Joining with `by = join_by(customer_id)`

#How many rows are in the result?

## [1] 8

#Identify any rows where there’s information from only one table. Explain these results.
Customer: 4,5 have no order info Order: 6 and 7 have no customer info #Display the result

## # A tibble: 8 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
## 7           6 <NA>    <NA>             105 Camera     600
## 8           7 <NA>    <NA>             106 Printer    150

Semi Join (3 points) Perform a semi join with customers as the left table and orders as the right table.

#How many rows are in the result?

## [1] 3

#How does this result differ from the inner join result?
Semi join only shows each customer once, even if they have purchased mutilple times. #Display the result

## # A tibble: 3 × 3
##   customer_id name    city       
##         <dbl> <chr>   <chr>      
## 1           1 Alice   New York   
## 2           2 Bob     Los Angeles
## 3           3 Charlie Chicago

Anti Join (3 points) Perform an anti join with customers as the left table and orders as the right table.

#Which customers are in the result?

## [1] 2

#Explain what this result tells you about these customers.
the result is showing the the customer have not order anything #Display the result

## # A tibble: 2 × 3
##   customer_id name  city   
##         <dbl> <chr> <chr>  
## 1           4 David Houston
## 2           5 Eve   Phoenix

Practical Application (4 points) Imagine you’re analyzing customer behavior.

#Which join would you use to find all customers, including those who haven’t placed any orders? Why?

## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA

‘left join since we want all customers, including the NA’ #Which join would you use to find only the customers who have placed orders? Why?

## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300

‘Inner join since we only want the matching records’

Challenge Question (3 points) Create a summary that shows each customer’s name, city, total number of orders, and total amount spent. Include all customers, even those without orders. Hint: You’ll need to use a combination of joins and group_by/summarize operations.

## # A tibble: 5 × 5
##   customer_id name    city        total_orders total_spent
##         <dbl> <chr>   <chr>              <int>       <dbl>
## 1           1 Alice   New York               1        1200
## 2           2 Bob     Los Angeles            2        2300
## 3           3 Charlie Chicago                1         300
## 4           4 David   Houston                0           0
## 5           5 Eve     Phoenix                0           0