1) Inner Join

a - How many rows are in the result?

There are 4 rows in the results.

b - Why are some customers or orders not included in the result?

Inner join only keeps rows that have the same customer_id in both tables.

c - Display the results.

## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300

2) Left Join

a - How many rows are in the result?

There are 6 rows in the resluts.

b - Explain why this number differs from the inner join result.

Left Join takes all rows from the left table (in this case customers) and connects the matching orders. There are 5 customers but one of them has 2 orders so that customers has 2 rows.

c - Display the results.

## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA

3) Right Join

a - How many rows are in the result?

There are 6 rows in the results.

b - Which customer_ids in the result have NULL for customer name and city? Explain why.

6 and 7. In the orders data there are order ids in rows 6 and 7 but there are no matching customers in the customers data to go with them. Right Join brings all the rows from the right data (in this case orders) and attaches the matching data from customers if it’s there.

c - Display the results.

## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           6 <NA>    <NA>             105 Camera     600
## 6           7 <NA>    <NA>             106 Printer    150

4) Full Joins

a - How many rows are in the result?

There are 8 rows.

b - Identify any rows where there’s information from only one table. Explain these results.

Rows 4 and 5 contain customers that don’t show up in the orders data. Rows 6 and 7 contain orders that don’t have matching customers in the customer data. Full Join brings all rows from both tables, and rows that don’t have matching data show N/A for the missing side.

c - Display the results.

## # A tibble: 8 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
## 7           6 <NA>    <NA>             105 Camera     600
## 8           7 <NA>    <NA>             106 Printer    150

5) Semi Join

a - How many rows are in the results?

There are 3 rows.

b - How does this result differ from the inner join result?

Semi join returns rows from customers that have at least one match in orders, but it returns only the customer columns (no duplicated rows for multiple orders). Inner join returns combined rows and duplicates customers for each matching order. In our data, customers with orders are customer_id 1, 2, 3 — semi join returns those 3 customer rows, only customer columns.

c - Display the results.

semi_join(customers, orders, by = "customer_id")
## # A tibble: 3 × 3
##   customer_id name    city       
##         <dbl> <chr>   <chr>      
## 1           1 Alice   New York   
## 2           2 Bob     Los Angeles
## 3           3 Charlie Chicago

6) Anti Join

a - How many rows are in the results?

There are 2 Rows.

b - Explain what this result tells you about these customers.

The customers that show up in the Anti Join table are customers in the customer data that don’t have matching customer_ids with any of the rows in the orders data. David and Eve didn’t place any orders according to the orders data.

c - Display the results.

## # A tibble: 2 × 3
##   customer_id name  city   
##         <dbl> <chr> <chr>  
## 1           4 David Houston
## 2           5 Eve   Phoenix

7) Practical Application

a - Which join would you use to find all customers, including those who haven’t placed any orders? Why?

We would use the Left Join command becuase it shows all cutomers and their order info if its there, if there is no order info its puts N/A in those collums but you still see the customer.

b - Which join would you use to find only the customers who have placed orders? Why?

We could use Semi Join or Inner Join. Inner Join shows all the customers that also have order with columns from both data sets, but its shows duplicate customers if they have multiple orders, which could be hard to read in large data sets. Semi Join shows a list of all the customers that have matching customer_ids in the orders data, it shows just the customer data columns with no duplicates for the matches.

c - Write the R code for both scenarios.

# all customers including those without orders

left_join(customers, orders, by = "customer_id")

# only customers who have placed orders

semi_join(customers, orders, by = "customer_id")

d - Display the results.

# all customers including those without orders

left_join(customers, orders, by = "customer_id")
## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
# only customers who have placed orders

semi_join(customers, orders, by = "customer_id")
## # A tibble: 3 × 3
##   customer_id name    city       
##         <dbl> <chr>   <chr>      
## 1           1 Alice   New York   
## 2           2 Bob     Los Angeles
## 3           3 Charlie Chicago

8) Challenge Question

customer_id name city total_orders total_amount
1 Alice New York 1 1200
2 Bob Los Angeles 2 2300
3 Charlie Chicago 1 300
4 David Houston 0 0
5 Eve Phoenix 0 0