1. Inner Join (3 points) Perform an inner join between the customers and orders datasets.

q1 <- inner_join(customers , orders)
## Joining with `by = join_by(customer_id)`

How many rows are in the result?

The resulting data has 4 rows.

Why are some customers or orders not included in the result?

Because the customers and orders that are not included did not have a match in the other table, and inner join only returns when there is a match.

Display the result

## # A tibble: 4 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300

2. Left Join (3 points) Perform a left join with customers as the left table and orders as the right table.

q2 <- left_join(customers , orders)
## Joining with `by = join_by(customer_id)`

How many rows are in the result?

The resulting data has 6 rows.

Explain why this number differs from the inner join result.

Customers with a customer_id, but have yet to place an order with said customer_id.

Display the result

## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA

3. Right Join (3 points) Perform a right join with customers as the left table and orders as the right table.

q3<- right_join(customers , orders)
## Joining with `by = join_by(customer_id)`

How many rows are in the result?

The resulting data has 6 rows.

Which customer_ids in the result have NULL for customer name and city? Explain why.

Customer_ids 6 and 7, because there are no customers that have customer_id 6 or 7.

Display the result

## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           6 <NA>    <NA>             105 Camera     600
## 6           7 <NA>    <NA>             106 Printer    150

4. Full Join (3 points) Perform a full join between customers and orders.

q4 <- full_join(customers , orders)
## Joining with `by = join_by(customer_id)`

How many rows are in the result?

The resulting data has 8 rows.

Identify any rows where there’s information from only one table. Explain these results.

In rows 5 and 6, only information from customers is present, and in rows 7 and 8 only orders are present, because there are missing customer ids in customers, and customers who have not placed orders with their ids yet.

Display the result;

## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA

5. Semi Join (3 points) Perform a semi join with customers as the left table and orders as the right table.

q5 <- semi_join(customers , orders)
## Joining with `by = join_by(customer_id)`

How many rows are in the result?

The resulting data has 3 rows.

How does this result differ from the inner join result?

Regardless of how many orders are made by one person, semi join only returns a customer id once, where inner join will return a customer id for each unique order placed.

Display the result

## # A tibble: 3 × 3
##   customer_id name    city       
##         <dbl> <chr>   <chr>      
## 1           1 Alice   New York   
## 2           2 Bob     Los Angeles
## 3           3 Charlie Chicago

6. Anti Join (3 points) Perform an anti join with customers as the left table and orders as the right table.

q6 <- anti_join(customers , orders)
## Joining with `by = join_by(customer_id)`

Which customers are in the result?

David and Eve

Explain what this result tells you about these customers.

It tells me these customers have a customer id, but have not placed any orders with their customer ids.

Display the result

## # A tibble: 2 × 3
##   customer_id name  city   
##         <dbl> <chr> <chr>  
## 1           4 David Houston
## 2           5 Eve   Phoenix

7. Practical Application (4 points) Imagine you’re analyzing customer behavior.

Which join would you use to find all customers, including those who haven’t placed any orders? Why?

I would use left join because that way I see all customer ids, including customers without orders placed.

Which join would you use to find only the customers who have placed orders? Why?

I would use semi join, because it only shows customer ids that are found on both datasets.

Write the R code for both scenarios.

q2 <- left_join(customers , orders)
## Joining with `by = join_by(customer_id)`
q5<- semi_join(customers , orders)
## Joining with `by = join_by(customer_id)`

Display the result

## # A tibble: 6 × 6
##   customer_id name    city        order_id product amount
##         <dbl> <chr>   <chr>          <dbl> <chr>    <dbl>
## 1           1 Alice   New York         101 Laptop    1200
## 2           2 Bob     Los Angeles      102 Phone      800
## 3           2 Bob     Los Angeles      104 Desktop   1500
## 4           3 Charlie Chicago          103 Tablet     300
## 5           4 David   Houston           NA <NA>        NA
## 6           5 Eve     Phoenix           NA <NA>        NA
## # A tibble: 3 × 3
##   customer_id name    city       
##         <dbl> <chr>   <chr>      
## 1           1 Alice   New York   
## 2           2 Bob     Los Angeles
## 3           3 Charlie Chicago