2) Left Join
a - How many rows are in the result?
There are 6 rows in the resluts.
b - Explain why this number differs from the inner join result.
Left Join takes all rows from the left table (in this case
customers) and connects the matching orders. There are 5 customers but
one of them has 2 orders so that customers has 2 rows.
c - Display the results.
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
3) Right Join
a - How many rows are in the result?
There are 6 rows in the results.
b - Which customer_ids in the result have NULL for customer name and
city? Explain why.
6 and 7. In the orders data there are order ids in rows 6 and 7 but
there are no matching customers in the customers data to go with them.
Right Join brings all the rows from the right data (in this case orders)
and attaches the matching data from customers if it’s there.
c - Display the results.
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
4) Full Joins
a - How many rows are in the result?
There are 8 rows.
c - Display the results.
## # A tibble: 8 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
## 7 6 <NA> <NA> 105 Camera 600
## 8 7 <NA> <NA> 106 Printer 150
5) Semi Join
a - How many rows are in the results?
There are 3 rows.
b - How does this result differ from the inner join result?
Semi join returns rows from customers that have at least one match
in orders, but it returns only the customer columns (no duplicated rows
for multiple orders). Inner join returns combined rows and duplicates
customers for each matching order. In our data, customers with orders
are customer_id 1, 2, 3 — semi join returns those 3 customer rows, only
customer columns.
c - Display the results.
semi_join(customers, orders, by = "customer_id")
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
7) Practical Application
a - Which join would you use to find all customers, including those
who haven’t placed any orders? Why?
We would use the Left Join command becuase it shows all cutomers and
their order info if its there, if there is no order info its puts N/A in
those collums but you still see the customer.
b - Which join would you use to find only the customers who have
placed orders? Why?
We could use Semi Join or Inner Join. Inner Join shows all the
customers that also have order with columns from both data sets, but its
shows duplicate customers if they have multiple orders, which could be
hard to read in large data sets. Semi Join shows a list of all the
customers that have matching customer_ids in the orders data, it shows
just the customer data columns with no duplicates for the matches.
c - Write the R code for both scenarios.
# all customers including those without orders
left_join(customers, orders, by = "customer_id")
# only customers who have placed orders
semi_join(customers, orders, by = "customer_id")
d - Display the results.
# all customers including those without orders
left_join(customers, orders, by = "customer_id")
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
# only customers who have placed orders
semi_join(customers, orders, by = "customer_id")
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago