-
## How many rows are in the result? # 4 rows ## Why are some customers
or orders not included in the result? # Because, inner join returns data
that is only a match on the other table by customer ID. ## Display the
result
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
-
## How many rows are in the result? # 6 ## Explain why this number
differs from the inner join result. # Because left join returns matching
data from the left data frame and the same from the right ##Display the
result
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
-
## How many rows are in the result? # 6 ## Which customer_ids in the
result have NULL for customer name and city? Explain why. # customer_ids
that have null are 6 & 7 because this represents the name & city
of these two ID’s are unknown. ## Display the result
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
-
## How many rows are in the result? # 8 ## Identify any rows where
there’s information from only one table. Explain these results. # rows
with information from one table are 4, 5, 7 & 8. This means that
some information is missing. ## Display the result
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
-
##How many rows are in the result? # 3 ## How does this result differ
from the inner join result? # inner join result had 4 rows ## Display
the result
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
-
## Which customers are in the result? # customers in the result are
David from Houston & Eve from Phoenix ## Explain what this result
tells you about these customers. # it means these customers dont’t have
any info for the product & the amount. ## Display the result
## # A tibble: 2 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 4 David Houston
## 2 5 Eve Phoenix
Practical
Application (4 points) Imagine you’re analyzing customer behavior.
## Which join would you use to find all customers, including those who
haven’t placed any orders? Why? # left join because it returns the
customer names who made an order, including customers that haven’t
ordered. ## Which join would you use to find only the customers who have
placed orders? Why? # inner join because it returns customers who have
an order. ## Write the R code for both scenarios. ## Display the
result
all_customers <- customers %>%
left_join(orders, by = "customer_id")
customers_with_orders <- customers %>%
inner_join(orders, by = "customer_id")
list("All Customers (Left Join)" = all_customers,
"Customers with Orders (Inner Join)" = customers_with_orders)
## $`All Customers (Left Join)`
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
##
## $`Customers with Orders (Inner Join)`
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
Challenge
Question (3 points) Create a summary that shows each customer’s name,
city, total number of orders, and total amount spent. Include all
customers, even those without orders. Hint: You’ll need to use a
combination of joins and group_by/summarize operations.
## # A tibble: 5 × 5
## customer_id name city total_orders total_spent
## <dbl> <chr> <chr> <int> <dbl>
## 1 1 Alice New York 1 1200
## 2 2 Bob Los Angeles 2 2300
## 3 3 Charlie Chicago 1 300
## 4 4 David Houston 1 0
## 5 5 Eve Phoenix 1 0