q1 <- inner_join(customers,orders,by = "customer_id")
Ans: 4 rows
Ans: Because only 4 of the customer id’s match
print(q1)
## # A tibble: 4 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
q2 <- left_join(customers, orders, by = "customer_id")
Ans: 6 rows
Ans: because it includes all of the rows from the left data dataset (5) and one extra matching from the right data set.
print(q2)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
q3<- right_join(customers, orders, by ="customer_id")
Ans: 6 rows
Ans: Id’s 6 & 7 because these numbers are within the right data set but have no match from the left which is the set that has name and city information
print(q3)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 6 <NA> <NA> 105 Camera 600
## 6 7 <NA> <NA> 106 Printer 150
q4 <- full_join(customers, orders, by ="customer_id")
Ans: 8 rows
Ans: Id #’s 4,5,6, and 7 are missing information - 4&5 are missing order_id and product. This means there is no match and the row comes from the left dataset only. - 6&7 are missing name and city info. This means there is no match and the row comes from the right dataset only.
print(q4)
## # A tibble: 8 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
## 7 6 <NA> <NA> 105 Camera 600
## 8 7 <NA> <NA> 106 Printer 150
q5 <- semi_join(customers, orders, by ="customer_id")
Ans: 3 rows
Ans: It contains one less row and only name and city information. This is because it only outputs left rows wit a right row match ID, and only the left row information.
print (q5)
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
q6 <- anti_join(customers, orders, by ="customer_id")
Ans: Id# 4 and 5
Ans: These customer’s id#’s are in the customers (left) dataset but not the right dataset, and have no match
print(q6)
## # A tibble: 2 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 4 David Houston
## 2 5 Eve Phoenix
Ans: left_Join because it gives all customer name and ID’s regardless of if they have a corresponding order ID.
Ans: semi_join because it oututs a customer from the left data set only when they have a matching id with an order from the right dataset
print(q2)
## # A tibble: 6 × 6
## customer_id name city order_id product amount
## <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 Alice New York 101 Laptop 1200
## 2 2 Bob Los Angeles 102 Phone 800
## 3 2 Bob Los Angeles 104 Desktop 1500
## 4 3 Charlie Chicago 103 Tablet 300
## 5 4 David Houston NA <NA> NA
## 6 5 Eve Phoenix NA <NA> NA
print(q5)
## # A tibble: 3 × 3
## customer_id name city
## <dbl> <chr> <chr>
## 1 1 Alice New York
## 2 2 Bob Los Angeles
## 3 3 Charlie Chicago
CQnums <- full_join(customers, orders, by = "customer_id")%>% #gives ttl # orders & ttl amt spent
group_by(customer_id) %>% #groups customers with same id
summarise(
order_count = sum(!is.na(order_id)), #counts how many rows are in each group
total_amount_spent = sum(amount) # creates new column based on sum of amt of the rows within each group
)
CQ_ans <- full_join(customers, CQnums, by= "customer_id")
print (CQ_ans)
## # A tibble: 7 × 5
## customer_id name city order_count total_amount_spent
## <dbl> <chr> <chr> <int> <dbl>
## 1 1 Alice New York 1 1200
## 2 2 Bob Los Angeles 2 2300
## 3 3 Charlie Chicago 1 300
## 4 4 David Houston 0 NA
## 5 5 Eve Phoenix 0 NA
## 6 6 <NA> <NA> 1 600
## 7 7 <NA> <NA> 1 150