Introduction

We have been tasked with performing data analysis for the grocery store chain Regork. We have used different forms of data analysis to sort through the data provided to us by Regork, in hope of finding new ways for Regork to increase their sales. In this report, you will learn what specific question we sought to answer, our findings in the data, and our solution to our business question.

Growth Possibiity for Seafood Products

The business problem that we are trying to solve is how can we increase the sales of seafood products.

Addressing the Problem

To address this problem we took a look at Regork’s customer data, completejourney, and joined the product data with transactions by product ID, then joined that data with demographics by household ID to get our full data set to work with. Then we filtered, grouped by, and arranged the data to get the most sold seafood product. After this we applied a similar method to plot the most sold seafood products by the age demographic. Continuing on with the same approach we then plotted the most sold seafood products by customers income range. Our next step was to see which months had the best sales. We filtered our data set for the seafood products once again, but this time we created a new column for months and summarized the total quantity to get total items sold for each month. After this we created a plot to showcase our findings. Lastly we needed to see what products were bought with seafood we created a data set of the baskets that included seafood and joined it with our full data set. This allowed us to filter out the actual seafood products and count the items left, which would be what customers bought with the seafood.

Why is this Reports and Analysis Important?

This analysis provides insights on what seafood product are the most sold and what income groups and age groups are buying these products. Then it looks at the purchasing trend throughout the year to identify which months have the greatest and least sales. This is good to know so we can identify when we need to increase sales. The main component to the solution is seeing which other products customers are buying with seafood. This gives an opportunity to create coupons for those products in the months with low sales to increase the overall sales of seafood products at Regork.

Packages and Libraries required

library(tidyverse)
library(lubridate)
library(completejourney)
library(DT)
transactions <- get_transactions()
products <- products
demographics <- demographics

Our Combined Dataset:

full_dataset <- products %>%
  inner_join(transactions, by = "product_id") %>%
  inner_join(demographics, by = "household_id") 

datatable(head(full_dataset,100), options = list(pageLength = 10))

Exploratory data analysis

Top seafood products sold overall

full_dataset %>%
  filter(str_detect(product_category, regex("SEAFOOD", ignore_case = TRUE)))%>%
  group_by(product_type) %>%
  summarize(items_sold = sum(quantity, na.rm = TRUE)) %>%
  arrange(desc(items_sold)) %>%
  slice(1:10) %>%
  ggplot(aes(x = product_type, y = items_sold)) +
  geom_col() + 
  geom_col(fill = "Red") +
  theme(axis.text.x = element_text(size = 7, angle = 90))  +
  scale_x_discrete("Product Type") +
  scale_y_continuous("Total Quantity of Seafood Sold")+
  ggtitle("What Are the most sold seafood products")

As seen in the table, tuna is by far the most sold product. More than 5000 units of tuna sold, where as the next closest item, raw fillets, only had 900 units sold. We do believe that this makes sense, as Tuna is the cheapest seafood product.

Top seafood products sold by revenue

full_dataset %>%
  filter(str_detect(product_category, regex("SEAFOOD", ignore_case = TRUE)))%>%
  group_by(product_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  arrange(desc(total_sales)) %>%
  slice(1:10) %>%
  ggplot(aes(x = product_type, y = total_sales)) +
  geom_col() + 
  geom_col(fill = "Red") +
  theme(axis.text.x = element_text(size = 7, angle = 90))  +
  scale_x_discrete("Product Type") +
  scale_y_continuous("Sales", labels = scales::dollar)+
  ggtitle("What Are the most sold seafood products")

Here we can see that Tuna also produces the highest sales value, with shrimp and raw fillets as the next highest.

Income vs Age for Products Sold

#income vs age for purchasing seafood.
full_dataset %>%
  #inner_join(transactions, by = "product_id") %>%
  #inner_join(demographics, by = "household_id") %>%
  filter(str_detect(product_category, regex('seafood', ignore_case = TRUE))) %>%
  group_by(age,income) %>%
  summarize(items_purchased = sum(quantity, na.rm = TRUE)) %>%
  ggplot(aes(x = age, y = items_purchased)) +
  geom_col(fill = "BLUE") +
  theme(axis.text.x = element_text(size = 8, angle = 55))  +
  facet_wrap(~ income) +
  scale_x_discrete("Age") +
  scale_y_continuous("Total Seafood Products Purchased") +
  ggtitle("Who's Buying Seafood?", 
        subtitle = "Ages 45-54 with an income of 50-74k are purchasing the most seafood.")

This graph shows that the customers who purchase the most seafood are ages 45-54 and in the income range of 50-74k. We can use this information to adjust marketing campaigns to better target this demographic, or adjust the, to target demographics we aren’t hitting as well, such as 55-64 year olds in the same income range.

How are these products purchased over time?

#How are these products purchased over time?
full_dataset %>%
  filter(str_detect(product_category, regex("SEAFOOD", ignore_case = TRUE)))%>%
  mutate(month = month(transaction_timestamp, label = TRUE)) %>%
  group_by(month) %>%
  summarise(items_sold = sum(quantity)) %>%
  ggplot(aes(x = month, y = items_sold, group = 1)) +
  geom_point() + 
  geom_line() +
  #geom_line(fill = "RED") +
  scale_x_discrete("Month") +
  scale_y_continuous("Total Number of Seafood Products Sold")+
  ggtitle("In what months are the most seafood sold?")

We found that sales of seafood are highest during the months of December through March, and are lowest during the summer and fall seasons.

Top Products Purchased with Seafood

#Top products purchased with seafood

#create a dataset of baskets w/ seafood in them
baskets_w_sf <- full_dataset %>%
  filter(str_detect(product_category, regex('seafood', ignore_case = TRUE))) %>%
  distinct(basket_id) 
#find the most bought items in baskets w/ seafood
full_dataset %>%
  semi_join(baskets_w_sf) %>%
  filter(!str_detect(product_category, regex('seafood', ignore_case = TRUE))) %>%
  count(product_category) %>%
  arrange(desc(n)) %>%
  slice(1:7) %>%
  ggplot(aes(x = product_category, y = n)) +
  geom_col() + 
  geom_col(fill = "RED") +
  theme(axis.text.x = element_text(size = 7, angle = 35))  +
  scale_x_discrete("Product Category") +
  scale_y_continuous("Items purchased with seafood")+
  ggtitle("What Products are purchased with seafood",
          subtitle = "The products most purchased with seafood are Cheese and Bread.")
## Joining with `by = join_by(basket_id)`

This graph shows the most purchased items in baskets that contain seafood. This information can be used in marketing or coupon campaigns where we could advertise seafood with these items in hopes of further increasing sales of both product categories.

Summary

  1. The business problem we hoped to address was how can we increase the sales of seafood.

  2. To solve this problem, we first created a combined dataframe including the data from products, demographics, and get_transactions. Once we had this combined dataframe, we filtered it for only transactions containing seafood, and found the top selling seafood products by quantity and sales value. We discovered that tuna is the highest selling seafood product by quantity and sales value. We next grouped by age and income, and found that the demographic that purchases the largest quantity of seafood is 45-54 year olds making $50-74k per year. Next, we grouped by month to find the quantity of sales per month, and to see if there were any trends in the data. We found that the four highest selling months are December through March. Finally, we created a dataframe that only contained baskets that included seafood. We then combined this dataframe with our original dataframe. We then filtered out seafood purchases from the baskets, and found the products most commonly purchased with seafood. With this, we found that bread and cheese are the most commonly purchased items in transactions that contain seafood.

  3. While completing this analysis we found a few interesting insights about customer behavior and how well seafood products do at Regork. The first interesting insight we found is that the most sold seafood product by quantity is tuna, and its by a large margin. Even when you look at the revenue generated by each product tuna still is the highest in that regard as well. This was interesting becasue tuna is the cheapest seafood product so it makes a lot of sense that it sold the most in quantity, but even while being the cheapest it still generates the most revenue. An interesting insight about the customers behaviors is that they purchase the most amount of seafood in the month of March and then buy significantly less in the following months. A final insight that we found interesting is how the middle class based on income level are buying the most amount of seafood, when you get to the larger incomes the quantity of seafood bought is significantly less.

  4. The overall takeaway from our analysis is to showcase who is buying seafood products, what are they buying with seafood, and when are they buying it. By identifying these main consumers, the time of greatest and lowest sales we can incorporate a marketing strategy, with coupons, to increase the overall sales of seafood products. Ultimately, we propose that Regork have coupons that pair those top selling products bought with seafood, with those seafood products to increase sales. Additionally, developing a marketing campaign for the summer months can help increase the sales druing that time when they are usually the lowest.

  5. Some of the limitations we ran into include that the data is only for one year, having multiple years would give a stronger analysis for the trends that we saw. Also if we had more time then we would’ve taken a deeper look into the coupons data and find out how effective the current promotional campaigns have been. The best way to improve or build on this analysis is to see how current coupons have been doing and collecting more data over a longer period of time to really identify the strongest trends.