Introduction

The Business Problem

We are trying to find areas of financial improvement with our cereal products. To do this we have analyzed multiple different demographics, time periods, and sales in order to detect potential growth opportunities. This analysis is important to understanding the data behind the sale of cereal at Regork, as well for finding opportunities for improvement.

Packages & Adjusted Data Tables

suppressWarnings(suppressMessages(library(tidyverse)))
suppressWarnings(suppressMessages(library(completejourney)))
suppressWarnings(suppressMessages(library(DT)))

cereal_prod <- products %>%
  filter(str_detect(product_category, regex("cereal", ignore_case = TRUE))) %>%
  filter(str_detect(product_type, regex("cereal", ignore_case = TRUE)))

suppressWarnings(suppressMessages(cereal_trans <- cereal_prod %>%
  inner_join(get_transactions())))

suppressWarnings(suppressMessages(cereal_dems <- cereal_trans %>%
  inner_join(demographics)))

suppressWarnings(suppressMessages(datatable(head(cereal_dems, 100), options = list(pagelength = 10))))

The Packages we used were ‘tidyverse’, ‘completejourney’, and ‘DT’. Tidyverse provides us with multiple different tools for data analysis. Complete Journey provides us with the Regork data, and DT is used to make this interactive data table above.

We have made three different data tables to work with. A filtered table regarding only cereal products, filtered by ‘product category’, as well as ‘product type’. We then joined this with the transactions data set to create our second data table, and then joined this one with the demographics data to create our third table. This demographics table is the one shown above.

Cereal Sales by Age & Household Size

suppressWarnings(suppressMessages(plot1 <- cereal_dems %>%
  group_by(household_size, age) %>%
  summarize(total_purchases = sum(quantity))))

ggplot(data = plot1, aes(x = household_size, y = total_purchases, fill = household_size)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ age, scales = "free_y") +
labs(
  title = "Total Cereal Puchases by Age & Household Size", 
  x = "Household Size", 
  y = "Total Purchases") +
  theme_minimal()

These graphs show the difference in volume of cereal purchased between different household sizes, as well as age of the consumer. We found this data by using our third data set, and then summarizing the total quantity purchased. We then grouped this data by age and household size, and then faceted our graphs by age.

Findings

These graphs show unique findings. We discovered that in almost all age groups, the household size purchasing cereal the most is 1-2 people. We expected households with children to purchase cereal the most, but it is more likely for 45-54 year old couples to buy a box of cereal.

Cereal Sales by Price

cereal_sales_bins <- cereal_trans %>%
  filter(quantity == "1") %>%
  mutate(sales_bin = cut_width(sales_value, width = 0.50, boundary = 0)) %>%
  group_by(sales_bin) %>%
  summarise(count = n()) %>%
  arrange(sales_bin)

ggplot(cereal_sales_bins, aes(x = sales_bin, y = count, fill = sales_bin)) +
  geom_col(show.legend = FALSE) +
  labs(title = "Cereal Sales Value ", x = "Cost of Cereal in $0.50 Increments", y = "Total Cereal Purchased") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot(cereal_sales_bins, aes(x = "", y = count, fill = sales_bin)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar(theta = "y") +
  labs(title = "Cereal Sales Value Distribution", fill = "Price Bins in $") +
  theme_minimal() +
  theme(axis.text.x = element_blank(), axis.ticks = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank())

These two graphs show the same data, which we found by using our cereal transactions data table. First we filtered the quantity to 1, so that we wouldn’t include buyers purchasing multiple boxes of cereal. We then mutated the data to group prices by $.50 increments, and then summarized this data by count.

Findings

From these graphs we found that most cereal boxes sell for about $3-$3.50, even though looking at the graph, you would expect the volume for that price range to be lower. The volume of cereals purchased that are more than $4 is also very low. This may be because there are not many products being listed at that price, but it may also be that they are too expensive.

Cereal Sales by Month

monthly_cereal_sales <- cereal_trans %>%
  mutate(month = month(transaction_timestamp, label = TRUE)) %>%
  group_by(month) %>%
  summarise(total_purchases = sum(quantity, na.rm = TRUE)) %>%
  arrange(match(month, month.abb))

suppressWarnings(suppressMessages(ggplot(monthly_cereal_sales, aes(x = factor(month, levels = month.abb), y = total_purchases, group = 1)) +
  geom_line(color = "blue", size = 1 ) +
  geom_point(color = "red", size = 2) +
  scale_y_continuous(limits = c( 1500, 2500)) +
  labs(title = "Cereal Sales by Month", x = "Month", y = "Total Quantity Purchased") +
  theme_minimal()))

In this graph we see the total cereal sales by month. We found this data by mutating the transaction time stamp into months, and then summarizing the total purchases of cereal. We chose a line graph to showcase the difference between months.

Findings

We found useful information with this graph, including that cereal purchases lowered in the February, April, and November, but boomed in August. We presume that the increased volume in August has to do with back to school.

Top 5 Cereal Sellers

top5_cereals <- cereal_trans %>%
  group_by(product_id) %>%
  summarise(total_purchases = sum(quantity, na.rm = TRUE)) %>%
  arrange(desc(total_purchases)) %>%
  slice_head(n = 5)

ggplot(top5_cereals, aes(x = reorder(product_id, total_purchases), y = total_purchases, fill = product_id)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  labs(title = "Top 5 Cereals Purchased", x = "Cereal Product ID", y = "Total Purchases") +
  theme_minimal()

We found this data simply by grouping the filtered transactions data by product ID, and then summarizing by total purchases, while slicing the data at the top five.

Findings

This graph shows that the top two cereal products are alone the best sellers, being one hundred sales ahead of the third best.

Summary

This report helps us financially improve the sale of cereal products. To do this, we wanted to find the price range of the product, the age group that buys the product, the household size that buys the product, what times during the year did the product’s sales fluctuate, and what were the top products sold.

Findings

The interesting data that we found through this analysis is that household sizes of two buy the most amount of cereal. Cereal is sold the most in the month of August. The most bought cereal is within the price range of $3.00 and $3.50. The top two cereals purchased are by far the most popular. We also found the people ages from 45-54 buy the most amount of cereal.

Recommendations

Our team would recommend that the companies should sell their Cereals in between $1.50 and $3.50 because cereal was most purchased around that price range. The analysis we have will allow companies to decide on what their price should be, and how they can make the most amount of profit on their product. We recommend that companies change the way their advertisement is for bigger families since companies aren’t selling enough to big household sizes. We also would advise the companies to make more promotions and advertisements during the months of February and November for the fact that the cereal sales volume was lower during those months.

Limitations

Some of the limitations we ran into while performing our exploratory data analysis included the fact that the data provided was only from 2017. Another thing that would lead to more conclusive results would be if the product names were provided. Lastly we could also have used more time to further explore the data and yield more accurate results.