Introduction

Our goal today is to educate you on the breakdown of spending within all age groups at Regork and draw insights on how to market and draw in these customers to continue shopping at our stores.

We are looking to analyze the monthly amount sold for the top 3 categories for each age group identified by Regork. Once we have this data we will be able to pull out the down months and counter those with sales or coupons to drive sales throughout the year. You will be seeing tables and graphs to help draw attention to areas of extreme concern.

Overall, we know that it is important to bring customers into the stores each week to have consistent sales throughout the year. This is why we have spent time bringing you digestible insights on how to increase our sales and customer volume consistently for the sustainable future.

Our methodology was to combine the transactions, products, and demographics data together and sort that data into groups based on age groups. Once we had them grouped, we were able to sort each via their highest bought product categories. We then broke that data down by months to get the yearly view on the data. This allowed us to see months that the product was selling great and months where the product needed some support. From there, we pulled our final insights and will present those to you at the end of this report.

Data Used

During our analysis, we used the completejourney data set. This data set is comprised of household level transactions over one year time span from a group of approximately 2,500 households who are frequent shoppers at a grocery store. We used a combination transactions, demographics, and products data to come to our final conclusions and make sure our work was exhaustive and complete.

Here are the individual packages we used and their essential functions:

dplyr - Primarily a set of functions designed to enable dataframe manipulation

tidyverse - A collection of open source packages for the R programming language that share an underlying design philosophy, grammar, and data structures” of tidy data

lubridate - Functions to work with date-times and time-spans

ggthemes - Provides extra themes, geoms, and scales for the ggplot2 package

ggplot2 - A system for declarative creating graphics, based on The Grammar of Graphics

paged/rmarkdown - For building table styles

library(completejourney)
library(dplyr)
library(tidyverse)
library(lubridate)
library(ggthemes)
library(ggplot2)
library(rmarkdown)
transactions <- get_transactions()

Data Preparation & Tidy Data

As mentioned previously, we used the transactions, products, and demographics data. Within that we used the columns product_category(to group by top bought categories), age, transaction_timestamp(to filter by month), and quantity(this played into the product_category data). We loaded this in and the packages mentioned in the last section via the library() function before proceeding to our analysis.

This is how we joined our data

transactions %>%
 left_join(products) %>%
  left_join(demographics) -> df_main

We started by filtering per month for later analysis

transactions %>%
  mutate(month = month(transaction_timestamp)) -> monthly_data

This is an example of how we broke the overall data into each age group. We repeated this as needed.

df_main %>%
  filter(str_detect(age, "19-24") ) -> df_19to24 

This is an example of how we broke down from filtering, slicing, and eventually graphing an output. We did this to find the top 3 of each age group and graphed each category individually. This resulted in 18 graphs which you will see in the next section.

df_19to24 %>%
  group_by(product_category) %>% # grouping by category  
  
  count() %>%                    # finding top bought categories  
  
  arrange(desc(n)) %>%
  filter(n>1440) %>%             # filtering for only the top 3 results  
  
 left_join(df_19to24)%>%         # rejoining old data to bring back months  
 
  select(product_category, n, transaction_timestamp) %>%
  filter(product_category == "SOFT DRINKS") %>%  # only selecting one of the top 3 at a time  
  
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count() %>%                    # breaking it down by each month to then graph  
  
  ggplot(aes(x=only_months, y = n)) +
  geom_col()                     # baseline graph of a top category by month

For simplicity, we only included one code chunk, but we reworked the above for each age and category until all graphs were finalized. You will see full code for each graph in the Data Analysis section.

Data Analysis

Below you will see tabs for each of the age groups followed by the top three categories of each and the monthly graphical breakdown. Moreover, you can also see a table that lists all age groups and their top three for ease of visualization. All insights will be stated in the summary section.

Table Breakdown

table <- data.frame("Ages_19_to_24" = c('Soft Drinks', 'Baked Bread/Buns/Rolls', 'Cheese'), 
                 "Ages_25_to_34" = c('Soft Drinks', 'Cheese', 'Fluid Milk'),
                 "Ages_35_to_44" = c('Soft Drinks', 'Fluid Milk', 'Baked Bread/Buns/Rolls'),
                 "Ages_45_to_54" = c('Soft Drinks', 'Fluid Milk', 'Baked Bread/Buns/Rolls'),
                 "Ages_55_to_64" = c('Soft Drinks', 'Baked Bread/Buns/Rolls', 'Fluid Milk'),
                 "Ages_65_and_over" = c('Baked Bread/Buns/Rolls', 'Soft Drinks', 'Fluid Milk'))

library(rmarkdown)

paged_table(table)

19 to 24 year olds

This section will display the code and graphs for the top 3 categories.

t1 <- transactions %>% 
  mutate(month = month(transaction_timestamp))

group1 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "19-24")) %>% 
  group_by(product_category) %>% 
  count(product_category) %>% 
  arrange(desc(n)) 

group1.1 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "19-24")) %>% 
  filter(product_category == "SOFT DRINKS")
  
group1.1 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 1400) %>%
  left_join(group1.1) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "SOFT DRINKS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "green") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 19-24",
          subtitle = "For Soft Drinks")

group1.2 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "19-24")) %>% 
  filter(product_category == "BAKED BREAD/BUNS/ROLLS")

group1.2 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 1400) %>%
  left_join(group1.2) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "BAKED BREAD/BUNS/ROLLS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "red") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 19-24",
          subtitle = "For Baked Bread/Buns/Rolls")

group1.3 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "19-24")) %>% 
  filter(product_category == "CHEESE")

group1.3 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 1400) %>%
  left_join(group1.3) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "CHEESE") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "blue") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 19-24",
          subtitle = "For Cheese")

25 to 34 year olds

This section will display the code and graphs for the top 3 categories.

group2 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "25-34")) %>% 
  group_by(product_category) %>% 
  count(product_category) %>% 
  arrange(desc(n))

group2.1 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "25-34")) %>% 
  filter(product_category == "SOFT DRINKS")

group2.1 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 4650) %>%
  left_join(group2.1) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "SOFT DRINKS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "orange") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 25-34",
          subtitle = "For Soft Drinks")

group2.2 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "25-34")) %>% 
  filter(product_category == "CHEESE")

group2.2 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 4650) %>%
  left_join(group2.2) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "CHEESE") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "purple") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 25-34",
          subtitle = "For Cheese")

group2.3 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "25-34")) %>% 
  filter(product_category == "FLUID MILK PRODUCTS")

group2.3 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 4650) %>%
  left_join(group2.3) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "FLUID MILK PRODUCTS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "yellow") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 25-34",
          subtitle = "For Fluid Milk Products")

35 to 44 year olds

This section will display the code and graphs for the top 3 categories.

group3 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "35-44")) %>% 
  group_by(product_category) %>% 
  count(product_category) %>% 
  arrange(desc(n))

group3.1 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "35-44")) %>% 
  filter(product_category == "SOFT DRINKS")

group3.1 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 6800) %>%
  left_join(group3.1) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "SOFT DRINKS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "seagreen") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 35-44",
          subtitle = "For Soft Drinks")

group3.2 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "35-44")) %>% 
  filter(product_category == "FLUID MILK PRODUCTS")

group3.2 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 6800) %>%
  left_join(group3.2) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "FLUID MILK PRODUCTS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "gold") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 35-44",
          subtitle = "For Fluid Milk Products")

group3.3 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "35-44")) %>% 
  filter(product_category == "BAKED BREAD/BUNS/ROLLS")

group3.3 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 6800) %>%
  left_join(group3.3) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "BAKED BREAD/BUNS/ROLLS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "deepskyblue") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 35-44",
          subtitle = "For Baked Bread/Buns/Rolls")

45 to 54 year olds

This section will display the code and graphs for the top 3 categories.

group4 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "45-54")) %>% 
  group_by(product_category) %>% 
  count(product_category) %>% 
  arrange(desc(n))

group4.1 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "45-54")) %>% 
  filter(product_category == "SOFT DRINKS")

group4.1 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 9000) %>%
  left_join(group4.1) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "SOFT DRINKS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "coral") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 45-54",
          subtitle = "For Soft Drinks")

group4.2 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "45-54")) %>% 
  filter(product_category == "FLUID MILK PRODUCTS")

group4.2 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 9000) %>%
  left_join(group4.2) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "FLUID MILK PRODUCTS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "cyan") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 45-54",
          subtitle = "For Fluid Milk Products")

group4.3 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "45-54")) %>% 
  filter(product_category == "BAKED BREAD/BUNS/ROLLS")

group4.3 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 9000) %>%
  left_join(group4.3) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "BAKED BREAD/BUNS/ROLLS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "mediumpurple") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 45-54",
          subtitle = "For Baked Bread/Buns/Rolls")

55 to 64 year olds

This section will display the code and graphs for the top 3 categories.

group5 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "55-64")) %>% 
  group_by(product_category) %>% 
  count(product_category) %>% 
  arrange(desc(n))

group5.1 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "55-64")) %>% 
  filter(product_category == "SOFT DRINKS")

group5.1 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 1700) %>%
  left_join(group5.1) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "SOFT DRINKS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "mediumspringgreen") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 55-64",
          subtitle = "For Soft Drinks")

group5.2 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "55-64")) %>% 
  filter(product_category == "BAKED BREAD/BUNS/ROLLS")

group5.2 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 1700) %>%
  left_join(group5.2) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "BAKED BREAD/BUNS/ROLLS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "magenta") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 55-64",
          subtitle = "For Baked Bread/Buns/Rolls")

group5.3 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "55-64")) %>% 
  filter(product_category == "FLUID MILK PRODUCTS")

group5.3 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 1700) %>%
  left_join(group5.3) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "FLUID MILK PRODUCTS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "red") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 55-64",
          subtitle = "For Fluid Milk Products")

65 and up

This section will display the code and graphs for the top 3 categories.

group6 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "65+")) %>% 
  group_by(product_category) %>% 
  count(product_category) %>% 
  arrange(desc(n))

group6.1 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "65+")) %>% 
  filter(product_category == "BAKED BREAD/BUNS/ROLLS")

group6.1 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 2000) %>%
  left_join(group6.1) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "BAKED BREAD/BUNS/ROLLS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "green") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 65+",
          subtitle = "For Baked Bread/Buns/Rolls")

group6.2 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "65+")) %>% 
  filter(product_category == "SOFT DRINKS")

group6.2 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 2000) %>%
  left_join(group6.2) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "SOFT DRINKS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "blue") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 65+",
          subtitle = "For Soft Drinks")

group6.3 <- t1 %>% 
  inner_join(products) %>% 
  inner_join(demographics) %>% 
  filter(str_detect(age, "65+")) %>% 
  filter(product_category == "FLUID MILK PRODUCTS")

group6.3 %>%
  group_by(product_category) %>%
  count(name = "Sales_Quantity") %>%
  arrange(desc(Sales_Quantity)) %>%
  filter(Sales_Quantity > 2000) %>%
  left_join(group6.3) %>%
  select(product_category, Sales_Quantity, transaction_timestamp) %>%
  filter(product_category == "FLUID MILK PRODUCTS") %>% 
  mutate(only_months = month(transaction_timestamp)) %>%
  group_by(only_months) %>%
  count(name = "Sales_Quantity") %>%
  ggplot(aes(x = only_months, y = Sales_Quantity, fill = Sales_Quantity)) +
  geom_col() +
  scale_fill_gradient(low = "black", high = "darkorchid") +
  scale_x_continuous("Month",
                     breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
                     label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
  scale_y_continuous("Number of Sales") +
  ggtitle("Sales Per Month For Customers 65+",
          subtitle = "For Fluid Milk Products")

Summary

Overall, our goal was to understand how certain age groups were spending on top categories throughout the year to increase promotion during down months.

We stated by loading the data, grouping it by age and months, and slicing the data for top categories which eventually gave us 18 unique graphs.

Insights for 19-24 years old

For this age group, the most impacted item was Soft Drinks. This category saw huge down swings in January, May, and September. This is a very strange pattern with dips every three months. January has the lowest sales most likely due to the start of the new year and the dieting that generally follows. We would recommend advertising the healthier soft drinks during this time to make up for lower sales. Lastly, for May and September, having personalized ads on social media to get the youth to think more about soft drinks and hydration will be huge to pick these months up.

Insights for 25-34 years old

For ages 25-34 cheese is a top-selling category with a noticeably high sales period from September through December.We should focus marketing efforts and increase promotions from January through August. This could be done through coupons and ad campaigns that specialize in cheese. We should also try to target other age ranges, since this is the only age group that had cheese as a top three product category in sales, and the more popular we make it the more sales we will get.

Insights for 35-44 years old

For ages 35-44 bread/buns/rolls is a top-selling category with a higher sales period in the summer months.We should focus on increasing marketing efforts from November through February, as those are our worst months.We should target the 25-34 age group more since this is the only age group that did not have bread in the top three sales by product category. Likewise, having a way to bring this age group in and buy like they do during the summer, the more sales Regork will obtain.

Insights for 45-54 years old

The 45-54 age range had very consistent sales in their top-three product categories sold: soft drinks, fluid milk products, and bread/buns/rolls.Therefore no necessary marketing changes need to be made in targeting this age group in these categories. We should focus efforts elsewhere.

Insights for 55-64 years old

For ages 55-64 years old, bread is again a top-selling category with the same seasonal high in sales during the summer months, with especially low sales in January and February. (There was a very similar trend for 65+).We should focus on increasing marketing efforts and promotional campaigns during the off-season, specifically January and February.This will even out our sales for the year and lead to more volume and sales for the company.

Insights for 65+ years old

For ages 65+ soft drinks were a top-selling category with higher sales in warmer months (May through September). Sales were again very high in December, most likely due to the holidays. We should focus on increasing marketing efforts during the off-season, specifically November with a particularly noticeable low month of sales.This could look like a Thanksgiving campaign on soft drinks and offer discounts throughout those low months.

Overall, we hope that this report has given you the insights and information about age groups and their seasonal spending throughout the year. We believe that with this information, we can create a sustainable sales outlook for Regork as the quantity sold would be even in each month. The more targeted the marketing and ads can be to each age group and down months, the higher the chances of positive success are. Another key is to market by the age group itself, as it makes more sense to use outlets like the newspaper and Facebook for the older groups, but not so much on the younger end.

We do understand that this report is slightly limited with the amount of data that was given and the short time frame given for the project. Given more time, this project can be expanded on with things such as a weekly breakdown during the low months, a breakdown into the top selling brands within the category, or even looking at more categories in general. Regardless, we hope this report has been informative and helpful as you continue moving the company forward.