Growth Possibilities

The business problem I am looking into is which age and income groups are spending the most. I want to see which products are being bought the most or not bought enough. After finding this data, I will make my recommendations on how to keep certain age groups buying certain items, or how to get different age and income groups involved.

Load Necessary Libraries

Here I loaded the necessary libraries needed to view the data sets located in tidyverse and complete journey.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.

Load Data Sets

transactions <- get_transactions()
demographics <- completejourney::demographics

Inspect data sets

Datasets are inspected to check the structure and to see what data is located in there. You can also view any missing or unique values in the dataset.

glimpse(transactions)
## Rows: 1,469,307
## Columns: 11
## $ household_id          <chr> "900", "900", "1228", "906", "906", "906", "906"…
## $ store_id              <chr> "330", "330", "406", "319", "319", "319", "319",…
## $ basket_id             <chr> "31198570044", "31198570047", "31198655051", "31…
## $ product_id            <chr> "1095275", "9878513", "1041453", "1020156", "105…
## $ quantity              <dbl> 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ sales_value           <dbl> 0.50, 0.99, 1.43, 1.50, 2.78, 5.49, 1.50, 1.88, …
## $ retail_disc           <dbl> 0.00, 0.10, 0.15, 0.29, 0.80, 0.50, 0.29, 0.21, …
## $ coupon_disc           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ coupon_match_disc     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ week                  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ transaction_timestamp <dttm> 2017-01-01 06:53:26, 2017-01-01 07:10:28, 2017-…
glimpse(demographics)
## Rows: 801
## Columns: 8
## $ household_id   <chr> "1", "1001", "1003", "1004", "101", "1012", "1014", "10…
## $ age            <ord> 65+, 45-54, 35-44, 25-34, 45-54, 35-44, 45-54, 45-54, 4…
## $ income         <ord> 35-49K, 50-74K, 25-34K, 15-24K, Under 15K, 35-49K, 15-2…
## $ home_ownership <ord> Homeowner, Homeowner, NA, NA, Homeowner, NA, NA, Homeow…
## $ marital_status <ord> Married, Unmarried, Unmarried, Unmarried, Married, Marr…
## $ household_size <ord> 2, 1, 1, 1, 4, 5+, 4, 1, 5+, 2, 5+, 4, 2, 1, 5+, 1, 1, …
## $ household_comp <ord> 2 Adults No Kids, 1 Adult No Kids, 1 Adult No Kids, 1 A…
## $ kids_count     <ord> 0, 0, 0, 0, 2, 3+, 2, 0, 3+, 0, 3+, 2, 0, 0, 3+, 0, 0, …

Join demographics and transactions with household_id

In this code chunk, I am merging transaction data with the demographic information based on household IDs to link purchases to customer profiles.

df <- transactions %>%
  left_join(demographics, by = "household_id") %>%
  select(household_id, age, income, sales_value) %>%
  drop_na()

Summarize total spending by age and income

Group and summarize total spending by age and income groups.

demographic_spending <- df %>%
  group_by(age, income) %>%
  summarise(total_spent = sum(sales_value, na.rm = TRUE)) %>%
  ungroup()
## `summarise()` has grouped output by 'age'. You can override using the `.groups`
## argument.
glimpse(demographic_spending)
## Rows: 62
## Columns: 3
## $ age         <ord> 19-24, 19-24, 19-24, 19-24, 19-24, 19-24, 19-24, 19-24, 25…
## $ income      <ord> Under 15K, 15-24K, 25-34K, 35-49K, 50-74K, 75-99K, 200-249…
## $ total_spent <dbl> 30485.56, 7297.92, 9519.19, 28985.34, 28326.37, 7796.25, 1…

Visualize Total Spending by Age and Income

ggplot(demographic_spending, aes(x = age, y = total_spent, fill = as.factor(income))) +
  geom_col(position = "dodge") + 
  labs(title = "Income Distrubution per Age Group", x = "Age Groups", y = "Count", fill = "Income Level") +
  theme_minimal()

This bar chart displays total spending across age and income groups located in the data set. From the results of this chart, we can see that most of the clients are in the 45-54 age group. Regork is lacking in the 19-24 and 55+ age groups. In these age groups, the average income level ranges from 34-49k and 50-74k. We can use this information and make recommendations based on the given income level and targeted age groups. This also allows us to create recommendations to increase activity in the age groups that aren’t as involved.

Load product data sets

Next, I loaded the “products” data set to be able to links products with demographics and household IDs.

data("products")

Join products with the demographic spending dataset

df <- transactions %>%
  left_join(demographics, by = "household_id") %>%
  select(household_id, age, income, sales_value, product_id) %>%
  drop_na()
df_products <- df %>%
  left_join(products, by = "product_id") %>%
  select(household_id, age, income, sales_value, product_category) %>%
  drop_na(product_category)

Summarize total spending by age, income, and product category

This section calculates the total sales value for each product category by age and income groups.

product_spending <- df_products %>%
  group_by(age, income, product_category) %>%
  summarise(total_spent = sum(sales_value, na.rm = TRUE)) %>%
  ungroup()
## `summarise()` has grouped output by 'age', 'income'. You can override using the
## `.groups` argument.

Separating Income Levels

df_products <- df_products %>%
  mutate(income_group = case_when(
    income <= 50 ~ "Low",
    income > 50 & income <= 100 ~ "Middle",
    income > 100 ~ "High"
))

Filter Top 5 Products for Each Age Group

top_5_products <- df_products %>% group_by(age, product_category) %>% summarise(total_spent = sum(sales_value, na.rm = TRUE)) %>% slice_max(order_by = total_spent, n = 5) %>% ungroup() %>% select(age, product_category)

Join Top Products with Original Data

df_top_products <- df_products %>% semi_join(top_5_products, by = c(“age”, “product_category”))

Visualize Spending by Income Group for Top 5 Products

ggplot(df_top_products, aes(x = product_category, y = sales_value, fill = income_group)) + geom_col(position = “dodge”, stat = “summary”, fun = sum) + facet_wrap(~ age) + labs(title = “Top 5 Product Spending by Income Group and Age Group”, x = “Product Category”, y = “Total Spending”, fill = “Income Group”) + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10), axis.title = element_text(size = 12), plot.title = element_text(size = 14), strip.text = element_text(size = 12)) + theme_minimal()

Visualize Spending by Income Group and Product Category

Displays product spending for each income group across different age groups.

ggplot(df_products, aes(x = product_category, y = sales_value, fill = income_group)) +
  geom_col(position = "dodge", stat = "summary", fun = sum) +
  facet_wrap(~ age) +
  labs(title = "Spending by Income Group and Product Category", x = "Product Category", y = "Total Spending", fill = "Income Group") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
        axis.title = element_text(size = 12),
        plot.title = element_text(size = 14), 
        strip.text = element_text(size = 12)) +
  theme_minimal()
## Warning in geom_col(position = "dodge", stat = "summary", fun = sum): Ignoring
## unknown parameters: `stat` and `fun`

Visualized Count of Products Bought by Age Groups and Income

A bar plot that counts the number of products purchased by different age and income groups.

ggplot(df_products, aes(x = product_category, fill = income_group)) + 
  geom_bar(position = "dodge") +
  facet_wrap(~ age) + 
  labs(title = "Count of Products Bought by Age Group and Income Level", x = "Product Category", y = "Count", fill = "Income Group") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10), 
        axis.title = element_text(size = 12),
        plot.title = element_text(size = 14),
        strip.text = element_text(size = 12)) +
  theme_minimal()

Tibble for Top Product Bought in Each Age Group

Identifies the top 5 best-selling product categories for each age group.

top_products <- df_products %>%
  group_by(age, product_category) %>%
  summarise(total_spent = sum(sales_value, na.rm = TRUE)) %>%
  arrange(desc(total_spent)) %>%
  slice_max(order_by = total_spent, n = 5) %>%
  ungroup()
## `summarise()` has grouped output by 'age'. You can override using the `.groups`
## argument.
print(top_products, n = 30)
## # A tibble: 30 × 3
##    age   product_category       total_spent
##    <ord> <chr>                        <dbl>
##  1 19-24 COUPON/MISC ITEMS            7840.
##  2 19-24 SOFT DRINKS                  6572.
##  3 19-24 BEEF                         4398.
##  4 19-24 FROZEN PIZZA                 3699.
##  5 19-24 CHEESE                       3672.
##  6 25-34 COUPON/MISC ITEMS           43606.
##  7 25-34 SOFT DRINKS                 15150.
##  8 25-34 BEEF                        14737.
##  9 25-34 FLUID MILK PRODUCTS         11913.
## 10 25-34 CHEESE                      11723.
## 11 35-44 COUPON/MISC ITEMS           72982.
## 12 35-44 SOFT DRINKS                 27337.
## 13 35-44 BEEF                        27124.
## 14 35-44 FLUID MILK PRODUCTS         17171.
## 15 35-44 CHEESE                      16180.
## 16 45-54 COUPON/MISC ITEMS           89171.
## 17 45-54 SOFT DRINKS                 38801.
## 18 45-54 BEEF                        36485.
## 19 45-54 FLUID MILK PRODUCTS         25563.
## 20 45-54 CHEESE                      21791.
## 21 55-64 COUPON/MISC ITEMS           18077.
## 22 55-64 BEEF                         6956.
## 23 55-64 SOFT DRINKS                  5744.
## 24 55-64 FRZN MEAT/MEAT DINNERS       3730.
## 25 55-64 CHEESE                       3657.
## 26 65+   COUPON/MISC ITEMS           13810.
## 27 65+   BEEF                         6190.
## 28 65+   SOFT DRINKS                  6180.
## 29 65+   FLUID MILK PRODUCTS          4401.
## 30 65+   BAKED BREAD/BUNS/ROLLS       3834.

Visualize Top 5 Products for Each Age Group

Creating a bar chart to display the top 5 products for each age group by total spending.

ggplot(top_products, aes(x = reorder(product_category, -total_spent), y = total_spent, fill = as.factor(age))) +
  geom_col() +
  facet_wrap(~ age, scales = "free_x") + 
  labs(title = " Top 5 Products by Age Group", x = "Product Cateogry", y = "Total Spending", fill = "Age Group") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8, vjust = 1),
        axis.title = element_text(size = 12),
        plot.title = element_text(size = 14),
        strip.text = element_text(size = 12)) +
  theme_minimal()

Analytical Approach

In my analysis, I used the “transactions” and “demographics” data sets from the CompleteJourney package. After merging transactions and demographics, I summarized total spending by age and income groups. Next, I merged both data sets with the “products” data set and made multiple visualizations to find certain trends on product spending within age and income groups. After finding the spending results from the top 5 products in the age and income groups, I made my recommendations.

My Recommendations

Based on the items listed in the tibble from earlier, the top 5 items were: Soft Drinks, Beef, Fluid Milk Products, Frozen Pizza, and Cheese. My first recommendation would be to target the younger age groups with promotions. This age group of 19-24 years old had the lowest overall spending out of all of the groups, meaning there is room to gather some more activity from them. I recommend that Regork should introduce either a student discount for college kids and maybe offer a bundle deal that includes frozen pizza and soft drinks. Coupons being created for certain products would also be a great idea, since this age groups heavily relies on discounts.

Regarding the 55+ age groups, they are a lower spending group, but they primarily buy Baked Breads, Fluid Milk Products, and Beef. I suggest that Regork increases their marketing towards this group on healthy options such as the convenient products like dairy and baked bread.

The highest spending groups is the middle-aged group that consists of ages 45-54. They are the leading buyers of almost all of the top 5 products listed above. I suggest that Regork keeps up on marketing to this group and offering new promotions for all of these products to ensure repeat purchases. For the higher income levels in this group, Regork can offer premium products at a higher price to create new top products specifically for this group. Premium products will double revenue and provide the store with plenty of options for all groups, no matter the income level or age.

In conclusion, the recommendations above will give Regork what they need to ensure that they are marketing to the right groups and offering sought after products to ensure that engagement improves among the age groups, and that they are maximizing revenue.