I am trying to solve the business problem of not marketing correctly to specific demographics. This analysis uncovers areas to increase revenue through targeted marketing based on existing spending habits and where there is over/under-indexing for different demographics.
To arrive at a solution to where we can earn more revenue from specific demographics, it was important to index whether customers were below, above, or average in spending. This provided insight for which demographics were over or under-indexing in categories and where we could adjust our marketing investment. With this information, we can alter our advertising strategies or coupons to encourage more spending from those customers. The data required is the transactions, products, and demographics data.
This analysis provides easily intpretable plots that show where spending is low or high and provides an index score to compare to all other demographics. Data from plots shows areas of improvement for marketing and promotions.
Complete Journey - Provides required data for analysis
Tidyverse - Provides tools to tidy and visualize data
Knitr - Provides tools to create tables
# Load Packages
library(completejourney)
library(tidyverse)
library(knitr)
# Load Transactions Data
transactions <- get_transactions()
# Join Data
demo_purc <- transactions %>%
left_join(products, by = "product_id") %>%
left_join(demographics, by = "household_id")
# Share of Total Spend by Group
category_demo_summary <- demo_purc %>%
group_by(income, product_category) %>%
summarize(
total_revenue = sum(sales_value),
total_units = sum(quantity),
unique_households = n_distinct(household_id),
.groups = "drop"
)
# Overall share by demographic group
overall_share <- demo_purc %>%
group_by(income) %>%
summarize(group_spend = sum(sales_value), .groups = "drop") %>%
mutate(overall_share = group_spend / sum(group_spend))
# Share by category
category_share <- category_demo_summary %>%
group_by(product_category) %>%
mutate(category_share = total_revenue / sum(total_revenue)) %>%
ungroup()
# Create index by combining demographic share with category share
index_scores <- category_share %>%
left_join(overall_share, by = "income") %>%
mutate(index = (category_share / overall_share) * 100)
# Find Groups above and below average spend
## Below Index
index_scores %>%
filter(index < 75) %>%
arrange(index) %>%
select(income, product_category, index)
## # A tibble: 913 × 3
## income product_category index
## <ord> <chr> <dbl>
## 1 15-24K PROD SUPPLIES 0
## 2 35-49K MEAT SUPPLIES 0
## 3 50-74K BULK FOODS 0
## 4 50-74K PROD SUPPLIES 0
## 5 75-99K COFFEE SHOP SWEET GOODS&RETAIL 2.27
## 6 175-199K PET CARE SUPPLIES 2.53
## 7 35-49K BOTTLE DEPOSITS 2.75
## 8 250K+ FRZN JCE CONC/DRNKS 2.80
## 9 Under 15K BIRD SEED 3.30
## 10 50-74K SERVICE BEVERAGE 3.54
## # ℹ 903 more rows
## Above Index
index_scores %>%
filter(index > 125) %>%
arrange(desc(index)) %>%
select(income, product_category, index)
## # A tibble: 798 × 3
## income product_category index
## <ord> <chr> <dbl>
## 1 200-249K RESTRICTED DIET 4607.
## 2 175-199K HOME HEALTH CARE 2485.
## 3 200-249K COUPONS/STORE & MFG 2088.
## 4 25-34K MISCELLANEOUS CROUTONS 2077.
## 5 250K+ PROD SUPPLIES 1496.
## 6 250K+ MISCELLANEOUS HBC 1334.
## 7 200-249K VEAL 1304.
## 8 250K+ FIREWORKS 1131.
## 9 125-149K PHARMACY 1096.
## 10 35-49K NDAIRY/TEAS/JUICE/SOD 947.
## # ℹ 788 more rows
# Create plot to visualize results
# Narrow top categories to prevent over plotting
top_categories <- category_demo_summary %>%
group_by(product_category) %>%
summarize(total_households = sum(unique_households)) %>%
slice_max(total_households, n = 10) %>%
pull(product_category)
# Plot index scores on heat map to compare
index_scores %>%
filter(product_category %in% top_categories) %>%
ggplot(aes(x = income, y = product_category, fill = index)) +
geom_tile() +
scale_fill_gradient2(low = "red", mid = "white", high = "blue", midpoint = 100) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Purchase Index by Income and Product Category",
fill = "Index\n(100 = avg)")
ggsave("income_heatmap.png", width = 10, height = 6, dpi = 300)
Heat map shows high spending in soup and crackers categories for high income group and low spending in soft drinks and fluid milk.
# Create age index to see spending habits based on age
age_index <- demo_purc %>%
filter(!is.na(age)) %>%
group_by(age, product_category) %>%
summarize(total_revenue = sum(sales_value), .groups = "drop") |>
group_by(product_category) %>%
mutate(category_share = total_revenue / sum(total_revenue)) %>%
ungroup() %>%
left_join(
demo_purc %>%
filter(!is.na(age)) %>%
group_by(age) %>%
summarize(group_spend = sum(sales_value), .groups = "drop") %>%
mutate(overall_share = group_spend / sum(group_spend)),
by = "age"
) %>%
mutate(index = (category_share / overall_share) * 100)
# Plot to compare spending by age
age_index %>%
filter(product_category %in% top_categories) %>%
ggplot(aes(x = age, y = index, fill = index)) +
geom_col() +
geom_hline(yintercept = 100, linetype = "dashed", color = "black") +
scale_fill_gradient2(low = "red", mid = "white", high = "blue", midpoint = 100) +
facet_wrap(~ product_category, scales = "free_y") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Purchase Index by Age and Product Category",
x = "Age Range", y = "Index (100 = avg)", fill = "Index")
ggsave("age_bar.png", width = 10, height = 6, dpi = 300)
This bar chart shows lower spending on bag snacks and cheese in the oldest age group. There is high spending on either extreme for soup and high spending on eggs in the oldest age groups and high spending on soft drinks in the youngest group.
# Create household size index
size_index <- demo_purc %>%
filter(!is.na(household_size)) %>%
group_by(household_size, product_category) %>%
summarize(total_revenue = sum(sales_value), .groups = "drop") %>%
group_by(product_category) %>%
mutate(category_share = total_revenue / sum(total_revenue)) %>%
ungroup() %>%
left_join(
demo_purc %>%
filter(!is.na(household_size)) %>%
group_by(household_size) %>%
summarize(group_spend = sum(sales_value), .groups = "drop") %>%
mutate(overall_share = group_spend / sum(group_spend)),
by = "household_size"
) %>%
mutate(index = (category_share / overall_share) * 100)
# Plot household index to compare spending habits
size_index %>%
filter(product_category %in% top_categories) %>%
ggplot(aes(x = product_category, y = index, fill = household_size)) +
geom_col(position = "dodge") +
geom_hline(yintercept = 100, linetype = "dashed", color = "black") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Purchase Index by Household Size and Product Category",
x = "Product Category", y = "Index (100 = avg)", fill = "Household Size")
ggsave("household_size_bar.png", width = 10, height = 6, dpi = 300)
Household size shows large households are the biggest spenders when it comes to crackers, bread, and fluid milk products. On the other side, one person households spend more on eggs, soup, and soft drinks.
# Create marital status index
marital_index <- demo_purc %>%
filter(!is.na(marital_status)) %>%
mutate(marital_status = case_when(
marital_status == "Married" ~ "Married",
TRUE ~ "Unmarried"
)) %>%
group_by(marital_status, product_category) %>%
summarize(total_revenue = sum(sales_value), .groups = "drop") %>%
group_by(product_category) %>%
mutate(category_share = total_revenue / sum(total_revenue)) %>%
ungroup() %>%
left_join(
demo_purc %>%
filter(!is.na(marital_status)) %>%
mutate(marital_status = case_when(
marital_status == "Married" ~ "Married",
TRUE ~ "Unmarried"
)) %>%
group_by(marital_status) %>%
summarize(group_spend = sum(sales_value), .groups = "drop") %>%
mutate(overall_share = group_spend / sum(group_spend)),
by = "marital_status"
) %>%
mutate(
index = (category_share / overall_share) * 100,
deviation = index - 100
)
# Plot marital status index to compare spending habits
marital_index %>%
filter(product_category %in% top_categories) %>%
ggplot(aes(x = reorder(product_category, deviation), y = deviation, fill = marital_status)) +
geom_col(position = "dodge") +
geom_hline(yintercept = 0, linetype = "dashed") +
coord_flip() +
scale_fill_manual(values = c("Married" = "blue", "Unmarried" = "red")) +
labs(title = "Purchase Index Deviation by Marital Status and Product Category",
x = "Product Category", y = "Deviation from Average (0 = avg)", fill = "Marital Status")
marital_table <- marital_index %>%
filter(product_category %in% top_categories) %>%
select(marital_status, product_category, index) %>%
mutate(index = round(index, 1)) %>%
pivot_wider(names_from = marital_status, values_from = index) %>%
arrange(desc(Married)) %>%
rename("Product Category" = product_category)
ggsave("marital_status_bar.png", width = 10, height = 6, dpi = 300)
The marital status chart shows us that married people spend more on average on crackers and bread and unmarried spend more on soft drinks, beef, and eggs.
# Create combined index of age and income
combined_index <- demo_purc %>%
filter(!is.na(income), !is.na(age)) %>%
group_by(income, age, product_category) %>%
summarize(total_revenue = sum(sales_value), .groups = "drop") %>%
group_by(product_category) %>%
mutate(category_share = total_revenue / sum(total_revenue)) %>%
ungroup() %>%
left_join(
demo_purc %>%
filter(!is.na(income), !is.na(age)) %>%
group_by(income, age) %>%
summarize(group_spend = sum(sales_value), .groups = "drop") %>%
mutate(overall_share = group_spend / sum(group_spend)),
by = c("income", "age")
) %>%
mutate(index = (category_share / overall_share) * 100)
# Chart was too busy with 10 categories, so we will reduce to top 5
top_5_categories <- category_demo_summary %>%
group_by(product_category) %>%
summarize(total = sum(total_revenue)) %>%
slice_max(total, n = 5) %>%
pull(product_category)
# Plot combined index
combined_index %>%
filter(product_category %in% top_5_categories) %>%
ggplot(aes(x = income, y = index, fill = age)) +
geom_col(position = "dodge") +
geom_hline(yintercept = 100, linetype = "dashed", color = "black") +
facet_wrap(~ product_category, ncol = 2) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Purchase Index by Income and Age Group",
x = "Income", y = "Index (100 = avg)", fill = "Age Group")
ggsave("age_income_combined.png", width = 10, height = 6, dpi = 300)
age_income_table <- combined_index %>%
filter(product_category %in% top_5_categories) %>%
mutate(
index = round(index, 1),
age_income = paste(age, "|", income)
) %>%
select(product_category, age_income, index) %>%
pivot_wider(names_from = product_category, values_from = index) %>%
rename("Age | Income" = age_income)
kable(age_income_table)
| Age | Income | BEEF | CHEESE | COUPON/MISC ITEMS | FLUID MILK PRODUCTS | SOFT DRINKS |
|---|---|---|---|---|---|
| 19-24 | Under 15K | 147.8 | 168.2 | 47.3 | 125.5 | 154.0 |
| 25-34 | Under 15K | 201.6 | 100.4 | 84.4 | 78.3 | 157.1 |
| 35-44 | Under 15K | 140.8 | 98.3 | 94.1 | 80.9 | 120.1 |
| 45-54 | Under 15K | 108.6 | 118.1 | 105.5 | 92.5 | 129.9 |
| 55-64 | Under 15K | 114.2 | 96.6 | 106.4 | 56.2 | 74.7 |
| 65+ | Under 15K | 144.9 | 49.8 | 66.6 | 62.6 | 109.0 |
| 19-24 | 15-24K | 118.2 | 77.9 | 12.6 | 61.1 | 209.4 |
| 25-34 | 15-24K | 113.8 | 124.7 | 63.9 | 87.5 | 98.4 |
| 35-44 | 15-24K | 115.5 | 80.9 | 98.8 | 105.2 | 145.4 |
| 45-54 | 15-24K | 136.7 | 79.5 | 84.4 | 102.8 | 222.3 |
| 55-64 | 15-24K | 127.1 | 115.3 | 28.4 | 85.9 | 110.6 |
| 65+ | 15-24K | 69.0 | 61.6 | 51.3 | 129.4 | 68.1 |
| 19-24 | 25-34K | 131.6 | 114.2 | 68.9 | 74.9 | 156.3 |
| 25-34 | 25-34K | 115.6 | 119.1 | 45.7 | 115.4 | 92.1 |
| 35-44 | 25-34K | 137.3 | 94.4 | 99.0 | 86.8 | 113.3 |
| 45-54 | 25-34K | 121.3 | 112.2 | 92.4 | 115.4 | 91.6 |
| 55-64 | 25-34K | 80.4 | 112.0 | 57.0 | 87.8 | 144.8 |
| 65+ | 25-34K | 166.0 | 79.3 | 78.2 | 78.3 | 118.6 |
| 19-24 | 35-49K | 71.9 | 119.9 | 119.3 | 97.1 | 93.8 |
| 25-34 | 35-49K | 89.6 | 95.7 | 70.8 | 110.5 | 88.2 |
| 35-44 | 35-49K | 120.9 | 117.0 | 111.6 | 92.9 | 85.9 |
| 45-54 | 35-49K | 110.3 | 107.0 | 81.9 | 125.7 | 103.8 |
| 55-64 | 35-49K | 164.7 | 102.1 | 86.4 | 105.8 | 99.5 |
| 65+ | 35-49K | 80.3 | 70.2 | 75.3 | 103.4 | 127.5 |
| 19-24 | 50-74K | 70.0 | 110.8 | 48.6 | 71.9 | 122.3 |
| 25-34 | 50-74K | 84.9 | 130.0 | 124.4 | 107.8 | 84.2 |
| 35-44 | 50-74K | 93.8 | 90.9 | 128.8 | 94.3 | 103.3 |
| 45-54 | 50-74K | 101.5 | 103.8 | 96.2 | 102.3 | 90.1 |
| 55-64 | 50-74K | 55.3 | 74.5 | 110.9 | 95.6 | 90.7 |
| 65+ | 50-74K | 101.6 | 83.0 | 65.2 | 82.0 | 33.2 |
| 19-24 | 75-99K | 137.2 | 168.5 | 12.2 | 53.8 | 273.8 |
| 25-34 | 75-99K | 56.3 | 101.5 | 162.1 | 126.9 | 99.0 |
| 35-44 | 75-99K | 89.6 | 97.7 | 91.2 | 93.6 | 97.9 |
| 45-54 | 75-99K | 93.8 | 95.1 | 102.2 | 107.5 | 115.1 |
| 55-64 | 75-99K | 122.0 | 130.7 | 101.7 | 66.5 | 77.2 |
| 65+ | 75-99K | 59.7 | 88.5 | 157.2 | 141.7 | 71.2 |
| 25-34 | 100-124K | 97.9 | 100.1 | 164.9 | 100.9 | 63.6 |
| 35-44 | 100-124K | 41.1 | 90.5 | 81.7 | 107.2 | 113.4 |
| 45-54 | 100-124K | 111.4 | 80.2 | 181.9 | 125.4 | 107.3 |
| 55-64 | 100-124K | 101.3 | 60.0 | 130.8 | 93.3 | 94.6 |
| 65+ | 100-124K | 61.2 | 63.0 | 150.0 | 90.4 | 91.9 |
| 25-34 | 125-149K | 40.9 | 108.7 | 103.8 | 135.4 | 58.0 |
| 35-44 | 125-149K | 73.5 | 102.4 | 97.0 | 96.1 | 59.0 |
| 45-54 | 125-149K | 62.1 | 95.1 | 158.5 | 90.4 | 72.5 |
| 55-64 | 125-149K | 45.5 | 177.3 | 201.6 | 130.8 | 95.6 |
| 65+ | 125-149K | 96.0 | 95.8 | 110.7 | 109.6 | 138.2 |
| 25-34 | 150-174K | 56.8 | 119.2 | 105.6 | 80.8 | 76.8 |
| 35-44 | 150-174K | 87.3 | 95.7 | 129.4 | 106.0 | 103.0 |
| 45-54 | 150-174K | 113.5 | 66.3 | 58.0 | 81.6 | 149.6 |
| 55-64 | 150-174K | 81.2 | 54.3 | 186.2 | 50.0 | 37.2 |
| 65+ | 150-174K | 93.4 | 49.8 | NA | 194.6 | 46.2 |
| 25-34 | 175-199K | 2.2 | 43.1 | 114.1 | 7.4 | 72.1 |
| 35-44 | 175-199K | 32.6 | 122.6 | 176.9 | 174.7 | 40.3 |
| 45-54 | 175-199K | 79.2 | 74.2 | 83.8 | 68.7 | 20.1 |
| 55-64 | 175-199K | 410.5 | 66.4 | 1.4 | 65.9 | 61.2 |
| 19-24 | 200-249K | 5.2 | 158.9 | 142.9 | 20.8 | 66.4 |
| 25-34 | 200-249K | 45.8 | 234.6 | 4.1 | 49.8 | 61.1 |
| 35-44 | 200-249K | 134.3 | 62.1 | 12.8 | 42.3 | 119.4 |
| 45-54 | 200-249K | 43.7 | 38.3 | NA | 102.0 | 47.8 |
| 19-24 | 250K+ | 22.7 | 97.0 | 90.6 | 36.7 | 97.9 |
| 35-44 | 250K+ | 98.3 | 67.3 | 77.1 | 93.6 | 66.3 |
| 45-54 | 250K+ | 96.6 | 84.4 | 22.9 | 129.4 | 58.8 |
This combined chart shows that 55-64 upper income individuals spend significantly more on beef. Younger, high income individuals spend more on cheese. It also shows potential for growth for high income individuals with soft drinks and fluid milk products.
# Create combined index of income and household size
combined_income_size <- demo_purc %>%
filter(!is.na(income), !is.na(household_size)) %>%
mutate(income_group = case_when(
income %in% c("Under 15K", "15-24K", "25-34K") ~ "Low (Under 35K)",
income %in% c("35-49K", "50-74K") ~ "Middle (35-74K)",
income %in% c("75-99K", "100-124K") ~ "Upper Middle (75-124K)",
income %in% c("125-149K", "150-174K", "175-199K", "200K+") ~ "High (125K+)",
TRUE ~ NA_character_
)) %>%
filter(!is.na(income_group)) %>%
group_by(income_group, household_size, product_category) %>%
summarize(total_revenue = sum(sales_value),
total_transactions = n(),
.groups = "drop") %>%
filter(total_transactions >= 30) %>%
group_by(product_category) %>%
mutate(category_share = total_revenue / sum(total_revenue)) %>%
ungroup() %>%
left_join(
demo_purc %>%
filter(!is.na(income), !is.na(household_size)) %>%
mutate(income_group = case_when(
income %in% c("Under 15K", "15-24K", "25-34K") ~ "Low (Under 35K)",
income %in% c("35-49K", "50-74K") ~ "Middle (35-74K)",
income %in% c("75-99K", "100-124K") ~ "Upper Middle (75-124K)",
income %in% c("125-149K", "150-174K", "175-199K", "200K+") ~ "High (125K+)",
TRUE ~ NA_character_
)) %>%
filter(!is.na(income_group)) %>%
group_by(income_group, household_size) %>%
summarize(group_spend = sum(sales_value), .groups = "drop") %>%
mutate(overall_share = group_spend / sum(group_spend)),
by = c("income_group", "household_size")
) %>%
mutate(
index = (category_share / overall_share) * 100,
income_group = factor(income_group, levels = c("Low (Under 35K)", "Middle (35-74K)",
"Upper Middle (75-124K)", "High (125K+)"))
)
# Plot combined index
combined_income_size %>%
filter(product_category %in% top_5_categories) %>%
ggplot(aes(x = index, y = income_group, color = household_size)) +
geom_point(size = 4, alpha = 0.8) +
geom_vline(xintercept = 100, linetype = "dashed", color = "black") +
facet_wrap(~ product_category, ncol = 2) +
labs(title = "Purchase Index by Income Group and Household Size",
x = "Index (100 = avg)", y = "Income Group", color = "Household Size")
income_size_table <- combined_income_size %>%
filter(product_category %in% top_5_categories) %>%
mutate(index = round(index, 1)) %>%
select(income_group, household_size, product_category, index) %>%
pivot_wider(names_from = product_category, values_from = index) %>%
arrange(income_group, household_size) %>%
rename("Income Group" = income_group, "Household Size" = household_size)
kable(income_size_table)
| Income Group | Household Size | BEEF | CHEESE | COUPON/MISC ITEMS | FLUID MILK PRODUCTS | SOFT DRINKS |
|---|---|---|---|---|---|---|
| Low (Under 35K) | 1 | 121.8 | 113.4 | 85.6 | 93.3 | 108.2 |
| Low (Under 35K) | 2 | 129.7 | 85.6 | 85.7 | 95.7 | 131.3 |
| Low (Under 35K) | 3 | 141.9 | 110.9 | 51.6 | 109.0 | 114.6 |
| Low (Under 35K) | 4 | 128.4 | 122.2 | 51.5 | 104.5 | 259.4 |
| Low (Under 35K) | 5+ | 87.2 | 90.2 | 114.6 | 101.0 | 119.4 |
| Middle (35-74K) | 1 | 108.4 | 106.0 | 89.2 | 105.0 | 103.1 |
| Middle (35-74K) | 2 | 95.9 | 101.2 | 103.9 | 96.6 | 89.9 |
| Middle (35-74K) | 3 | 85.1 | 100.9 | 84.8 | 95.7 | 82.6 |
| Middle (35-74K) | 4 | 97.2 | 101.4 | 140.0 | 114.0 | 83.8 |
| Middle (35-74K) | 5+ | 100.7 | 114.8 | 80.7 | 117.3 | 84.8 |
| Upper Middle (75-124K) | 1 | 58.2 | 89.9 | 136.5 | 103.3 | 126.7 |
| Upper Middle (75-124K) | 2 | 98.1 | 92.9 | 109.8 | 106.2 | 92.8 |
| Upper Middle (75-124K) | 3 | 85.1 | 104.3 | 101.4 | 95.0 | 126.7 |
| Upper Middle (75-124K) | 4 | 63.3 | 93.2 | 154.8 | 121.9 | 90.5 |
| Upper Middle (75-124K) | 5+ | 139.7 | 101.7 | 89.3 | 103.5 | 60.6 |
| High (125K+) | 1 | 66.5 | 60.6 | 108.8 | 73.6 | 89.9 |
| High (125K+) | 2 | 102.3 | 87.0 | 98.2 | 76.1 | 69.4 |
| High (125K+) | 3 | 64.7 | 110.3 | 81.6 | 100.6 | 103.0 |
| High (125K+) | 4 | 48.2 | 103.2 | 132.8 | 120.2 | 37.2 |
| High (125K+) | 5+ | 54.8 | 103.3 | 176.6 | 113.8 | 69.4 |
ggsave("income_size_dotplot.png", width = 10, height = 6, dpi = 300)
High income households with 1-2 people spend less on cheese. 4 person households are complete opposites for soft drink spending, so there is room for growth in high income to spend more on soft drinks.
This analysis uncovered areas to increase revenue through targeted marketing based on existing spending habits and where there is over/under-indexing for different demographics. These areas include soft drinks and fluid milk among high-income customers as well as beef for 55-64 year olds, and cheese for young, high-income customers.
I recommend a premium soft drink marketing campaign to reach the high-income demographic based on their current low index. This can include sparkling water and premium sodas. I also recommend organic or locally sourced milk promotions for high-income customers. For upper-income, 55-64 year olds, there should be a focus on premium beef/quality cuts since they spend significantly more in that category. To target the young, high-income individuals, there should be cheese promotions focused on quality cheese and pairing guides. To reach single person households in the bag snacks and cheese categories, there should be a promotion for convenient, single serve options. For large households of 4+ people, soft drinks should be the foucs with promotions on large quantity value packs to grow sales from high-income households.
Some limitations in this analysis include the data being older and not covering a long time period. Using an index could shows relative comparison, but does not show full purchasing power. Price sensitivity is also a variable not considered in this analysis.