The goal of this analysis is to identify key demographic groups among
retail shoppers and understand their shopping behavior. We will use the
completejourney package to analyze transaction data,
demographics, and promotions. Our primary focus will be on understanding
which demographic groups are the most frequent shoppers and their
response to various promotions.
## Packages Required
install.packages("completejourney", repos = "https://cran.rstudio.com/")
## package 'completejourney' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\ashto\AppData\Local\Temp\RtmpGmeBYL\downloaded_packages
library(completejourney)
library(dplyr)
library(ggplot2)
transactions <- get_transactions()
demographics <- completejourney::demographics
head(transactions)
## # A tibble: 6 × 11
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 900 330 31198570044 1095275 1 0.5 0
## 2 900 330 31198570047 9878513 1 0.99 0.1
## 3 1228 406 31198655051 1041453 1 1.43 0.15
## 4 906 319 31198705046 1020156 1 1.5 0.29
## 5 906 319 31198705046 1053875 2 2.78 0.8
## 6 906 319 31198705046 1060312 1 5.49 0.5
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>
head(demographics)
## # A tibble: 6 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
colnames(demographics)
## [1] "household_id" "age" "income" "home_ownership"
## [5] "marital_status" "household_size" "household_comp" "kids_count"
transactions <- transactions %>% filter(!is.na(sales_value))
transactions <- transactions %>% distinct()
demographics <- demographics %>% distinct()
summary(transactions)
## household_id store_id basket_id product_id
## Length:1469307 Length:1469307 Length:1469307 Length:1469307
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## quantity sales_value retail_disc coupon_disc
## Min. : 0.0 Min. : 0.000 Min. : 0.0000 Min. : 0.00000
## 1st Qu.: 1.0 1st Qu.: 1.290 1st Qu.: 0.0000 1st Qu.: 0.00000
## Median : 1.0 Median : 2.000 Median : 0.0100 Median : 0.00000
## Mean : 104.1 Mean : 3.128 Mean : 0.5388 Mean : 0.01794
## 3rd Qu.: 1.0 3rd Qu.: 3.490 3rd Qu.: 0.6800 3rd Qu.: 0.00000
## Max. :89638.0 Max. :840.000 Max. :130.0200 Max. :55.93000
## coupon_match_disc week transaction_timestamp
## Min. :0.000000 Min. : 1.00 Min. :2017-01-01 06:53:26.00
## 1st Qu.:0.000000 1st Qu.:14.00 1st Qu.:2017-04-01 19:31:00.00
## Median :0.000000 Median :27.00 Median :2017-07-02 11:17:58.00
## Mean :0.003092 Mean :27.37 Mean :2017-07-02 10:58:07.54
## 3rd Qu.:0.000000 3rd Qu.:41.00 3rd Qu.:2017-10-02 12:28:34.00
## Max. :7.700000 Max. :53.00 Max. :2017-12-31 23:01:20.00
summary(demographics)
## household_id age income home_ownership
## Length:801 19-24: 46 50-74K :192 Renter : 42
## Class :character 25-34:142 35-49K :172 Probable Renter : 11
## Mode :character 35-44:194 75-99K : 96 Homeowner :504
## 45-54:288 25-34K : 77 Probable Homeowner: 11
## 55-64: 59 15-24K : 74 Unknown : 0
## 65+ : 72 Under 15K: 61 NA's :233
## (Other) :129
## marital_status household_size household_comp kids_count
## Married :340 1 :255 1 Adult Kids : 93 0 :513
## Unmarried:324 2 :318 1 Adult No Kids :255 1 :159
## Unknown : 0 3 :109 2 Adults Kids :195 2 : 60
## NA's :137 4 : 53 2 Adults No Kids:258 3+ : 69
## 5+: 66 Unknown : 0 Unknown: 0
##
##
ggplot(transactions, aes(x = sales_value)) +
geom_histogram(binwidth = 0.5, fill = "skyblue", color = "black", alpha = 0.7) +
scale_x_continuous(limits = c(0, 100)) + # Adjusting x-axis limits to focus on lower range
labs(title = "Distribution of Sales Values",
subtitle = "Focusing on the lower range of retail transaction sales values",
x = "Sales Value",
y = "Frequency") +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
axis.title = element_text(face = "bold"),
panel.grid.major = element_line(color = "gray80"),
panel.grid.minor = element_blank()
)
merged_data <- transactions %>%
inner_join(demographics, by = "household_id")
age_spending <- merged_data %>%
group_by(age) %>%
summarize(average_spending = mean(sales_value))
ggplot(age_spending, aes(x = age, y = average_spending)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Average Spending by Age",
x = "Age",
y = "Average Spending")
merged_data <- transactions %>%
inner_join(demographics, by = "household_id")
income_spending <- merged_data %>%
group_by(income) %>%
summarize(average_spending = mean(sales_value))
ggplot(income_spending, aes(x = income, y = average_spending)) +
geom_bar(stat = "identity", fill = "lightcoral") +
labs(title = "Average Spending by Income Level",
x = "Income Level",
y = "Average Spending")
household_size_spending <- merged_data %>%
group_by(household_size) %>%
summarize(average_spending = mean(sales_value))
ggplot(household_size_spending, aes(x = household_size, y = average_spending)) +
geom_bar(stat = "identity", fill = "lightgreen") +
labs(title = "Average Spending by Household Size",
x = "Household Size",
y = "Average Spending")
purchase_frequency_age <- merged_data %>%
group_by(age) %>%
summarize(frequency = n())
ggplot(purchase_frequency_age, aes(x = age, y = frequency)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
labs(title = "Purchase Frequency by Age",
x = "Age",
y = "Frequency") +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold"),
panel.grid.major = element_line(color = "gray80"),
panel.grid.minor = element_blank()
)
purchase_frequency_income <- merged_data %>%
group_by(income) %>%
summarize(frequency = n())
ggplot(purchase_frequency_income, aes(x = income, y = frequency)) +
geom_bar(stat = "identity", fill = "darkorange") +
labs(title = "Purchase Frequency by Income Level",
x = "Income Level",
y = "Frequency") +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold"),
panel.grid.major = element_line(color = "gray80"),
panel.grid.minor = element_blank()
)
promotion_response <- merged_data %>%
mutate(promotion_used = ifelse(coupon_disc > 0, "Yes", "No")) %>%
group_by(age, promotion_used) %>%
summarize(count = n()) %>%
mutate(percentage = count / sum(count) * 100)
ggplot(promotion_response, aes(x = age, y = percentage, fill = promotion_used)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Promotion Response Rate by Age",
x = "Age",
y = "Percentage",
fill = "Promotion Used")
basket_size <- merged_data %>%
group_by(income) %>%
summarize(average_basket_size = mean(quantity, na.rm = TRUE))
ggplot(basket_size, aes(x = income, y = average_basket_size)) +
geom_bar(stat = "identity", fill = "lightblue") +
labs(title = "Average Basket Size by Income Level",
x = "Income Level",
y = "Average Basket Size") +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold"),
panel.grid.major = element_line(color = "gray80"),
panel.grid.minor = element_blank()
)
The analysis reveals that certain demographic groups exhibit distinct shopping behaviors. For example, the average spending varies significantly across age groups, with younger shoppers tending to spend more per transaction. Additionally, higher-income households show different spending patterns compared to lower-income households. Households with more members tend to have higher average spending.
Promotion response rates vary across different age groups, indicating that some age groups are more inclined to use promotions. The average basket size also varies by income level, suggesting that higher-income shoppers tend to purchase more items per transaction.
Targeted Promotions: Develop targeted marketing campaigns for higher-income households to leverage their distinct spending patterns. Focus on age groups that are more responsive to promotions to maximize the effectiveness of promotional campaigns.
Age-Specific Marketing: Create age-specific marketing strategies to cater to the spending habits of different age groups. Younger shoppers tend to spend more per transaction, so offering them special deals and promotions might increase their shopping frequency.
Enhanced Loyalty Programs: Implement loyalty programs that offer personalized rewards based on demographic insights to drive customer engagement and retention. Focus on households with more members by offering family-oriented promotions and rewards, as they tend to spend more.
Inventory and Stock Management: Plan inventory based on the average basket size by income level to ensure that popular items are always in stock. Monitor purchase patterns over time to identify seasonal trends and adjust inventory accordingly.
Promotion Strategies: Design promotions that appeal to specific age groups that are more likely to use them. Consider offering bundled products or discounts on frequently purchased items to increase the average basket size.