This report analyzes survey data collected from 22 grocery store customers. Respondents were asked to rate various aspects of their shopping experience on Likert-type scales, and demographic information (gender, age group, and education level) was also recorded.This experience was very easy and straightforward. This grocery store customer segmentation dataset, though modest at just 22 respondents, offers a useful snapshot of shopper attitudes across several dimensions of the retail experience. The survey captures both service quality perceptions — staff helpfulness, professionalism, and ease of finding items — and preferences around modern shopping channels like online ordering, delivery, and pick-up. Core service measures trend positive overall, while pick-up responses are notably more mixed, hinting at an operational gap worth addressing. Demographically, the sample skews younger and female, which limits generalizability, but the data still reveals a telling pattern: most customers cluster in a “moderate satisfaction” band, representing a prime opportunity to convert lukewarm shoppers into loyal advocates. Future iterations would benefit from a larger, more diverse sample and open-ended questions to better understand the specific friction points driving that middle-ground sentiment.
Research questions explored:
# Install any missing packages before loading
required_packages <- c("tidyverse", "knitr", "kableExtra", "ggcorrplot",
"scales", "patchwork", "psych")
installed <- rownames(installed.packages())
to_install <- required_packages[!(required_packages %in% installed)]
if (length(to_install) > 0) install.packages(to_install, repos = "https://cloud.r-project.org")
library(tidyverse)
library(knitr)
library(kableExtra)
library(ggcorrplot)
library(scales)
library(patchwork)
library(psych)df_raw <- read.csv("customer_segmentation.csv", stringsAsFactors = FALSE)
# Trim whitespace from column names (one column has a trailing space)
names(df_raw) <- trimws(names(df_raw))
glimpse(df_raw)## Rows: 22
## Columns: 15
## $ ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …
## $ CS_helpful <int> 2, 1, 2, 3, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3…
## $ Recommend <int> 2, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Come_again <int> 2, 1, 1, 2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2…
## $ All_Products <int> 2, 1, 1, 4, 5, 2, 2, 2, 2, 1, 2, 2, 1, 2, 4, 2, 2, 1, 3…
## $ Profesionalism <int> 2, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2…
## $ Limitation <int> 2, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4…
## $ Online_grocery <int> 2, 2, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3, 1, 3, 2, 3, 2, 3, 1…
## $ delivery <int> 3, 3, 3, 3, 3, 2, 2, 1, 1, 2, 2, 2, 2, 3, 2, 1, 3, 3, 3…
## $ Pick_up <int> 4, 3, 2, 2, 1, 1, 2, 2, 3, 2, 2, 3, 2, 3, 2, 3, 5, 3, 1…
## $ Find_items <int> 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 1, 1, 3, 2, 1, 2, 1, 3…
## $ other_shops <int> 2, 2, 3, 2, 3, 4, 1, 4, 1, 1, 3, 3, 1, 1, 5, 5, 5, 2, 2…
## $ Gender <int> 1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2…
## $ Age <int> 2, 2, 2, 3, 4, 2, 2, 2, 2, 2, 4, 3, 4, 3, 2, 3, 2, 2, 2…
## $ Education <int> 2, 2, 2, 5, 2, 5, 3, 2, 1, 2, 5, 1, 5, 5, 5, 5, 1, 5, 2…
All Likert variables use numeric codes. We decode them into human-readable labels and convert demographic variables to ordered factors.
# ── Likert scale labels ──────────────────────────────────────────────────
likert3 <- c("1" = "Agree", "2" = "Neutral", "3" = "Disagree")
likert5 <- c("1" = "Strongly Agree", "2" = "Agree", "3" = "Neutral",
"4" = "Disagree", "5" = "Strongly Disagree")
# ── Demographic labels ───────────────────────────────────────────────────
gender_labels <- c("1" = "Male", "2" = "Female")
age_labels <- c("2" = "18–24", "3" = "25–34", "4" = "35–44")
education_labels <- c("1" = "Some High School", "2" = "High School Diploma",
"3" = "Some College", "5" = "Bachelor's or Higher")
df <- df_raw %>%
mutate(
# Service satisfaction items (3-point)
CS_helpful_f = factor(CS_helpful, levels = 1:3, labels = likert3),
Recommend_f = factor(Recommend, levels = 1:3, labels = likert3),
Come_again_f = factor(Come_again, levels = 1:3, labels = likert3),
Profesionalism_f = factor(Profesionalism, levels = 1:3, labels = likert3),
Online_grocery_f = factor(Online_grocery, levels = 1:3, labels = likert3),
delivery_f = factor(delivery, levels = 1:3, labels = likert3),
Find_items_f = factor(Find_items, levels = 1:3, labels = likert3),
# 5-point items
All_Products_f = factor(All_Products, levels = 1:5, labels = likert5),
Limitation_f = factor(Limitation, levels = 1:4,
labels = c("Strongly Agree","Agree","Neutral","Disagree")),
Pick_up_f = factor(Pick_up, levels = 1:5, labels = likert5),
other_shops_f = factor(other_shops, levels = 1:5, labels = likert5),
# Demographics
Gender_f = factor(Gender, levels = 1:2, labels = gender_labels),
Age_f = factor(Age, levels = c(2,3,4), labels = age_labels),
Education_f = factor(Education, levels = c(1,2,3,5), labels = education_labels)
)
cat("Rows:", nrow(df), " | Columns:", ncol(df), "\n")## Rows: 22 | Columns: 29
demo_summary <- df %>%
select(Gender_f, Age_f, Education_f) %>%
pivot_longer(everything(), names_to = "Variable", values_to = "Category") %>%
mutate(Variable = recode(Variable,
Gender_f = "Gender",
Age_f = "Age Group",
Education_f = "Education")) %>%
count(Variable, Category) %>%
mutate(Percent = percent(n / nrow(df), accuracy = 1))
demo_summary %>%
kable(caption = "Demographic Breakdown (N = 22)",
col.names = c("Variable", "Category", "n", "%")) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed"),
full_width = FALSE) %>%
collapse_rows(columns = 1, valign = "top")| Variable | Category | n | % |
|---|---|---|---|
| Age Group | 18–24 | 15 | 68% |
| 25–34 | 4 | 18% | |
| 35–44 | 3 | 14% | |
| Education | Some High School | 3 | 14% |
| High School Diploma | 8 | 36% | |
| Some College | 2 | 9% | |
| Bachelor’s or Higher | 9 | 41% | |
| Gender | Male | 16 | 73% |
| Female | 6 | 27% |
p_gender <- df %>%
count(Gender_f) %>%
ggplot(aes(x = Gender_f, y = n, fill = Gender_f)) +
geom_col(width = 0.6, show.legend = FALSE) +
geom_text(aes(label = n), vjust = -0.4, fontface = "bold") +
scale_fill_manual(values = c("#4E79A7","#F28E2B")) +
labs(title = "Gender", x = NULL, y = "Count") +
theme_minimal(base_size = 12)
p_age <- df %>%
count(Age_f) %>%
ggplot(aes(x = Age_f, y = n, fill = Age_f)) +
geom_col(width = 0.6, show.legend = FALSE) +
geom_text(aes(label = n), vjust = -0.4, fontface = "bold") +
scale_fill_brewer(palette = "Set2") +
labs(title = "Age Group", x = NULL, y = "Count") +
theme_minimal(base_size = 12)
p_edu <- df %>%
count(Education_f) %>%
ggplot(aes(x = n, y = reorder(Education_f, n), fill = Education_f)) +
geom_col(width = 0.6, show.legend = FALSE) +
geom_text(aes(label = n), hjust = -0.3, fontface = "bold") +
scale_fill_brewer(palette = "Pastel1") +
xlim(0, max(table(df$Education_f)) + 2) +
labs(title = "Education Level", x = "Count", y = NULL) +
theme_minimal(base_size = 12)
(p_gender | p_age) / p_edu +
plot_annotation(title = "Sample Demographics",
theme = theme(plot.title = element_text(size = 14, face = "bold")))service_vars <- c("CS_helpful","Recommend","Come_again","All_Products",
"Profesionalism","Limitation","Online_grocery",
"delivery","Pick_up","Find_items","other_shops")
df %>%
select(all_of(service_vars)) %>%
describe() %>%
select(n, mean, sd, median, min, max, skew, kurtosis) %>%
round(2) %>%
kable(caption = "Descriptive Statistics — Service & Shopping Variables") %>%
kable_styling(bootstrap_options = c("striped","hover","condensed"))| n | mean | sd | median | min | max | skew | kurtosis | |
|---|---|---|---|---|---|---|---|---|
| CS_helpful | 22 | 1.59 | 0.73 | 1 | 1 | 3 | 0.73 | -0.89 |
| Recommend | 22 | 1.32 | 0.65 | 1 | 1 | 3 | 1.67 | 1.38 |
| Come_again | 22 | 1.45 | 0.74 | 1 | 1 | 3 | 1.16 | -0.23 |
| All_Products | 22 | 2.09 | 1.06 | 2 | 1 | 5 | 1.18 | 0.79 |
| Profesionalism | 22 | 1.41 | 0.59 | 1 | 1 | 3 | 1.00 | -0.14 |
| Limitation | 22 | 1.50 | 0.80 | 1 | 1 | 4 | 1.59 | 1.99 |
| Online_grocery | 22 | 2.27 | 0.77 | 2 | 1 | 3 | -0.46 | -1.25 |
| delivery | 22 | 2.41 | 0.73 | 3 | 1 | 3 | -0.73 | -0.89 |
| Pick_up | 22 | 2.45 | 1.06 | 2 | 1 | 5 | 0.46 | -0.37 |
| Find_items | 22 | 1.45 | 0.67 | 1 | 1 | 3 | 1.06 | -0.19 |
| other_shops | 22 | 2.59 | 1.40 | 2 | 1 | 5 | 0.42 | -1.21 |
Note on scale direction: Lower scores indicate more positive responses (1 = Agree / Strongly Agree).
satisfaction_items <- c("CS_helpful_f","Recommend_f","Come_again_f",
"Profesionalism_f","Find_items_f")
item_labels <- c("CS Helpfulness","Recommend Store",
"Come Again","Professionalism","Find Items Easily")
freq_df <- map2_dfr(satisfaction_items, item_labels, function(var, lbl) {
df %>%
count(.data[[var]]) %>%
rename(Response = 1) %>%
mutate(Item = lbl, Pct = n / sum(n))
})
ggplot(freq_df, aes(x = Response, y = Pct, fill = Response)) +
geom_col(show.legend = FALSE, width = 0.7) +
geom_text(aes(label = percent(Pct, accuracy = 1)),
vjust = -0.4, size = 3.5, fontface = "bold") +
facet_wrap(~Item, nrow = 2) +
scale_y_continuous(labels = percent_format(), limits = c(0, 0.9)) +
scale_fill_manual(values = c("#59A14F","#EDC948","#E15759")) +
labs(title = "Customer Satisfaction: Key Service Items",
subtitle = "Response distribution across 22 respondents",
x = NULL, y = "Proportion") +
theme_minimal(base_size = 12) +
theme(strip.text = element_text(face = "bold"))channel_items <- c("Online_grocery_f","delivery_f","Pick_up_f")
channel_labels <- c("Online Grocery","Delivery","Pick-Up")
channel_df <- map2_dfr(channel_items, channel_labels, function(var, lbl) {
df %>%
count(.data[[var]]) %>%
rename(Response = 1) %>%
mutate(Item = lbl, Pct = n / sum(n))
})
ggplot(channel_df, aes(x = Response, y = Pct, fill = Item)) +
geom_col(position = "dodge", width = 0.7) +
geom_text(aes(label = percent(Pct, accuracy = 1)),
position = position_dodge(width = 0.7),
vjust = -0.4, size = 3.2, fontface = "bold") +
scale_y_continuous(labels = percent_format(), limits = c(0, 0.85)) +
scale_fill_brewer(palette = "Set1") +
labs(title = "Shopping Channel Preferences",
subtitle = "Agreement rates for online, delivery, and pick-up options",
x = "Response", y = "Proportion", fill = "Channel") +
theme_minimal(base_size = 12)cor_matrix <- df %>%
select(all_of(service_vars)) %>%
cor(use = "complete.obs")
ggcorrplot(cor_matrix,
method = "square",
type = "lower",
lab = TRUE,
lab_size = 3,
colors = c("#2166AC","white","#D6604D"),
title = "Correlation Matrix — Service & Shopping Variables",
ggtheme = theme_minimal(base_size = 11)) +
theme(plot.title = element_text(face = "bold", size = 13))Interpretation: Positive correlations (red) suggest items that tend to move together. Because the scale is reverse-coded (1 = positive), a positive correlation means respondents who agreed on one item tended to agree on the other.
df %>%
select(Gender_f, CS_helpful, Recommend, Come_again, Profesionalism, Find_items) %>%
pivot_longer(-Gender_f, names_to = "Item", values_to = "Score") %>%
group_by(Gender_f, Item) %>%
summarise(Mean = mean(Score), SE = sd(Score)/sqrt(n()), .groups = "drop") %>%
ggplot(aes(x = Item, y = Mean, fill = Gender_f)) +
geom_col(position = position_dodge(0.75), width = 0.65) +
geom_errorbar(aes(ymin = Mean - SE, ymax = Mean + SE),
position = position_dodge(0.75), width = 0.25) +
scale_fill_manual(values = c("#4E79A7","#F28E2B")) +
scale_y_continuous(limits = c(0, 3.2),
breaks = 1:3,
labels = c("1 = Agree","2 = Neutral","3 = Disagree")) +
labs(title = "Mean Satisfaction Scores by Gender",
subtitle = "Lower = more positive; error bars show ±1 SE",
x = NULL, y = "Mean Score", fill = "Gender") +
coord_flip() +
theme_minimal(base_size = 12)df %>%
select(Age_f, CS_helpful, Recommend, Come_again, Profesionalism, Find_items) %>%
pivot_longer(-Age_f, names_to = "Item", values_to = "Score") %>%
group_by(Age_f, Item) %>%
summarise(Mean = mean(Score), SE = sd(Score)/sqrt(n()), .groups = "drop") %>%
ggplot(aes(x = Item, y = Mean, color = Age_f, group = Age_f)) +
geom_point(size = 3, position = position_dodge(0.4)) +
geom_errorbar(aes(ymin = Mean - SE, ymax = Mean + SE),
position = position_dodge(0.4), width = 0.25) +
geom_line(position = position_dodge(0.4), linetype = "dashed", linewidth = 0.7) +
scale_color_brewer(palette = "Dark2") +
scale_y_continuous(limits = c(0.5, 3),
breaks = 1:3,
labels = c("1 = Agree","2 = Neutral","3 = Disagree")) +
labs(title = "Mean Satisfaction Scores by Age Group",
subtitle = "Lower = more positive; error bars show ±1 SE",
x = NULL, y = "Mean Score", color = "Age Group") +
coord_flip() +
theme_minimal(base_size = 12)We create a composite Overall Satisfaction Score by averaging the five core service items (CS_helpful, Recommend, Come_again, Professionalism, Find_items), then classify respondents as High, Moderate, or Low satisfaction.
df <- df %>%
mutate(
Satisfaction_Score = rowMeans(select(., CS_helpful, Recommend,
Come_again, Profesionalism, Find_items)),
Segment = case_when(
Satisfaction_Score <= 1.4 ~ "High Satisfaction",
Satisfaction_Score <= 2.2 ~ "Moderate Satisfaction",
TRUE ~ "Low Satisfaction"
),
Segment = factor(Segment,
levels = c("High Satisfaction","Moderate Satisfaction","Low Satisfaction"))
)
df %>%
count(Segment) %>%
mutate(Percent = percent(n / sum(n), accuracy = 1)) %>%
kable(caption = "Customer Segments Based on Composite Satisfaction Score",
col.names = c("Segment","n","%")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)| Segment | n | % |
|---|---|---|
| High Satisfaction | 16 | 73% |
| Moderate Satisfaction | 5 | 23% |
| Low Satisfaction | 1 | 5% |
p_seg_demo <- df %>%
count(Segment, Gender_f) %>%
ggplot(aes(x = Segment, y = n, fill = Gender_f)) +
geom_col(position = "fill", width = 0.65) +
scale_y_continuous(labels = percent_format()) +
scale_fill_manual(values = c("#4E79A7","#F28E2B")) +
labs(title = "Segments by Gender", x = NULL, y = "Proportion", fill = "Gender") +
theme_minimal(base_size = 11) +
theme(axis.text.x = element_text(angle = 15, hjust = 1))
p_seg_score <- df %>%
ggplot(aes(x = Segment, y = Satisfaction_Score, fill = Segment)) +
geom_boxplot(width = 0.5, show.legend = FALSE, alpha = 0.8) +
geom_jitter(width = 0.1, size = 2.5, alpha = 0.6, show.legend = FALSE) +
scale_fill_manual(values = c("#59A14F","#EDC948","#E15759")) +
labs(title = "Score Distribution by Segment",
x = NULL, y = "Avg. Satisfaction Score") +
theme_minimal(base_size = 11) +
theme(axis.text.x = element_text(angle = 15, hjust = 1))
p_seg_demo | p_seg_scorefindings <- tibble(
`#` = 1:5,
Finding = c(
"Majority of customers agree that staff are helpful and that they would recommend the store.",
"Pick-up service receives the most mixed ratings, suggesting room for improvement.",
"Online grocery and delivery channels show moderate-to-high agreement, indicating demand.",
"Younger customers (18–24) tend to show slightly higher overall satisfaction.",
"Most respondents fall in the 'Moderate Satisfaction' segment, pointing to an opportunity to convert them to highly satisfied customers."
),
Recommendation = c(
"Reinforce staff training programs and recognize high-performing employees.",
"Audit the pick-up process for bottlenecks; consider a dedicated pick-up lane.",
"Invest in digital infrastructure; promote online ordering and same-day delivery.",
"Tailor marketing and loyalty programs to retain younger demographics.",
"Identify the specific friction points for 'Moderate' customers and address them proactively."
)
)
findings %>%
kable(caption = "Summary of Findings and Recommendations") %>%
kable_styling(bootstrap_options = c("striped","hover","condensed")) %>%
column_spec(2, width = "45%") %>%
column_spec(3, width = "45%")| # | Finding | Recommendation |
|---|---|---|
| 1 | Majority of customers agree that staff are helpful and that they would recommend the store. | Reinforce staff training programs and recognize high-performing employees. |
| 2 | Pick-up service receives the most mixed ratings, suggesting room for improvement. | Audit the pick-up process for bottlenecks; consider a dedicated pick-up lane. |
| 3 | Online grocery and delivery channels show moderate-to-high agreement, indicating demand. | Invest in digital infrastructure; promote online ordering and same-day delivery. |
| 4 | Younger customers (18–24) tend to show slightly higher overall satisfaction. | Tailor marketing and loyalty programs to retain younger demographics. |
| 5 | Most respondents fall in the ‘Moderate Satisfaction’ segment, pointing to an opportunity to convert them to highly satisfied customers. | Identify the specific friction points for ‘Moderate’ customers and address them proactively. |
## R version 4.5.2 (2025-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/Los_Angeles
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] psych_2.6.3 patchwork_1.3.2 scales_1.4.0 ggcorrplot_0.1.4.1
## [5] kableExtra_1.4.0 knitr_1.51 lubridate_1.9.5 forcats_1.0.1
## [9] stringr_1.6.0 dplyr_1.2.0 purrr_1.2.1 readr_2.2.0
## [13] tidyr_1.3.2 tibble_3.3.1 ggplot2_4.0.2 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.10 generics_0.1.4 xml2_1.5.2 lattice_0.22-7
## [5] stringi_1.8.7 hms_1.1.4 digest_0.6.39 magrittr_2.0.4
## [9] evaluate_1.0.5 grid_4.5.2 timechange_0.4.0 RColorBrewer_1.1-3
## [13] fastmap_1.2.0 plyr_1.8.9 jsonlite_2.0.0 viridisLite_0.4.3
## [17] textshaping_1.0.5 jquerylib_0.1.4 mnormt_2.1.2 cli_3.6.5
## [21] rlang_1.1.7 withr_3.0.2 cachem_1.1.0 yaml_2.3.12
## [25] parallel_4.5.2 tools_4.5.2 reshape2_1.4.5 tzdb_0.5.0
## [29] vctrs_0.7.2 R6_2.6.1 lifecycle_1.0.5 pkgconfig_2.0.3
## [33] pillar_1.11.1 bslib_0.10.0 gtable_0.3.6 Rcpp_1.1.1
## [37] glue_1.8.0 systemfonts_1.3.2 xfun_0.57 tidyselect_1.2.1
## [41] rstudioapi_0.18.0 farver_2.1.2 nlme_3.1-168 htmltools_0.5.9
## [45] labeling_0.4.3 rmarkdown_2.30 svglite_2.2.2 compiler_4.5.2
## [49] S7_0.2.1