At the start of this analysis I didn’t know what to expect from Claude AI in regards to the diversity of the data reports and how it would choose to analyze the data. I was genuinely impressed with its ability to generate the R Markdown file and process all the data differently each time it was asked.
This was an effective practice of using Claude to analyze data. I didn’t realize it was so good at creating the script, but I will be using this moving forward.
This report analyzes a grocery shopping survey dataset collected from classmates. The dataset contains 22 responses and 15 variables covering customer satisfaction, shopping behaviors, demographics, and preferences related to a grocery store experience.
Survey Scale Reference (unless otherwise noted):
| Score | Meaning |
|---|---|
| 1 | Strongly Agree / Very Satisfied |
| 2 | Agree / Satisfied |
| 3 | Neutral |
| 4 | Disagree / Dissatisfied |
| 5 | Strongly Disagree / Very Dissatisfied |
# Load required libraries
library(tidyverse)
library(ggplot2)
library(dplyr)
library(knitr)
library(kableExtra)
library(scales)
library(corrplot)
library(reshape2)# Load the dataset
df <- read.csv("customer_segmentation.csv", stringsAsFactors = FALSE)
# Trim whitespace from column names
colnames(df) <- trimws(colnames(df))
# Preview the data
head(df)## Number of rows (respondents): 22
## Number of columns (variables): 15
## 'data.frame': 22 obs. of 15 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ CS_helpful : int 2 1 2 3 2 1 2 1 1 1 ...
## $ Recommend : int 2 2 1 3 1 1 1 1 1 1 ...
## $ Come_again : int 2 1 1 2 3 3 1 1 1 1 ...
## $ All_Products : int 2 1 1 4 5 2 2 2 2 1 ...
## $ Profesionalism: int 2 1 1 1 2 1 2 1 2 1 ...
## $ Limitation : int 2 1 2 2 1 1 1 2 1 1 ...
## $ Online_grocery: int 2 2 3 3 2 1 2 1 2 3 ...
## $ delivery : int 3 3 3 3 3 2 2 1 1 2 ...
## $ Pick_up : int 4 3 2 2 1 1 2 2 3 2 ...
## $ Find_items : int 1 1 1 2 2 1 1 2 1 1 ...
## $ other_shops : int 2 2 3 2 3 4 1 4 1 1 ...
## $ Gender : int 1 1 1 1 2 1 1 1 2 2 ...
## $ Age : int 2 2 2 3 4 2 2 2 2 2 ...
## $ Education : int 2 2 2 5 2 5 3 2 1 2 ...
## CS_helpful Recommend Come_again All_Products
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.250
## Median :1.000 Median :1.000 Median :1.000 Median :2.000
## Mean :1.591 Mean :1.318 Mean :1.455 Mean :2.091
## 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :3.000 Max. :3.000 Max. :3.000 Max. :5.000
## Profesionalism Limitation Online_grocery delivery Pick_up
## Min. :1.000 Min. :1.0 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:1.0 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
## Median :1.000 Median :1.0 Median :2.000 Median :3.000 Median :2.000
## Mean :1.409 Mean :1.5 Mean :2.273 Mean :2.409 Mean :2.455
## 3rd Qu.:2.000 3rd Qu.:2.0 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :3.000 Max. :4.0 Max. :3.000 Max. :3.000 Max. :5.000
## Find_items other_shops Gender Age
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :2.000
## 1st Qu.:1.000 1st Qu.:1.250 1st Qu.:1.000 1st Qu.:2.000
## Median :1.000 Median :2.000 Median :1.000 Median :2.000
## Mean :1.455 Mean :2.591 Mean :1.273 Mean :2.455
## 3rd Qu.:2.000 3rd Qu.:3.750 3rd Qu.:1.750 3rd Qu.:3.000
## Max. :3.000 Max. :5.000 Max. :2.000 Max. :4.000
## Education
## Min. :1.000
## 1st Qu.:2.000
## Median :2.500
## Mean :3.182
## 3rd Qu.:5.000
## Max. :5.000
missing_counts <- colSums(is.na(df))
missing_df <- data.frame(
Variable = names(missing_counts),
Missing = missing_counts
)
kable(missing_df, row.names = FALSE, caption = "Missing Values per Variable") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)| Variable | Missing |
|---|---|
| ID | 0 |
| CS_helpful | 0 |
| Recommend | 0 |
| Come_again | 0 |
| All_Products | 0 |
| Profesionalism | 0 |
| Limitation | 0 |
| Online_grocery | 0 |
| delivery | 0 |
| Pick_up | 0 |
| Find_items | 0 |
| other_shops | 0 |
| Gender | 0 |
| Age | 0 |
| Education | 0 |
# Recode Gender: 1 = Male, 2 = Female
df$Gender_Label <- ifelse(df$Gender == 1, "Male", "Female")
gender_counts <- df %>%
count(Gender_Label) %>%
mutate(Percentage = round(n / sum(n) * 100, 1))
kable(gender_counts, col.names = c("Gender", "Count", "Percentage (%)"),
caption = "Gender Distribution") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Gender | Count | Percentage (%) |
|---|---|---|
| Female | 6 | 27.3 |
| Male | 16 | 72.7 |
ggplot(gender_counts, aes(x = Gender_Label, y = n, fill = Gender_Label)) +
geom_bar(stat = "identity", width = 0.5, color = "white") +
geom_text(aes(label = paste0(n, " (", Percentage, "%)")), vjust = -0.5, size = 4) +
scale_fill_manual(values = c("Male" = "#2E86AB", "Female" = "#E84855")) +
labs(title = "Gender Distribution of Survey Respondents",
x = "Gender", y = "Count") +
theme_minimal() +
theme(legend.position = "none")# Recode Age: 1 = Under 18, 2 = 18-25, 3 = 26-35, 4 = 36-50, 5 = 51+
age_labels <- c("1" = "Under 18", "2" = "18–25", "3" = "26–35", "4" = "36–50", "5" = "51+")
df$Age_Label <- recode(as.character(df$Age), !!!age_labels)
age_counts <- df %>%
count(Age_Label) %>%
arrange(match(Age_Label, age_labels))
ggplot(age_counts, aes(x = Age_Label, y = n, fill = Age_Label)) +
geom_bar(stat = "identity", color = "white") +
geom_text(aes(label = n), vjust = -0.5, size = 4) +
scale_fill_brewer(palette = "Blues") +
labs(title = "Age Distribution of Survey Respondents",
x = "Age Group", y = "Count") +
theme_minimal() +
theme(legend.position = "none")# Recode Education: 1=High School, 2=Some College, 3=Associate's, 4=Bachelor's, 5=Graduate+
edu_labels <- c("1" = "High School", "2" = "Some College",
"3" = "Associate's", "4" = "Bachelor's", "5" = "Graduate+")
df$Education_Label <- recode(as.character(df$Education), !!!edu_labels)
edu_counts <- df %>%
count(Education_Label) %>%
mutate(Percentage = round(n / sum(n) * 100, 1))
ggplot(edu_counts, aes(x = reorder(Education_Label, -n), y = n, fill = Education_Label)) +
geom_bar(stat = "identity", color = "white") +
geom_text(aes(label = paste0(n, "\n(", Percentage, "%)")), vjust = -0.3, size = 3.5) +
scale_fill_brewer(palette = "Set2") +
labs(title = "Education Level of Survey Respondents",
x = "Education Level", y = "Count") +
theme_minimal() +
theme(legend.position = "none",
axis.text.x = element_text(angle = 15, hjust = 1))cs_counts <- df %>%
count(CS_helpful) %>%
mutate(
Label = recode(as.character(CS_helpful),
"1" = "Strongly Agree", "2" = "Agree",
"3" = "Neutral", "4" = "Disagree", "5" = "Strongly Disagree"),
Percentage = round(n / sum(n) * 100, 1)
)
ggplot(cs_counts, aes(x = Label, y = n, fill = factor(CS_helpful))) +
geom_bar(stat = "identity", color = "white") +
geom_text(aes(label = paste0(n, " (", Percentage, "%)")), vjust = -0.4, size = 3.5) +
scale_fill_brewer(palette = "RdYlGn", direction = -1) +
labs(title = "Customer Service is Helpful",
x = "Response", y = "Count") +
theme_minimal() +
theme(legend.position = "none")rec_counts <- df %>%
count(Recommend) %>%
mutate(
Label = recode(as.character(Recommend),
"1" = "Strongly Agree", "2" = "Agree",
"3" = "Neutral", "4" = "Disagree", "5" = "Strongly Disagree"),
Percentage = round(n / sum(n) * 100, 1)
)
ggplot(rec_counts, aes(x = Label, y = n, fill = factor(Recommend))) +
geom_bar(stat = "identity", color = "white") +
geom_text(aes(label = paste0(n, " (", Percentage, "%)")), vjust = -0.4, size = 3.5) +
scale_fill_brewer(palette = "RdYlGn", direction = -1) +
labs(title = "Likelihood to Recommend the Store",
x = "Response", y = "Count") +
theme_minimal() +
theme(legend.position = "none")ca_counts <- df %>%
count(Come_again) %>%
mutate(
Label = recode(as.character(Come_again),
"1" = "Strongly Agree", "2" = "Agree",
"3" = "Neutral", "4" = "Disagree", "5" = "Strongly Disagree"),
Percentage = round(n / sum(n) * 100, 1)
)
ggplot(ca_counts, aes(x = Label, y = n, fill = factor(Come_again))) +
geom_bar(stat = "identity", color = "white") +
geom_text(aes(label = paste0(n, " (", Percentage, "%)")), vjust = -0.4, size = 3.5) +
scale_fill_brewer(palette = "RdYlGn", direction = -1) +
labs(title = "Likelihood to Come Again",
x = "Response", y = "Count") +
theme_minimal() +
theme(legend.position = "none")# Compute mean scores for key satisfaction variables
sat_vars <- c("CS_helpful", "Recommend", "Come_again", "Profesionalism", "Find_items")
sat_labels <- c("Customer Service", "Recommend", "Come Again", "Professionalism", "Find Items")
sat_means <- colMeans(df[, sat_vars], na.rm = TRUE)
sat_df <- data.frame(
Variable = sat_labels,
Mean_Score = round(sat_means, 2)
)
ggplot(sat_df, aes(x = reorder(Variable, Mean_Score), y = Mean_Score, fill = Mean_Score)) +
geom_bar(stat = "identity", color = "white", width = 0.6) +
geom_text(aes(label = Mean_Score), hjust = -0.2, size = 4) +
scale_fill_gradient(low = "#2ECC71", high = "#E74C3C") +
coord_flip() +
labs(title = "Average Satisfaction Scores by Category",
subtitle = "Scale: 1 = Strongly Agree / Very Satisfied → 5 = Strongly Disagree / Very Dissatisfied",
x = "", y = "Mean Score") +
theme_minimal() +
theme(legend.position = "none") +
ylim(0, 3.5)og_counts <- df %>%
count(Online_grocery) %>%
mutate(
Label = recode(as.character(Online_grocery),
"1" = "Strongly Agree", "2" = "Agree",
"3" = "Neutral", "4" = "Disagree", "5" = "Strongly Disagree"),
Percentage = round(n / sum(n) * 100, 1)
)
ggplot(og_counts, aes(x = Label, y = n, fill = factor(Online_grocery))) +
geom_bar(stat = "identity", color = "white") +
geom_text(aes(label = paste0(n, " (", Percentage, "%)")), vjust = -0.4, size = 3.5) +
scale_fill_brewer(palette = "PuBuGn") +
labs(title = "Interest in Online Grocery Shopping",
x = "Response", y = "Count") +
theme_minimal() +
theme(legend.position = "none")pref_df <- data.frame(
Category = c(rep("Delivery", nrow(df)), rep("Pick-Up", nrow(df))),
Score = c(df$delivery, df$Pick_up)
)
ggplot(pref_df, aes(x = factor(Score), fill = Category)) +
geom_bar(position = "dodge", color = "white") +
scale_fill_manual(values = c("Delivery" = "#3498DB", "Pick-Up" = "#E67E22")) +
scale_x_discrete(labels = c("1" = "Strongly\nAgree", "2" = "Agree",
"3" = "Neutral", "4" = "Disagree", "5" = "Strongly\nDisagree")) +
labs(title = "Delivery vs. Pick-Up Preference",
x = "Response", y = "Count", fill = "Shopping Method") +
theme_minimal()os_counts <- df %>%
count(other_shops) %>%
mutate(
Label = recode(as.character(other_shops),
"1" = "Strongly Agree", "2" = "Agree",
"3" = "Neutral", "4" = "Disagree", "5" = "Strongly Disagree"),
Percentage = round(n / sum(n) * 100, 1)
)
ggplot(os_counts, aes(x = Label, y = n, fill = factor(other_shops))) +
geom_bar(stat = "identity", color = "white") +
geom_text(aes(label = paste0(n, " (", Percentage, "%)")), vjust = -0.4, size = 3.5) +
scale_fill_brewer(palette = "Oranges") +
labs(title = "Shops at Other Grocery Stores as Well",
x = "Response", y = "Count") +
theme_minimal() +
theme(legend.position = "none")# Select numeric survey variables (exclude ID, Gender, Age, Education)
survey_vars <- df[, c("CS_helpful", "Recommend", "Come_again", "All_Products",
"Profesionalism", "Limitation", "Online_grocery",
"delivery", "Pick_up", "Find_items", "other_shops")]
cor_matrix <- cor(survey_vars, use = "complete.obs")
corrplot(cor_matrix,
method = "color",
type = "upper",
tl.col = "black",
tl.srt = 45,
tl.cex = 0.8,
addCoef.col = "black",
number.cex = 0.65,
col = colorRampPalette(c("#E74C3C", "white", "#2980B9"))(200),
title = "Correlation Matrix — Survey Variables",
mar = c(0, 0, 2, 0))cor_with_recommend <- cor(survey_vars, use = "complete.obs")[, "Recommend"]
cor_df <- data.frame(
Variable = names(cor_with_recommend),
Correlation = round(cor_with_recommend, 3)
) %>%
filter(Variable != "Recommend") %>%
arrange(desc(abs(Correlation)))
kable(cor_df, row.names = FALSE,
caption = "Correlation of Variables with 'Likelihood to Recommend'") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Variable | Correlation |
|---|---|
| CS_helpful | 0.488 |
| delivery | 0.415 |
| Profesionalism | 0.391 |
| Come_again | 0.381 |
| Online_grocery | 0.297 |
| Pick_up | -0.082 |
| other_shops | -0.060 |
| Limitation | 0.046 |
| All_Products | 0.025 |
| Find_items | -0.020 |
crosstab_gender <- df %>%
count(Gender_Label, Recommend) %>%
mutate(
Recommend_Label = recode(as.character(Recommend),
"1" = "Strongly Agree", "2" = "Agree",
"3" = "Neutral", "4" = "Disagree", "5" = "Strongly Disagree")
)
ggplot(crosstab_gender, aes(x = Recommend_Label, y = n, fill = Gender_Label)) +
geom_bar(stat = "identity", position = "dodge", color = "white") +
scale_fill_manual(values = c("Male" = "#2E86AB", "Female" = "#E84855")) +
labs(title = "Likelihood to Recommend by Gender",
x = "Response", y = "Count", fill = "Gender") +
theme_minimal()crosstab_age <- df %>%
count(Age_Label, Come_again) %>%
mutate(
Come_again_Label = recode(as.character(Come_again),
"1" = "Strongly Agree", "2" = "Agree",
"3" = "Neutral", "4" = "Disagree", "5" = "Strongly Disagree")
)
ggplot(crosstab_age, aes(x = Age_Label, y = n, fill = Come_again_Label)) +
geom_bar(stat = "identity", position = "fill", color = "white") +
scale_fill_brewer(palette = "RdYlGn", direction = -1) +
scale_y_continuous(labels = percent_format()) +
labs(title = "Likelihood to Come Again by Age Group (Proportional)",
x = "Age Group", y = "Proportion", fill = "Response") +
theme_minimal()findings <- data.frame(
Finding = c(
"Dominant age group",
"Most common gender",
"Most common education level",
"Customer service helpfulness",
"Likelihood to recommend",
"Likelihood to return",
"Online grocery preference",
"Preferred fulfillment method"
),
Result = c(
paste0("18–25 (", round(mean(df$Age == 2) * 100, 1), "% of respondents)"),
paste0("Male (", round(mean(df$Gender == 1) * 100, 1), "%)"),
"Graduate+ and Some College (tied most common)",
paste0("Mean score: ", round(mean(df$CS_helpful), 2), " — largely positive"),
paste0("Mean score: ", round(mean(df$Recommend), 2), " — largely positive"),
paste0("Mean score: ", round(mean(df$Come_again), 2), " — largely positive"),
paste0("Mean score: ", round(mean(df$Online_grocery), 2), " — mixed/neutral interest"),
paste0("Delivery mean: ", round(mean(df$delivery), 2),
" | Pick-up mean: ", round(mean(df$Pick_up), 2))
)
)
kable(findings, col.names = c("Finding", "Result"),
caption = "Summary of Key Findings") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE)| Finding | Result |
|---|---|
| Dominant age group | 18–25 (68.2% of respondents) |
| Most common gender | Male (72.7%) |
| Most common education level | Graduate+ and Some College (tied most common) |
| Customer service helpfulness | Mean score: 1.59 — largely positive |
| Likelihood to recommend | Mean score: 1.32 — largely positive |
| Likelihood to return | Mean score: 1.45 — largely positive |
| Online grocery preference | Mean score: 2.27 — mixed/neutral interest |
| Preferred fulfillment method | Delivery mean: 2.41 | Pick-up mean: 2.45 |
Based on analysis of this 22-respondent classmate grocery survey dataset:
Report generated using R Markdown. Dataset:
customer_segmentation.csv.