After improving my prompt by including the dataset name and specific variables, the AI provided more accurate and useful output. I think this newly designed lab is very useful because it not only teaches data analysis but also how to use AI effectively. # Introduction
This report analyzes the dataset
customer_segmentation.csv, which contains survey data from
grocery store customers. The dataset includes information on customer
satisfaction, shopping behavior, and demographic characteristics such as
gender, age, and education. The main goal of this analysis is to explore
patterns in customer experiences and identify potential differences
across groups. Since the variables are coded numerically and no codebook
is provided, the analysis focuses on basic summaries and visual
patterns.
library(tidyverse)
library(knitr)
customer_data <- read.csv("customer_segmentation.csv")
head(customer_data)
## ID CS_helpful Recommend Come_again All_Products Profesionalism Limitation
## 1 1 2 2 2 2 2 2
## 2 2 1 2 1 1 1 1
## 3 3 2 1 1 1 1 2
## 4 4 3 3 2 4 1 2
## 5 5 2 1 3 5 2 1
## 6 6 1 1 3 2 1 1
## Online_grocery delivery Pick_up Find_items other_shops Gender Age Education
## 1 2 3 4 1 2 1 2 2
## 2 2 3 3 1 2 1 2 2
## 3 3 3 2 1 3 1 2 2
## 4 3 3 2 2 2 1 3 5
## 5 2 3 1 2 3 2 4 2
## 6 1 2 1 1 4 1 2 5
names(customer_data)
## [1] "ID" "CS_helpful" "Recommend" "Come_again"
## [5] "All_Products" "Profesionalism" "Limitation" "Online_grocery"
## [9] "delivery" "Pick_up" "Find_items" "other_shops"
## [13] "Gender" "Age" "Education"
colSums(is.na(customer_data))
## ID CS_helpful Recommend Come_again All_Products
## 0 0 0 0 0
## Profesionalism Limitation Online_grocery delivery Pick_up
## 0 0 0 0 0
## Find_items other_shops Gender Age Education
## 0 0 0 0 0
summary(customer_data)
## ID CS_helpful Recommend Come_again
## Min. : 1.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.: 6.25 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :11.50 Median :1.000 Median :1.000 Median :1.000
## Mean :11.50 Mean :1.591 Mean :1.318 Mean :1.455
## 3rd Qu.:16.75 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:2.000
## Max. :22.00 Max. :3.000 Max. :3.000 Max. :3.000
## All_Products Profesionalism Limitation Online_grocery delivery
## Min. :1.000 Min. :1.000 Min. :1.0 Min. :1.000 Min. :1.000
## 1st Qu.:1.250 1st Qu.:1.000 1st Qu.:1.0 1st Qu.:2.000 1st Qu.:2.000
## Median :2.000 Median :1.000 Median :1.0 Median :2.000 Median :3.000
## Mean :2.091 Mean :1.409 Mean :1.5 Mean :2.273 Mean :2.409
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.0 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :5.000 Max. :3.000 Max. :4.0 Max. :3.000 Max. :3.000
## Pick_up Find_items other_shops Gender
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.250 1st Qu.:1.000
## Median :2.000 Median :1.000 Median :2.000 Median :1.000
## Mean :2.455 Mean :1.455 Mean :2.591 Mean :1.273
## 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:3.750 3rd Qu.:1.750
## Max. :5.000 Max. :3.000 Max. :5.000 Max. :2.000
## Age Education
## Min. :2.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:2.000
## Median :2.000 Median :2.500
## Mean :2.455 Mean :3.182
## 3rd Qu.:3.000 3rd Qu.:5.000
## Max. :4.000 Max. :5.000
# Recommend by Gender
recommend_gender <- customer_data %>%
group_by(Gender) %>%
summarise(mean_recommend = mean(Recommend))
kable(recommend_gender, caption = "Average Recommendation Score by Gender")
| Gender | mean_recommend |
|---|---|
| 1 | 1.312500 |
| 2 | 1.333333 |
ggplot(recommend_gender, aes(x = factor(Gender), y = mean_recommend, fill = factor(Gender))) +
geom_col() +
labs(
title = "Average Recommendation Score by Gender",
x = "Gender",
y = "Mean Recommend Score"
) +
theme_minimal()
# Correlation Heatmap
cor_matrix <- cor(customer_data %>% select(-ID))
cor_data <- as.data.frame(as.table(cor_matrix))
ggplot(cor_data, aes(Var1, Var2, fill = Freq)) +
geom_tile() +
geom_text(aes(label = round(Freq, 2)), size = 3) +
labs(
title = "Correlation Heatmap of Numeric Variables",
x = "",
y = ""
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(customer_data, aes(x = factor(Recommend))) +
geom_bar(fill="lightblue") +
labs(
title = "Distribution of Recommendation Scores",
x = "Recommend (coded values)",
y = "Count"
)
ggplot(customer_data, aes(x = factor(Come_again))) +
geom_bar(fill="yellow") +
labs(
title = "Distribution of Come Again Scores",
x = "Come Again (coded values)",
y = "Count"
)
ggplot(customer_data, aes(x = factor(Gender))) +
geom_bar(fill="lightgreen") +
labs(
title = "Gender Distribution",
x = "Gender (coded values)",
y = "Count"
)
experience_vars <- customer_data %>%
select(CS_helpful, Recommend, Come_again, All_Products, Profesionalism, Limitation, Online_grocery)
mean_scores <- data.frame(
Variable = names(experience_vars),
Mean = sapply(experience_vars, mean)
)
kable(mean_scores, caption = "Mean Scores for Selected Customer Experience Variables")
| Variable | Mean | |
|---|---|---|
| CS_helpful | CS_helpful | 1.590909 |
| Recommend | Recommend | 1.318182 |
| Come_again | Come_again | 1.454546 |
| All_Products | All_Products | 2.090909 |
| Profesionalism | Profesionalism | 1.409091 |
| Limitation | Limitation | 1.500000 |
| Online_grocery | Online_grocery | 2.272727 |
ggplot(mean_scores, aes(x = reorder(Variable, Mean), y = Mean)) +
geom_col(fill="pink") +
coord_flip( ) +
labs(
title = "Average Scores of Customer Experience Variables",
x = "Variable",
y = "Mean Score"
)
# Stacked Chart: Recommend by Gender
ggplot(customer_data, aes(x = factor(Gender), fill = factor(Recommend))) +
geom_bar(position = "fill") +
labs(
title = "Recommendation Distribution by Gender",
x = "Gender",
y = "Proportion",
fill = "Recommend"
) +
theme_minimal()
# Line Chart: Mean Scores Across Variables
experience_vars <- customer_data %>%
select(CS_helpful, Recommend, Come_again, All_Products, Profesionalism,
Limitation, Online_grocery, delivery, Pick_up, Find_items, other_shops)
mean_scores <- data.frame(
Variable = names(experience_vars),
Mean = sapply(experience_vars, mean)
)
kable(mean_scores, caption = "Mean Scores Across Key Variables")
| Variable | Mean | |
|---|---|---|
| CS_helpful | CS_helpful | 1.590909 |
| Recommend | Recommend | 1.318182 |
| Come_again | Come_again | 1.454546 |
| All_Products | All_Products | 2.090909 |
| Profesionalism | Profesionalism | 1.409091 |
| Limitation | Limitation | 1.500000 |
| Online_grocery | Online_grocery | 2.272727 |
| delivery | delivery | 2.409091 |
| Pick_up | Pick_up | 2.454546 |
| Find_items | Find_items | 1.454546 |
| other_shops | other_shops | 2.590909 |
ggplot(mean_scores, aes(x = reorder(Variable, Mean), y = Mean, group = 1)) +
geom_line(linewidth = 1) +
geom_point(size = 3) +
coord_flip() +
labs(
title = "Mean Scores Across Key Variables",
x = "Variable",
y = "Mean Score"
) +
theme_minimal()
# Boxplot: Recommend by Age
ggplot(customer_data, aes(x = factor(Age), y = Recommend, fill = factor(Age))) +
geom_boxplot() +
labs(
title = "Recommendation Scores by Age Group",
x = "Age Group",
y = "Recommend Score"
) +
theme_minimal()
The results show generally positive customer satisfaction, with moderate to high scores for recommendation and return intention. There are small differences across gender and age groups, but no major gaps. The analysis also suggests that customers can be grouped into different segments based on their responses.
This report provides a descriptive analysis of the customer segmentation dataset using R Markdown. The results show general trends in customer satisfaction, shopping behavior, and demographic differences. AI tools were helpful in generating the structure and code for this analysis.