Reflection

After improving my prompt by including the dataset name and specific variables, the AI provided more accurate and useful output. I think this newly designed lab is very useful because it not only teaches data analysis but also how to use AI effectively. # Introduction

This report analyzes the dataset customer_segmentation.csv, which contains survey data from grocery store customers. The dataset includes information on customer satisfaction, shopping behavior, and demographic characteristics such as gender, age, and education. The main goal of this analysis is to explore patterns in customer experiences and identify potential differences across groups. Since the variables are coded numerically and no codebook is provided, the analysis focuses on basic summaries and visual patterns.

library(tidyverse)
library(knitr)

customer_data <- read.csv("customer_segmentation.csv")
head(customer_data)

##   ID CS_helpful Recommend Come_again All_Products Profesionalism Limitation
## 1  1          2         2          2            2              2          2
## 2  2          1         2          1            1              1          1
## 3  3          2         1          1            1              1          2
## 4  4          3         3          2            4              1          2
## 5  5          2         1          3            5              2          1
## 6  6          1         1          3            2              1          1
##   Online_grocery delivery Pick_up Find_items other_shops Gender Age Education
## 1              2        3       4          1           2      1   2         2
## 2              2        3       3          1           2      1   2         2
## 3              3        3       2          1           3      1   2         2
## 4              3        3       2          2           2      1   3         5
## 5              2        3       1          2           3      2   4         2
## 6              1        2       1          1           4      1   2         5

names(customer_data)

##  [1] "ID"             "CS_helpful"     "Recommend"      "Come_again"    
##  [5] "All_Products"   "Profesionalism" "Limitation"     "Online_grocery"
##  [9] "delivery"       "Pick_up"        "Find_items"     "other_shops"   
## [13] "Gender"         "Age"            "Education"

colSums(is.na(customer_data))

##             ID     CS_helpful      Recommend     Come_again   All_Products 
##              0              0              0              0              0 
## Profesionalism     Limitation Online_grocery       delivery        Pick_up 
##              0              0              0              0              0 
##     Find_items    other_shops         Gender            Age      Education 
##              0              0              0              0              0

summary(customer_data)

##        ID          CS_helpful      Recommend       Come_again   
##  Min.   : 1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 6.25   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :11.50   Median :1.000   Median :1.000   Median :1.000  
##  Mean   :11.50   Mean   :1.591   Mean   :1.318   Mean   :1.455  
##  3rd Qu.:16.75   3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.:2.000  
##  Max.   :22.00   Max.   :3.000   Max.   :3.000   Max.   :3.000  
##   All_Products   Profesionalism    Limitation  Online_grocery     delivery    
##  Min.   :1.000   Min.   :1.000   Min.   :1.0   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.250   1st Qu.:1.000   1st Qu.:1.0   1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median :1.000   Median :1.0   Median :2.000   Median :3.000  
##  Mean   :2.091   Mean   :1.409   Mean   :1.5   Mean   :2.273   Mean   :2.409  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.0   3rd Qu.:3.000   3rd Qu.:3.000  
##  Max.   :5.000   Max.   :3.000   Max.   :4.0   Max.   :3.000   Max.   :3.000  
##     Pick_up        Find_items     other_shops        Gender     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.250   1st Qu.:1.000  
##  Median :2.000   Median :1.000   Median :2.000   Median :1.000  
##  Mean   :2.455   Mean   :1.455   Mean   :2.591   Mean   :1.273  
##  3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.:3.750   3rd Qu.:1.750  
##  Max.   :5.000   Max.   :3.000   Max.   :5.000   Max.   :2.000  
##       Age          Education    
##  Min.   :2.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median :2.500  
##  Mean   :2.455   Mean   :3.182  
##  3rd Qu.:3.000   3rd Qu.:5.000  
##  Max.   :4.000   Max.   :5.000

# Recommend by Gender

recommend_gender <- customer_data %>%
  group_by(Gender) %>%
  summarise(mean_recommend = mean(Recommend))

kable(recommend_gender, caption = "Average Recommendation Score by Gender")

Average Recommendation Score by Gender
Gender	mean_recommend
1	1.312500
2	1.333333

ggplot(recommend_gender, aes(x = factor(Gender), y = mean_recommend, fill = factor(Gender))) +
  geom_col() +
  labs(
    title = "Average Recommendation Score by Gender",
    x = "Gender",
    y = "Mean Recommend Score"
  ) +
  theme_minimal()

# Correlation Heatmap

cor_matrix <- cor(customer_data %>% select(-ID))

cor_data <- as.data.frame(as.table(cor_matrix))

ggplot(cor_data, aes(Var1, Var2, fill = Freq)) +
  geom_tile() +
  geom_text(aes(label = round(Freq, 2)), size = 3) +
  labs(
    title = "Correlation Heatmap of Numeric Variables",
    x = "",
    y = ""
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot(customer_data, aes(x = factor(Recommend))) +
  geom_bar(fill="lightblue") +
  labs(
    title = "Distribution of Recommendation Scores",
    x = "Recommend (coded values)",
    y = "Count"
  )

ggplot(customer_data, aes(x = factor(Come_again))) +
  geom_bar(fill="yellow") +
  labs(
    title = "Distribution of Come Again Scores",
    x = "Come Again (coded values)",
    y = "Count"
  )

ggplot(customer_data, aes(x = factor(Gender))) +
  geom_bar(fill="lightgreen") +
  labs(
    title = "Gender Distribution",
    x = "Gender (coded values)",
    y = "Count"
  )

experience_vars <- customer_data %>%
  select(CS_helpful, Recommend, Come_again, All_Products, Profesionalism, Limitation, Online_grocery)

mean_scores <- data.frame(
  Variable = names(experience_vars),
  Mean = sapply(experience_vars, mean)
)

kable(mean_scores, caption = "Mean Scores for Selected Customer Experience Variables")

Mean Scores for Selected Customer Experience Variables
	Variable	Mean
CS_helpful	CS_helpful	1.590909
Recommend	Recommend	1.318182
Come_again	Come_again	1.454546
All_Products	All_Products	2.090909
Profesionalism	Profesionalism	1.409091
Limitation	Limitation	1.500000
Online_grocery	Online_grocery	2.272727

ggplot(mean_scores, aes(x = reorder(Variable, Mean), y = Mean)) +
  geom_col(fill="pink") +
  coord_flip( ) +
  labs(
    title = "Average Scores of Customer Experience Variables",
    x = "Variable",
    y = "Mean Score"
  )

# Stacked Chart: Recommend by Gender

ggplot(customer_data, aes(x = factor(Gender), fill = factor(Recommend))) +
  geom_bar(position = "fill") +
  labs(
    title = "Recommendation Distribution by Gender",
    x = "Gender",
    y = "Proportion",
    fill = "Recommend"
  ) +
  theme_minimal()

# Line Chart: Mean Scores Across Variables

experience_vars <- customer_data %>%
  select(CS_helpful, Recommend, Come_again, All_Products, Profesionalism,
         Limitation, Online_grocery, delivery, Pick_up, Find_items, other_shops)

mean_scores <- data.frame(
  Variable = names(experience_vars),
  Mean = sapply(experience_vars, mean)
)

kable(mean_scores, caption = "Mean Scores Across Key Variables")

Mean Scores Across Key Variables
	Variable	Mean
CS_helpful	CS_helpful	1.590909
Recommend	Recommend	1.318182
Come_again	Come_again	1.454546
All_Products	All_Products	2.090909
Profesionalism	Profesionalism	1.409091
Limitation	Limitation	1.500000
Online_grocery	Online_grocery	2.272727
delivery	delivery	2.409091
Pick_up	Pick_up	2.454546
Find_items	Find_items	1.454546
other_shops	other_shops	2.590909

ggplot(mean_scores, aes(x = reorder(Variable, Mean), y = Mean, group = 1)) +
  geom_line(linewidth = 1) +
  geom_point(size = 3) +
  coord_flip() +
  labs(
    title = "Mean Scores Across Key Variables",
    x = "Variable",
    y = "Mean Score"
  ) +
  theme_minimal()

# Boxplot: Recommend by Age

ggplot(customer_data, aes(x = factor(Age), y = Recommend, fill = factor(Age))) +
  geom_boxplot() +
  labs(
    title = "Recommendation Scores by Age Group",
    x = "Age Group",
    y = "Recommend Score"
  ) +
  theme_minimal()

Key Findings

The results show generally positive customer satisfaction, with moderate to high scores for recommendation and return intention. There are small differences across gender and age groups, but no major gaps. The analysis also suggests that customers can be grouped into different segments based on their responses.

Conclusion

This report provides a descriptive analysis of the customer segmentation dataset using R Markdown. The results show general trends in customer satisfaction, shopping behavior, and demographic differences. AI tools were helpful in generating the structure and code for this analysis.

Lab 9 - Customer Segmentation Analysis

Hanh Duyen Le

2026-03-25

Reflection

Key Findings

Conclusion