Reflection

Working on this lab helped me understand how useful AI can be when creating an R Markdown file, particularly for organizing and starting an analysis. This showed me that AI does not automatically know what I need unless I explain it clearly. After that, I improved my prompt by including specific variable names like CS_helpful, Recommend, and Online_grocery, and also asked for a structured report. This made a big difference because the output became much more accurate and relevant to my data set. Overall, this experience taught me that giving detailed and specific prompts leads to much better results when using AI.

Even though AI helped me a lot, I still had to carefully review the code and make sure everything worked correctly. Some parts of the code needed small adjustments because not everything matched perfectly at first. I also had to understand what each graph and output meant instead of just copying the results. This helped me realize that AI is a helpful tool, but it cannot replace understanding the material. One challenge was interpreting the variables since they were coded numerically, which made them harder to understand at first. Overall, I learned that AI can make the process faster and easier, but I still need to think critically and check my work to make sure it is correct.

Challenges:

One challenge I faced in this lab was making sure the AI-generated code actually matched the dataset I was using. At first, the code included variables that did not exist, which caused errors when I tried to run it in RStudio. I had to go back and improve my prompt by adding specific variable names so the output would be more accurate. Even after that, I still needed to carefully review the code and fix small issues to make everything run correctly. Another challenge was understanding what the variables meant since many of them were coded numerically instead of being clearly labeled. This made it harder to interpret the graphs and fully understand what the results were showing at first.

Opportunities:

This lab also showed me the opportunities of using AI as a helpful tool for data analysis. AI made it easier to get started by generating a basic structure for the R Markdown file, which saved me time. It helped organize the analysis into clear sections like introduction, data overview, and visualizations, which made my work more structured. AI also suggested different types of graphs and analysis that I might not have thought of on my own. This made the assignment feel less overwhelming and more manageable. Overall, AI helped improve the efficiency while still allowing me to learn and better understand the data analysis process.

Introduction

This report analyzes the customer_segmentation.csv dataset, which contains survey data about grocery shopping experiences, behaviors, and customer demographics. The goal is to explore patterns in customer satisfaction, shopping behavior, and demographic differences.

Load packages

library(tidyverse)
library(janitor)
library(skimr)

Import data

customers <- read.csv("customer_segmentation.csv")
customers <- clean_names(customers)

head(customers)
##   id cs_helpful recommend come_again all_products profesionalism limitation
## 1  1          2         2          2            2              2          2
## 2  2          1         2          1            1              1          1
## 3  3          2         1          1            1              1          2
## 4  4          3         3          2            4              1          2
## 5  5          2         1          3            5              2          1
## 6  6          1         1          3            2              1          1
##   online_grocery delivery pick_up find_items other_shops gender age education
## 1              2        3       4          1           2      1   2         2
## 2              2        3       3          1           2      1   2         2
## 3              3        3       2          1           3      1   2         2
## 4              3        3       2          2           2      1   3         5
## 5              2        3       1          2           3      2   4         2
## 6              1        2       1          1           4      1   2         5

Data overview

dim(customers)
## [1] 22 15
names(customers)
##  [1] "id"             "cs_helpful"     "recommend"      "come_again"    
##  [5] "all_products"   "profesionalism" "limitation"     "online_grocery"
##  [9] "delivery"       "pick_up"        "find_items"     "other_shops"   
## [13] "gender"         "age"            "education"
str(customers)
## 'data.frame':    22 obs. of  15 variables:
##  $ id            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ cs_helpful    : int  2 1 2 3 2 1 2 1 1 1 ...
##  $ recommend     : int  2 2 1 3 1 1 1 1 1 1 ...
##  $ come_again    : int  2 1 1 2 3 3 1 1 1 1 ...
##  $ all_products  : int  2 1 1 4 5 2 2 2 2 1 ...
##  $ profesionalism: int  2 1 1 1 2 1 2 1 2 1 ...
##  $ limitation    : int  2 1 2 2 1 1 1 2 1 1 ...
##  $ online_grocery: int  2 2 3 3 2 1 2 1 2 3 ...
##  $ delivery      : int  3 3 3 3 3 2 2 1 1 2 ...
##  $ pick_up       : int  4 3 2 2 1 1 2 2 3 2 ...
##  $ find_items    : int  1 1 1 2 2 1 1 2 1 1 ...
##  $ other_shops   : int  2 2 3 2 3 4 1 4 1 1 ...
##  $ gender        : int  1 1 1 1 2 1 1 1 2 2 ...
##  $ age           : int  2 2 2 3 4 2 2 2 2 2 ...
##  $ education     : int  2 2 2 5 2 5 3 2 1 2 ...
skim(customers)
Data summary
Name customers
Number of rows 22
Number of columns 15
_______________________
Column type frequency:
numeric 15
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
id 0 1 11.50 6.49 1 6.25 11.5 16.75 22 ▇▆▆▆▇
cs_helpful 0 1 1.59 0.73 1 1.00 1.0 2.00 3 ▇▁▅▁▂
recommend 0 1 1.32 0.65 1 1.00 1.0 1.00 3 ▇▁▂▁▁
come_again 0 1 1.45 0.74 1 1.00 1.0 2.00 3 ▇▁▂▁▂
all_products 0 1 2.09 1.06 1 1.25 2.0 2.00 5 ▃▇▁▁▁
profesionalism 0 1 1.41 0.59 1 1.00 1.0 2.00 3 ▇▁▃▁▁
limitation 0 1 1.50 0.80 1 1.00 1.0 2.00 4 ▇▃▁▁▁
online_grocery 0 1 2.27 0.77 1 2.00 2.0 3.00 3 ▃▁▆▁▇
delivery 0 1 2.41 0.73 1 2.00 3.0 3.00 3 ▂▁▅▁▇
pick_up 0 1 2.45 1.06 1 2.00 2.0 3.00 5 ▃▇▇▂▁
find_items 0 1 1.45 0.67 1 1.00 1.0 2.00 3 ▇▁▃▁▁
other_shops 0 1 2.59 1.40 1 1.25 2.0 3.75 5 ▇▇▅▃▃
gender 0 1 1.27 0.46 1 1.00 1.0 1.75 2 ▇▁▁▁▃
age 0 1 2.45 0.74 2 2.00 2.0 3.00 4 ▇▁▂▁▂
education 0 1 3.18 1.62 1 2.00 2.5 5.00 5 ▂▇▂▁▇
summary(customers)
##        id          cs_helpful      recommend       come_again   
##  Min.   : 1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 6.25   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :11.50   Median :1.000   Median :1.000   Median :1.000  
##  Mean   :11.50   Mean   :1.591   Mean   :1.318   Mean   :1.455  
##  3rd Qu.:16.75   3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.:2.000  
##  Max.   :22.00   Max.   :3.000   Max.   :3.000   Max.   :3.000  
##   all_products   profesionalism    limitation  online_grocery     delivery    
##  Min.   :1.000   Min.   :1.000   Min.   :1.0   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.250   1st Qu.:1.000   1st Qu.:1.0   1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median :1.000   Median :1.0   Median :2.000   Median :3.000  
##  Mean   :2.091   Mean   :1.409   Mean   :1.5   Mean   :2.273   Mean   :2.409  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.0   3rd Qu.:3.000   3rd Qu.:3.000  
##  Max.   :5.000   Max.   :3.000   Max.   :4.0   Max.   :3.000   Max.   :3.000  
##     pick_up        find_items     other_shops        gender     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.250   1st Qu.:1.000  
##  Median :2.000   Median :1.000   Median :2.000   Median :1.000  
##  Mean   :2.455   Mean   :1.455   Mean   :2.591   Mean   :1.273  
##  3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.:3.750   3rd Qu.:1.750  
##  Max.   :5.000   Max.   :3.000   Max.   :5.000   Max.   :2.000  
##       age          education    
##  Min.   :2.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median :2.500  
##  Mean   :2.455   Mean   :3.182  
##  3rd Qu.:3.000   3rd Qu.:5.000  
##  Max.   :4.000   Max.   :5.000

Missing values

colSums(is.na(customers))
##             id     cs_helpful      recommend     come_again   all_products 
##              0              0              0              0              0 
## profesionalism     limitation online_grocery       delivery        pick_up 
##              0              0              0              0              0 
##     find_items    other_shops         gender            age      education 
##              0              0              0              0              0

Customer satisfaction variables

satisfaction_vars <- customers %>%
  select(cs_helpful, recommend, come_again, all_products, profesionalism, limitation)

satisfaction_vars %>%
  pivot_longer(cols = everything(), names_to = "variable", values_to = "value") %>%
  ggplot(aes(x = factor(value))) +
  geom_bar(fill = "steelblue") +
  facet_wrap(~variable, scales = "free") +
  theme_minimal() +
  labs(
    title = "Customer Satisfaction Responses",
    x = "Response",
    y = "Count"
  )

Shopping behavior variables

behavior_vars <- customers %>%
  select(online_grocery, delivery, pick_up, find_items, other_shops)

behavior_vars %>%
  pivot_longer(cols = everything(), names_to = "variable", values_to = "value") %>%
  ggplot(aes(x = factor(value))) +
  geom_bar(fill = "darkgreen") +
  facet_wrap(~variable, scales = "free") +
  theme_minimal() +
  labs(
    title = "Shopping Behavior Variables",
    x = "Response",
    y = "Count"
  )

Demographics

ggplot(customers, aes(x = factor(gender))) +
  geom_bar(fill = "purple") +
  theme_minimal() +
  labs(
    title = "Gender Distribution",
    x = "Gender",
    y = "Count"
  )

ggplot(customers, aes(x = factor(age))) +
  geom_bar(fill = "orange") +
  theme_minimal() +
  labs(
    title = "Age Distribution",
    x = "Age Group",
    y = "Count"
  )

ggplot(customers, aes(x = factor(education))) +
  geom_bar(fill = "brown") +
  theme_minimal() +
  labs(
    title = "Education Levels",
    x = "Education",
    y = "Count"
  )

Relationships between variables

ggplot(customers, aes(x = recommend, y = come_again)) +
  geom_jitter(width = 0.2, height = 0.2, alpha = 0.5, color = "blue") +
  theme_minimal() +
  labs(
    title = "Recommend vs Come Again",
    x = "Recommend",
    y = "Come Again"
  )

ggplot(customers, aes(x = cs_helpful, y = recommend)) +
  geom_jitter(width = 0.2, height = 0.2, alpha = 0.5, color = "red") +
  theme_minimal() +
  labs(
    title = "Customer Service Helpfulness vs Recommendation",
    x = "CS Helpful",
    y = "Recommend"
  )

Clustering (Customer Segmentation)

cluster_data <- customers %>%
  select(where(is.numeric)) %>%
  drop_na()

scaled_data <- scale(cluster_data)

set.seed(123)
k3 <- kmeans(scaled_data, centers = 3, nstart = 25)

cluster_df <- as.data.frame(cluster_data)
cluster_df$cluster <- as.factor(k3$cluster)

ggplot(cluster_df, aes(x = cs_helpful, y = recommend, color = cluster)) +
  geom_point(size = 3, alpha = 0.7) +
  theme_minimal() +
  labs(
    title = "Customer Segments (K-means Clustering)",
    x = "CS Helpful",
    y = "Recommend"
  )

Key Findings

The analysis shows that customer satisfaction variables such as helpfulness, recommendation, and likelihood to return are closely related. Customers who rate service as more helpful are also more likely to recommend the store and return in the future. This suggests that improving customer service could directly increase customer loyalty and positive feedback. Shopping behavior varaibles also show variation, indicating that customers use different methods such as delivery, pickup, or online grocery shopping. These differences may reflect personal preferences or convenience factors among customers, Overall the data highlights that both satisfaction and behavior play an important role in understanding customer experiences.

The demographic graphs also help describe the sample by showing the distribution of gender, age, and education. These factors provide useful context for understanding who the customers are. In addition, the clustering analysis suggests that customers can be grouped into segments based on their survey responses. These segments may represent different types of shoppers, such as highly satisfied customers, moderate users, or less engaged customers. Identifying these groups can help businesses target their services and marketing strategies. Overall, segmentation helps turn raw data into more meaningful insights about customer behavior.

Conclusion

This analysis explored customer survey data using R Markdown, combining code, visualizations, and interpretation in one report. The use of R Markdown made is easier to organize the analysis in a clear and structured way. AI was helpful in generating the strcuture and initial code, which saved time at the beginning of the lab. However, the final version required adjustments to match the actual data set and fix small issues. This showcases that AI is a useful support tool, but it still requires human review and understanding. Overall, this lab showed the importance of combining AI assistance with critical thinking in order to produce meaningful and accurate results.