Working on this lab helped me understand how useful AI can be when creating an R Markdown file, particularly for organizing and starting an analysis. This showed me that AI does not automatically know what I need unless I explain it clearly. After that, I improved my prompt by including specific variable names like CS_helpful, Recommend, and Online_grocery, and also asked for a structured report. This made a big difference because the output became much more accurate and relevant to my data set. Overall, this experience taught me that giving detailed and specific prompts leads to much better results when using AI.
Even though AI helped me a lot, I still had to carefully review the code and make sure everything worked correctly. Some parts of the code needed small adjustments because not everything matched perfectly at first. I also had to understand what each graph and output meant instead of just copying the results. This helped me realize that AI is a helpful tool, but it cannot replace understanding the material. One challenge was interpreting the variables since they were coded numerically, which made them harder to understand at first. Overall, I learned that AI can make the process faster and easier, but I still need to think critically and check my work to make sure it is correct.
One challenge I faced in this lab was making sure the AI-generated code actually matched the dataset I was using. At first, the code included variables that did not exist, which caused errors when I tried to run it in RStudio. I had to go back and improve my prompt by adding specific variable names so the output would be more accurate. Even after that, I still needed to carefully review the code and fix small issues to make everything run correctly. Another challenge was understanding what the variables meant since many of them were coded numerically instead of being clearly labeled. This made it harder to interpret the graphs and fully understand what the results were showing at first.
This lab also showed me the opportunities of using AI as a helpful tool for data analysis. AI made it easier to get started by generating a basic structure for the R Markdown file, which saved me time. It helped organize the analysis into clear sections like introduction, data overview, and visualizations, which made my work more structured. AI also suggested different types of graphs and analysis that I might not have thought of on my own. This made the assignment feel less overwhelming and more manageable. Overall, AI helped improve the efficiency while still allowing me to learn and better understand the data analysis process.
This report analyzes the customer_segmentation.csv
dataset, which contains survey data about grocery shopping experiences,
behaviors, and customer demographics. The goal is to explore patterns in
customer satisfaction, shopping behavior, and demographic
differences.
library(tidyverse)
library(janitor)
library(skimr)
customers <- read.csv("customer_segmentation.csv")
customers <- clean_names(customers)
head(customers)
## id cs_helpful recommend come_again all_products profesionalism limitation
## 1 1 2 2 2 2 2 2
## 2 2 1 2 1 1 1 1
## 3 3 2 1 1 1 1 2
## 4 4 3 3 2 4 1 2
## 5 5 2 1 3 5 2 1
## 6 6 1 1 3 2 1 1
## online_grocery delivery pick_up find_items other_shops gender age education
## 1 2 3 4 1 2 1 2 2
## 2 2 3 3 1 2 1 2 2
## 3 3 3 2 1 3 1 2 2
## 4 3 3 2 2 2 1 3 5
## 5 2 3 1 2 3 2 4 2
## 6 1 2 1 1 4 1 2 5
dim(customers)
## [1] 22 15
names(customers)
## [1] "id" "cs_helpful" "recommend" "come_again"
## [5] "all_products" "profesionalism" "limitation" "online_grocery"
## [9] "delivery" "pick_up" "find_items" "other_shops"
## [13] "gender" "age" "education"
str(customers)
## 'data.frame': 22 obs. of 15 variables:
## $ id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ cs_helpful : int 2 1 2 3 2 1 2 1 1 1 ...
## $ recommend : int 2 2 1 3 1 1 1 1 1 1 ...
## $ come_again : int 2 1 1 2 3 3 1 1 1 1 ...
## $ all_products : int 2 1 1 4 5 2 2 2 2 1 ...
## $ profesionalism: int 2 1 1 1 2 1 2 1 2 1 ...
## $ limitation : int 2 1 2 2 1 1 1 2 1 1 ...
## $ online_grocery: int 2 2 3 3 2 1 2 1 2 3 ...
## $ delivery : int 3 3 3 3 3 2 2 1 1 2 ...
## $ pick_up : int 4 3 2 2 1 1 2 2 3 2 ...
## $ find_items : int 1 1 1 2 2 1 1 2 1 1 ...
## $ other_shops : int 2 2 3 2 3 4 1 4 1 1 ...
## $ gender : int 1 1 1 1 2 1 1 1 2 2 ...
## $ age : int 2 2 2 3 4 2 2 2 2 2 ...
## $ education : int 2 2 2 5 2 5 3 2 1 2 ...
skim(customers)
| Name | customers |
| Number of rows | 22 |
| Number of columns | 15 |
| _______________________ | |
| Column type frequency: | |
| numeric | 15 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1 | 11.50 | 6.49 | 1 | 6.25 | 11.5 | 16.75 | 22 | ▇▆▆▆▇ |
| cs_helpful | 0 | 1 | 1.59 | 0.73 | 1 | 1.00 | 1.0 | 2.00 | 3 | ▇▁▅▁▂ |
| recommend | 0 | 1 | 1.32 | 0.65 | 1 | 1.00 | 1.0 | 1.00 | 3 | ▇▁▂▁▁ |
| come_again | 0 | 1 | 1.45 | 0.74 | 1 | 1.00 | 1.0 | 2.00 | 3 | ▇▁▂▁▂ |
| all_products | 0 | 1 | 2.09 | 1.06 | 1 | 1.25 | 2.0 | 2.00 | 5 | ▃▇▁▁▁ |
| profesionalism | 0 | 1 | 1.41 | 0.59 | 1 | 1.00 | 1.0 | 2.00 | 3 | ▇▁▃▁▁ |
| limitation | 0 | 1 | 1.50 | 0.80 | 1 | 1.00 | 1.0 | 2.00 | 4 | ▇▃▁▁▁ |
| online_grocery | 0 | 1 | 2.27 | 0.77 | 1 | 2.00 | 2.0 | 3.00 | 3 | ▃▁▆▁▇ |
| delivery | 0 | 1 | 2.41 | 0.73 | 1 | 2.00 | 3.0 | 3.00 | 3 | ▂▁▅▁▇ |
| pick_up | 0 | 1 | 2.45 | 1.06 | 1 | 2.00 | 2.0 | 3.00 | 5 | ▃▇▇▂▁ |
| find_items | 0 | 1 | 1.45 | 0.67 | 1 | 1.00 | 1.0 | 2.00 | 3 | ▇▁▃▁▁ |
| other_shops | 0 | 1 | 2.59 | 1.40 | 1 | 1.25 | 2.0 | 3.75 | 5 | ▇▇▅▃▃ |
| gender | 0 | 1 | 1.27 | 0.46 | 1 | 1.00 | 1.0 | 1.75 | 2 | ▇▁▁▁▃ |
| age | 0 | 1 | 2.45 | 0.74 | 2 | 2.00 | 2.0 | 3.00 | 4 | ▇▁▂▁▂ |
| education | 0 | 1 | 3.18 | 1.62 | 1 | 2.00 | 2.5 | 5.00 | 5 | ▂▇▂▁▇ |
summary(customers)
## id cs_helpful recommend come_again
## Min. : 1.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.: 6.25 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :11.50 Median :1.000 Median :1.000 Median :1.000
## Mean :11.50 Mean :1.591 Mean :1.318 Mean :1.455
## 3rd Qu.:16.75 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:2.000
## Max. :22.00 Max. :3.000 Max. :3.000 Max. :3.000
## all_products profesionalism limitation online_grocery delivery
## Min. :1.000 Min. :1.000 Min. :1.0 Min. :1.000 Min. :1.000
## 1st Qu.:1.250 1st Qu.:1.000 1st Qu.:1.0 1st Qu.:2.000 1st Qu.:2.000
## Median :2.000 Median :1.000 Median :1.0 Median :2.000 Median :3.000
## Mean :2.091 Mean :1.409 Mean :1.5 Mean :2.273 Mean :2.409
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.0 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :5.000 Max. :3.000 Max. :4.0 Max. :3.000 Max. :3.000
## pick_up find_items other_shops gender
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.250 1st Qu.:1.000
## Median :2.000 Median :1.000 Median :2.000 Median :1.000
## Mean :2.455 Mean :1.455 Mean :2.591 Mean :1.273
## 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:3.750 3rd Qu.:1.750
## Max. :5.000 Max. :3.000 Max. :5.000 Max. :2.000
## age education
## Min. :2.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:2.000
## Median :2.000 Median :2.500
## Mean :2.455 Mean :3.182
## 3rd Qu.:3.000 3rd Qu.:5.000
## Max. :4.000 Max. :5.000
colSums(is.na(customers))
## id cs_helpful recommend come_again all_products
## 0 0 0 0 0
## profesionalism limitation online_grocery delivery pick_up
## 0 0 0 0 0
## find_items other_shops gender age education
## 0 0 0 0 0
satisfaction_vars <- customers %>%
select(cs_helpful, recommend, come_again, all_products, profesionalism, limitation)
satisfaction_vars %>%
pivot_longer(cols = everything(), names_to = "variable", values_to = "value") %>%
ggplot(aes(x = factor(value))) +
geom_bar(fill = "steelblue") +
facet_wrap(~variable, scales = "free") +
theme_minimal() +
labs(
title = "Customer Satisfaction Responses",
x = "Response",
y = "Count"
)
behavior_vars <- customers %>%
select(online_grocery, delivery, pick_up, find_items, other_shops)
behavior_vars %>%
pivot_longer(cols = everything(), names_to = "variable", values_to = "value") %>%
ggplot(aes(x = factor(value))) +
geom_bar(fill = "darkgreen") +
facet_wrap(~variable, scales = "free") +
theme_minimal() +
labs(
title = "Shopping Behavior Variables",
x = "Response",
y = "Count"
)
ggplot(customers, aes(x = factor(gender))) +
geom_bar(fill = "purple") +
theme_minimal() +
labs(
title = "Gender Distribution",
x = "Gender",
y = "Count"
)
ggplot(customers, aes(x = factor(age))) +
geom_bar(fill = "orange") +
theme_minimal() +
labs(
title = "Age Distribution",
x = "Age Group",
y = "Count"
)
ggplot(customers, aes(x = factor(education))) +
geom_bar(fill = "brown") +
theme_minimal() +
labs(
title = "Education Levels",
x = "Education",
y = "Count"
)
ggplot(customers, aes(x = recommend, y = come_again)) +
geom_jitter(width = 0.2, height = 0.2, alpha = 0.5, color = "blue") +
theme_minimal() +
labs(
title = "Recommend vs Come Again",
x = "Recommend",
y = "Come Again"
)
ggplot(customers, aes(x = cs_helpful, y = recommend)) +
geom_jitter(width = 0.2, height = 0.2, alpha = 0.5, color = "red") +
theme_minimal() +
labs(
title = "Customer Service Helpfulness vs Recommendation",
x = "CS Helpful",
y = "Recommend"
)
cluster_data <- customers %>%
select(where(is.numeric)) %>%
drop_na()
scaled_data <- scale(cluster_data)
set.seed(123)
k3 <- kmeans(scaled_data, centers = 3, nstart = 25)
cluster_df <- as.data.frame(cluster_data)
cluster_df$cluster <- as.factor(k3$cluster)
ggplot(cluster_df, aes(x = cs_helpful, y = recommend, color = cluster)) +
geom_point(size = 3, alpha = 0.7) +
theme_minimal() +
labs(
title = "Customer Segments (K-means Clustering)",
x = "CS Helpful",
y = "Recommend"
)
The analysis shows that customer satisfaction variables such as helpfulness, recommendation, and likelihood to return are closely related. Customers who rate service as more helpful are also more likely to recommend the store and return in the future. This suggests that improving customer service could directly increase customer loyalty and positive feedback. Shopping behavior varaibles also show variation, indicating that customers use different methods such as delivery, pickup, or online grocery shopping. These differences may reflect personal preferences or convenience factors among customers, Overall the data highlights that both satisfaction and behavior play an important role in understanding customer experiences.
The demographic graphs also help describe the sample by showing the distribution of gender, age, and education. These factors provide useful context for understanding who the customers are. In addition, the clustering analysis suggests that customers can be grouped into segments based on their survey responses. These segments may represent different types of shoppers, such as highly satisfied customers, moderate users, or less engaged customers. Identifying these groups can help businesses target their services and marketing strategies. Overall, segmentation helps turn raw data into more meaningful insights about customer behavior.
This analysis explored customer survey data using R Markdown, combining code, visualizations, and interpretation in one report. The use of R Markdown made is easier to organize the analysis in a clear and structured way. AI was helpful in generating the strcuture and initial code, which saved time at the beginning of the lab. However, the final version required adjustments to match the actual data set and fix small issues. This showcases that AI is a useful support tool, but it still requires human review and understanding. Overall, this lab showed the importance of combining AI assistance with critical thinking in order to produce meaningful and accurate results.