After doing as was instructed I went into chatgpt and added in my first prompt. Going in I went into this with somewhat of an idea of what the AI would give me as a result but come to find out it was way different from what I was expecting. When I sent in the prompt I thought the AI would only take the data that was collected and break it down into different bar graphs to help display what the results from each category showed. Which is what I would have done on the R markdown file if I was doing this on my own.
I was quite surprised to see how the AI took the data cleaned it up, split it, and even put it into clusters.From there I was able to use the AI to further customize those clusters into more easy to read and visually appealing graphs. It really helped me create a foundation where I could then customize the r markdown to display the information that I needed. I can see why the AI can be a helpful tool when working on r cloud.
data <- read.csv("customer_segmentation.csv")
# Preview
head(data)
## ID CS_helpful Recommend Come_again All_Products Profesionalism Limitation
## 1 1 2 2 2 2 2 2
## 2 2 1 2 1 1 1 1
## 3 3 2 1 1 1 1 2
## 4 4 3 3 2 4 1 2
## 5 5 2 1 3 5 2 1
## 6 6 1 1 3 2 1 1
## Online_grocery delivery Pick_up Find_items other_shops Gender Age Education
## 1 2 3 4 1 2 1 2 2
## 2 2 3 3 1 2 1 2 2
## 3 3 3 2 1 3 1 2 2
## 4 3 3 2 2 2 1 3 5
## 5 2 3 1 2 3 2 4 2
## 6 1 2 1 1 4 1 2 5
str(data)
## 'data.frame': 22 obs. of 15 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ CS_helpful : int 2 1 2 3 2 1 2 1 1 1 ...
## $ Recommend : int 2 2 1 3 1 1 1 1 1 1 ...
## $ Come_again : int 2 1 1 2 3 3 1 1 1 1 ...
## $ All_Products : int 2 1 1 4 5 2 2 2 2 1 ...
## $ Profesionalism: int 2 1 1 1 2 1 2 1 2 1 ...
## $ Limitation : int 2 1 2 2 1 1 1 2 1 1 ...
## $ Online_grocery: int 2 2 3 3 2 1 2 1 2 3 ...
## $ delivery : int 3 3 3 3 3 2 2 1 1 2 ...
## $ Pick_up : int 4 3 2 2 1 1 2 2 3 2 ...
## $ Find_items : int 1 1 1 2 2 1 1 2 1 1 ...
## $ other_shops : int 2 2 3 2 3 4 1 4 1 1 ...
## $ Gender : int 1 1 1 1 2 1 1 1 2 2 ...
## $ Age : int 2 2 2 3 4 2 2 2 2 2 ...
## $ Education : int 2 2 2 5 2 5 3 2 1 2 ...
summary(data)
## ID CS_helpful Recommend Come_again
## Min. : 1.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.: 6.25 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :11.50 Median :1.000 Median :1.000 Median :1.000
## Mean :11.50 Mean :1.591 Mean :1.318 Mean :1.455
## 3rd Qu.:16.75 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:2.000
## Max. :22.00 Max. :3.000 Max. :3.000 Max. :3.000
## All_Products Profesionalism Limitation Online_grocery delivery
## Min. :1.000 Min. :1.000 Min. :1.0 Min. :1.000 Min. :1.000
## 1st Qu.:1.250 1st Qu.:1.000 1st Qu.:1.0 1st Qu.:2.000 1st Qu.:2.000
## Median :2.000 Median :1.000 Median :1.0 Median :2.000 Median :3.000
## Mean :2.091 Mean :1.409 Mean :1.5 Mean :2.273 Mean :2.409
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.0 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :5.000 Max. :3.000 Max. :4.0 Max. :3.000 Max. :3.000
## Pick_up Find_items other_shops Gender
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.250 1st Qu.:1.000
## Median :2.000 Median :1.000 Median :2.000 Median :1.000
## Mean :2.455 Mean :1.455 Mean :2.591 Mean :1.273
## 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:3.750 3rd Qu.:1.750
## Max. :5.000 Max. :3.000 Max. :5.000 Max. :2.000
## Age Education
## Min. :2.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:2.000
## Median :2.000 Median :2.500
## Mean :2.455 Mean :3.182
## 3rd Qu.:3.000 3rd Qu.:5.000
## Max. :4.000 Max. :5.000
# Remove ID column (not useful for clustering)
data_clean <- data %>% select(-ID)
# Check missing values
colSums(is.na(data_clean))
## CS_helpful Recommend Come_again All_Products Profesionalism
## 0 0 0 0 0
## Limitation Online_grocery delivery Pick_up Find_items
## 0 0 0 0 0
## other_shops Gender Age Education
## 0 0 0 0
data_clean %>%
pivot_longer(cols = everything()) %>%
ggplot(aes(x = value)) +
geom_histogram(bins = 10) +
facet_wrap(~name, scales = "free") +
theme_minimal()
cor_matrix <- cor(data_clean)
library(corrplot)
## corrplot 0.95 loaded
corrplot(cor_matrix, method = "color", tl.cex = 0.7)
data_scaled <- scale(data_clean)
fviz_nbclust(data_scaled, kmeans, method = "wss") +
labs(title = "Elbow Method")
set.seed(123)
kmeans_result <- kmeans(data_scaled, centers = 3, nstart = 25)
# Add cluster labels
data$Cluster <- kmeans_result$cluster
fviz_cluster(kmeans_result, data = data_scaled)
demo_vars <- data %>%
select(Cluster, Gender, Age, Education) %>%
pivot_longer(-Cluster)
ggplot(demo_vars, aes(x = name, y = value, fill = factor(Cluster))) +
geom_bar(stat = "summary", fun = "mean", position = "dodge") +
scale_fill_manual(values = c("#FF8FAB", "#FFC75F", "#4D96FF")) +
labs(title = "Demographics by Cluster",
x = "Variable",
y = "Average") +
theme_minimal()
ggplot(data, aes(x = factor(Cluster), fill = factor(Cluster))) +
geom_bar() +
scale_fill_manual(values = c("#FF6B6B", "#FFD93D", "#6BCB77")) +
labs(title = "Number of Customers per Cluster",
x = "Cluster",
y = "Count") +
theme_minimal()
library(ggplot2)
library(dplyr)
library(tidyr)
cluster_summary <- data %>%
group_by(Cluster) %>%
summarise(across(-ID, mean)) %>%
pivot_longer(-Cluster)
ggplot(cluster_summary, aes(x = name, y = value, fill = factor(Cluster))) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("#FF6B6B", "#FFD93D", "#6BCB77")) +
labs(title = "Average Feature Values by Cluster",
x = "Variables",
y = "Average Score",
fill = "Cluster") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
data %>%
group_by(Cluster) %>%
summarise(across(-ID, mean))
## # A tibble: 3 × 15
## Cluster CS_helpful Recommend Come_again All_Products Profesionalism Limitation
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 1.5 2.17 1 1.17
## 2 2 2.5 2 2.5 3.25 2 2
## 3 3 1.58 1.25 1.08 1.67 1.42 1.5
## # ℹ 8 more variables: Online_grocery <dbl>, delivery <dbl>, Pick_up <dbl>,
## # Find_items <dbl>, other_shops <dbl>, Gender <dbl>, Age <dbl>,
## # Education <dbl>