Reflection

After doing as was instructed I went into chatgpt and added in my first prompt. Going in I went into this with somewhat of an idea of what the AI would give me as a result but come to find out it was way different from what I was expecting. When I sent in the prompt I thought the AI would only take the data that was collected and break it down into different bar graphs to help display what the results from each category showed. Which is what I would have done on the R markdown file if I was doing this on my own.

I was quite surprised to see how the AI took the data cleaned it up, split it, and even put it into clusters.From there I was able to use the AI to further customize those clusters into more easy to read and visually appealing graphs. It really helped me create a foundation where I could then customize the r markdown to display the information that I needed. I can see why the AI can be a helpful tool when working on r cloud.

1. Load Data

data <- read.csv("customer_segmentation.csv")

# Preview
head(data)
##   ID CS_helpful Recommend Come_again All_Products Profesionalism Limitation
## 1  1          2         2          2            2              2          2
## 2  2          1         2          1            1              1          1
## 3  3          2         1          1            1              1          2
## 4  4          3         3          2            4              1          2
## 5  5          2         1          3            5              2          1
## 6  6          1         1          3            2              1          1
##   Online_grocery delivery Pick_up Find_items other_shops Gender Age Education
## 1              2        3       4          1           2      1   2         2
## 2              2        3       3          1           2      1   2         2
## 3              3        3       2          1           3      1   2         2
## 4              3        3       2          2           2      1   3         5
## 5              2        3       1          2           3      2   4         2
## 6              1        2       1          1           4      1   2         5
str(data)
## 'data.frame':    22 obs. of  15 variables:
##  $ ID            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ CS_helpful    : int  2 1 2 3 2 1 2 1 1 1 ...
##  $ Recommend     : int  2 2 1 3 1 1 1 1 1 1 ...
##  $ Come_again    : int  2 1 1 2 3 3 1 1 1 1 ...
##  $ All_Products  : int  2 1 1 4 5 2 2 2 2 1 ...
##  $ Profesionalism: int  2 1 1 1 2 1 2 1 2 1 ...
##  $ Limitation    : int  2 1 2 2 1 1 1 2 1 1 ...
##  $ Online_grocery: int  2 2 3 3 2 1 2 1 2 3 ...
##  $ delivery      : int  3 3 3 3 3 2 2 1 1 2 ...
##  $ Pick_up       : int  4 3 2 2 1 1 2 2 3 2 ...
##  $ Find_items    : int  1 1 1 2 2 1 1 2 1 1 ...
##  $ other_shops   : int  2 2 3 2 3 4 1 4 1 1 ...
##  $ Gender        : int  1 1 1 1 2 1 1 1 2 2 ...
##  $ Age           : int  2 2 2 3 4 2 2 2 2 2 ...
##  $ Education     : int  2 2 2 5 2 5 3 2 1 2 ...
summary(data)
##        ID          CS_helpful      Recommend       Come_again   
##  Min.   : 1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 6.25   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :11.50   Median :1.000   Median :1.000   Median :1.000  
##  Mean   :11.50   Mean   :1.591   Mean   :1.318   Mean   :1.455  
##  3rd Qu.:16.75   3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.:2.000  
##  Max.   :22.00   Max.   :3.000   Max.   :3.000   Max.   :3.000  
##   All_Products   Profesionalism    Limitation  Online_grocery     delivery    
##  Min.   :1.000   Min.   :1.000   Min.   :1.0   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.250   1st Qu.:1.000   1st Qu.:1.0   1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median :1.000   Median :1.0   Median :2.000   Median :3.000  
##  Mean   :2.091   Mean   :1.409   Mean   :1.5   Mean   :2.273   Mean   :2.409  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.0   3rd Qu.:3.000   3rd Qu.:3.000  
##  Max.   :5.000   Max.   :3.000   Max.   :4.0   Max.   :3.000   Max.   :3.000  
##     Pick_up        Find_items     other_shops        Gender     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.250   1st Qu.:1.000  
##  Median :2.000   Median :1.000   Median :2.000   Median :1.000  
##  Mean   :2.455   Mean   :1.455   Mean   :2.591   Mean   :1.273  
##  3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.:3.750   3rd Qu.:1.750  
##  Max.   :5.000   Max.   :3.000   Max.   :5.000   Max.   :2.000  
##       Age          Education    
##  Min.   :2.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median :2.500  
##  Mean   :2.455   Mean   :3.182  
##  3rd Qu.:3.000   3rd Qu.:5.000  
##  Max.   :4.000   Max.   :5.000

2. Data Cleaning

# Remove ID column (not useful for clustering)
data_clean <- data %>% select(-ID)

# Check missing values
colSums(is.na(data_clean))
##     CS_helpful      Recommend     Come_again   All_Products Profesionalism 
##              0              0              0              0              0 
##     Limitation Online_grocery       delivery        Pick_up     Find_items 
##              0              0              0              0              0 
##    other_shops         Gender            Age      Education 
##              0              0              0              0

3. Exploratory Data Analysis (EDA)

Distribution of Variables

data_clean %>%
  pivot_longer(cols = everything()) %>%
  ggplot(aes(x = value)) +
  geom_histogram(bins = 10) +
  facet_wrap(~name, scales = "free") +
  theme_minimal()

Correlation Matrix

cor_matrix <- cor(data_clean)

library(corrplot)
## corrplot 0.95 loaded
corrplot(cor_matrix, method = "color", tl.cex = 0.7)

4. Standardize Data

data_scaled <- scale(data_clean)

5. Determine Optimal Number of Clusters

fviz_nbclust(data_scaled, kmeans, method = "wss") +
  labs(title = "Elbow Method")

6. K-Means Clustering

set.seed(123)

kmeans_result <- kmeans(data_scaled, centers = 3, nstart = 25)

# Add cluster labels
data$Cluster <- kmeans_result$cluster

7. Visualize Clusters

fviz_cluster(kmeans_result, data = data_scaled)

demo_vars <- data %>%
  select(Cluster, Gender, Age, Education) %>%
  pivot_longer(-Cluster)

ggplot(demo_vars, aes(x = name, y = value, fill = factor(Cluster))) +
  geom_bar(stat = "summary", fun = "mean", position = "dodge") +
  scale_fill_manual(values = c("#FF8FAB", "#FFC75F", "#4D96FF")) +
  labs(title = "Demographics by Cluster",
       x = "Variable",
       y = "Average") +
  theme_minimal() 

ggplot(data, aes(x = factor(Cluster), fill = factor(Cluster))) +
  geom_bar() +
  scale_fill_manual(values = c("#FF6B6B", "#FFD93D", "#6BCB77")) +
  labs(title = "Number of Customers per Cluster",
       x = "Cluster",
       y = "Count") +
  theme_minimal() 

library(ggplot2)
library(dplyr)
library(tidyr)

cluster_summary <- data %>%
  group_by(Cluster) %>%
  summarise(across(-ID, mean)) %>%
  pivot_longer(-Cluster)

ggplot(cluster_summary, aes(x = name, y = value, fill = factor(Cluster))) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values = c("#FF6B6B", "#FFD93D", "#6BCB77")) +
  labs(title = "Average Feature Values by Cluster",
       x = "Variables",
       y = "Average Score",
       fill = "Cluster") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) 

8. Cluster Profiling

data %>%
  group_by(Cluster) %>%
  summarise(across(-ID, mean))
## # A tibble: 3 × 15
##   Cluster CS_helpful Recommend Come_again All_Products Profesionalism Limitation
##     <int>      <dbl>     <dbl>      <dbl>        <dbl>          <dbl>      <dbl>
## 1       1       1         1          1.5          2.17           1          1.17
## 2       2       2.5       2          2.5          3.25           2          2   
## 3       3       1.58      1.25       1.08         1.67           1.42       1.5 
## # ℹ 8 more variables: Online_grocery <dbl>, delivery <dbl>, Pick_up <dbl>,
## #   Find_items <dbl>, other_shops <dbl>, Gender <dbl>, Age <dbl>,
## #   Education <dbl>

9. Interpretation

10. Conclusion