Reflection

This lab gave me a chance to use AI as a tool for writing an R Markdown file and organizing a data analysis project. At first, I thought using AI would make the process very easy, but I learned that the quality of the answer depends on the quality of the prompt. My first prompt gave me a basic response, but it was too general and did not fully match the lab instructions. After improving my prompt, the AI gave me a much more useful response with clearer code, better structure, and stronger explanations. This showed me that AI can save time, but I still need to guide it carefully and check that the work matches the assignment.

One challenge in this lab was making sure the R code would fit the dataset and the course expectations. AI can generate code quickly, but it does not always know the exact column names or the level of detail the professor wants. A major opportunity is that AI can help students organize their ideas, build a starting point for analysis, and explain the results in a simpler way. Overall, I think this newly designed lab was helpful because it taught me not only how to use R Markdown, but also how to use AI more effectively and responsibly in academic work.

Introduction

This report analyzes the grocery shopping dataset named customer_segmentation.csv. The goal is to explore patterns in customer behavior and identify possible customer segments. Understanding customer segments can help businesses make better marketing decisions, improve customer targeting, and better understand shopping habits.

Load Data

library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

customer_data <- read_csv("customer_segmentation.csv")
## Rows: 22 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (15): ID, CS_helpful, Recommend, Come_again, All_Products, Profesionalis...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(customer_data)
## Rows: 22
## Columns: 15
## $ ID             <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …
## $ CS_helpful     <dbl> 2, 1, 2, 3, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3…
## $ Recommend      <dbl> 2, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Come_again     <dbl> 2, 1, 1, 2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2…
## $ All_Products   <dbl> 2, 1, 1, 4, 5, 2, 2, 2, 2, 1, 2, 2, 1, 2, 4, 2, 2, 1, 3…
## $ Profesionalism <dbl> 2, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2…
## $ Limitation     <dbl> 2, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4…
## $ Online_grocery <dbl> 2, 2, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3, 1, 3, 2, 3, 2, 3, 1…
## $ delivery       <dbl> 3, 3, 3, 3, 3, 2, 2, 1, 1, 2, 2, 2, 2, 3, 2, 1, 3, 3, 3…
## $ Pick_up        <dbl> 4, 3, 2, 2, 1, 1, 2, 2, 3, 2, 2, 3, 2, 3, 2, 3, 5, 3, 1…
## $ Find_items     <dbl> 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 1, 1, 3, 2, 1, 2, 1, 3…
## $ other_shops    <dbl> 2, 2, 3, 2, 3, 4, 1, 4, 1, 1, 3, 3, 1, 1, 5, 5, 5, 2, 2…
## $ Gender         <dbl> 1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2…
## $ Age            <dbl> 2, 2, 2, 3, 4, 2, 2, 2, 2, 2, 4, 3, 4, 3, 2, 3, 2, 2, 2…
## $ Education      <dbl> 2, 2, 2, 5, 2, 5, 3, 2, 1, 2, 5, 1, 5, 5, 5, 5, 1, 5, 2…
summary(customer_data)
##        ID          CS_helpful      Recommend       Come_again   
##  Min.   : 1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 6.25   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :11.50   Median :1.000   Median :1.000   Median :1.000  
##  Mean   :11.50   Mean   :1.591   Mean   :1.318   Mean   :1.455  
##  3rd Qu.:16.75   3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.:2.000  
##  Max.   :22.00   Max.   :3.000   Max.   :3.000   Max.   :3.000  
##   All_Products   Profesionalism    Limitation  Online_grocery     delivery    
##  Min.   :1.000   Min.   :1.000   Min.   :1.0   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.250   1st Qu.:1.000   1st Qu.:1.0   1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median :1.000   Median :1.0   Median :2.000   Median :3.000  
##  Mean   :2.091   Mean   :1.409   Mean   :1.5   Mean   :2.273   Mean   :2.409  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.0   3rd Qu.:3.000   3rd Qu.:3.000  
##  Max.   :5.000   Max.   :3.000   Max.   :4.0   Max.   :3.000   Max.   :3.000  
##     Pick_up        Find_items     other_shops        Gender     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.250   1st Qu.:1.000  
##  Median :2.000   Median :1.000   Median :2.000   Median :1.000  
##  Mean   :2.455   Mean   :1.455   Mean   :2.591   Mean   :1.273  
##  3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.:3.750   3rd Qu.:1.750  
##  Max.   :5.000   Max.   :3.000   Max.   :5.000   Max.   :2.000  
##       Age          Education    
##  Min.   :2.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median :2.500  
##  Mean   :2.455   Mean   :3.182  
##  3rd Qu.:3.000   3rd Qu.:5.000  
##  Max.   :4.000   Max.   :5.000

Data Preview

head(customer_data)
## # A tibble: 6 × 15
##      ID CS_helpful Recommend Come_again All_Products Profesionalism Limitation
##   <dbl>      <dbl>     <dbl>      <dbl>        <dbl>          <dbl>      <dbl>
## 1     1          2         2          2            2              2          2
## 2     2          1         2          1            1              1          1
## 3     3          2         1          1            1              1          2
## 4     4          3         3          2            4              1          2
## 5     5          2         1          3            5              2          1
## 6     6          1         1          3            2              1          1
## # ℹ 8 more variables: Online_grocery <dbl>, delivery <dbl>, Pick_up <dbl>,
## #   Find_items <dbl>, other_shops <dbl>, Gender <dbl>, Age <dbl>,
## #   Education <dbl>

Missing Values

colSums(is.na(customer_data))
##             ID     CS_helpful      Recommend     Come_again   All_Products 
##              0              0              0              0              0 
## Profesionalism     Limitation Online_grocery       delivery        Pick_up 
##              0              0              0              0              0 
##     Find_items    other_shops         Gender            Age      Education 
##              0              0              0              0              0

Numeric Variable Distributions

numeric_data <- customer_data %>% select(where(is.numeric))

for (col in names(numeric_data)) { hist(numeric_data[[col]], main = paste(“Histogram of”, col), xlab = col) }

Basic Summary Statistics

customer_data %>%
  select(where(is.numeric)) %>%
  summary()
##        ID          CS_helpful      Recommend       Come_again   
##  Min.   : 1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 6.25   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :11.50   Median :1.000   Median :1.000   Median :1.000  
##  Mean   :11.50   Mean   :1.591   Mean   :1.318   Mean   :1.455  
##  3rd Qu.:16.75   3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.:2.000  
##  Max.   :22.00   Max.   :3.000   Max.   :3.000   Max.   :3.000  
##   All_Products   Profesionalism    Limitation  Online_grocery     delivery    
##  Min.   :1.000   Min.   :1.000   Min.   :1.0   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.250   1st Qu.:1.000   1st Qu.:1.0   1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median :1.000   Median :1.0   Median :2.000   Median :3.000  
##  Mean   :2.091   Mean   :1.409   Mean   :1.5   Mean   :2.273   Mean   :2.409  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.0   3rd Qu.:3.000   3rd Qu.:3.000  
##  Max.   :5.000   Max.   :3.000   Max.   :4.0   Max.   :3.000   Max.   :3.000  
##     Pick_up        Find_items     other_shops        Gender     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.250   1st Qu.:1.000  
##  Median :2.000   Median :1.000   Median :2.000   Median :1.000  
##  Mean   :2.455   Mean   :1.455   Mean   :2.591   Mean   :1.273  
##  3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.:3.750   3rd Qu.:1.750  
##  Max.   :5.000   Max.   :3.000   Max.   :5.000   Max.   :2.000  
##       Age          Education    
##  Min.   :2.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median :2.500  
##  Mean   :2.455   Mean   :3.182  
##  3rd Qu.:3.000   3rd Qu.:5.000  
##  Max.   :4.000   Max.   :5.000

Correlation Analysis

numeric_data <- customer_data %>%
  select(where(is.numeric))

cor(numeric_data, use = "complete.obs")
##                         ID  CS_helpful   Recommend  Come_again All_Products
## ID              1.00000000  0.15482785 -0.08509414 -0.12908035  -0.11705779
## CS_helpful      0.15482785  1.00000000  0.48809623  0.27146195   0.29345435
## Recommend      -0.08509414  0.48809623  1.00000000  0.38089069   0.02515624
## Come_again     -0.12908035  0.27146195  0.38089069  1.00000000   0.36875582
## All_Products   -0.11705779  0.29345435  0.02515624  0.36875582   1.00000000
## Profesionalism  0.25465839  0.51442802  0.39143306  0.42695809   0.08951478
## Limitation      0.19664246  0.60674478  0.04594474  0.00000000   0.05576720
## Online_grocery  0.23893106  0.20749595  0.29678764 -0.14514393  -0.14833305
## delivery        0.09489449  0.59036145  0.41510987  0.16766768   0.07197937
## Pick_up         0.15959528 -0.17854819 -0.08238912 -0.52135402  -0.25000740
## Find_items      0.24044075  0.29879792 -0.01996410  0.04367853   0.53916624
## other_shops     0.09671790 -0.30898381 -0.05968695  0.32594355   0.21734201
## Gender          0.08043618  0.06467921  0.01469318  0.32146531   0.14267528
## Age            -0.10922184 -0.16766768 -0.11789474  0.12698413   0.30821382
## Education       0.21244579  0.06542384  0.12385279  0.08671100   0.07266003
##                Profesionalism  Limitation Online_grocery    delivery
## ID                 0.25465839  0.19664246     0.23893106  0.09489449
## CS_helpful         0.51442802  0.60674478     0.20749595  0.59036145
## Recommend          0.39143306  0.04594474     0.29678764  0.41510987
## Come_again         0.42695809  0.00000000    -0.14514393  0.16766768
## All_Products       0.08951478  0.05576720    -0.14833305  0.07197937
## Profesionalism     1.00000000  0.05030388     0.05734345  0.25471679
## Limitation         0.05030388  1.00000000    -0.15480679  0.36404687
## Online_grocery     0.05734345 -0.15480679     1.00000000  0.29971638
## delivery           0.25471679  0.36404687     0.29971638  1.00000000
## Pick_up           -0.15959528  0.00000000     0.30963403  0.11717225
## Find_items        -0.01092912  0.44257084    -0.15975979  0.28122157
## other_shops       -0.19082180 -0.06351171    -0.11262158 -0.19968341
## Gender             0.45044262  0.00000000    -0.08663791 -0.06467921
## Age               -0.22837293 -0.32166527    -0.06111323 -0.09581010
## Education         -0.28024764 -0.07321628     0.07302945 -0.02544260
##                    Pick_up   Find_items  other_shops      Gender         Age
## ID              0.15959528  0.240440748  0.096717897  0.08043618 -0.10922184
## CS_helpful     -0.17854819  0.298797921 -0.308983807  0.06467921 -0.16766768
## Recommend      -0.08238912 -0.019964097 -0.059686954  0.01469318 -0.11789474
## Come_again     -0.52135402  0.043678535  0.325943546  0.32146531  0.12698413
## All_Products   -0.25000740  0.539166240  0.217342007  0.14267528  0.30821382
## Profesionalism -0.15959528 -0.010929125 -0.190821797  0.45044262 -0.22837293
## Limitation      0.00000000  0.442570837 -0.063511705  0.00000000 -0.32166527
## Online_grocery  0.30963403 -0.159759789 -0.112621585 -0.08663791 -0.06111323
## delivery        0.11717225  0.281221573 -0.199683413 -0.06467921 -0.09581010
## Pick_up         1.00000000 -0.103782087 -0.029202713 -0.46727535 -0.21630646
## Find_items     -0.10378209  1.000000000  0.004599561  0.04246039  0.04367853
## other_shops    -0.02920271  0.004599561  1.000000000 -0.11509630 -0.04178763
## Gender         -0.46727535  0.042460389 -0.115096299  1.00000000  0.18002057
## Age            -0.21630646  0.043678535 -0.041787634  0.18002057  1.00000000
## Education      -0.24491202  0.095442714  0.013316169 -0.26341476  0.32516624
##                  Education
## ID              0.21244579
## CS_helpful      0.06542384
## Recommend       0.12385279
## Come_again      0.08671100
## All_Products    0.07266003
## Profesionalism -0.28024764
## Limitation     -0.07321628
## Online_grocery  0.07302945
## delivery       -0.02544260
## Pick_up        -0.24491202
## Find_items      0.09544271
## other_shops     0.01331617
## Gender         -0.26341476
## Age             0.32516624
## Education       1.00000000

Scatterplot Example

plot(numeric_data[[1]], numeric_data[[2]],
     main = "Scatterplot of First Two Numeric Variables",
     xlab = names(numeric_data)[1],
     ylab = names(numeric_data)[2],
     pch = 19)

Boxplot Example

boxplot(numeric_data,
        main = "Boxplots of Numeric Variables")

Interpretation

The analysis helps show how customers may differ in their shopping behavior. By examining summary statistics, distributions, correlations, and visualizations, it becomes easier to see patterns that may support customer segmentation. For example, some customers may spend more often, buy more items, or fall into different groups based on age or shopping behavior.

This type of analysis is useful for businesses because it can support marketing decisions, promotions, and customer targeting. If a company understands which groups spend the most or shop most often, it can create better strategies to serve those customers.

Conclusion

Overall, this dataset provides useful information for understanding grocery shopping behavior. R Markdown is a helpful tool because it allows the code, analysis, and interpretation to be shown together in one report. AI was useful in helping create the structure of this file, but the work still required editing, checking, and improving to make sure it matched the assignment.