This lab gave me a chance to use AI as a tool for writing an R Markdown file and organizing a data analysis project. At first, I thought using AI would make the process very easy, but I learned that the quality of the answer depends on the quality of the prompt. My first prompt gave me a basic response, but it was too general and did not fully match the lab instructions. After improving my prompt, the AI gave me a much more useful response with clearer code, better structure, and stronger explanations. This showed me that AI can save time, but I still need to guide it carefully and check that the work matches the assignment.
One challenge in this lab was making sure the R code would fit the dataset and the course expectations. AI can generate code quickly, but it does not always know the exact column names or the level of detail the professor wants. A major opportunity is that AI can help students organize their ideas, build a starting point for analysis, and explain the results in a simpler way. Overall, I think this newly designed lab was helpful because it taught me not only how to use R Markdown, but also how to use AI more effectively and responsibly in academic work.
This report analyzes the grocery shopping dataset named
customer_segmentation.csv. The goal is to explore patterns
in customer behavior and identify possible customer segments.
Understanding customer segments can help businesses make better
marketing decisions, improve customer targeting, and better understand
shopping habits.
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
customer_data <- read_csv("customer_segmentation.csv")
## Rows: 22 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (15): ID, CS_helpful, Recommend, Come_again, All_Products, Profesionalis...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(customer_data)
## Rows: 22
## Columns: 15
## $ ID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …
## $ CS_helpful <dbl> 2, 1, 2, 3, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3…
## $ Recommend <dbl> 2, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Come_again <dbl> 2, 1, 1, 2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2…
## $ All_Products <dbl> 2, 1, 1, 4, 5, 2, 2, 2, 2, 1, 2, 2, 1, 2, 4, 2, 2, 1, 3…
## $ Profesionalism <dbl> 2, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2…
## $ Limitation <dbl> 2, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4…
## $ Online_grocery <dbl> 2, 2, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3, 1, 3, 2, 3, 2, 3, 1…
## $ delivery <dbl> 3, 3, 3, 3, 3, 2, 2, 1, 1, 2, 2, 2, 2, 3, 2, 1, 3, 3, 3…
## $ Pick_up <dbl> 4, 3, 2, 2, 1, 1, 2, 2, 3, 2, 2, 3, 2, 3, 2, 3, 5, 3, 1…
## $ Find_items <dbl> 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 1, 1, 3, 2, 1, 2, 1, 3…
## $ other_shops <dbl> 2, 2, 3, 2, 3, 4, 1, 4, 1, 1, 3, 3, 1, 1, 5, 5, 5, 2, 2…
## $ Gender <dbl> 1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2…
## $ Age <dbl> 2, 2, 2, 3, 4, 2, 2, 2, 2, 2, 4, 3, 4, 3, 2, 3, 2, 2, 2…
## $ Education <dbl> 2, 2, 2, 5, 2, 5, 3, 2, 1, 2, 5, 1, 5, 5, 5, 5, 1, 5, 2…
summary(customer_data)
## ID CS_helpful Recommend Come_again
## Min. : 1.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.: 6.25 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :11.50 Median :1.000 Median :1.000 Median :1.000
## Mean :11.50 Mean :1.591 Mean :1.318 Mean :1.455
## 3rd Qu.:16.75 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:2.000
## Max. :22.00 Max. :3.000 Max. :3.000 Max. :3.000
## All_Products Profesionalism Limitation Online_grocery delivery
## Min. :1.000 Min. :1.000 Min. :1.0 Min. :1.000 Min. :1.000
## 1st Qu.:1.250 1st Qu.:1.000 1st Qu.:1.0 1st Qu.:2.000 1st Qu.:2.000
## Median :2.000 Median :1.000 Median :1.0 Median :2.000 Median :3.000
## Mean :2.091 Mean :1.409 Mean :1.5 Mean :2.273 Mean :2.409
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.0 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :5.000 Max. :3.000 Max. :4.0 Max. :3.000 Max. :3.000
## Pick_up Find_items other_shops Gender
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.250 1st Qu.:1.000
## Median :2.000 Median :1.000 Median :2.000 Median :1.000
## Mean :2.455 Mean :1.455 Mean :2.591 Mean :1.273
## 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:3.750 3rd Qu.:1.750
## Max. :5.000 Max. :3.000 Max. :5.000 Max. :2.000
## Age Education
## Min. :2.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:2.000
## Median :2.000 Median :2.500
## Mean :2.455 Mean :3.182
## 3rd Qu.:3.000 3rd Qu.:5.000
## Max. :4.000 Max. :5.000
head(customer_data)
## # A tibble: 6 × 15
## ID CS_helpful Recommend Come_again All_Products Profesionalism Limitation
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2 2 2 2 2 2
## 2 2 1 2 1 1 1 1
## 3 3 2 1 1 1 1 2
## 4 4 3 3 2 4 1 2
## 5 5 2 1 3 5 2 1
## 6 6 1 1 3 2 1 1
## # ℹ 8 more variables: Online_grocery <dbl>, delivery <dbl>, Pick_up <dbl>,
## # Find_items <dbl>, other_shops <dbl>, Gender <dbl>, Age <dbl>,
## # Education <dbl>
colSums(is.na(customer_data))
## ID CS_helpful Recommend Come_again All_Products
## 0 0 0 0 0
## Profesionalism Limitation Online_grocery delivery Pick_up
## 0 0 0 0 0
## Find_items other_shops Gender Age Education
## 0 0 0 0 0
numeric_data <- customer_data %>% select(where(is.numeric))
for (col in names(numeric_data)) { hist(numeric_data[[col]], main = paste(“Histogram of”, col), xlab = col) }
customer_data %>%
select(where(is.numeric)) %>%
summary()
## ID CS_helpful Recommend Come_again
## Min. : 1.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.: 6.25 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :11.50 Median :1.000 Median :1.000 Median :1.000
## Mean :11.50 Mean :1.591 Mean :1.318 Mean :1.455
## 3rd Qu.:16.75 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:2.000
## Max. :22.00 Max. :3.000 Max. :3.000 Max. :3.000
## All_Products Profesionalism Limitation Online_grocery delivery
## Min. :1.000 Min. :1.000 Min. :1.0 Min. :1.000 Min. :1.000
## 1st Qu.:1.250 1st Qu.:1.000 1st Qu.:1.0 1st Qu.:2.000 1st Qu.:2.000
## Median :2.000 Median :1.000 Median :1.0 Median :2.000 Median :3.000
## Mean :2.091 Mean :1.409 Mean :1.5 Mean :2.273 Mean :2.409
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.0 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :5.000 Max. :3.000 Max. :4.0 Max. :3.000 Max. :3.000
## Pick_up Find_items other_shops Gender
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.250 1st Qu.:1.000
## Median :2.000 Median :1.000 Median :2.000 Median :1.000
## Mean :2.455 Mean :1.455 Mean :2.591 Mean :1.273
## 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:3.750 3rd Qu.:1.750
## Max. :5.000 Max. :3.000 Max. :5.000 Max. :2.000
## Age Education
## Min. :2.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:2.000
## Median :2.000 Median :2.500
## Mean :2.455 Mean :3.182
## 3rd Qu.:3.000 3rd Qu.:5.000
## Max. :4.000 Max. :5.000
numeric_data <- customer_data %>%
select(where(is.numeric))
cor(numeric_data, use = "complete.obs")
## ID CS_helpful Recommend Come_again All_Products
## ID 1.00000000 0.15482785 -0.08509414 -0.12908035 -0.11705779
## CS_helpful 0.15482785 1.00000000 0.48809623 0.27146195 0.29345435
## Recommend -0.08509414 0.48809623 1.00000000 0.38089069 0.02515624
## Come_again -0.12908035 0.27146195 0.38089069 1.00000000 0.36875582
## All_Products -0.11705779 0.29345435 0.02515624 0.36875582 1.00000000
## Profesionalism 0.25465839 0.51442802 0.39143306 0.42695809 0.08951478
## Limitation 0.19664246 0.60674478 0.04594474 0.00000000 0.05576720
## Online_grocery 0.23893106 0.20749595 0.29678764 -0.14514393 -0.14833305
## delivery 0.09489449 0.59036145 0.41510987 0.16766768 0.07197937
## Pick_up 0.15959528 -0.17854819 -0.08238912 -0.52135402 -0.25000740
## Find_items 0.24044075 0.29879792 -0.01996410 0.04367853 0.53916624
## other_shops 0.09671790 -0.30898381 -0.05968695 0.32594355 0.21734201
## Gender 0.08043618 0.06467921 0.01469318 0.32146531 0.14267528
## Age -0.10922184 -0.16766768 -0.11789474 0.12698413 0.30821382
## Education 0.21244579 0.06542384 0.12385279 0.08671100 0.07266003
## Profesionalism Limitation Online_grocery delivery
## ID 0.25465839 0.19664246 0.23893106 0.09489449
## CS_helpful 0.51442802 0.60674478 0.20749595 0.59036145
## Recommend 0.39143306 0.04594474 0.29678764 0.41510987
## Come_again 0.42695809 0.00000000 -0.14514393 0.16766768
## All_Products 0.08951478 0.05576720 -0.14833305 0.07197937
## Profesionalism 1.00000000 0.05030388 0.05734345 0.25471679
## Limitation 0.05030388 1.00000000 -0.15480679 0.36404687
## Online_grocery 0.05734345 -0.15480679 1.00000000 0.29971638
## delivery 0.25471679 0.36404687 0.29971638 1.00000000
## Pick_up -0.15959528 0.00000000 0.30963403 0.11717225
## Find_items -0.01092912 0.44257084 -0.15975979 0.28122157
## other_shops -0.19082180 -0.06351171 -0.11262158 -0.19968341
## Gender 0.45044262 0.00000000 -0.08663791 -0.06467921
## Age -0.22837293 -0.32166527 -0.06111323 -0.09581010
## Education -0.28024764 -0.07321628 0.07302945 -0.02544260
## Pick_up Find_items other_shops Gender Age
## ID 0.15959528 0.240440748 0.096717897 0.08043618 -0.10922184
## CS_helpful -0.17854819 0.298797921 -0.308983807 0.06467921 -0.16766768
## Recommend -0.08238912 -0.019964097 -0.059686954 0.01469318 -0.11789474
## Come_again -0.52135402 0.043678535 0.325943546 0.32146531 0.12698413
## All_Products -0.25000740 0.539166240 0.217342007 0.14267528 0.30821382
## Profesionalism -0.15959528 -0.010929125 -0.190821797 0.45044262 -0.22837293
## Limitation 0.00000000 0.442570837 -0.063511705 0.00000000 -0.32166527
## Online_grocery 0.30963403 -0.159759789 -0.112621585 -0.08663791 -0.06111323
## delivery 0.11717225 0.281221573 -0.199683413 -0.06467921 -0.09581010
## Pick_up 1.00000000 -0.103782087 -0.029202713 -0.46727535 -0.21630646
## Find_items -0.10378209 1.000000000 0.004599561 0.04246039 0.04367853
## other_shops -0.02920271 0.004599561 1.000000000 -0.11509630 -0.04178763
## Gender -0.46727535 0.042460389 -0.115096299 1.00000000 0.18002057
## Age -0.21630646 0.043678535 -0.041787634 0.18002057 1.00000000
## Education -0.24491202 0.095442714 0.013316169 -0.26341476 0.32516624
## Education
## ID 0.21244579
## CS_helpful 0.06542384
## Recommend 0.12385279
## Come_again 0.08671100
## All_Products 0.07266003
## Profesionalism -0.28024764
## Limitation -0.07321628
## Online_grocery 0.07302945
## delivery -0.02544260
## Pick_up -0.24491202
## Find_items 0.09544271
## other_shops 0.01331617
## Gender -0.26341476
## Age 0.32516624
## Education 1.00000000
plot(numeric_data[[1]], numeric_data[[2]],
main = "Scatterplot of First Two Numeric Variables",
xlab = names(numeric_data)[1],
ylab = names(numeric_data)[2],
pch = 19)
boxplot(numeric_data,
main = "Boxplots of Numeric Variables")
The analysis helps show how customers may differ in their shopping behavior. By examining summary statistics, distributions, correlations, and visualizations, it becomes easier to see patterns that may support customer segmentation. For example, some customers may spend more often, buy more items, or fall into different groups based on age or shopping behavior.
This type of analysis is useful for businesses because it can support marketing decisions, promotions, and customer targeting. If a company understands which groups spend the most or shop most often, it can create better strategies to serve those customers.
Overall, this dataset provides useful information for understanding grocery shopping behavior. R Markdown is a helpful tool because it allows the code, analysis, and interpretation to be shown together in one report. AI was useful in helping create the structure of this file, but the work still required editing, checking, and improving to make sure it matched the assignment.