Clustering es una técnica de aprendizaje no supervisado que agrupa observaciones según su similitud.
En este caso lo usaremos para segmentar clientes según:
Frecuencia de compra
Ticket promedio
Esto nos ayudará a generar estrategias de marketing personalizadas.
# install.packages("tidyverse")
library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.4.3
## Warning: package 'dplyr' was built under R version 4.4.3
## Warning: package 'stringr' was built under R version 4.4.3
## Warning: package 'forcats' was built under R version 4.4.3
## Warning: package 'lubridate' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# install.packages("cluster")
library(cluster)
# install.packages("factoextra")
library(factoextra)
## Warning: package 'factoextra' was built under R version 4.4.3
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
# install.packages("lubridate")
library(lubridate)
library(readxl)
## Warning: package 'readxl' was built under R version 4.4.3
df <- read_excel("C:/Users/robie/Downloads/supermarket.xlsx")
## Warning: Expecting numeric in A522063 / R522063C1: got 'A563185'
## Warning: Expecting numeric in A522064 / R522064C1: got 'A563186'
## Warning: Expecting numeric in A522065 / R522065C1: got 'A563187'
head(df)
## # A tibble: 6 × 8
## BillNo Itemname Quantity Date Time Price
## <dbl> <chr> <dbl> <dttm> <dttm> <dbl>
## 1 536365 WHITE HANGING H… 6 2010-12-01 00:00:00 1899-12-31 08:26:00 2.55
## 2 536365 WHITE METAL LAN… 6 2010-12-01 00:00:00 1899-12-31 08:26:00 3.39
## 3 536365 CREAM CUPID HEA… 8 2010-12-01 00:00:00 1899-12-31 08:26:00 2.75
## 4 536365 KNITTED UNION F… 6 2010-12-01 00:00:00 1899-12-31 08:26:00 3.39
## 5 536365 RED WOOLLY HOTT… 6 2010-12-01 00:00:00 1899-12-31 08:26:00 3.39
## 6 536365 SET 7 BABUSHKA … 2 2010-12-01 00:00:00 1899-12-31 08:26:00 7.65
## # ℹ 2 more variables: CustomerID <dbl>, Country <chr>
df$Total <- df$Quantity * df$Price