Paso 1. definir el area del negocio que buscamos
impactar o mejorar y su KPI R: El area de negocio que se busca impactar
es el departamento de ventas Paso 2. seleccionar la
plantilla (-s) para crear valor a partir de los datos de los
clientes
Vision |Segmentacion | personalizacion |
contextualizacion
R: Se selecciono la plantilla de segmentacion Paso 3.
generar ideas o conceptos especificos
Elaborar un market basket analysis para obtener informacion acerca de
los clientes y sus patrones de compra Paso 4. reunir
los datos requeridos
Se reunen los datos necesarios para crear el market basket, en est ecaso
informacion de los productos. Paso 5. plan de
ejecucion.
El departamento de ventas y marketing podra tomar una decision en base a
los reusltados obtenidos
#file.choose()
base_de_datos<-read.csv("/Users/andreapaolasosa/Desktop/ventas (1).csv")
summary(base_de_datos)
## BillNo Itemname Quantity Date
## Length:522064 Length:522064 Min. :-9600.00 Length:522064
## Class :character Class :character 1st Qu.: 1.00 Class :character
## Mode :character Mode :character Median : 3.00 Mode :character
## Mean : 10.09
## 3rd Qu.: 10.00
## Max. :80995.00
##
## Time Price CustomerID Country
## Length:522064 Min. :-11062.060 Min. :12346 Length:522064
## Class :character 1st Qu.: 1.250 1st Qu.:13950 Class :character
## Mode :character Median : 2.080 Median :15265 Mode :character
## Mean : 3.827 Mean :15317
## 3rd Qu.: 4.130 3rd Qu.:16837
## Max. : 13541.330 Max. :18287
## NA's :134041
str(base_de_datos)
## 'data.frame': 522064 obs. of 8 variables:
## $ BillNo : chr "536365" "536365" "536365" "536365" ...
## $ Itemname : chr "WHITE HANGING HEART T-LIGHT HOLDER" "WHITE METAL LANTERN" "CREAM CUPID HEARTS COAT HANGER" "KNITTED UNION FLAG HOT WATER BOTTLE" ...
## $ Quantity : int 6 6 8 6 6 2 6 6 6 32 ...
## $ Date : chr "01/12/2010" "01/12/2010" "01/12/2010" "01/12/2010" ...
## $ Time : chr "08:26:00 a. m." "08:26:00 a. m." "08:26:00 a. m." "08:26:00 a. m." ...
## $ Price : num 2.55 3.39 2.75 3.39 3.39 7.65 4.25 1.85 1.85 1.69 ...
## $ CustomerID: int 17850 17850 17850 17850 17850 17850 17850 17850 17850 13047 ...
## $ Country : chr "United Kingdom" "United Kingdom" "United Kingdom" "United Kingdom" ...
#install.packages("dplyr")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
count(base_de_datos, Country, sort = TRUE)
## Country n
## 1 United Kingdom 487622
## 2 Germany 9042
## 3 France 8408
## 4 Spain 2485
## 5 Netherlands 2363
## 6 Belgium 2031
## 7 Switzerland 1967
## 8 Portugal 1501
## 9 Australia 1185
## 10 Norway 1072
## 11 Italy 758
## 12 Sweden 451
## 13 Unspecified 446
## 14 Austria 398
## 15 Poland 330
## 16 Japan 321
## 17 Israel 295
## 18 Hong Kong 284
## 19 Singapore 222
## 20 Iceland 182
## 21 USA 179
## 22 Greece 145
## 23 Malta 112
## 24 United Arab Emirates 68
## 25 RSA 58
## 26 Lebanon 45
## 27 Lithuania 35
## 28 Brazil 32
## 29 Bahrain 18
## 30 Saudi Arabia 9
sum(is.na(base_de_datos))
## [1] 134041
sapply(base_de_datos, function(x)sum(is.na(x)))
## BillNo Itemname Quantity Date Time Price CustomerID
## 0 0 0 0 0 0 134041
## Country
## 0
base_de_datos2 <- base_de_datos
base_de_datos2$Date <- as.Date(base_de_datos2$Date, format = "%d/%m/%Y")
tibble(base_de_datos2)
## # A tibble: 522,064 × 8
## BillNo Itemname Quant…¹ Date Time Price Custo…² Country
## <chr> <chr> <int> <date> <chr> <dbl> <int> <chr>
## 1 536365 "WHITE HANGING HEART T… 6 2010-12-01 08:2… 2.55 17850 United…
## 2 536365 "WHITE METAL LANTERN" 6 2010-12-01 08:2… 3.39 17850 United…
## 3 536365 "CREAM CUPID HEARTS CO… 8 2010-12-01 08:2… 2.75 17850 United…
## 4 536365 "KNITTED UNION FLAG HO… 6 2010-12-01 08:2… 3.39 17850 United…
## 5 536365 "RED WOOLLY HOTTIE WHI… 6 2010-12-01 08:2… 3.39 17850 United…
## 6 536365 "SET 7 BABUSHKA NESTIN… 2 2010-12-01 08:2… 7.65 17850 United…
## 7 536365 "GLASS STAR FROSTED T-… 6 2010-12-01 08:2… 4.25 17850 United…
## 8 536366 "HAND WARMER UNION JAC… 6 2010-12-01 08:2… 1.85 17850 United…
## 9 536366 "HAND WARMER RED POLKA… 6 2010-12-01 08:2… 1.85 17850 United…
## 10 536367 "ASSORTED COLOUR BIRD … 32 2010-12-01 08:3… 1.69 13047 United…
## # … with 522,054 more rows, and abbreviated variable names ¹Quantity,
## # ²CustomerID
base_de_datos4 <- base_de_datos2[order(base_de_datos2$BillNo),]
tail(base_de_datos4)
## BillNo Itemname Quantity Date
## 522062 581587 CHILDRENS CUTLERY DOLLY GIRL 4 2011-12-09
## 522063 581587 CHILDRENS CUTLERY CIRCUS PARADE 4 2011-12-09
## 522064 581587 BAKING SET 9 PIECE RETROSPOT 3 2011-12-09
## 288773 A563185 Adjust bad debt 1 2011-08-12
## 288774 A563186 Adjust bad debt 1 2011-08-12
## 288775 A563187 Adjust bad debt 1 2011-08-12
## Time Price CustomerID Country
## 522062 12:50:00 p. m. 4.15 12680 France
## 522063 12:50:00 p. m. 4.15 12680 France
## 522064 12:50:00 p. m. 4.95 12680 France
## 288773 02:50:00 p. m. 11062.06 NA United Kingdom
## 288774 02:51:00 p. m. -11062.06 NA United Kingdom
## 288775 02:52:00 p. m. -11062.06 NA United Kingdom
base_de_datos5 <- base_de_datos4
base_de_datos5 <- base_de_datos5[base_de_datos5$BillNo < 581588, ]
summary (base_de_datos)
## BillNo Itemname Quantity Date
## Length:522064 Length:522064 Min. :-9600.00 Length:522064
## Class :character Class :character 1st Qu.: 1.00 Class :character
## Mode :character Mode :character Median : 3.00 Mode :character
## Mean : 10.09
## 3rd Qu.: 10.00
## Max. :80995.00
##
## Time Price CustomerID Country
## Length:522064 Min. :-11062.060 Min. :12346 Length:522064
## Class :character 1st Qu.: 1.250 1st Qu.:13950 Class :character
## Mode :character Median : 2.080 Median :15265 Mode :character
## Mean : 3.827 Mean :15317
## 3rd Qu.: 4.130 3rd Qu.:16837
## Max. : 13541.330 Max. :18287
## NA's :134041
tail(base_de_datos5)
## BillNo Itemname Quantity Date
## 522059 581587 CHILDRENS CUTLERY SPACEBOY 4 2011-12-09
## 522060 581587 PACK OF 20 SPACEBOY NAPKINS 12 2011-12-09
## 522061 581587 CHILDREN'S APRON DOLLY GIRL 6 2011-12-09
## 522062 581587 CHILDRENS CUTLERY DOLLY GIRL 4 2011-12-09
## 522063 581587 CHILDRENS CUTLERY CIRCUS PARADE 4 2011-12-09
## 522064 581587 BAKING SET 9 PIECE RETROSPOT 3 2011-12-09
## Time Price CustomerID Country
## 522059 12:50:00 p. m. 4.15 12680 France
## 522060 12:50:00 p. m. 0.85 12680 France
## 522061 12:50:00 p. m. 2.10 12680 France
## 522062 12:50:00 p. m. 4.15 12680 France
## 522063 12:50:00 p. m. 4.15 12680 France
## 522064 12:50:00 p. m. 4.95 12680 France
#Eliminar todos los valores negativos
base_de_datos3 <- base_de_datos5
base_de_datos3 <- base_de_datos3[base_de_datos3$Price > 0,]
summary (base_de_datos3)
## BillNo Itemname Quantity Date
## Length:519550 Length:519550 Min. : 1.0 Min. :2010-12-01
## Class :character Class :character 1st Qu.: 1.0 1st Qu.:2011-03-28
## Mode :character Mode :character Median : 3.0 Median :2011-07-20
## Mean : 10.4 Mean :2011-07-04
## 3rd Qu.: 10.0 3rd Qu.:2011-10-19
## Max. :80995.0 Max. :2011-12-09
##
## Time Price CustomerID Country
## Length:519550 Min. : 0.001 Min. :12346 Length:519550
## Class :character 1st Qu.: 1.250 1st Qu.:13950 Class :character
## Mode :character Median : 2.080 Median :15265 Mode :character
## Mean : 3.867 Mean :15317
## 3rd Qu.: 4.130 3rd Qu.:16837
## Max. :13541.330 Max. :18287
## NA's :131565
#Exportar base de datos limpias
bd_limpia <- base_de_datos3
write.csv (bd_limpia, file="Ventas_bd_limpia2.csv", row.names= FALSE)
#Market Basket Analysis
#install.packages("plyr")
library(Matrix)
#install.packages("arules")
library(arules)
##
## Attaching package: 'arules'
## The following object is masked from 'package:dplyr':
##
## recode
## The following objects are masked from 'package:base':
##
## abbreviate, write
#install.packages("arulesViz")
library(arulesViz)
library(datasets)
bd_limpia <- bd_limpia[order(bd_limpia$BillNo),]
head(bd_limpia)
## BillNo Itemname Quantity Date Time
## 1 536365 WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 a. m.
## 2 536365 WHITE METAL LANTERN 6 2010-12-01 08:26:00 a. m.
## 3 536365 CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 a. m.
## 4 536365 KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 a. m.
## 5 536365 RED WOOLLY HOTTIE WHITE HEART 6 2010-12-01 08:26:00 a. m.
## 6 536365 SET 7 BABUSHKA NESTING BOXES 2 2010-12-01 08:26:00 a. m.
## Price CustomerID Country
## 1 2.55 17850 United Kingdom
## 2 3.39 17850 United Kingdom
## 3 2.75 17850 United Kingdom
## 4 3.39 17850 United Kingdom
## 5 3.39 17850 United Kingdom
## 6 7.65 17850 United Kingdom
tail(bd_limpia)
## BillNo Itemname Quantity Date
## 522059 581587 CHILDRENS CUTLERY SPACEBOY 4 2011-12-09
## 522060 581587 PACK OF 20 SPACEBOY NAPKINS 12 2011-12-09
## 522061 581587 CHILDREN'S APRON DOLLY GIRL 6 2011-12-09
## 522062 581587 CHILDRENS CUTLERY DOLLY GIRL 4 2011-12-09
## 522063 581587 CHILDRENS CUTLERY CIRCUS PARADE 4 2011-12-09
## 522064 581587 BAKING SET 9 PIECE RETROSPOT 3 2011-12-09
## Time Price CustomerID Country
## 522059 12:50:00 p. m. 4.15 12680 France
## 522060 12:50:00 p. m. 0.85 12680 France
## 522061 12:50:00 p. m. 2.10 12680 France
## 522062 12:50:00 p. m. 4.15 12680 France
## 522063 12:50:00 p. m. 4.15 12680 France
## 522064 12:50:00 p. m. 4.95 12680 France
#Generar basket
#install.packages("plyr")
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
basket <- ddply(bd_limpia,c("BillNo"), function(bd_limpia)paste(bd_limpia$Itemname, collapse = ","))
basket$BillNo <- NULL
#Renombramos el nombre de la columna
colnames(basket) <- c("Itemname")
#Exportar basket
write.csv(basket, "basket4.csv", quote = FALSE, row.names = FALSE)
#Importar transacciones
#file.choose()
#tr <- read.transactions("/Users/andreapaolasosa/basket4.csv", format = "basket", sep = ",")
reglas.asociaciones <- apriori(tr, parameter = list(supp=0.001, conf=0.8, maxlen=10))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 19
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[8579 item(s), 19559 transaction(s)] done [0.23s].
## sorting and recoding items ... [2644 item(s)] done [0.01s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10
## Warning in apriori(tr, parameter = list(supp = 0.001, conf = 0.8, maxlen = 10)):
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
## done [4.74s].
## writing ... [6156148 rule(s)] done [1.65s].
## creating S4 object ... done [4.24s].
#summary(reglas.asociaciones)
#inspect(reglas.asociaciones)
reglas.asociaciones <- sort(reglas.asociaciones,by="confidence", decreasing = TRUE )
#summary(reglas.asociaciones)
#inspect(reglas.asociaciones)
top10reglas <- head(reglas.asociaciones, n =10, by="confidence")
inspect(top10reglas)
## lhs rhs support confidence coverage lift count
## [1] {SILVER MINI TAPE MEASURE} => {JUMBO BAG PINK VINTAGE PAISLEY} 0.001124802 1 0.001124802 27.24095 22
## [2] {SILVER MINI TAPE MEASURE} => {STRAWBERRY CHARLOTTE BAG} 0.001124802 1 0.001124802 32.32893 22
## [3] {SILVER MINI TAPE MEASURE} => {LUNCH BAG CARS BLUE} 0.001124802 1 0.001124802 20.24741 22
## [4] {SILVER MINI TAPE MEASURE} => {WOODLAND CHARLOTTE BAG} 0.001124802 1 0.001124802 28.84808 22
## [5] {SILVER MINI TAPE MEASURE} => {RED RETROSPOT CHARLOTTE BAG} 0.001124802 1 0.001124802 22.98355 22
## [6] {PINK POLKADOT BOWL,
## SET/20 FRUIT SALAD PAPER NAPKINS} => {STRAWBERRY CHARLOTTE BAG} 0.001022547 1 0.001022547 32.32893 20
## [7] {PINK POLKADOT BOWL,
## SET/20 FRUIT SALAD PAPER NAPKINS} => {LUNCH BAG CARS BLUE} 0.001022547 1 0.001022547 20.24741 20
## [8] {PINK POLKADOT BOWL,
## SET/20 FRUIT SALAD PAPER NAPKINS} => {WOODLAND CHARLOTTE BAG} 0.001022547 1 0.001022547 28.84808 20
## [9] {PINK POLKADOT BOWL,
## SET/20 FRUIT SALAD PAPER NAPKINS} => {RED RETROSPOT CHARLOTTE BAG} 0.001022547 1 0.001022547 22.98355 20
## [10] {SET/20 FRUIT SALAD PAPER NAPKINS,
## STRAWBERRY CHARLOTTE BAG} => {LUNCH BAG CARS BLUE} 0.001073675 1 0.001073675 20.24741 21
plot(top10reglas, method = "graph", engine = "htmlwidget")
Mediante el ejercicio realizado previamente, se puede generar un basket con las bases de datos que le limpiaron previamente. Este basket nos ayudara a identificar algun patron o relacion que exista el cual le ayudara a la tienda a poder tomar decisiones.