JuveYell

a

Inicio

bd <- read.csv("C:\\Users\\chema\\Downloads\\abarrotes.csv")

#Herramienta: “Generador de Valor de Datos” ##Paso 1. Definir el área de negocios que deseamos impactar o mejorar y su KPI. Área de Ventas. Incremento de ventas.

##Paso 2. Seleccionar plantilla (-s) para crear valor a partir de los datos de los clientes. Visión / Segmentación / Personalización / Contextualización

##Paso 3. Generar ideas o conceptos específicos. Generar un análisis de Market Basket para relacionar productos. Crear un nuevo acomodo de productos para generar nuevos comportamientos de compra en el consumidor.

##Paso 4. Reunir los datos requeridos. Se limpió base de datos.

##Paso 5. Plan de ejecución. Al realizar el market basket analysis podemos trazar una ruta de compra del consumidor, lo cual nos puede ayudar a crear un acomodo estrategicos de lso productos en anaquel, por lo que esto puede impulsar la compra del consumidor de distintos productos.

Entender la base de datos

summary(bd)
##  vcClaveTienda        DescGiro         Codigo.Barras            PLU        
##  Length:200625      Length:200625      Min.   :8.347e+05   Min.   : 1.00   
##  Class :character   Class :character   1st Qu.:7.501e+12   1st Qu.: 1.00   
##  Mode  :character   Mode  :character   Median :7.501e+12   Median : 1.00   
##                                        Mean   :5.950e+12   Mean   : 2.11   
##                                        3rd Qu.:7.501e+12   3rd Qu.: 1.00   
##                                        Max.   :1.750e+13   Max.   :30.00   
##                                                            NA's   :199188  
##     Fecha               Hora              Marca            Fabricante       
##  Length:200625      Length:200625      Length:200625      Length:200625     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    Producto             Precio          Ult.Costo         Unidades     
##  Length:200625      Min.   :-147.00   Min.   :  0.38   Min.   : 0.200  
##  Class :character   1st Qu.:  11.00   1st Qu.:  8.46   1st Qu.: 1.000  
##  Mode  :character   Median :  16.00   Median : 12.31   Median : 1.000  
##                     Mean   :  19.42   Mean   : 15.31   Mean   : 1.262  
##                     3rd Qu.:  25.00   3rd Qu.: 19.23   3rd Qu.: 1.000  
##                     Max.   :1000.00   Max.   :769.23   Max.   :96.000  
##                                                                        
##     F.Ticket      NombreDepartamento NombreFamilia      NombreCategoria   
##  Min.   :     1   Length:200625      Length:200625      Length:200625     
##  1st Qu.: 33964   Class :character   Class :character   Class :character  
##  Median :105993   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :193990                                                           
##  3rd Qu.:383005                                                           
##  Max.   :450040                                                           
##                                                                           
##     Estado              Mts.2      Tipo.ubicación         Giro          
##  Length:200625      Min.   :47.0   Length:200625      Length:200625     
##  Class :character   1st Qu.:53.0   Class :character   Class :character  
##  Mode  :character   Median :60.0   Mode  :character   Mode  :character  
##                     Mean   :56.6                                        
##                     3rd Qu.:60.0                                        
##                     Max.   :62.0                                        
##                                                                         
##  Hora.inicio        Hora.cierre       
##  Length:200625      Length:200625     
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 

Observaciones

  1. Casi ningun registro cuenta con PLU
  2. Cambiar formato de fecha
  3. Cambiar formato de Hora
  4. Hay precios negativos
  5. Hay unidades menores a 1.

A la base de datos se le hicieron los siguientes cambios:

Tecnicas para limpieza de datos

Tecnica 1. Remover valores irrelevantes

Eliminar columnas

  bd1 <- bd
  bd1 <- subset (bd1, select = -c (PLU, Codigo.Barras))

  #Eliminar Renglones
  bd2 <- bd1
  bd2 <- bd2[bd2$Precio > 0,]
  summary(bd1)
##  vcClaveTienda        DescGiro            Fecha               Hora          
##  Length:200625      Length:200625      Length:200625      Length:200625     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     Marca            Fabricante          Producto             Precio       
##  Length:200625      Length:200625      Length:200625      Min.   :-147.00  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.42  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 0.200   Min.   :     1   Length:200625     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33964   Class :character  
##  Median : 12.31   Median : 1.000   Median :105993   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193990                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383005                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts.2     
##  Length:200625      Length:200625      Length:200625      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro           Hora.inicio        Hora.cierre       
##  Length:200625      Length:200625      Length:200625      Length:200625     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 
  summary(bd2)
##  vcClaveTienda        DescGiro            Fecha               Hora          
##  Length:200478      Length:200478      Length:200478      Length:200478     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     Marca            Fabricante          Producto             Precio       
##  Length:200478      Length:200478      Length:200478      Min.   :   0.50  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.45  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 0.200   Min.   :     1   Length:200478     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33977   Class :character  
##  Median : 12.31   Median : 1.000   Median :106034   Mode  :character  
##  Mean   : 15.31   Mean   : 1.261   Mean   :194096                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383062                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts.2     
##  Length:200478      Length:200478      Length:200478      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro           Hora.inicio        Hora.cierre       
##  Length:200478      Length:200478      Length:200478      Length:200478     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 

Tecnica 2

Cuantos renglones duplicados tenemos?

bd1[duplicated(bd1),]
##    vcClaveTienda  DescGiro    Fecha           Hora                      Marca
## 6          MX001 Abarrotes 19/06/20 08:16:21 a. m.                NUTRI LECHE
## 7          MX001 Abarrotes 19/06/20 08:23:33 a. m.                     DAN UP
## 8          MX001 Abarrotes 19/06/20 08:24:33 a. m.                      BIMBO
## 9          MX001 Abarrotes 19/06/20 08:24:33 a. m.                      PEPSI
## 10         MX001 Abarrotes 19/06/20 08:26:28 a. m. BLANCA NIEVES (DETERGENTE)
##                    Fabricante                           Producto Precio
## 6                     MEXILAC                Nutri Leche 1 Litro   16.0
## 7            DANONE DE MEXICO DANUP STRAWBERRY P/BEBER 350GR NAL   14.0
## 8                 GRUPO BIMBO                Rebanadas Bimbo 2Pz    5.0
## 9         PEPSI-COLA MEXICANA                   Pepsi N.R. 400Ml    8.0
## 10 FABRICA DE JABON LA CORONA      Detergente Blanca Nieves 500G   19.5
##    Ult.Costo Unidades F.Ticket NombreDepartamento          NombreFamilia
## 6      12.31        1        1          Abarrotes Lacteos y Refrigerados
## 7      14.00        1        2          Abarrotes Lacteos y Refrigerados
## 8       5.00        1        3          Abarrotes         Pan y Tortilla
## 9       8.00        1        3          Abarrotes                Bebidas
## 10     15.00        1        4          Abarrotes     Limpieza del Hogar
##              NombreCategoria     Estado Mts.2 Tipo.ubicación      Giro
## 6                      Leche Nuevo León    60        Esquina Abarrotes
## 7                     Yogurt Nuevo León    60        Esquina Abarrotes
## 8      Pan Dulce Empaquetado Nuevo León    60        Esquina Abarrotes
## 9  Refrescos Plástico (N.R.) Nuevo León    60        Esquina Abarrotes
## 10                Lavandería Nuevo León    60        Esquina Abarrotes
##    Hora.inicio Hora.cierre
## 6         8:00       22:00
## 7         8:00       22:00
## 8         8:00       22:00
## 9         8:00       22:00
## 10        8:00       22:00
  sum(duplicated(bd1))
## [1] 5
  #Eliminar renglones duplicados
  bd3 <- bd1
  library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
  bd3 <- distinct(bd3)

Tecnica 3. Errores tipograficos y errores similares

Precios en absoluto

bd4 <- bd3
  bd4$Precio <- abs(bd4$Precio)
  summary(bd4)
##  vcClaveTienda        DescGiro            Fecha               Hora          
##  Length:200620      Length:200620      Length:200620      Length:200620     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     Marca            Fabricante          Producto             Precio       
##  Length:200620      Length:200620      Length:200620      Min.   :   0.50  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.45  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 0.200   Min.   :     1   Length:200620     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33967   Class :character  
##  Median : 12.31   Median : 1.000   Median :105996   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193994                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383009                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts.2     
##  Length:200620      Length:200620      Length:200620      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro           Hora.inicio        Hora.cierre       
##  Length:200620      Length:200620      Length:200620      Length:200620     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 
  # Cantidades en enteros
  bd5 <- bd4
  bd5$Unidades <- ceiling(bd5$Unidades)
  summary(bd5)  
##  vcClaveTienda        DescGiro            Fecha               Hora          
##  Length:200620      Length:200620      Length:200620      Length:200620     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     Marca            Fabricante          Producto             Precio       
##  Length:200620      Length:200620      Length:200620      Min.   :   0.50  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.45  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 1.000   Min.   :     1   Length:200620     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33967   Class :character  
##  Median : 12.31   Median : 1.000   Median :105996   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193994                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383009                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts.2     
##  Length:200620      Length:200620      Length:200620      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro           Hora.inicio        Hora.cierre       
##  Length:200620      Length:200620      Length:200620      Length:200620     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 

TECNICA 4. Convertir tipos de datos

Convertir en caracter a fecha

  bd6 <- bd5
  bd6$Fecha <- as.Date(bd6$Fecha, format = "%d/%m/%Y")
  summary(bd6)
##  vcClaveTienda        DescGiro             Fecha                Hora          
##  Length:200620      Length:200620      Min.   :0020-05-01   Length:200620     
##  Class :character   Class :character   1st Qu.:0020-06-06   Class :character  
##  Mode  :character   Mode  :character   Median :0020-07-11   Mode  :character  
##                                        Mean   :0020-07-18                     
##                                        3rd Qu.:0020-08-29                     
##                                        Max.   :0020-11-11                     
##     Marca            Fabricante          Producto             Precio       
##  Length:200620      Length:200620      Length:200620      Min.   :   0.50  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.45  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 1.000   Min.   :     1   Length:200620     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33967   Class :character  
##  Median : 12.31   Median : 1.000   Median :105996   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193994                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383009                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts.2     
##  Length:200620      Length:200620      Length:200620      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro           Hora.inicio        Hora.cierre       
##  Length:200620      Length:200620      Length:200620      Length:200620     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 
  tibble(bd6)
## # A tibble: 200,620 × 20
##    vcCla…¹ DescG…² Fecha      Hora  Marca Fabri…³ Produ…⁴ Precio Ult.C…⁵ Unida…⁶
##    <chr>   <chr>   <date>     <chr> <chr> <chr>   <chr>    <dbl>   <dbl>   <dbl>
##  1 MX001   Abarro… 0020-06-19 08:1… NUTR… MEXILAC Nutri …   16     12.3        1
##  2 MX001   Abarro… 0020-06-19 08:2… DAN … DANONE… DANUP …   14     14          1
##  3 MX001   Abarro… 0020-06-19 08:2… BIMBO GRUPO … Rebana…    5      5          1
##  4 MX001   Abarro… 0020-06-19 08:2… PEPSI PEPSI-… Pepsi …    8      8          1
##  5 MX001   Abarro… 0020-06-19 08:2… BLAN… FABRIC… Deterg…   19.5   15          1
##  6 MX001   Abarro… 0020-06-19 08:2… FLASH ALEN    Flash …    9.5    7.31       1
##  7 MX001   Abarro… 0020-06-19 08:2… VARI… DANONE… Danone…   11     11          1
##  8 MX001   Abarro… 0020-06-19 08:2… ZOTE  FABRIC… Jabon …    9.5    7.31       1
##  9 MX001   Abarro… 0020-06-19 08:2… ALWA… PROCTE… T Feme…   23.5   18.1        1
## 10 MX001   Abarro… 0020-06-19 03:2… JUMEX JUMEX   Jugo D…   12     12          1
## # … with 200,610 more rows, 10 more variables: F.Ticket <int>,
## #   NombreDepartamento <chr>, NombreFamilia <chr>, NombreCategoria <chr>,
## #   Estado <chr>, Mts.2 <int>, Tipo.ubicación <chr>, Giro <chr>,
## #   Hora.inicio <chr>, Hora.cierre <chr>, and abbreviated variable names
## #   ¹​vcClaveTienda, ²​DescGiro, ³​Fabricante, ⁴​Producto, ⁵​Ult.Costo, ⁶​Unidades

Convertir de caracter a entero

bd7 <- bd6
  
  bd7$Hora <- substr(bd7$Hora, start =1, stop = 2)
  tibble(bd7)
## # A tibble: 200,620 × 20
##    vcCla…¹ DescG…² Fecha      Hora  Marca Fabri…³ Produ…⁴ Precio Ult.C…⁵ Unida…⁶
##    <chr>   <chr>   <date>     <chr> <chr> <chr>   <chr>    <dbl>   <dbl>   <dbl>
##  1 MX001   Abarro… 0020-06-19 08    NUTR… MEXILAC Nutri …   16     12.3        1
##  2 MX001   Abarro… 0020-06-19 08    DAN … DANONE… DANUP …   14     14          1
##  3 MX001   Abarro… 0020-06-19 08    BIMBO GRUPO … Rebana…    5      5          1
##  4 MX001   Abarro… 0020-06-19 08    PEPSI PEPSI-… Pepsi …    8      8          1
##  5 MX001   Abarro… 0020-06-19 08    BLAN… FABRIC… Deterg…   19.5   15          1
##  6 MX001   Abarro… 0020-06-19 08    FLASH ALEN    Flash …    9.5    7.31       1
##  7 MX001   Abarro… 0020-06-19 08    VARI… DANONE… Danone…   11     11          1
##  8 MX001   Abarro… 0020-06-19 08    ZOTE  FABRIC… Jabon …    9.5    7.31       1
##  9 MX001   Abarro… 0020-06-19 08    ALWA… PROCTE… T Feme…   23.5   18.1        1
## 10 MX001   Abarro… 0020-06-19 03    JUMEX JUMEX   Jugo D…   12     12          1
## # … with 200,610 more rows, 10 more variables: F.Ticket <int>,
## #   NombreDepartamento <chr>, NombreFamilia <chr>, NombreCategoria <chr>,
## #   Estado <chr>, Mts.2 <int>, Tipo.ubicación <chr>, Giro <chr>,
## #   Hora.inicio <chr>, Hora.cierre <chr>, and abbreviated variable names
## #   ¹​vcClaveTienda, ²​DescGiro, ³​Fabricante, ⁴​Producto, ⁵​Ult.Costo, ⁶​Unidades
  bd7$Hora <- as.integer(bd7$Hora)
  str(bd7)
## 'data.frame':    200620 obs. of  20 variables:
##  $ vcClaveTienda     : chr  "MX001" "MX001" "MX001" "MX001" ...
##  $ DescGiro          : chr  "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ Fecha             : Date, format: "0020-06-19" "0020-06-19" ...
##  $ Hora              : int  8 8 8 8 8 8 8 8 8 3 ...
##  $ Marca             : chr  "NUTRI LECHE" "DAN UP" "BIMBO" "PEPSI" ...
##  $ Fabricante        : chr  "MEXILAC" "DANONE DE MEXICO" "GRUPO BIMBO" "PEPSI-COLA MEXICANA" ...
##  $ Producto          : chr  "Nutri Leche 1 Litro" "DANUP STRAWBERRY P/BEBER 350GR NAL" "Rebanadas Bimbo 2Pz" "Pepsi N.R. 400Ml" ...
##  $ Precio            : num  16 14 5 8 19.5 9.5 11 9.5 23.5 12 ...
##  $ Ult.Costo         : num  12.3 14 5 8 15 ...
##  $ Unidades          : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ F.Ticket          : int  1 2 3 3 4 4 4 4 4 5 ...
##  $ NombreDepartamento: chr  "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ NombreFamilia     : chr  "Lacteos y Refrigerados" "Lacteos y Refrigerados" "Pan y Tortilla" "Bebidas" ...
##  $ NombreCategoria   : chr  "Leche" "Yogurt" "Pan Dulce Empaquetado" "Refrescos Plástico (N.R.)" ...
##  $ Estado            : chr  "Nuevo León" "Nuevo León" "Nuevo León" "Nuevo León" ...
##  $ Mts.2             : int  60 60 60 60 60 60 60 60 60 60 ...
##  $ Tipo.ubicación    : chr  "Esquina" "Esquina" "Esquina" "Esquina" ...
##  $ Giro              : chr  "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ Hora.inicio       : chr  "8:00" "8:00" "8:00" "8:00" ...
##  $ Hora.cierre       : chr  "22:00" "22:00" "22:00" "22:00" ...

###Tecnica 5. Valores Faltantes Cuantos NA tengo en mi base de datos

sum(is.na(bd7))
## [1] 0
  sum(is.na(bd))
## [1] 199188

Borrar todos los registros NA de una tabla

  bd8 <- bd
  bd8 <- na.omit(bd8)
  summary(bd8)
##  vcClaveTienda        DescGiro         Codigo.Barras            PLU        
##  Length:1437        Length:1437        Min.   :6.750e+08   Min.   : 1.000  
##  Class :character   Class :character   1st Qu.:6.750e+08   1st Qu.: 1.000  
##  Mode  :character   Mode  :character   Median :6.750e+08   Median : 1.000  
##                                        Mean   :2.616e+11   Mean   : 2.112  
##                                        3rd Qu.:6.750e+08   3rd Qu.: 1.000  
##                                        Max.   :7.501e+12   Max.   :30.000  
##     Fecha               Hora              Marca            Fabricante       
##  Length:1437        Length:1437        Length:1437        Length:1437       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Producto             Precio        Ult.Costo        Unidades    
##  Length:1437        Min.   :30.00   Min.   : 1.00   Min.   :1.000  
##  Class :character   1st Qu.:90.00   1st Qu.:64.62   1st Qu.:1.000  
##  Mode  :character   Median :90.00   Median :64.62   Median :1.000  
##                     Mean   :87.94   Mean   :56.65   Mean   :1.124  
##                     3rd Qu.:90.00   3rd Qu.:64.62   3rd Qu.:1.000  
##                     Max.   :90.00   Max.   :64.62   Max.   :7.000  
##     F.Ticket      NombreDepartamento NombreFamilia      NombreCategoria   
##  Min.   :   772   Length:1437        Length:1437        Length:1437       
##  1st Qu.: 99955   Class :character   Class :character   Class :character  
##  Median :102493   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100595                                                           
##  3rd Qu.:106546                                                           
##  Max.   :118356                                                           
##     Estado              Mts.2       Tipo.ubicación         Giro          
##  Length:1437        Min.   :58.00   Length:1437        Length:1437       
##  Class :character   1st Qu.:58.00   Class :character   Class :character  
##  Mode  :character   Median :58.00   Mode  :character   Mode  :character  
##                     Mean   :58.07                                        
##                     3rd Qu.:58.00                                        
##                     Max.   :60.00                                        
##  Hora.inicio        Hora.cierre       
##  Length:1437        Length:1437       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

###Reemplazar NA con ceros

  bd9 <- bd
  bd9[is.na(bd9)]<-0
  summary(bd9)
##  vcClaveTienda        DescGiro         Codigo.Barras            PLU          
##  Length:200625      Length:200625      Min.   :8.347e+05   Min.   : 0.00000  
##  Class :character   Class :character   1st Qu.:7.501e+12   1st Qu.: 0.00000  
##  Mode  :character   Mode  :character   Median :7.501e+12   Median : 0.00000  
##                                        Mean   :5.950e+12   Mean   : 0.01513  
##                                        3rd Qu.:7.501e+12   3rd Qu.: 0.00000  
##                                        Max.   :1.750e+13   Max.   :30.00000  
##     Fecha               Hora              Marca            Fabricante       
##  Length:200625      Length:200625      Length:200625      Length:200625     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Producto             Precio          Ult.Costo         Unidades     
##  Length:200625      Min.   :-147.00   Min.   :  0.38   Min.   : 0.200  
##  Class :character   1st Qu.:  11.00   1st Qu.:  8.46   1st Qu.: 1.000  
##  Mode  :character   Median :  16.00   Median : 12.31   Median : 1.000  
##                     Mean   :  19.42   Mean   : 15.31   Mean   : 1.262  
##                     3rd Qu.:  25.00   3rd Qu.: 19.23   3rd Qu.: 1.000  
##                     Max.   :1000.00   Max.   :769.23   Max.   :96.000  
##     F.Ticket      NombreDepartamento NombreFamilia      NombreCategoria   
##  Min.   :     1   Length:200625      Length:200625      Length:200625     
##  1st Qu.: 33964   Class :character   Class :character   Class :character  
##  Median :105993   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :193990                                                           
##  3rd Qu.:383005                                                           
##  Max.   :450040                                                           
##     Estado              Mts.2      Tipo.ubicación         Giro          
##  Length:200625      Min.   :47.0   Length:200625      Length:200625     
##  Class :character   1st Qu.:53.0   Class :character   Class :character  
##  Mode  :character   Median :60.0   Mode  :character   Mode  :character  
##                     Mean   :56.6                                        
##                     3rd Qu.:60.0                                        
##                     Max.   :62.0                                        
##  Hora.inicio        Hora.cierre       
##  Length:200625      Length:200625     
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

Reemplazar Na con el promedio

 bd10 <- bd
  bd10$PLU[is.na(bd10$PLU)]<-mean(bd10$PLU, na.rm = TRUE)
  summary(bd10)
##  vcClaveTienda        DescGiro         Codigo.Barras            PLU        
##  Length:200625      Length:200625      Min.   :8.347e+05   Min.   : 1.000  
##  Class :character   Class :character   1st Qu.:7.501e+12   1st Qu.: 2.112  
##  Mode  :character   Mode  :character   Median :7.501e+12   Median : 2.112  
##                                        Mean   :5.950e+12   Mean   : 2.112  
##                                        3rd Qu.:7.501e+12   3rd Qu.: 2.112  
##                                        Max.   :1.750e+13   Max.   :30.000  
##     Fecha               Hora              Marca            Fabricante       
##  Length:200625      Length:200625      Length:200625      Length:200625     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Producto             Precio          Ult.Costo         Unidades     
##  Length:200625      Min.   :-147.00   Min.   :  0.38   Min.   : 0.200  
##  Class :character   1st Qu.:  11.00   1st Qu.:  8.46   1st Qu.: 1.000  
##  Mode  :character   Median :  16.00   Median : 12.31   Median : 1.000  
##                     Mean   :  19.42   Mean   : 15.31   Mean   : 1.262  
##                     3rd Qu.:  25.00   3rd Qu.: 19.23   3rd Qu.: 1.000  
##                     Max.   :1000.00   Max.   :769.23   Max.   :96.000  
##     F.Ticket      NombreDepartamento NombreFamilia      NombreCategoria   
##  Min.   :     1   Length:200625      Length:200625      Length:200625     
##  1st Qu.: 33964   Class :character   Class :character   Class :character  
##  Median :105993   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :193990                                                           
##  3rd Qu.:383005                                                           
##  Max.   :450040                                                           
##     Estado              Mts.2      Tipo.ubicación         Giro          
##  Length:200625      Min.   :47.0   Length:200625      Length:200625     
##  Class :character   1st Qu.:53.0   Class :character   Class :character  
##  Mode  :character   Median :60.0   Mode  :character   Mode  :character  
##                     Mean   :56.6                                        
##                     3rd Qu.:60.0                                        
##                     Max.   :62.0                                        
##  Hora.inicio        Hora.cierre       
##  Length:200625      Length:200625     
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

Reemplazar negativos con cero

bd11 <- bd
  bd11[bd11 < 0] <- 0
  summary(bd11)
##  vcClaveTienda        DescGiro         Codigo.Barras            PLU        
##  Length:200625      Length:200625      Min.   :8.347e+05   Min.   : 1.00   
##  Class :character   Class :character   1st Qu.:7.501e+12   1st Qu.: 1.00   
##  Mode  :character   Mode  :character   Median :7.501e+12   Median : 1.00   
##                                        Mean   :5.950e+12   Mean   : 2.11   
##                                        3rd Qu.:7.501e+12   3rd Qu.: 1.00   
##                                        Max.   :1.750e+13   Max.   :30.00   
##                                                            NA's   :199188  
##     Fecha               Hora              Marca            Fabricante       
##  Length:200625      Length:200625      Length:200625      Length:200625     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    Producto             Precio          Ult.Costo         Unidades     
##  Length:200625      Min.   :   0.00   Min.   :  0.38   Min.   : 0.200  
##  Class :character   1st Qu.:  11.00   1st Qu.:  8.46   1st Qu.: 1.000  
##  Mode  :character   Median :  16.00   Median : 12.31   Median : 1.000  
##                     Mean   :  19.44   Mean   : 15.31   Mean   : 1.262  
##                     3rd Qu.:  25.00   3rd Qu.: 19.23   3rd Qu.: 1.000  
##                     Max.   :1000.00   Max.   :769.23   Max.   :96.000  
##                                                                        
##     F.Ticket      NombreDepartamento NombreFamilia      NombreCategoria   
##  Min.   :     1   Length:200625      Length:200625      Length:200625     
##  1st Qu.: 33964   Class :character   Class :character   Class :character  
##  Median :105993   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :193990                                                           
##  3rd Qu.:383005                                                           
##  Max.   :450040                                                           
##                                                                           
##     Estado              Mts.2      Tipo.ubicación         Giro          
##  Length:200625      Min.   :47.0   Length:200625      Length:200625     
##  Class :character   1st Qu.:53.0   Class :character   Class :character  
##  Mode  :character   Median :60.0   Mode  :character   Mode  :character  
##                     Mean   :56.6                                        
##                     3rd Qu.:60.0                                        
##                     Max.   :62.0                                        
##                                                                         
##  Hora.inicio        Hora.cierre       
##  Length:200625      Length:200625     
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 

Tecnica 6. metodo estadistico

bd12 <- bd7
  boxplot(bd12$Precio, horizontal = TRUE)

  boxplot(bd12$Unidades, horizontal = TRUE)

Agregar columnas

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
  bd12$dia_de_la_semana <- wday(bd12$Fecha)
  bd12$subtotal <- bd12$Precio * bd12$Unidades
  bd12$Utilidad <- bd12$Precio - bd12$Ult.Costo
  summary(bd12)
##  vcClaveTienda        DescGiro             Fecha                 Hora       
##  Length:200620      Length:200620      Min.   :0020-05-01   Min.   : 1.000  
##  Class :character   Class :character   1st Qu.:0020-06-06   1st Qu.: 5.000  
##  Mode  :character   Mode  :character   Median :0020-07-11   Median : 8.000  
##                                        Mean   :0020-07-18   Mean   : 7.299  
##                                        3rd Qu.:0020-08-29   3rd Qu.:10.000  
##                                        Max.   :0020-11-11   Max.   :12.000  
##     Marca            Fabricante          Producto             Precio       
##  Length:200620      Length:200620      Length:200620      Min.   :   0.50  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.45  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 1.000   Min.   :     1   Length:200620     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33967   Class :character  
##  Median : 12.31   Median : 1.000   Median :105996   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193994                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383009                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts.2     
##  Length:200620      Length:200620      Length:200620      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro           Hora.inicio        Hora.cierre       
##  Length:200620      Length:200620      Length:200620      Length:200620     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  dia_de_la_semana    subtotal          Utilidad      
##  Min.   :1.000    Min.   :   1.00   Min.   :  0.000  
##  1st Qu.:2.000    1st Qu.:  12.00   1st Qu.:  2.310  
##  Median :4.000    Median :  18.00   Median :  3.230  
##  Mean   :3.912    Mean   :  24.33   Mean   :  4.142  
##  3rd Qu.:6.000    3rd Qu.:  27.00   3rd Qu.:  5.420  
##  Max.   :7.000    Max.   :2496.00   Max.   :230.770

Exportar base de datos limpia

bd_limpia <- bd12
        write.csv(bd_limpia,file = "abarrotes_bd_limpia2.csv", row.names = FALSE)

#Market Basket Analysis

library(Matrix)
      library(datasets)

Ordenar de menor a mayor los Tickets

 bd_limpia <- bd_limpia[order(bd_limpia$F.Ticket),]
      head(bd_limpia)
##   vcClaveTienda  DescGiro      Fecha Hora                      Marca
## 1         MX001 Abarrotes 0020-06-19    8                NUTRI LECHE
## 2         MX001 Abarrotes 0020-06-19    8                     DAN UP
## 3         MX001 Abarrotes 0020-06-19    8                      BIMBO
## 4         MX001 Abarrotes 0020-06-19    8                      PEPSI
## 5         MX001 Abarrotes 0020-06-19    8 BLANCA NIEVES (DETERGENTE)
## 6         MX001 Abarrotes 0020-06-19    8                      FLASH
##                   Fabricante                           Producto Precio
## 1                    MEXILAC                Nutri Leche 1 Litro   16.0
## 2           DANONE DE MEXICO DANUP STRAWBERRY P/BEBER 350GR NAL   14.0
## 3                GRUPO BIMBO                Rebanadas Bimbo 2Pz    5.0
## 4        PEPSI-COLA MEXICANA                   Pepsi N.R. 400Ml    8.0
## 5 FABRICA DE JABON LA CORONA      Detergente Blanca Nieves 500G   19.5
## 6                       ALEN      Flash Xtra Brisa Marina 500Ml    9.5
##   Ult.Costo Unidades F.Ticket NombreDepartamento          NombreFamilia
## 1     12.31        1        1          Abarrotes Lacteos y Refrigerados
## 2     14.00        1        2          Abarrotes Lacteos y Refrigerados
## 3      5.00        1        3          Abarrotes         Pan y Tortilla
## 4      8.00        1        3          Abarrotes                Bebidas
## 5     15.00        1        4          Abarrotes     Limpieza del Hogar
## 6      7.31        1        4          Abarrotes     Limpieza del Hogar
##             NombreCategoria     Estado Mts.2 Tipo.ubicación      Giro
## 1                     Leche Nuevo León    60        Esquina Abarrotes
## 2                    Yogurt Nuevo León    60        Esquina Abarrotes
## 3     Pan Dulce Empaquetado Nuevo León    60        Esquina Abarrotes
## 4 Refrescos Plástico (N.R.) Nuevo León    60        Esquina Abarrotes
## 5                Lavandería Nuevo León    60        Esquina Abarrotes
## 6      Limpiadores Líquidos Nuevo León    60        Esquina Abarrotes
##   Hora.inicio Hora.cierre dia_de_la_semana subtotal Utilidad
## 1        8:00       22:00                6     16.0     3.69
## 2        8:00       22:00                6     14.0     0.00
## 3        8:00       22:00                6      5.0     0.00
## 4        8:00       22:00                6      8.0     0.00
## 5        8:00       22:00                6     19.5     4.50
## 6        8:00       22:00                6      9.5     2.19
        tail(bd_limpia)
##        vcClaveTienda   DescGiro      Fecha Hora          Marca
## 107394         MX004 Carnicería 0020-10-15   11         YEMINA
## 167771         MX004 Carnicería 0020-10-15   11     DEL FUERTE
## 149429         MX004 Carnicería 0020-10-15   11 COCA COLA ZERO
## 168750         MX004 Carnicería 0020-10-15   11       DIAMANTE
## 161193         MX004 Carnicería 0020-10-15   12          PEPSI
## 112970         MX004 Carnicería 0020-10-15   12      COCA COLA
##                  Fabricante                       Producto Precio Ult.Costo
## 107394               HERDEZ    PASTA SPAGHETTI YEMINA 200G      7      5.38
## 167771 ALIMENTOS DEL FUERTE PURE DE TOMATE DEL FUERTE 345G     12      9.23
## 149429            COCA COLA           COCA COLA ZERO 600ML     15     11.54
## 168750           EMPACADOS              ARROZ DIAMANTE225G     11      8.46
## 161193  PEPSI-COLA MEXICANA              PEPSI N. R. 500ML     10      7.69
## 112970            COCA COLA     COCA COLA RETORNABLE 500ML     10      7.69
##        Unidades F.Ticket NombreDepartamento        NombreFamilia
## 107394        2   450032          Abarrotes       Sopas y Pastas
## 167771        1   450032          Abarrotes Salsas y Sazonadores
## 149429        2   450034          Abarrotes              Bebidas
## 168750        1   450037          Abarrotes    Granos y Semillas
## 161193        1   450039          Abarrotes              Bebidas
## 112970        8   450040          Abarrotes              Bebidas
##                      NombreCategoria  Estado Mts.2 Tipo.ubicación      Giro
## 107394 Fideos, Spaguetti, Tallarines Sinaloa    53        Esquina Abarrotes
## 167771          Salsa para Spaguetti Sinaloa    53        Esquina Abarrotes
## 149429         Refrescos Retornables Sinaloa    53        Esquina Abarrotes
## 168750                         Arroz Sinaloa    53        Esquina Abarrotes
## 161193     Refrescos Plástico (N.R.) Sinaloa    53        Esquina Abarrotes
## 112970         Refrescos Retornables Sinaloa    53        Esquina Abarrotes
##        Hora.inicio Hora.cierre dia_de_la_semana subtotal Utilidad
## 107394        7:00       23:00                5       14     1.62
## 167771        7:00       23:00                5       12     2.77
## 149429        7:00       23:00                5       30     3.46
## 168750        7:00       23:00                5       11     2.54
## 161193        7:00       23:00                5       10     2.31
## 112970        7:00       23:00                5       80     2.31

Generar Basket

library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
        basket <- ddply(bd_limpia,c("F.Ticket"), function(bd_limpia)paste(bd_limpia$Marca, collapse = ","))

Eliminar Número de Ticket

basket$F.Ticket <- NULL #con esto eliminas columnas

Renombramos el nombre de columna

 colnames(basket) <- c("marca")

Exportamos basket

 write.csv(basket,"basket.csv", quote = FALSE, row.names = FALSE)
      library(arulesViz)
## Loading required package: arules
## 
## Attaching package: 'arules'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
        tr <- read.transactions("C:\\Users\\chema\\Downloads\\basket.csv")
## Warning in asMethod(object): removing duplicated items in transactions
        reglas.asociacion <- apriori(tr, parameter = list(supp=0.001, conf=0.2, maxlen=10))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 115 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[27312 item(s), 115111 transaction(s)] done [0.18s].
## sorting and recoding items ... [191 item(s)] done [0.01s].
## creating transaction tree ... done [0.04s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [139 rule(s)] done [0.00s].
## creating S4 object  ... done [0.01s].
        #summary(reglas.asociacion)
        #inspect(reglas.asociacion)
        
        reglas.asociacion <- sort(reglas.asociacion, by="confidence", decreasing = TRUE)
        #summary(reglas.asociacion)
        #inspect(reglas.asociacion)
        
        top10reglas <- head(reglas.asociacion, n = 10, by= "confidence")
        plot(top10reglas, method = "graph", engine = "htmlwidget")

#Conclusión

En este análisis de Market Basket, obtuvimos una gráfica que nos sirvió para relacionar las ventas de los distintos productos.