R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Instalación de librerías

#install.packages("tidyverse")
#install.packages("dplyr")
#install.packages("ggplot2")
#install.packages("readxl")
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(readxl)

Importar y observar Base de datos

library(readxl)
Abarrotes_Ventas_2 <- read_excel("D:/Lesly Gómez/Descargas/Abarrotes_Ventas-2.xlsx")
View(Abarrotes_Ventas_2)
bd<-Abarrotes_Ventas_2
str(bd)
## tibble [200,620 × 22] (S3: tbl_df/tbl/data.frame)
##  $ vcClaveTienda     : chr [1:200620] "MX001" "MX001" "MX001" "MX001" ...
##  $ DescGiro          : chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ Codigo Barras     : num [1:200620] 7.5e+12 7.5e+12 7.5e+12 7.5e+12 7.5e+12 ...
##  $ PLU               : logi [1:200620] NA NA NA NA NA NA ...
##  $ Fecha             : POSIXct[1:200620], format: "2020-06-19" "2020-06-19" ...
##  $ Hora              : POSIXct[1:200620], format: "1899-12-31 08:16:21" "1899-12-31 08:23:33" ...
##  $ Marca             : chr [1:200620] "NUTRI LECHE" "DAN UP" "BIMBO" "PEPSI" ...
##  $ Fabricante        : chr [1:200620] "MEXILAC" "DANONE DE MEXICO" "GRUPO BIMBO" "PEPSI-COLA MEXICANA" ...
##  $ Producto          : chr [1:200620] "Nutri Leche 1 Litro" "DANUP STRAWBERRY P/BEBER 350GR NAL" "Rebanadas Bimbo 2Pz" "Pepsi N.R. 400Ml" ...
##  $ Precio            : num [1:200620] 16 14 5 8 19.5 9.5 11 9.5 23.5 12 ...
##  $ Ult.Costo         : num [1:200620] 12.3 14 5 8 15 ...
##  $ Unidades          : num [1:200620] 1 1 1 1 1 1 1 1 1 1 ...
##  $ F.Ticket          : num [1:200620] 1 2 3 3 4 4 4 4 4 5 ...
##  $ NombreDepartamento: chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ NombreFamilia     : chr [1:200620] "Lacteos y Refrigerados" "Lacteos y Refrigerados" "Pan y Tortilla" "Bebidas" ...
##  $ NombreCategoria   : chr [1:200620] "Leche" "Yogurt" "Pan Dulce Empaquetado" "Refrescos Plástico (N.R.)" ...
##  $ Estado            : chr [1:200620] "Nuevo León" "Nuevo León" "Nuevo León" "Nuevo León" ...
##  $ Mts 2             : num [1:200620] 60 60 60 60 60 60 60 60 60 60 ...
##  $ Tipo.ubicación    : chr [1:200620] "Esquina" "Esquina" "Esquina" "Esquina" ...
##  $ Giro              : chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ Hora inicio       : POSIXct[1:200620], format: "1899-12-31 08:00:00" "1899-12-31 08:00:00" ...
##  $ Hora cierre       : POSIXct[1:200620], format: "1899-12-31 22:00:00" "1899-12-31 22:00:00" ...
summary(bd)
##  vcClaveTienda        DescGiro         Codigo Barras         PLU         
##  Length:200620      Length:200620      Min.   :8.347e+05   Mode:logical  
##  Class :character   Class :character   1st Qu.:7.501e+12   TRUE:1437     
##  Mode  :character   Mode  :character   Median :7.501e+12   NA's:199183   
##                                        Mean   :5.950e+12                 
##                                        3rd Qu.:7.501e+12                 
##                                        Max.   :1.750e+13                 
##      Fecha                             Hora                       
##  Min.   :2020-05-01 00:00:00.00   Min.   :1899-12-31 00:00:00.00  
##  1st Qu.:2020-06-06 00:00:00.00   1st Qu.:1899-12-31 13:12:42.75  
##  Median :2020-07-11 00:00:00.00   Median :1899-12-31 17:35:59.00  
##  Mean   :2020-07-18 22:35:49.58   Mean   :1899-12-31 16:43:52.05  
##  3rd Qu.:2020-08-29 00:00:00.00   3rd Qu.:1899-12-31 20:47:06.00  
##  Max.   :2020-11-11 00:00:00.00   Max.   :1899-12-31 23:59:59.00  
##     Marca            Fabricante          Producto             Precio       
##  Length:200620      Length:200620      Length:200620      Min.   :-147.00  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.42  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 0.200   Min.   :     1   Length:200620     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33967   Class :character  
##  Median : 12.31   Median : 1.000   Median :105996   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193994                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383009                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts 2     
##  Length:200620      Length:200620      Length:200620      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro            Hora inicio                    
##  Length:200620      Length:200620      Min.   :1899-12-31 07:00:00.00  
##  Class :character   Class :character   1st Qu.:1899-12-31 07:00:00.00  
##  Mode  :character   Mode  :character   Median :1899-12-31 08:00:00.00  
##                                        Mean   :1899-12-31 07:35:49.71  
##                                        3rd Qu.:1899-12-31 08:00:00.00  
##                                        Max.   :1899-12-31 09:00:00.00  
##   Hora cierre                    
##  Min.   :1899-12-31 21:00:00.00  
##  1st Qu.:1899-12-31 22:00:00.00  
##  Median :1899-12-31 22:00:00.00  
##  Mean   :1899-12-31 22:23:11.42  
##  3rd Qu.:1899-12-31 23:00:00.00  
##  Max.   :1899-12-31 23:00:00.00

Observaciones La variable PLU tiene 199183 NA´s La variable Fecha está como fecha La variable hora está como hora La variable Precio tiene negativos La variable unidades tiene decimales

Análisis de base de datos

count(bd)
## # A tibble: 1 × 1
##        n
##    <int>
## 1 200620
count(bd,vcClaveTienda,sort = TRUE)
## # A tibble: 5 × 2
##   vcClaveTienda     n
##   <chr>         <int>
## 1 MX001         96464
## 2 MX004         83455
## 3 MX005         10021
## 4 MX002          6629
## 5 MX003          4051
count(bd,DescGiro,sort = TRUE)
## # A tibble: 3 × 2
##   DescGiro        n
##   <chr>       <int>
## 1 Abarrotes  100515
## 2 Carnicería  83455
## 3 Depósito    16650
count(bd,Marca,sort = TRUE)
## # A tibble: 540 × 2
##    Marca           n
##    <chr>       <int>
##  1 COCA COLA   18686
##  2 PEPSI       15966
##  3 TECATE      11674
##  4 BIMBO        8316
##  5 LALA         5866
##  6 MARINELA     3696
##  7 DORITOS      3142
##  8 CHEETOS      3130
##  9 NUTRI LECHE  3127
## 10 MARLBORO     2579
## # ℹ 530 more rows
count(bd,Fabricante,sort =TRUE)
## # A tibble: 241 × 2
##    Fabricante                          n
##    <chr>                           <int>
##  1 COCA COLA                       27519
##  2 PEPSI-COLA MEXICANA             22415
##  3 SABRITAS                        14296
##  4 CERVECERIA CUAUHTEMOC MOCTEZUMA 13681
##  5 GRUPO BIMBO                     13077
##  6 SIGMA ALIMENTOS                  8014
##  7 GRUPO INDUSTRIAL LALA            5868
##  8 GRUPO GAMESA                     5527
##  9 NESTLE                           3698
## 10 JUGOS DEL VALLE S.A. DE C.V.     3581
## # ℹ 231 more rows
count(bd,Producto,sort =TRUE)
## # A tibble: 3,404 × 2
##    Producto                        n
##    <chr>                       <int>
##  1 Pepsi N.R. 1.5L              5108
##  2 Coca Cola Retornable 2.5L    3771
##  3 Caguamon Tecate Light 1.2Lt  3471
##  4 Pepsi N. R. 2.5L             2899
##  5 Cerveza Tecate Light 340Ml   2619
##  6 Cerveza Tecate Light 16Oz    2315
##  7 Coca Cola Retornable 1.5L    2124
##  8 Pepsi N.R. 3L                1832
##  9 Coca Cola Retornable 500Ml   1659
## 10 PEPSI N.R. 1.5L              1631
## # ℹ 3,394 more rows
count(bd,NombreDepartamento,sort =TRUE)
## # A tibble: 9 × 2
##   NombreDepartamento        n
##   <chr>                 <int>
## 1 Abarrotes            198274
## 2 Bebes e Infantiles     1483
## 3 Ferretería              377
## 4 Farmacia                255
## 5 Vinos y Licores         104
## 6 Papelería                74
## 7 Mercería                 44
## 8 Productos a Eliminar      8
## 9 Carnes                    1
count(bd,NombreFamilia,sort =TRUE)
## # A tibble: 51 × 2
##    NombreFamilia              n
##    <chr>                  <int>
##  1 Bebidas                64917
##  2 Botanas                21583
##  3 Lacteos y Refrigerados 17657
##  4 Cerveza                14017
##  5 Pan y Tortilla         10501
##  6 Limpieza del Hogar      8723
##  7 Galletas                7487
##  8 Cigarros                6817
##  9 Cuidado Personal        5433
## 10 Salsas y Sazonadores    5320
## # ℹ 41 more rows
count(bd,NombreCategoria,sort =TRUE)
## # A tibble: 174 × 2
##    NombreCategoria               n
##    <chr>                     <int>
##  1 Refrescos Plástico (N.R.) 32861
##  2 Refrescos Retornables     13880
##  3 Frituras                  11082
##  4 Lata                       8150
##  5 Leche                      7053
##  6 Cajetilla                  6329
##  7 Botella                    5867
##  8 Productos sin Categoria    5455
##  9 Papas Fritas               5344
## 10 Jugos y Néctares           5295
## # ℹ 164 more rows
count(bd,Estado,sort =TRUE)
## # A tibble: 5 × 2
##   Estado           n
##   <chr>        <int>
## 1 Nuevo León   96464
## 2 Sinaloa      83455
## 3 Quintana Roo 10021
## 4 Jalisco       6629
## 5 Chiapas       4051
count(bd,Tipo.ubicación,sort =TRUE)
## # A tibble: 3 × 2
##   Tipo.ubicación      n
##   <chr>           <int>
## 1 Esquina        189940
## 2 Rotonda          6629
## 3 Entre calles     4051
count(bd,DescGiro,sort =TRUE)
## # A tibble: 3 × 2
##   DescGiro        n
##   <chr>       <int>
## 1 Abarrotes  100515
## 2 Carnicería  83455
## 3 Depósito    16650
tibble(bd)
## # A tibble: 200,620 × 22
##    vcClaveTienda DescGiro  `Codigo Barras` PLU   Fecha              
##    <chr>         <chr>               <dbl> <lgl> <dttm>             
##  1 MX001         Abarrotes   7501020540666 NA    2020-06-19 00:00:00
##  2 MX001         Abarrotes   7501032397906 NA    2020-06-19 00:00:00
##  3 MX001         Abarrotes   7501000112845 NA    2020-06-19 00:00:00
##  4 MX001         Abarrotes   7501031302741 NA    2020-06-19 00:00:00
##  5 MX001         Abarrotes   7501026027543 NA    2020-06-19 00:00:00
##  6 MX001         Abarrotes   7501025433024 NA    2020-06-19 00:00:00
##  7 MX001         Abarrotes   7501032332013 NA    2020-06-19 00:00:00
##  8 MX001         Abarrotes   7501026005688 NA    2020-06-19 00:00:00
##  9 MX001         Abarrotes   7506195178188 NA    2020-06-19 00:00:00
## 10 MX001         Abarrotes     32239052017 NA    2020-06-19 00:00:00
## # ℹ 200,610 more rows
## # ℹ 17 more variables: Hora <dttm>, Marca <chr>, Fabricante <chr>,
## #   Producto <chr>, Precio <dbl>, Ult.Costo <dbl>, Unidades <dbl>,
## #   F.Ticket <dbl>, NombreDepartamento <chr>, NombreFamilia <chr>,
## #   NombreCategoria <chr>, Estado <chr>, `Mts 2` <dbl>, Tipo.ubicación <chr>,
## #   Giro <chr>, `Hora inicio` <dttm>, `Hora cierre` <dttm>
str(bd)
## tibble [200,620 × 22] (S3: tbl_df/tbl/data.frame)
##  $ vcClaveTienda     : chr [1:200620] "MX001" "MX001" "MX001" "MX001" ...
##  $ DescGiro          : chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ Codigo Barras     : num [1:200620] 7.5e+12 7.5e+12 7.5e+12 7.5e+12 7.5e+12 ...
##  $ PLU               : logi [1:200620] NA NA NA NA NA NA ...
##  $ Fecha             : POSIXct[1:200620], format: "2020-06-19" "2020-06-19" ...
##  $ Hora              : POSIXct[1:200620], format: "1899-12-31 08:16:21" "1899-12-31 08:23:33" ...
##  $ Marca             : chr [1:200620] "NUTRI LECHE" "DAN UP" "BIMBO" "PEPSI" ...
##  $ Fabricante        : chr [1:200620] "MEXILAC" "DANONE DE MEXICO" "GRUPO BIMBO" "PEPSI-COLA MEXICANA" ...
##  $ Producto          : chr [1:200620] "Nutri Leche 1 Litro" "DANUP STRAWBERRY P/BEBER 350GR NAL" "Rebanadas Bimbo 2Pz" "Pepsi N.R. 400Ml" ...
##  $ Precio            : num [1:200620] 16 14 5 8 19.5 9.5 11 9.5 23.5 12 ...
##  $ Ult.Costo         : num [1:200620] 12.3 14 5 8 15 ...
##  $ Unidades          : num [1:200620] 1 1 1 1 1 1 1 1 1 1 ...
##  $ F.Ticket          : num [1:200620] 1 2 3 3 4 4 4 4 4 5 ...
##  $ NombreDepartamento: chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ NombreFamilia     : chr [1:200620] "Lacteos y Refrigerados" "Lacteos y Refrigerados" "Pan y Tortilla" "Bebidas" ...
##  $ NombreCategoria   : chr [1:200620] "Leche" "Yogurt" "Pan Dulce Empaquetado" "Refrescos Plástico (N.R.)" ...
##  $ Estado            : chr [1:200620] "Nuevo León" "Nuevo León" "Nuevo León" "Nuevo León" ...
##  $ Mts 2             : num [1:200620] 60 60 60 60 60 60 60 60 60 60 ...
##  $ Tipo.ubicación    : chr [1:200620] "Esquina" "Esquina" "Esquina" "Esquina" ...
##  $ Giro              : chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ Hora inicio       : POSIXct[1:200620], format: "1899-12-31 08:00:00" "1899-12-31 08:00:00" ...
##  $ Hora cierre       : POSIXct[1:200620], format: "1899-12-31 22:00:00" "1899-12-31 22:00:00" ...
head(bd)
## # A tibble: 6 × 22
##   vcClaveTienda DescGiro  `Codigo Barras` PLU   Fecha              
##   <chr>         <chr>               <dbl> <lgl> <dttm>             
## 1 MX001         Abarrotes   7501020540666 NA    2020-06-19 00:00:00
## 2 MX001         Abarrotes   7501032397906 NA    2020-06-19 00:00:00
## 3 MX001         Abarrotes   7501000112845 NA    2020-06-19 00:00:00
## 4 MX001         Abarrotes   7501031302741 NA    2020-06-19 00:00:00
## 5 MX001         Abarrotes   7501026027543 NA    2020-06-19 00:00:00
## 6 MX001         Abarrotes   7501025433024 NA    2020-06-19 00:00:00
## # ℹ 17 more variables: Hora <dttm>, Marca <chr>, Fabricante <chr>,
## #   Producto <chr>, Precio <dbl>, Ult.Costo <dbl>, Unidades <dbl>,
## #   F.Ticket <dbl>, NombreDepartamento <chr>, NombreFamilia <chr>,
## #   NombreCategoria <chr>, Estado <chr>, `Mts 2` <dbl>, Tipo.ubicación <chr>,
## #   Giro <chr>, `Hora inicio` <dttm>, `Hora cierre` <dttm>
tail (bd)
## # A tibble: 6 × 22
##   vcClaveTienda DescGiro `Codigo Barras` PLU   Fecha              
##   <chr>         <chr>              <dbl> <lgl> <dttm>             
## 1 MX005         Depósito   7622210464811 NA    2020-07-12 00:00:00
## 2 MX005         Depósito   7622210464811 NA    2020-10-23 00:00:00
## 3 MX005         Depósito   7622210464811 NA    2020-10-10 00:00:00
## 4 MX005         Depósito   7622210464811 NA    2020-10-10 00:00:00
## 5 MX005         Depósito   7622210464811 NA    2020-06-27 00:00:00
## 6 MX005         Depósito   7622210464811 NA    2020-06-26 00:00:00
## # ℹ 17 more variables: Hora <dttm>, Marca <chr>, Fabricante <chr>,
## #   Producto <chr>, Precio <dbl>, Ult.Costo <dbl>, Unidades <dbl>,
## #   F.Ticket <dbl>, NombreDepartamento <chr>, NombreFamilia <chr>,
## #   NombreCategoria <chr>, Estado <chr>, `Mts 2` <dbl>, Tipo.ubicación <chr>,
## #   Giro <chr>, `Hora inicio` <dttm>, `Hora cierre` <dttm>
#install.packages("janitor")
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
tabyl(bd, vcClaveTienda, NombreDepartamento)
##  vcClaveTienda Abarrotes Bebes e Infantiles Carnes Farmacia Ferretería Mercería
##          MX001     95410                515      1      147        245       28
##          MX002      6590                 21      0        4         10        0
##          MX003      4026                 15      0        2          8        0
##          MX004     82234                932      0      102        114       16
##          MX005     10014                  0      0        0          0        0
##  Papelería Productos a Eliminar Vinos y Licores
##         35                    3              80
##          0                    0               4
##          0                    0               0
##         32                    5              20
##          7                    0               0

Técnicas para limpieza de datos

Técnica 1. Remover valores irrelevantes

Eliminar columnas Primer solución:Eliminar PLU (solución radical)

bd1 <- bd
bd1 <- subset(bd1,select = -c(PLU))
summary(bd1)
##  vcClaveTienda        DescGiro         Codigo Barras      
##  Length:200620      Length:200620      Min.   :8.347e+05  
##  Class :character   Class :character   1st Qu.:7.501e+12  
##  Mode  :character   Mode  :character   Median :7.501e+12  
##                                        Mean   :5.950e+12  
##                                        3rd Qu.:7.501e+12  
##                                        Max.   :1.750e+13  
##      Fecha                             Hora                       
##  Min.   :2020-05-01 00:00:00.00   Min.   :1899-12-31 00:00:00.00  
##  1st Qu.:2020-06-06 00:00:00.00   1st Qu.:1899-12-31 13:12:42.75  
##  Median :2020-07-11 00:00:00.00   Median :1899-12-31 17:35:59.00  
##  Mean   :2020-07-18 22:35:49.58   Mean   :1899-12-31 16:43:52.05  
##  3rd Qu.:2020-08-29 00:00:00.00   3rd Qu.:1899-12-31 20:47:06.00  
##  Max.   :2020-11-11 00:00:00.00   Max.   :1899-12-31 23:59:59.00  
##     Marca            Fabricante          Producto             Precio       
##  Length:200620      Length:200620      Length:200620      Min.   :-147.00  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.42  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 0.200   Min.   :     1   Length:200620     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33967   Class :character  
##  Median : 12.31   Median : 1.000   Median :105996   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193994                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383009                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts 2     
##  Length:200620      Length:200620      Length:200620      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro            Hora inicio                    
##  Length:200620      Length:200620      Min.   :1899-12-31 07:00:00.00  
##  Class :character   Class :character   1st Qu.:1899-12-31 07:00:00.00  
##  Mode  :character   Mode  :character   Median :1899-12-31 08:00:00.00  
##                                        Mean   :1899-12-31 07:35:49.71  
##                                        3rd Qu.:1899-12-31 08:00:00.00  
##                                        Max.   :1899-12-31 09:00:00.00  
##   Hora cierre                    
##  Min.   :1899-12-31 21:00:00.00  
##  1st Qu.:1899-12-31 22:00:00.00  
##  Median :1899-12-31 22:00:00.00  
##  Mean   :1899-12-31 22:23:11.42  
##  3rd Qu.:1899-12-31 23:00:00.00  
##  Max.   :1899-12-31 23:00:00.00

subset extraer de una base de datos -c es para borrar las columnas seleccionadas

Eliminar renglones Segunda solución: Eliminar renglones que tengan PLU en NA

bd2 <- bd1
bd2 <- bd2 [bd2$Precio>0,]
summary(bd1$Precio)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -147.00   11.00   16.00   19.42   25.00 1000.00

Técnica 2. Remover valores duplicados

¿Cuántos renglones/registros duplicados tenemos?

bd2[duplicated(bd2),] 
## # A tibble: 0 × 21
## # ℹ 21 variables: vcClaveTienda <chr>, DescGiro <chr>, Codigo Barras <dbl>,
## #   Fecha <dttm>, Hora <dttm>, Marca <chr>, Fabricante <chr>, Producto <chr>,
## #   Precio <dbl>, Ult.Costo <dbl>, Unidades <dbl>, F.Ticket <dbl>,
## #   NombreDepartamento <chr>, NombreFamilia <chr>, NombreCategoria <chr>,
## #   Estado <chr>, Mts 2 <dbl>, Tipo.ubicación <chr>, Giro <chr>,
## #   Hora inicio <dttm>, Hora cierre <dttm>
sum(duplicated(bd2)) 
## [1] 0

Eliminar registros duplicados

bd3 <- bd2
library(dplyr)
bd3 <- distinct(bd3)

dplyr: Realizar operaciones de manipulación de datos comunes como: filtrar por fila, seleccionar columnas específicas, reordenar filas, añadir nuevas filas y agregar datos

Técnica 3. Errores tipográficos y erroes similares

Solución 1: Precios en absoluto (Debido a los datos en negativo, en caso de que fuera error de dedo)

bd4 <- bd1
bd4$Precio <- abs(bd4$Precio)
summary(bd4$Precio)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.50   11.00   16.00   19.45   25.00 1000.00

Obtenemos resultados positivos

primer renglón: cuántos duplicados hay segundo renglón: que me los sume

Solución 2: Cantidades en enteros

bd5 <- bd4
bd5$Unidades <- -ceiling(bd5$Unidades)
summary(bd5$Unidades)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -96.000  -1.000  -1.000  -1.262  -1.000  -1.000

Técnica 4. Convertir tipos de datos

signo de $ para decir columna y signo de % para decir formato y minuscula dos digitos en año Y son 4

Solución 1:Convertir de caracter a fecha NO FUNCIONA bd6<-bd5 bd6\(Fecha <- as.Date(bd6\)Fecha, “%d/%m/%Y”) tibble(bd6)

Solución 2: #Convertir de caracter a entero

bd7 <- bd5
bd7$Hora <- substr(bd7$Hora,start=1, stop=2)
tibble(bd7)
## # A tibble: 200,620 × 21
##    vcClaveTienda DescGiro  `Codigo Barras` Fecha               Hora  Marca      
##    <chr>         <chr>               <dbl> <dttm>              <chr> <chr>      
##  1 MX001         Abarrotes   7501020540666 2020-06-19 00:00:00 18    NUTRI LECHE
##  2 MX001         Abarrotes   7501032397906 2020-06-19 00:00:00 18    DAN UP     
##  3 MX001         Abarrotes   7501000112845 2020-06-19 00:00:00 18    BIMBO      
##  4 MX001         Abarrotes   7501031302741 2020-06-19 00:00:00 18    PEPSI      
##  5 MX001         Abarrotes   7501026027543 2020-06-19 00:00:00 18    BLANCA NIE…
##  6 MX001         Abarrotes   7501025433024 2020-06-19 00:00:00 18    FLASH      
##  7 MX001         Abarrotes   7501032332013 2020-06-19 00:00:00 18    VARIOS DAN…
##  8 MX001         Abarrotes   7501026005688 2020-06-19 00:00:00 18    ZOTE       
##  9 MX001         Abarrotes   7506195178188 2020-06-19 00:00:00 18    ALWAYS     
## 10 MX001         Abarrotes     32239052017 2020-06-19 00:00:00 18    JUMEX      
## # ℹ 200,610 more rows
## # ℹ 15 more variables: Fabricante <chr>, Producto <chr>, Precio <dbl>,
## #   Ult.Costo <dbl>, Unidades <dbl>, F.Ticket <dbl>, NombreDepartamento <chr>,
## #   NombreFamilia <chr>, NombreCategoria <chr>, Estado <chr>, `Mts 2` <dbl>,
## #   Tipo.ubicación <chr>, Giro <chr>, `Hora inicio` <dttm>,
## #   `Hora cierre` <dttm>
bd7$Hora <- as.integer(bd7$Hora)
str(bd7) 
## tibble [200,620 × 21] (S3: tbl_df/tbl/data.frame)
##  $ vcClaveTienda     : chr [1:200620] "MX001" "MX001" "MX001" "MX001" ...
##  $ DescGiro          : chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ Codigo Barras     : num [1:200620] 7.5e+12 7.5e+12 7.5e+12 7.5e+12 7.5e+12 ...
##  $ Fecha             : POSIXct[1:200620], format: "2020-06-19" "2020-06-19" ...
##  $ Hora              : int [1:200620] 18 18 18 18 18 18 18 18 18 18 ...
##  $ Marca             : chr [1:200620] "NUTRI LECHE" "DAN UP" "BIMBO" "PEPSI" ...
##  $ Fabricante        : chr [1:200620] "MEXILAC" "DANONE DE MEXICO" "GRUPO BIMBO" "PEPSI-COLA MEXICANA" ...
##  $ Producto          : chr [1:200620] "Nutri Leche 1 Litro" "DANUP STRAWBERRY P/BEBER 350GR NAL" "Rebanadas Bimbo 2Pz" "Pepsi N.R. 400Ml" ...
##  $ Precio            : num [1:200620] 16 14 5 8 19.5 9.5 11 9.5 23.5 12 ...
##  $ Ult.Costo         : num [1:200620] 12.3 14 5 8 15 ...
##  $ Unidades          : num [1:200620] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ F.Ticket          : num [1:200620] 1 2 3 3 4 4 4 4 4 5 ...
##  $ NombreDepartamento: chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ NombreFamilia     : chr [1:200620] "Lacteos y Refrigerados" "Lacteos y Refrigerados" "Pan y Tortilla" "Bebidas" ...
##  $ NombreCategoria   : chr [1:200620] "Leche" "Yogurt" "Pan Dulce Empaquetado" "Refrescos Plástico (N.R.)" ...
##  $ Estado            : chr [1:200620] "Nuevo León" "Nuevo León" "Nuevo León" "Nuevo León" ...
##  $ Mts 2             : num [1:200620] 60 60 60 60 60 60 60 60 60 60 ...
##  $ Tipo.ubicación    : chr [1:200620] "Esquina" "Esquina" "Esquina" "Esquina" ...
##  $ Giro              : chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ Hora inicio       : POSIXct[1:200620], format: "1899-12-31 08:00:00" "1899-12-31 08:00:00" ...
##  $ Hora cierre       : POSIXct[1:200620], format: "1899-12-31 22:00:00" "1899-12-31 22:00:00" ...

#Técnica 5. Valores faltantes

Cuántos NS tengo en la base de datos?

sum(is.na(bd))
## [1] 199183
sum(is.na(bd7))
## [1] 0

Cuántos NA tengo por variable?

sapply(bd7, function(x) sum(is.na(x)))
##      vcClaveTienda           DescGiro      Codigo Barras              Fecha 
##                  0                  0                  0                  0 
##               Hora              Marca         Fabricante           Producto 
##                  0                  0                  0                  0 
##             Precio          Ult.Costo           Unidades           F.Ticket 
##                  0                  0                  0                  0 
## NombreDepartamento      NombreFamilia    NombreCategoria             Estado 
##                  0                  0                  0                  0 
##              Mts 2     Tipo.ubicación               Giro        Hora inicio 
##                  0                  0                  0                  0 
##        Hora cierre 
##                  0
sapply(bd, function(x) sum(is.na(x)))
##      vcClaveTienda           DescGiro      Codigo Barras                PLU 
##                  0                  0                  0             199183 
##              Fecha               Hora              Marca         Fabricante 
##                  0                  0                  0                  0 
##           Producto             Precio          Ult.Costo           Unidades 
##                  0                  0                  0                  0 
##           F.Ticket NombreDepartamento      NombreFamilia    NombreCategoria 
##                  0                  0                  0                  0 
##             Estado              Mts 2     Tipo.ubicación               Giro 
##                  0                  0                  0                  0 
##        Hora inicio        Hora cierre 
##                  0                  0

Solución 1: Borrar todos los registros de NA de una tabla

bd8<- bd7
bd8<- na.omit(bd8) 
summary(bd8)
##  vcClaveTienda        DescGiro         Codigo Barras      
##  Length:200620      Length:200620      Min.   :8.347e+05  
##  Class :character   Class :character   1st Qu.:7.501e+12  
##  Mode  :character   Mode  :character   Median :7.501e+12  
##                                        Mean   :5.950e+12  
##                                        3rd Qu.:7.501e+12  
##                                        Max.   :1.750e+13  
##      Fecha                             Hora       Marca          
##  Min.   :2020-05-01 00:00:00.00   Min.   :18   Length:200620     
##  1st Qu.:2020-06-06 00:00:00.00   1st Qu.:18   Class :character  
##  Median :2020-07-11 00:00:00.00   Median :18   Mode  :character  
##  Mean   :2020-07-18 22:35:49.58   Mean   :18                     
##  3rd Qu.:2020-08-29 00:00:00.00   3rd Qu.:18                     
##  Max.   :2020-11-11 00:00:00.00   Max.   :18                     
##   Fabricante          Producto             Precio          Ult.Costo     
##  Length:200620      Length:200620      Min.   :   0.50   Min.   :  0.38  
##  Class :character   Class :character   1st Qu.:  11.00   1st Qu.:  8.46  
##  Mode  :character   Mode  :character   Median :  16.00   Median : 12.31  
##                                        Mean   :  19.45   Mean   : 15.31  
##                                        3rd Qu.:  25.00   3rd Qu.: 19.23  
##                                        Max.   :1000.00   Max.   :769.23  
##     Unidades          F.Ticket      NombreDepartamento NombreFamilia     
##  Min.   :-96.000   Min.   :     1   Length:200620      Length:200620     
##  1st Qu.: -1.000   1st Qu.: 33967   Class :character   Class :character  
##  Median : -1.000   Median :105996   Mode  :character   Mode  :character  
##  Mean   : -1.262   Mean   :193994                                        
##  3rd Qu.: -1.000   3rd Qu.:383009                                        
##  Max.   : -1.000   Max.   :450040                                        
##  NombreCategoria       Estado              Mts 2      Tipo.ubicación    
##  Length:200620      Length:200620      Min.   :47.0   Length:200620     
##  Class :character   Class :character   1st Qu.:53.0   Class :character  
##  Mode  :character   Mode  :character   Median :60.0   Mode  :character  
##                                        Mean   :56.6                     
##                                        3rd Qu.:60.0                     
##                                        Max.   :62.0                     
##      Giro            Hora inicio                    
##  Length:200620      Min.   :1899-12-31 07:00:00.00  
##  Class :character   1st Qu.:1899-12-31 07:00:00.00  
##  Mode  :character   Median :1899-12-31 08:00:00.00  
##                     Mean   :1899-12-31 07:35:49.71  
##                     3rd Qu.:1899-12-31 08:00:00.00  
##                     Max.   :1899-12-31 09:00:00.00  
##   Hora cierre                    
##  Min.   :1899-12-31 21:00:00.00  
##  1st Qu.:1899-12-31 22:00:00.00  
##  Median :1899-12-31 22:00:00.00  
##  Mean   :1899-12-31 22:23:11.42  
##  3rd Qu.:1899-12-31 23:00:00.00  
##  Max.   :1899-12-31 23:00:00.00
bd15<- bd
bd15<- na.omit(bd15) 
summary(bd15)
##  vcClaveTienda        DescGiro         Codigo Barras         PLU         
##  Length:1437        Length:1437        Min.   :6.750e+08   Mode:logical  
##  Class :character   Class :character   1st Qu.:6.750e+08   TRUE:1437     
##  Mode  :character   Mode  :character   Median :6.750e+08                 
##                                        Mean   :2.616e+11                 
##                                        3rd Qu.:6.750e+08                 
##                                        Max.   :7.501e+12                 
##      Fecha                             Hora                       
##  Min.   :2020-06-06 00:00:00.00   Min.   :1899-12-31 00:01:22.00  
##  1st Qu.:2020-06-20 00:00:00.00   1st Qu.:1899-12-31 15:57:22.00  
##  Median :2020-07-10 00:00:00.00   Median :1899-12-31 18:49:20.00  
##  Mean   :2020-07-15 18:04:15.52   Mean   :1899-12-31 17:46:04.46  
##  3rd Qu.:2020-08-08 00:00:00.00   3rd Qu.:1899-12-31 21:09:03.00  
##  Max.   :2020-11-11 00:00:00.00   Max.   :1899-12-31 23:58:14.00  
##     Marca            Fabricante          Producto             Precio     
##  Length:1437        Length:1437        Length:1437        Min.   :30.00  
##  Class :character   Class :character   Class :character   1st Qu.:90.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :90.00  
##                                                           Mean   :87.94  
##                                                           3rd Qu.:90.00  
##                                                           Max.   :90.00  
##    Ult.Costo        Unidades        F.Ticket      NombreDepartamento
##  Min.   : 1.00   Min.   :1.000   Min.   :   772   Length:1437       
##  1st Qu.:64.62   1st Qu.:1.000   1st Qu.: 99955   Class :character  
##  Median :64.62   Median :1.000   Median :102493   Mode  :character  
##  Mean   :56.65   Mean   :1.124   Mean   :100595                     
##  3rd Qu.:64.62   3rd Qu.:1.000   3rd Qu.:106546                     
##  Max.   :64.62   Max.   :7.000   Max.   :118356                     
##  NombreFamilia      NombreCategoria       Estado              Mts 2      
##  Length:1437        Length:1437        Length:1437        Min.   :58.00  
##  Class :character   Class :character   Class :character   1st Qu.:58.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :58.00  
##                                                           Mean   :58.07  
##                                                           3rd Qu.:58.00  
##                                                           Max.   :60.00  
##  Tipo.ubicación         Giro            Hora inicio                 
##  Length:1437        Length:1437        Min.   :1899-12-31 08:00:00  
##  Class :character   Class :character   1st Qu.:1899-12-31 08:00:00  
##  Mode  :character   Mode  :character   Median :1899-12-31 08:00:00  
##                                        Mean   :1899-12-31 08:00:00  
##                                        3rd Qu.:1899-12-31 08:00:00  
##                                        Max.   :1899-12-31 08:00:00  
##   Hora cierre                    
##  Min.   :1899-12-31 21:00:00.00  
##  1st Qu.:1899-12-31 21:00:00.00  
##  Median :1899-12-31 21:00:00.00  
##  Mean   :1899-12-31 21:02:06.26  
##  3rd Qu.:1899-12-31 21:00:00.00  
##  Max.   :1899-12-31 22:00:00.00

usar na.omit con cuidado no siempre en la segunda sección intenté con la base de datos original por que me marcaba cero NA’s en la bd7, y no servía de nada.

Solución 2: Reemplazar NA con 0

bd9<- bd
bd9[is.na(bd9)] <-0
summary(bd9)
##  vcClaveTienda        DescGiro         Codigo Barras          PLU         
##  Length:200620      Length:200620      Min.   :8.347e+05   Mode :logical  
##  Class :character   Class :character   1st Qu.:7.501e+12   FALSE:199183   
##  Mode  :character   Mode  :character   Median :7.501e+12   TRUE :1437     
##                                        Mean   :5.950e+12                  
##                                        3rd Qu.:7.501e+12                  
##                                        Max.   :1.750e+13                  
##      Fecha                             Hora                       
##  Min.   :2020-05-01 00:00:00.00   Min.   :1899-12-31 00:00:00.00  
##  1st Qu.:2020-06-06 00:00:00.00   1st Qu.:1899-12-31 13:12:42.75  
##  Median :2020-07-11 00:00:00.00   Median :1899-12-31 17:35:59.00  
##  Mean   :2020-07-18 22:35:49.58   Mean   :1899-12-31 16:43:52.05  
##  3rd Qu.:2020-08-29 00:00:00.00   3rd Qu.:1899-12-31 20:47:06.00  
##  Max.   :2020-11-11 00:00:00.00   Max.   :1899-12-31 23:59:59.00  
##     Marca            Fabricante          Producto             Precio       
##  Length:200620      Length:200620      Length:200620      Min.   :-147.00  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.42  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 0.200   Min.   :     1   Length:200620     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33967   Class :character  
##  Median : 12.31   Median : 1.000   Median :105996   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193994                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383009                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts 2     
##  Length:200620      Length:200620      Length:200620      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro            Hora inicio                    
##  Length:200620      Length:200620      Min.   :1899-12-31 07:00:00.00  
##  Class :character   Class :character   1st Qu.:1899-12-31 07:00:00.00  
##  Mode  :character   Mode  :character   Median :1899-12-31 08:00:00.00  
##                                        Mean   :1899-12-31 07:35:49.71  
##                                        3rd Qu.:1899-12-31 08:00:00.00  
##                                        Max.   :1899-12-31 09:00:00.00  
##   Hora cierre                    
##  Min.   :1899-12-31 21:00:00.00  
##  1st Qu.:1899-12-31 22:00:00.00  
##  Median :1899-12-31 22:00:00.00  
##  Mean   :1899-12-31 22:23:11.42  
##  3rd Qu.:1899-12-31 23:00:00.00  
##  Max.   :1899-12-31 23:00:00.00
sum(is.na(bd9))
## [1] 0

Solución 3: Reemplazar NA’s con promedio

bd10<- bd
bd10$PLU[is.na(bd10$PLU)]<-mean(bd10$PLU, na.rm=TRUE)
summary(bd10)
##  vcClaveTienda        DescGiro         Codigo Barras            PLU   
##  Length:200620      Length:200620      Min.   :8.347e+05   Min.   :1  
##  Class :character   Class :character   1st Qu.:7.501e+12   1st Qu.:1  
##  Mode  :character   Mode  :character   Median :7.501e+12   Median :1  
##                                        Mean   :5.950e+12   Mean   :1  
##                                        3rd Qu.:7.501e+12   3rd Qu.:1  
##                                        Max.   :1.750e+13   Max.   :1  
##      Fecha                             Hora                       
##  Min.   :2020-05-01 00:00:00.00   Min.   :1899-12-31 00:00:00.00  
##  1st Qu.:2020-06-06 00:00:00.00   1st Qu.:1899-12-31 13:12:42.75  
##  Median :2020-07-11 00:00:00.00   Median :1899-12-31 17:35:59.00  
##  Mean   :2020-07-18 22:35:49.58   Mean   :1899-12-31 16:43:52.05  
##  3rd Qu.:2020-08-29 00:00:00.00   3rd Qu.:1899-12-31 20:47:06.00  
##  Max.   :2020-11-11 00:00:00.00   Max.   :1899-12-31 23:59:59.00  
##     Marca            Fabricante          Producto             Precio       
##  Length:200620      Length:200620      Length:200620      Min.   :-147.00  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.42  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 0.200   Min.   :     1   Length:200620     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33967   Class :character  
##  Median : 12.31   Median : 1.000   Median :105996   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193994                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383009                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts 2     
##  Length:200620      Length:200620      Length:200620      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro            Hora inicio                    
##  Length:200620      Length:200620      Min.   :1899-12-31 07:00:00.00  
##  Class :character   Class :character   1st Qu.:1899-12-31 07:00:00.00  
##  Mode  :character   Mode  :character   Median :1899-12-31 08:00:00.00  
##                                        Mean   :1899-12-31 07:35:49.71  
##                                        3rd Qu.:1899-12-31 08:00:00.00  
##                                        Max.   :1899-12-31 09:00:00.00  
##   Hora cierre                    
##  Min.   :1899-12-31 21:00:00.00  
##  1st Qu.:1899-12-31 22:00:00.00  
##  Median :1899-12-31 22:00:00.00  
##  Mean   :1899-12-31 22:23:11.42  
##  3rd Qu.:1899-12-31 23:00:00.00  
##  Max.   :1899-12-31 23:00:00.00
sum(is.na(bd10))
## [1] 0

Solución 4: Reemplazar negativos con CEROS NO FUNCIONA

bd11<-bd bd11[bd11<0] <-0 summary(bd11) “ERROR Assigned data 0 must be compatible with existing data.Error in [<-:”

Técnica 6. Método estadístico

Gráfica de caja y bigotes

bd12<-bd7
boxplot(bd12$Precio,horizontal=TRUE)

boxplot(bd12$Unidades,horizontal=TRUE)

Da negativas las unidades(?)

Agregar Columnas

#install.packages("lubridate")
library(lubridate)
bd12$Dia_de_la_semana <-wday(bd12$Fecha)
summary(bd12)
##  vcClaveTienda        DescGiro         Codigo Barras      
##  Length:200620      Length:200620      Min.   :8.347e+05  
##  Class :character   Class :character   1st Qu.:7.501e+12  
##  Mode  :character   Mode  :character   Median :7.501e+12  
##                                        Mean   :5.950e+12  
##                                        3rd Qu.:7.501e+12  
##                                        Max.   :1.750e+13  
##      Fecha                             Hora       Marca          
##  Min.   :2020-05-01 00:00:00.00   Min.   :18   Length:200620     
##  1st Qu.:2020-06-06 00:00:00.00   1st Qu.:18   Class :character  
##  Median :2020-07-11 00:00:00.00   Median :18   Mode  :character  
##  Mean   :2020-07-18 22:35:49.58   Mean   :18                     
##  3rd Qu.:2020-08-29 00:00:00.00   3rd Qu.:18                     
##  Max.   :2020-11-11 00:00:00.00   Max.   :18                     
##   Fabricante          Producto             Precio          Ult.Costo     
##  Length:200620      Length:200620      Min.   :   0.50   Min.   :  0.38  
##  Class :character   Class :character   1st Qu.:  11.00   1st Qu.:  8.46  
##  Mode  :character   Mode  :character   Median :  16.00   Median : 12.31  
##                                        Mean   :  19.45   Mean   : 15.31  
##                                        3rd Qu.:  25.00   3rd Qu.: 19.23  
##                                        Max.   :1000.00   Max.   :769.23  
##     Unidades          F.Ticket      NombreDepartamento NombreFamilia     
##  Min.   :-96.000   Min.   :     1   Length:200620      Length:200620     
##  1st Qu.: -1.000   1st Qu.: 33967   Class :character   Class :character  
##  Median : -1.000   Median :105996   Mode  :character   Mode  :character  
##  Mean   : -1.262   Mean   :193994                                        
##  3rd Qu.: -1.000   3rd Qu.:383009                                        
##  Max.   : -1.000   Max.   :450040                                        
##  NombreCategoria       Estado              Mts 2      Tipo.ubicación    
##  Length:200620      Length:200620      Min.   :47.0   Length:200620     
##  Class :character   Class :character   1st Qu.:53.0   Class :character  
##  Mode  :character   Mode  :character   Median :60.0   Mode  :character  
##                                        Mean   :56.6                     
##                                        3rd Qu.:60.0                     
##                                        Max.   :62.0                     
##      Giro            Hora inicio                    
##  Length:200620      Min.   :1899-12-31 07:00:00.00  
##  Class :character   1st Qu.:1899-12-31 07:00:00.00  
##  Mode  :character   Median :1899-12-31 08:00:00.00  
##                     Mean   :1899-12-31 07:35:49.71  
##                     3rd Qu.:1899-12-31 08:00:00.00  
##                     Max.   :1899-12-31 09:00:00.00  
##   Hora cierre                     Dia_de_la_semana
##  Min.   :1899-12-31 21:00:00.00   Min.   :1.000   
##  1st Qu.:1899-12-31 22:00:00.00   1st Qu.:2.000   
##  Median :1899-12-31 22:00:00.00   Median :4.000   
##  Mean   :1899-12-31 22:23:11.42   Mean   :3.912   
##  3rd Qu.:1899-12-31 23:00:00.00   3rd Qu.:6.000   
##  Max.   :1899-12-31 23:00:00.00   Max.   :7.000
bd12$subtotal<- bd12$Precio*bd12$Unidades
summary(bd12)
##  vcClaveTienda        DescGiro         Codigo Barras      
##  Length:200620      Length:200620      Min.   :8.347e+05  
##  Class :character   Class :character   1st Qu.:7.501e+12  
##  Mode  :character   Mode  :character   Median :7.501e+12  
##                                        Mean   :5.950e+12  
##                                        3rd Qu.:7.501e+12  
##                                        Max.   :1.750e+13  
##      Fecha                             Hora       Marca          
##  Min.   :2020-05-01 00:00:00.00   Min.   :18   Length:200620     
##  1st Qu.:2020-06-06 00:00:00.00   1st Qu.:18   Class :character  
##  Median :2020-07-11 00:00:00.00   Median :18   Mode  :character  
##  Mean   :2020-07-18 22:35:49.58   Mean   :18                     
##  3rd Qu.:2020-08-29 00:00:00.00   3rd Qu.:18                     
##  Max.   :2020-11-11 00:00:00.00   Max.   :18                     
##   Fabricante          Producto             Precio          Ult.Costo     
##  Length:200620      Length:200620      Min.   :   0.50   Min.   :  0.38  
##  Class :character   Class :character   1st Qu.:  11.00   1st Qu.:  8.46  
##  Mode  :character   Mode  :character   Median :  16.00   Median : 12.31  
##                                        Mean   :  19.45   Mean   : 15.31  
##                                        3rd Qu.:  25.00   3rd Qu.: 19.23  
##                                        Max.   :1000.00   Max.   :769.23  
##     Unidades          F.Ticket      NombreDepartamento NombreFamilia     
##  Min.   :-96.000   Min.   :     1   Length:200620      Length:200620     
##  1st Qu.: -1.000   1st Qu.: 33967   Class :character   Class :character  
##  Median : -1.000   Median :105996   Mode  :character   Mode  :character  
##  Mean   : -1.262   Mean   :193994                                        
##  3rd Qu.: -1.000   3rd Qu.:383009                                        
##  Max.   : -1.000   Max.   :450040                                        
##  NombreCategoria       Estado              Mts 2      Tipo.ubicación    
##  Length:200620      Length:200620      Min.   :47.0   Length:200620     
##  Class :character   Class :character   1st Qu.:53.0   Class :character  
##  Mode  :character   Mode  :character   Median :60.0   Mode  :character  
##                                        Mean   :56.6                     
##                                        3rd Qu.:60.0                     
##                                        Max.   :62.0                     
##      Giro            Hora inicio                    
##  Length:200620      Min.   :1899-12-31 07:00:00.00  
##  Class :character   1st Qu.:1899-12-31 07:00:00.00  
##  Mode  :character   Median :1899-12-31 08:00:00.00  
##                     Mean   :1899-12-31 07:35:49.71  
##                     3rd Qu.:1899-12-31 08:00:00.00  
##                     Max.   :1899-12-31 09:00:00.00  
##   Hora cierre                     Dia_de_la_semana    subtotal       
##  Min.   :1899-12-31 21:00:00.00   Min.   :1.000    Min.   :-2496.00  
##  1st Qu.:1899-12-31 22:00:00.00   1st Qu.:2.000    1st Qu.:  -27.00  
##  Median :1899-12-31 22:00:00.00   Median :4.000    Median :  -18.00  
##  Mean   :1899-12-31 22:23:11.42   Mean   :3.912    Mean   :  -24.33  
##  3rd Qu.:1899-12-31 23:00:00.00   3rd Qu.:6.000    3rd Qu.:  -12.00  
##  Max.   :1899-12-31 23:00:00.00   Max.   :7.000    Max.   :   -1.00
bd12$Utilidad <- bd12$Precio - bd12$Ult.Costo
summary(bd12)
##  vcClaveTienda        DescGiro         Codigo Barras      
##  Length:200620      Length:200620      Min.   :8.347e+05  
##  Class :character   Class :character   1st Qu.:7.501e+12  
##  Mode  :character   Mode  :character   Median :7.501e+12  
##                                        Mean   :5.950e+12  
##                                        3rd Qu.:7.501e+12  
##                                        Max.   :1.750e+13  
##      Fecha                             Hora       Marca          
##  Min.   :2020-05-01 00:00:00.00   Min.   :18   Length:200620     
##  1st Qu.:2020-06-06 00:00:00.00   1st Qu.:18   Class :character  
##  Median :2020-07-11 00:00:00.00   Median :18   Mode  :character  
##  Mean   :2020-07-18 22:35:49.58   Mean   :18                     
##  3rd Qu.:2020-08-29 00:00:00.00   3rd Qu.:18                     
##  Max.   :2020-11-11 00:00:00.00   Max.   :18                     
##   Fabricante          Producto             Precio          Ult.Costo     
##  Length:200620      Length:200620      Min.   :   0.50   Min.   :  0.38  
##  Class :character   Class :character   1st Qu.:  11.00   1st Qu.:  8.46  
##  Mode  :character   Mode  :character   Median :  16.00   Median : 12.31  
##                                        Mean   :  19.45   Mean   : 15.31  
##                                        3rd Qu.:  25.00   3rd Qu.: 19.23  
##                                        Max.   :1000.00   Max.   :769.23  
##     Unidades          F.Ticket      NombreDepartamento NombreFamilia     
##  Min.   :-96.000   Min.   :     1   Length:200620      Length:200620     
##  1st Qu.: -1.000   1st Qu.: 33967   Class :character   Class :character  
##  Median : -1.000   Median :105996   Mode  :character   Mode  :character  
##  Mean   : -1.262   Mean   :193994                                        
##  3rd Qu.: -1.000   3rd Qu.:383009                                        
##  Max.   : -1.000   Max.   :450040                                        
##  NombreCategoria       Estado              Mts 2      Tipo.ubicación    
##  Length:200620      Length:200620      Min.   :47.0   Length:200620     
##  Class :character   Class :character   1st Qu.:53.0   Class :character  
##  Mode  :character   Mode  :character   Median :60.0   Mode  :character  
##                                        Mean   :56.6                     
##                                        3rd Qu.:60.0                     
##                                        Max.   :62.0                     
##      Giro            Hora inicio                    
##  Length:200620      Min.   :1899-12-31 07:00:00.00  
##  Class :character   1st Qu.:1899-12-31 07:00:00.00  
##  Mode  :character   Median :1899-12-31 08:00:00.00  
##                     Mean   :1899-12-31 07:35:49.71  
##                     3rd Qu.:1899-12-31 08:00:00.00  
##                     Max.   :1899-12-31 09:00:00.00  
##   Hora cierre                     Dia_de_la_semana    subtotal       
##  Min.   :1899-12-31 21:00:00.00   Min.   :1.000    Min.   :-2496.00  
##  1st Qu.:1899-12-31 22:00:00.00   1st Qu.:2.000    1st Qu.:  -27.00  
##  Median :1899-12-31 22:00:00.00   Median :4.000    Median :  -18.00  
##  Mean   :1899-12-31 22:23:11.42   Mean   :3.912    Mean   :  -24.33  
##  3rd Qu.:1899-12-31 23:00:00.00   3rd Qu.:6.000    3rd Qu.:  -12.00  
##  Max.   :1899-12-31 23:00:00.00   Max.   :7.000    Max.   :   -1.00  
##     Utilidad      
##  Min.   :  0.000  
##  1st Qu.:  2.310  
##  Median :  3.230  
##  Mean   :  4.142  
##  3rd Qu.:  5.420  
##  Max.   :230.770

1 es domingo, 2 Lunes, etc. Agregamos columnas de día de la semana, subtotal y utilidad.

Exportar base de datos

bd_limpia<- bd12
write.csv(bd_limpia,file="Abarrotes_bd_limpia_R.csv",row.names=FALSE) 

que no tome en cuenta el primer renglón como títulos

Market basket analisis

Instalar paquetes

#install.packages("plyr")
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following object is masked from 'package:purrr':
## 
##     compact
#install.packages("Matrix")
library(Matrix)
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
#install.packages("arules")
library(arules)
## 
## Attaching package: 'arules'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
#install.packages("arulesViz")
library(arules)
#install.packages("datasets")
library(datasets)

Ordenar de menor a mayor los Tickets

bd_limpia <- bd_limpia[order(bd_limpia$F.Ticket),]
head(bd_limpia)
## # A tibble: 6 × 24
##   vcClaveTienda DescGiro  `Codigo Barras` Fecha                Hora Marca       
##   <chr>         <chr>               <dbl> <dttm>              <int> <chr>       
## 1 MX001         Abarrotes   7501020540666 2020-06-19 00:00:00    18 NUTRI LECHE 
## 2 MX001         Abarrotes   7501032397906 2020-06-19 00:00:00    18 DAN UP      
## 3 MX001         Abarrotes   7501000112845 2020-06-19 00:00:00    18 BIMBO       
## 4 MX001         Abarrotes   7501031302741 2020-06-19 00:00:00    18 PEPSI       
## 5 MX001         Abarrotes   7501026027543 2020-06-19 00:00:00    18 BLANCA NIEV…
## 6 MX001         Abarrotes   7501025433024 2020-06-19 00:00:00    18 FLASH       
## # ℹ 18 more variables: Fabricante <chr>, Producto <chr>, Precio <dbl>,
## #   Ult.Costo <dbl>, Unidades <dbl>, F.Ticket <dbl>, NombreDepartamento <chr>,
## #   NombreFamilia <chr>, NombreCategoria <chr>, Estado <chr>, `Mts 2` <dbl>,
## #   Tipo.ubicación <chr>, Giro <chr>, `Hora inicio` <dttm>,
## #   `Hora cierre` <dttm>, Dia_de_la_semana <dbl>, subtotal <dbl>,
## #   Utilidad <dbl>
tail(bd_limpia)
## # A tibble: 6 × 24
##   vcClaveTienda DescGiro   `Codigo Barras` Fecha                Hora Marca      
##   <chr>         <chr>                <dbl> <dttm>              <int> <chr>      
## 1 MX004         Carnicería     10248765241 2020-10-15 00:00:00    18 YEMINA     
## 2 MX004         Carnicería   7501079702855 2020-10-15 00:00:00    18 DEL FUERTE 
## 3 MX004         Carnicería   7501055320639 2020-10-15 00:00:00    18 COCA COLA …
## 4 MX004         Carnicería   7501214100256 2020-10-15 00:00:00    18 DIAMANTE   
## 5 MX004         Carnicería   7501031311620 2020-10-15 00:00:00    18 PEPSI      
## 6 MX004         Carnicería        75004699 2020-10-15 00:00:00    18 COCA COLA  
## # ℹ 18 more variables: Fabricante <chr>, Producto <chr>, Precio <dbl>,
## #   Ult.Costo <dbl>, Unidades <dbl>, F.Ticket <dbl>, NombreDepartamento <chr>,
## #   NombreFamilia <chr>, NombreCategoria <chr>, Estado <chr>, `Mts 2` <dbl>,
## #   Tipo.ubicación <chr>, Giro <chr>, `Hora inicio` <dttm>,
## #   `Hora cierre` <dttm>, Dia_de_la_semana <dbl>, subtotal <dbl>,
## #   Utilidad <dbl>

Generar Basket

basket<- ddply(bd_limpia,c("F.Ticket"),function(bd_limpia)paste(bd_limpia$Marca,collapse=","))

Eliminar número de ticket

basket$F.Ticket<-NULL

Renombrar el nombre de la columna

colnames(basket) <- c ("Marca")

Exportar Basket

write.csv(basket,"basket.csv", quote=FALSE,row.names=FALSE)

Importar Transacciones

#file.choose()
tr<-read.transactions("D:\\Lesly Gómez\\Documentos\\basket.csv", format= "basket", sep =",")
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string

## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in asMethod(object): removing duplicated items in transactions

Generar reglas de asociación

reglas.asociacion<-apriori(tr,parameter = list(supp=0.001,conf=0.2,maxlen=10))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 115 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[604 item(s), 115111 transaction(s)] done [0.03s].
## sorting and recoding items ... [207 item(s)] done [0.00s].
## creating transaction tree ... done [0.03s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [11 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
summary(reglas.asociacion)
## set of 11 rules
## 
## rule length distribution (lhs + rhs):sizes
##  2 
## 11 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       2       2       2       2       2       2 
## 
## summary of quality measures:
##     support           confidence        coverage             lift       
##  Min.   :0.001016   Min.   :0.2069   Min.   :0.003562   Min.   : 1.325  
##  1st Qu.:0.001103   1st Qu.:0.2356   1st Qu.:0.004504   1st Qu.: 1.787  
##  Median :0.001416   Median :0.2442   Median :0.005803   Median : 3.972  
##  Mean   :0.001519   Mean   :0.2536   Mean   :0.006054   Mean   :17.563  
##  3rd Qu.:0.001651   3rd Qu.:0.2685   3rd Qu.:0.006893   3rd Qu.:21.798  
##  Max.   :0.002745   Max.   :0.3098   Max.   :0.010503   Max.   :65.908  
##      count      
##  Min.   :117.0  
##  1st Qu.:127.0  
##  Median :163.0  
##  Mean   :174.9  
##  3rd Qu.:190.0  
##  Max.   :316.0  
## 
## mining info:
##  data ntransactions support confidence
##    tr        115111   0.001        0.2
##                                                                         call
##  apriori(data = tr, parameter = list(supp = 0.001, conf = 0.2, maxlen = 10))
inspect(reglas.asociacion)
##      lhs                  rhs         support     confidence coverage   
## [1]  {FANTA}           => {COCA COLA} 0.001051159 0.2439516  0.004308884
## [2]  {SALVO}           => {FABULOSO}  0.001103283 0.3097561  0.003561779
## [3]  {FABULOSO}        => {SALVO}     0.001103283 0.2347505  0.004699811
## [4]  {COCA COLA ZERO}  => {COCA COLA} 0.001416025 0.2969035  0.004769310
## [5]  {SPRITE}          => {COCA COLA} 0.001346526 0.2069426  0.006506763
## [6]  {PINOL}           => {CLORALEX}  0.001016410 0.2363636  0.004300197
## [7]  {BLUE HOUSE}      => {BIMBO}     0.001711392 0.2720994  0.006289581
## [8]  {HELLMANN´S}      => {BIMBO}     0.001537646 0.2649701  0.005803094
## [9]  {REYMA}           => {CONVERMEX} 0.002093631 0.2441743  0.008574333
## [10] {FUD}             => {BIMBO}     0.001589770 0.2183771  0.007279930
## [11] {COCA COLA LIGHT} => {COCA COLA} 0.002745176 0.2613730  0.010502906
##      lift      count
## [1]   1.561906 121  
## [2]  65.908196 127  
## [3]  65.908196 127  
## [4]   1.900932 163  
## [5]   1.324955 155  
## [6]  25.030409 117  
## [7]   4.078870 197  
## [8]   3.971997 177  
## [9]  18.564824 241  
## [10]  3.273552 183  
## [11]  1.673447 316

Ordenar reglas de asociación

reglas.asociacion<-sort(reglas.asociacion,by="confidence",decreasing=TRUE)
summary(reglas.asociacion)
## set of 11 rules
## 
## rule length distribution (lhs + rhs):sizes
##  2 
## 11 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       2       2       2       2       2       2 
## 
## summary of quality measures:
##     support           confidence        coverage             lift       
##  Min.   :0.001016   Min.   :0.2069   Min.   :0.003562   Min.   : 1.325  
##  1st Qu.:0.001103   1st Qu.:0.2356   1st Qu.:0.004504   1st Qu.: 1.787  
##  Median :0.001416   Median :0.2442   Median :0.005803   Median : 3.972  
##  Mean   :0.001519   Mean   :0.2536   Mean   :0.006054   Mean   :17.563  
##  3rd Qu.:0.001651   3rd Qu.:0.2685   3rd Qu.:0.006893   3rd Qu.:21.798  
##  Max.   :0.002745   Max.   :0.3098   Max.   :0.010503   Max.   :65.908  
##      count      
##  Min.   :117.0  
##  1st Qu.:127.0  
##  Median :163.0  
##  Mean   :174.9  
##  3rd Qu.:190.0  
##  Max.   :316.0  
## 
## mining info:
##  data ntransactions support confidence
##    tr        115111   0.001        0.2
##                                                                         call
##  apriori(data = tr, parameter = list(supp = 0.001, conf = 0.2, maxlen = 10))
inspect(reglas.asociacion)
##      lhs                  rhs         support     confidence coverage   
## [1]  {SALVO}           => {FABULOSO}  0.001103283 0.3097561  0.003561779
## [2]  {COCA COLA ZERO}  => {COCA COLA} 0.001416025 0.2969035  0.004769310
## [3]  {BLUE HOUSE}      => {BIMBO}     0.001711392 0.2720994  0.006289581
## [4]  {HELLMANN´S}      => {BIMBO}     0.001537646 0.2649701  0.005803094
## [5]  {COCA COLA LIGHT} => {COCA COLA} 0.002745176 0.2613730  0.010502906
## [6]  {REYMA}           => {CONVERMEX} 0.002093631 0.2441743  0.008574333
## [7]  {FANTA}           => {COCA COLA} 0.001051159 0.2439516  0.004308884
## [8]  {PINOL}           => {CLORALEX}  0.001016410 0.2363636  0.004300197
## [9]  {FABULOSO}        => {SALVO}     0.001103283 0.2347505  0.004699811
## [10] {FUD}             => {BIMBO}     0.001589770 0.2183771  0.007279930
## [11] {SPRITE}          => {COCA COLA} 0.001346526 0.2069426  0.006506763
##      lift      count
## [1]  65.908196 127  
## [2]   1.900932 163  
## [3]   4.078870 197  
## [4]   3.971997 177  
## [5]   1.673447 316  
## [6]  18.564824 241  
## [7]   1.561906 121  
## [8]  25.030409 117  
## [9]  65.908196 127  
## [10]  3.273552 183  
## [11]  1.324955 155

Visualizar reglas de asociación

#install.packages("arulesViz")
library(arulesViz)
top10reglas <- head(reglas.asociacion,n=10,bye="confidence")
plot(top10reglas,method="graph",engine="htmlwidget")
LS0tDQp0aXRsZTogIkFjdGl2aWRhZCAyIg0KYXV0aG9yOiAiTGVzbHkgR8OzbWV6Ig0KZGF0ZTogIjIwMjMtMDgtMTgiDQpvdXRwdXQ6IA0KICBodG1sX2RvY3VtZW50Og0KICAgICAgdG9jOiBUUlVFDQogICAgICBjb2RlX2Rvd25sb2FkOiBUUlVFDQogICAgICB0b2NfZmxvYXQ6IFRSVUUNCi0tLQ0KDQohW10oQzpcXFVzZXJzXFxBcnR1cm8gTWFydGluZXpcXEVzY3JpdG9yaW9cXGRvY3NcXEFuw6FsaXNpcyBkZSBEYXRvc1xcVGllbmRhX2RlX2FiYXJyb3Rlcy5qcGcpDQoNCg0KYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9DQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUpDQpgYGANCg0KIyMgUiBNYXJrZG93bg0KDQpUaGlzIGlzIGFuIFIgTWFya2Rvd24gZG9jdW1lbnQuIE1hcmtkb3duIGlzIGEgc2ltcGxlIGZvcm1hdHRpbmcgc3ludGF4IGZvciBhdXRob3JpbmcgSFRNTCwgUERGLCBhbmQgTVMgV29yZCBkb2N1bWVudHMuIEZvciBtb3JlIGRldGFpbHMgb24gdXNpbmcgUiBNYXJrZG93biBzZWUgPGh0dHA6Ly9ybWFya2Rvd24ucnN0dWRpby5jb20+Lg0KDQpXaGVuIHlvdSBjbGljayB0aGUgKipLbml0KiogYnV0dG9uIGEgZG9jdW1lbnQgd2lsbCBiZSBnZW5lcmF0ZWQgdGhhdCBpbmNsdWRlcyBib3RoIGNvbnRlbnQgYXMgd2VsbCBhcyB0aGUgb3V0cHV0IG9mIGFueSBlbWJlZGRlZCBSIGNvZGUgY2h1bmtzIHdpdGhpbiB0aGUgZG9jdW1lbnQuIFlvdSBjYW4gZW1iZWQgYW4gUiBjb2RlIGNodW5rIGxpa2UgdGhpczoNCg0KYGBge3IgY2Fyc30NCnN1bW1hcnkoY2FycykNCmBgYA0KDQojIyBJbmNsdWRpbmcgUGxvdHMNCg0KWW91IGNhbiBhbHNvIGVtYmVkIHBsb3RzLCBmb3IgZXhhbXBsZToNCg0KYGBge3IgcHJlc3N1cmUsIGVjaG89RkFMU0V9DQpwbG90KHByZXNzdXJlKQ0KYGBgDQoNCk5vdGUgdGhhdCB0aGUgYGVjaG8gPSBGQUxTRWAgcGFyYW1ldGVyIHdhcyBhZGRlZCB0byB0aGUgY29kZSBjaHVuayB0byBwcmV2ZW50IHByaW50aW5nIG9mIHRoZSBSIGNvZGUgdGhhdCBnZW5lcmF0ZWQgdGhlIHBsb3QuDQoNCkluc3RhbGFjacOzbiBkZSBsaWJyZXLDrWFzDQoNCmBgYHtyfQ0KI2luc3RhbGwucGFja2FnZXMoInRpZHl2ZXJzZSIpDQojaW5zdGFsbC5wYWNrYWdlcygiZHBseXIiKQ0KI2luc3RhbGwucGFja2FnZXMoImdncGxvdDIiKQ0KI2luc3RhbGwucGFja2FnZXMoInJlYWR4bCIpDQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkoZHBseXIpDQpsaWJyYXJ5KGdncGxvdDIpDQpsaWJyYXJ5KHJlYWR4bCkNCmBgYA0KDQpJbXBvcnRhciB5IG9ic2VydmFyIEJhc2UgZGUgZGF0b3MNCg0KYGBge3J9DQpsaWJyYXJ5KHJlYWR4bCkNCkFiYXJyb3Rlc19WZW50YXNfMiA8LSByZWFkX2V4Y2VsKCJEOi9MZXNseSBHw7NtZXovRGVzY2FyZ2FzL0FiYXJyb3Rlc19WZW50YXMtMi54bHN4IikNClZpZXcoQWJhcnJvdGVzX1ZlbnRhc18yKQ0KYmQ8LUFiYXJyb3Rlc19WZW50YXNfMg0Kc3RyKGJkKQ0Kc3VtbWFyeShiZCkNCmBgYA0KDQpPYnNlcnZhY2lvbmVzIExhIHZhcmlhYmxlIFBMVSB0aWVuZSAxOTkxODMgTkHCtHMgTGEgdmFyaWFibGUgRmVjaGEgZXN0w6EgY29tbyBmZWNoYSBMYSB2YXJpYWJsZSBob3JhIGVzdMOhIGNvbW8gaG9yYSBMYSB2YXJpYWJsZSBQcmVjaW8gdGllbmUgbmVnYXRpdm9zIExhIHZhcmlhYmxlIHVuaWRhZGVzIHRpZW5lIGRlY2ltYWxlcw0KDQojIyMgQW7DoWxpc2lzIGRlIGJhc2UgZGUgZGF0b3MNCg0KYGBge3J9DQpjb3VudChiZCkNCmNvdW50KGJkLHZjQ2xhdmVUaWVuZGEsc29ydCA9IFRSVUUpDQpjb3VudChiZCxEZXNjR2lybyxzb3J0ID0gVFJVRSkNCmNvdW50KGJkLE1hcmNhLHNvcnQgPSBUUlVFKQ0KY291bnQoYmQsRmFicmljYW50ZSxzb3J0ID1UUlVFKQ0KY291bnQoYmQsUHJvZHVjdG8sc29ydCA9VFJVRSkNCmNvdW50KGJkLE5vbWJyZURlcGFydGFtZW50byxzb3J0ID1UUlVFKQ0KY291bnQoYmQsTm9tYnJlRmFtaWxpYSxzb3J0ID1UUlVFKQ0KY291bnQoYmQsTm9tYnJlQ2F0ZWdvcmlhLHNvcnQgPVRSVUUpDQpjb3VudChiZCxFc3RhZG8sc29ydCA9VFJVRSkNCmNvdW50KGJkLFRpcG8udWJpY2FjacOzbixzb3J0ID1UUlVFKQ0KY291bnQoYmQsRGVzY0dpcm8sc29ydCA9VFJVRSkNCg0KdGliYmxlKGJkKQ0Kc3RyKGJkKQ0KaGVhZChiZCkNCnRhaWwgKGJkKQ0KI2luc3RhbGwucGFja2FnZXMoImphbml0b3IiKQ0KbGlicmFyeShqYW5pdG9yKQ0KdGFieWwoYmQsIHZjQ2xhdmVUaWVuZGEsIE5vbWJyZURlcGFydGFtZW50bykNCmBgYA0KDQpUw6ljbmljYXMgcGFyYSBsaW1waWV6YSBkZSBkYXRvcw0KDQojIyMgVMOpY25pY2EgMS4gUmVtb3ZlciB2YWxvcmVzIGlycmVsZXZhbnRlcw0KDQpFbGltaW5hciBjb2x1bW5hcyBQcmltZXIgc29sdWNpw7NuOkVsaW1pbmFyIFBMVSAoc29sdWNpw7NuIHJhZGljYWwpDQoNCmBgYHtyfQ0KYmQxIDwtIGJkDQpiZDEgPC0gc3Vic2V0KGJkMSxzZWxlY3QgPSAtYyhQTFUpKQ0Kc3VtbWFyeShiZDEpDQpgYGANCg0Kc3Vic2V0IGV4dHJhZXIgZGUgdW5hIGJhc2UgZGUgZGF0b3MgLWMgZXMgcGFyYSBib3JyYXIgbGFzIGNvbHVtbmFzIHNlbGVjY2lvbmFkYXMNCg0KRWxpbWluYXIgcmVuZ2xvbmVzIFNlZ3VuZGEgc29sdWNpw7NuOiBFbGltaW5hciByZW5nbG9uZXMgcXVlIHRlbmdhbiBQTFUgZW4gTkENCg0KYGBge3J9DQpiZDIgPC0gYmQxDQpiZDIgPC0gYmQyIFtiZDIkUHJlY2lvPjAsXQ0Kc3VtbWFyeShiZDEkUHJlY2lvKQ0KYGBgDQoNCiMjIyBUw6ljbmljYSAyLiBSZW1vdmVyIHZhbG9yZXMgZHVwbGljYWRvcw0KDQrCv0N1w6FudG9zIHJlbmdsb25lcy9yZWdpc3Ryb3MgZHVwbGljYWRvcyB0ZW5lbW9zPw0KDQpgYGB7cn0NCmJkMltkdXBsaWNhdGVkKGJkMiksXSANCnN1bShkdXBsaWNhdGVkKGJkMikpIA0KDQpgYGANCg0KRWxpbWluYXIgcmVnaXN0cm9zIGR1cGxpY2Fkb3MNCg0KYGBge3J9DQpiZDMgPC0gYmQyDQpsaWJyYXJ5KGRwbHlyKQ0KYmQzIDwtIGRpc3RpbmN0KGJkMykNCmBgYA0KDQpkcGx5cjogUmVhbGl6YXIgb3BlcmFjaW9uZXMgZGUgbWFuaXB1bGFjacOzbiBkZSBkYXRvcyBjb211bmVzIGNvbW86IGZpbHRyYXIgcG9yIGZpbGEsIHNlbGVjY2lvbmFyIGNvbHVtbmFzIGVzcGVjw61maWNhcywgcmVvcmRlbmFyIGZpbGFzLCBhw7FhZGlyIG51ZXZhcyBmaWxhcyB5IGFncmVnYXIgZGF0b3MNCg0KIyMjIFTDqWNuaWNhIDMuIEVycm9yZXMgdGlwb2dyw6FmaWNvcyB5IGVycm9lcyBzaW1pbGFyZXMNCg0KU29sdWNpw7NuIDE6IFByZWNpb3MgZW4gYWJzb2x1dG8gKERlYmlkbyBhIGxvcyBkYXRvcyBlbiBuZWdhdGl2bywgZW4gY2FzbyBkZSBxdWUgZnVlcmEgZXJyb3IgZGUgZGVkbykNCg0KYGBge3J9DQpiZDQgPC0gYmQxDQpiZDQkUHJlY2lvIDwtIGFicyhiZDQkUHJlY2lvKQ0Kc3VtbWFyeShiZDQkUHJlY2lvKQ0KDQpgYGANCg0KT2J0ZW5lbW9zIHJlc3VsdGFkb3MgcG9zaXRpdm9zDQoNCnByaW1lciByZW5nbMOzbjogY3XDoW50b3MgZHVwbGljYWRvcyBoYXkgc2VndW5kbyByZW5nbMOzbjogcXVlIG1lIGxvcyBzdW1lDQoNClNvbHVjacOzbiAyOiBDYW50aWRhZGVzIGVuIGVudGVyb3MNCg0KYGBge3J9DQpiZDUgPC0gYmQ0DQpiZDUkVW5pZGFkZXMgPC0gLWNlaWxpbmcoYmQ1JFVuaWRhZGVzKQ0Kc3VtbWFyeShiZDUkVW5pZGFkZXMpDQpgYGANCg0KIyMjIFTDqWNuaWNhIDQuIENvbnZlcnRpciB0aXBvcyBkZSBkYXRvcw0KDQpzaWdubyBkZSBcJCBwYXJhIGRlY2lyIGNvbHVtbmEgeSBzaWdubyBkZSAlIHBhcmEgZGVjaXIgZm9ybWF0byB5IG1pbnVzY3VsYSBkb3MgZGlnaXRvcyBlbiBhw7FvIFkgc29uIDQNCg0KU29sdWNpw7NuIDE6Q29udmVydGlyIGRlIGNhcmFjdGVyIGEgZmVjaGEgTk8gRlVOQ0lPTkEgYmQ2XDwtYmQ1IGJkNiRGZWNoYSA8LSBhcy5EYXRlKGJkNiRGZWNoYSwgIiVkLyVtLyVZIikgdGliYmxlKGJkNikNCg0KU29sdWNpw7NuIDI6ICNDb252ZXJ0aXIgZGUgY2FyYWN0ZXIgYSBlbnRlcm8NCg0KYGBge3J9DQpiZDcgPC0gYmQ1DQpiZDckSG9yYSA8LSBzdWJzdHIoYmQ3JEhvcmEsc3RhcnQ9MSwgc3RvcD0yKQ0KdGliYmxlKGJkNykNCmJkNyRIb3JhIDwtIGFzLmludGVnZXIoYmQ3JEhvcmEpDQpzdHIoYmQ3KSANCmBgYA0KDQojIyMgI1TDqWNuaWNhIDUuIFZhbG9yZXMgZmFsdGFudGVzDQoNCkN1w6FudG9zIE5TIHRlbmdvIGVuIGxhIGJhc2UgZGUgZGF0b3M/DQoNCmBgYHtyfQ0Kc3VtKGlzLm5hKGJkKSkNCnN1bShpcy5uYShiZDcpKQ0KYGBgDQoNCkN1w6FudG9zIE5BIHRlbmdvIHBvciB2YXJpYWJsZT8NCg0KYGBge3J9DQpzYXBwbHkoYmQ3LCBmdW5jdGlvbih4KSBzdW0oaXMubmEoeCkpKQ0Kc2FwcGx5KGJkLCBmdW5jdGlvbih4KSBzdW0oaXMubmEoeCkpKQ0KYGBgDQoNClNvbHVjacOzbiAxOiBCb3JyYXIgdG9kb3MgbG9zIHJlZ2lzdHJvcyBkZSBOQSBkZSB1bmEgdGFibGENCg0KYGBge3J9DQpiZDg8LSBiZDcNCmJkODwtIG5hLm9taXQoYmQ4KSANCnN1bW1hcnkoYmQ4KQ0KDQpiZDE1PC0gYmQNCmJkMTU8LSBuYS5vbWl0KGJkMTUpIA0Kc3VtbWFyeShiZDE1KQ0KYGBgDQoNCnVzYXIgbmEub21pdCBjb24gY3VpZGFkbyBubyBzaWVtcHJlIGVuIGxhIHNlZ3VuZGEgc2VjY2nDs24gaW50ZW50w6kgY29uIGxhIGJhc2UgZGUgZGF0b3Mgb3JpZ2luYWwgcG9yIHF1ZSBtZSBtYXJjYWJhIGNlcm8gTkEncyBlbiBsYSBiZDcsIHkgbm8gc2VydsOtYSBkZSBuYWRhLg0KDQpTb2x1Y2nDs24gMjogUmVlbXBsYXphciBOQSBjb24gMA0KDQpgYGB7cn0NCmJkOTwtIGJkDQpiZDlbaXMubmEoYmQ5KV0gPC0wDQpzdW1tYXJ5KGJkOSkNCnN1bShpcy5uYShiZDkpKQ0KDQpgYGANCg0KU29sdWNpw7NuIDM6IFJlZW1wbGF6YXIgTkEncyBjb24gcHJvbWVkaW8NCg0KYGBge3J9DQpiZDEwPC0gYmQNCmJkMTAkUExVW2lzLm5hKGJkMTAkUExVKV08LW1lYW4oYmQxMCRQTFUsIG5hLnJtPVRSVUUpDQpzdW1tYXJ5KGJkMTApDQpzdW0oaXMubmEoYmQxMCkpDQpgYGANCg0KU29sdWNpw7NuIDQ6IFJlZW1wbGF6YXIgbmVnYXRpdm9zIGNvbiBDRVJPUyBOTyBGVU5DSU9OQQ0KDQpiZDExXDwtYmQgYmQxMVtiZDExXDwwXSBcPC0wIHN1bW1hcnkoYmQxMSkgIkVSUk9SIEFzc2lnbmVkIGRhdGEgYDBgIG11c3QgYmUgY29tcGF0aWJsZSB3aXRoIGV4aXN0aW5nIGRhdGEuRXJyb3IgaW4gYFs8LWA6Ig0KDQojIyMgVMOpY25pY2EgNi4gTcOpdG9kbyBlc3RhZMOtc3RpY28NCg0KR3LDoWZpY2EgZGUgY2FqYSB5IGJpZ290ZXMNCg0KYGBge3J9DQpiZDEyPC1iZDcNCmJveHBsb3QoYmQxMiRQcmVjaW8saG9yaXpvbnRhbD1UUlVFKQ0KYm94cGxvdChiZDEyJFVuaWRhZGVzLGhvcml6b250YWw9VFJVRSkNCmBgYA0KDQpEYSBuZWdhdGl2YXMgbGFzIHVuaWRhZGVzKD8pDQoNCkFncmVnYXIgQ29sdW1uYXMNCg0KYGBge3J9DQojaW5zdGFsbC5wYWNrYWdlcygibHVicmlkYXRlIikNCmxpYnJhcnkobHVicmlkYXRlKQ0KYmQxMiREaWFfZGVfbGFfc2VtYW5hIDwtd2RheShiZDEyJEZlY2hhKQ0Kc3VtbWFyeShiZDEyKQ0KYmQxMiRzdWJ0b3RhbDwtIGJkMTIkUHJlY2lvKmJkMTIkVW5pZGFkZXMNCnN1bW1hcnkoYmQxMikNCmJkMTIkVXRpbGlkYWQgPC0gYmQxMiRQcmVjaW8gLSBiZDEyJFVsdC5Db3N0bw0Kc3VtbWFyeShiZDEyKQ0KYGBgDQoNCjEgZXMgZG9taW5nbywgMiBMdW5lcywgZXRjLiBBZ3JlZ2Ftb3MgY29sdW1uYXMgZGUgZMOtYSBkZSBsYSBzZW1hbmEsIHN1YnRvdGFsIHkgdXRpbGlkYWQuDQoNCiMjIyBFeHBvcnRhciBiYXNlIGRlIGRhdG9zDQoNCmBgYHtyfQ0KYmRfbGltcGlhPC0gYmQxMg0Kd3JpdGUuY3N2KGJkX2xpbXBpYSxmaWxlPSJBYmFycm90ZXNfYmRfbGltcGlhX1IuY3N2Iixyb3cubmFtZXM9RkFMU0UpIA0KDQpgYGANCg0KcXVlIG5vIHRvbWUgZW4gY3VlbnRhIGVsIHByaW1lciByZW5nbMOzbiBjb21vIHTDrXR1bG9zDQoNCiMjIyBNYXJrZXQgYmFza2V0IGFuYWxpc2lzDQoNCkluc3RhbGFyIHBhcXVldGVzDQoNCmBgYHtyfQ0KI2luc3RhbGwucGFja2FnZXMoInBseXIiKQ0KbGlicmFyeShwbHlyKQ0KI2luc3RhbGwucGFja2FnZXMoIk1hdHJpeCIpDQpsaWJyYXJ5KE1hdHJpeCkNCiNpbnN0YWxsLnBhY2thZ2VzKCJhcnVsZXMiKQ0KbGlicmFyeShhcnVsZXMpDQojaW5zdGFsbC5wYWNrYWdlcygiYXJ1bGVzVml6IikNCmxpYnJhcnkoYXJ1bGVzKQ0KI2luc3RhbGwucGFja2FnZXMoImRhdGFzZXRzIikNCmxpYnJhcnkoZGF0YXNldHMpDQpgYGANCg0KT3JkZW5hciBkZSBtZW5vciBhIG1heW9yIGxvcyBUaWNrZXRzDQoNCmBgYHtyfQ0KYmRfbGltcGlhIDwtIGJkX2xpbXBpYVtvcmRlcihiZF9saW1waWEkRi5UaWNrZXQpLF0NCmhlYWQoYmRfbGltcGlhKQ0KdGFpbChiZF9saW1waWEpDQpgYGANCg0KR2VuZXJhciBCYXNrZXQNCg0KYGBge3J9DQpiYXNrZXQ8LSBkZHBseShiZF9saW1waWEsYygiRi5UaWNrZXQiKSxmdW5jdGlvbihiZF9saW1waWEpcGFzdGUoYmRfbGltcGlhJE1hcmNhLGNvbGxhcHNlPSIsIikpDQpgYGANCg0KRWxpbWluYXIgbsO6bWVybyBkZSB0aWNrZXQNCg0KYGBge3J9DQpiYXNrZXQkRi5UaWNrZXQ8LU5VTEwNCmBgYA0KDQpSZW5vbWJyYXIgZWwgbm9tYnJlIGRlIGxhIGNvbHVtbmENCg0KYGBge3J9DQpjb2xuYW1lcyhiYXNrZXQpIDwtIGMgKCJNYXJjYSIpDQpgYGANCg0KRXhwb3J0YXIgQmFza2V0DQoNCmBgYHtyfQ0Kd3JpdGUuY3N2KGJhc2tldCwiYmFza2V0LmNzdiIsIHF1b3RlPUZBTFNFLHJvdy5uYW1lcz1GQUxTRSkNCmBgYA0KDQpJbXBvcnRhciBUcmFuc2FjY2lvbmVzDQoNCmBgYHtyfQ0KI2ZpbGUuY2hvb3NlKCkNCnRyPC1yZWFkLnRyYW5zYWN0aW9ucygiRDpcXExlc2x5IEfDs21lelxcRG9jdW1lbnRvc1xcYmFza2V0LmNzdiIsIGZvcm1hdD0gImJhc2tldCIsIHNlcCA9IiwiKQ0KYGBgDQoNCkdlbmVyYXIgcmVnbGFzIGRlIGFzb2NpYWNpw7NuDQoNCmBgYHtyfQ0KcmVnbGFzLmFzb2NpYWNpb248LWFwcmlvcmkodHIscGFyYW1ldGVyID0gbGlzdChzdXBwPTAuMDAxLGNvbmY9MC4yLG1heGxlbj0xMCkpDQpzdW1tYXJ5KHJlZ2xhcy5hc29jaWFjaW9uKQ0KaW5zcGVjdChyZWdsYXMuYXNvY2lhY2lvbikNCmBgYA0KDQpPcmRlbmFyIHJlZ2xhcyBkZSBhc29jaWFjacOzbg0KDQpgYGB7cn0NCnJlZ2xhcy5hc29jaWFjaW9uPC1zb3J0KHJlZ2xhcy5hc29jaWFjaW9uLGJ5PSJjb25maWRlbmNlIixkZWNyZWFzaW5nPVRSVUUpDQpzdW1tYXJ5KHJlZ2xhcy5hc29jaWFjaW9uKQ0KaW5zcGVjdChyZWdsYXMuYXNvY2lhY2lvbikNCmBgYA0KDQpWaXN1YWxpemFyIHJlZ2xhcyBkZSBhc29jaWFjacOzbg0KDQpgYGB7cn0NCiNpbnN0YWxsLnBhY2thZ2VzKCJhcnVsZXNWaXoiKQ0KbGlicmFyeShhcnVsZXNWaXopDQp0b3AxMHJlZ2xhcyA8LSBoZWFkKHJlZ2xhcy5hc29jaWFjaW9uLG49MTAsYnllPSJjb25maWRlbmNlIikNCnBsb3QodG9wMTByZWdsYXMsbWV0aG9kPSJncmFwaCIsZW5naW5lPSJodG1sd2lkZ2V0IikNCg0KYGBgDQo=