This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
Instalación de librerías
#install.packages("tidyverse")
#install.packages("dplyr")
#install.packages("ggplot2")
#install.packages("readxl")
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(readxl)
Importar y observar Base de datos
library(readxl)
Abarrotes_Ventas_2 <- read_excel("D:/Lesly Gómez/Descargas/Abarrotes_Ventas-2.xlsx")
View(Abarrotes_Ventas_2)
bd<-Abarrotes_Ventas_2
str(bd)
## tibble [200,620 × 22] (S3: tbl_df/tbl/data.frame)
## $ vcClaveTienda : chr [1:200620] "MX001" "MX001" "MX001" "MX001" ...
## $ DescGiro : chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ Codigo Barras : num [1:200620] 7.5e+12 7.5e+12 7.5e+12 7.5e+12 7.5e+12 ...
## $ PLU : logi [1:200620] NA NA NA NA NA NA ...
## $ Fecha : POSIXct[1:200620], format: "2020-06-19" "2020-06-19" ...
## $ Hora : POSIXct[1:200620], format: "1899-12-31 08:16:21" "1899-12-31 08:23:33" ...
## $ Marca : chr [1:200620] "NUTRI LECHE" "DAN UP" "BIMBO" "PEPSI" ...
## $ Fabricante : chr [1:200620] "MEXILAC" "DANONE DE MEXICO" "GRUPO BIMBO" "PEPSI-COLA MEXICANA" ...
## $ Producto : chr [1:200620] "Nutri Leche 1 Litro" "DANUP STRAWBERRY P/BEBER 350GR NAL" "Rebanadas Bimbo 2Pz" "Pepsi N.R. 400Ml" ...
## $ Precio : num [1:200620] 16 14 5 8 19.5 9.5 11 9.5 23.5 12 ...
## $ Ult.Costo : num [1:200620] 12.3 14 5 8 15 ...
## $ Unidades : num [1:200620] 1 1 1 1 1 1 1 1 1 1 ...
## $ F.Ticket : num [1:200620] 1 2 3 3 4 4 4 4 4 5 ...
## $ NombreDepartamento: chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ NombreFamilia : chr [1:200620] "Lacteos y Refrigerados" "Lacteos y Refrigerados" "Pan y Tortilla" "Bebidas" ...
## $ NombreCategoria : chr [1:200620] "Leche" "Yogurt" "Pan Dulce Empaquetado" "Refrescos Plástico (N.R.)" ...
## $ Estado : chr [1:200620] "Nuevo León" "Nuevo León" "Nuevo León" "Nuevo León" ...
## $ Mts 2 : num [1:200620] 60 60 60 60 60 60 60 60 60 60 ...
## $ Tipo.ubicación : chr [1:200620] "Esquina" "Esquina" "Esquina" "Esquina" ...
## $ Giro : chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ Hora inicio : POSIXct[1:200620], format: "1899-12-31 08:00:00" "1899-12-31 08:00:00" ...
## $ Hora cierre : POSIXct[1:200620], format: "1899-12-31 22:00:00" "1899-12-31 22:00:00" ...
summary(bd)
## vcClaveTienda DescGiro Codigo Barras PLU
## Length:200620 Length:200620 Min. :8.347e+05 Mode:logical
## Class :character Class :character 1st Qu.:7.501e+12 TRUE:1437
## Mode :character Mode :character Median :7.501e+12 NA's:199183
## Mean :5.950e+12
## 3rd Qu.:7.501e+12
## Max. :1.750e+13
## Fecha Hora
## Min. :2020-05-01 00:00:00.00 Min. :1899-12-31 00:00:00.00
## 1st Qu.:2020-06-06 00:00:00.00 1st Qu.:1899-12-31 13:12:42.75
## Median :2020-07-11 00:00:00.00 Median :1899-12-31 17:35:59.00
## Mean :2020-07-18 22:35:49.58 Mean :1899-12-31 16:43:52.05
## 3rd Qu.:2020-08-29 00:00:00.00 3rd Qu.:1899-12-31 20:47:06.00
## Max. :2020-11-11 00:00:00.00 Max. :1899-12-31 23:59:59.00
## Marca Fabricante Producto Precio
## Length:200620 Length:200620 Length:200620 Min. :-147.00
## Class :character Class :character Class :character 1st Qu.: 11.00
## Mode :character Mode :character Mode :character Median : 16.00
## Mean : 19.42
## 3rd Qu.: 25.00
## Max. :1000.00
## Ult.Costo Unidades F.Ticket NombreDepartamento
## Min. : 0.38 Min. : 0.200 Min. : 1 Length:200620
## 1st Qu.: 8.46 1st Qu.: 1.000 1st Qu.: 33967 Class :character
## Median : 12.31 Median : 1.000 Median :105996 Mode :character
## Mean : 15.31 Mean : 1.262 Mean :193994
## 3rd Qu.: 19.23 3rd Qu.: 1.000 3rd Qu.:383009
## Max. :769.23 Max. :96.000 Max. :450040
## NombreFamilia NombreCategoria Estado Mts 2
## Length:200620 Length:200620 Length:200620 Min. :47.0
## Class :character Class :character Class :character 1st Qu.:53.0
## Mode :character Mode :character Mode :character Median :60.0
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Tipo.ubicación Giro Hora inicio
## Length:200620 Length:200620 Min. :1899-12-31 07:00:00.00
## Class :character Class :character 1st Qu.:1899-12-31 07:00:00.00
## Mode :character Mode :character Median :1899-12-31 08:00:00.00
## Mean :1899-12-31 07:35:49.71
## 3rd Qu.:1899-12-31 08:00:00.00
## Max. :1899-12-31 09:00:00.00
## Hora cierre
## Min. :1899-12-31 21:00:00.00
## 1st Qu.:1899-12-31 22:00:00.00
## Median :1899-12-31 22:00:00.00
## Mean :1899-12-31 22:23:11.42
## 3rd Qu.:1899-12-31 23:00:00.00
## Max. :1899-12-31 23:00:00.00
Observaciones La variable PLU tiene 199183 NA´s La variable Fecha está como fecha La variable hora está como hora La variable Precio tiene negativos La variable unidades tiene decimales
count(bd)
## # A tibble: 1 × 1
## n
## <int>
## 1 200620
count(bd,vcClaveTienda,sort = TRUE)
## # A tibble: 5 × 2
## vcClaveTienda n
## <chr> <int>
## 1 MX001 96464
## 2 MX004 83455
## 3 MX005 10021
## 4 MX002 6629
## 5 MX003 4051
count(bd,DescGiro,sort = TRUE)
## # A tibble: 3 × 2
## DescGiro n
## <chr> <int>
## 1 Abarrotes 100515
## 2 Carnicería 83455
## 3 Depósito 16650
count(bd,Marca,sort = TRUE)
## # A tibble: 540 × 2
## Marca n
## <chr> <int>
## 1 COCA COLA 18686
## 2 PEPSI 15966
## 3 TECATE 11674
## 4 BIMBO 8316
## 5 LALA 5866
## 6 MARINELA 3696
## 7 DORITOS 3142
## 8 CHEETOS 3130
## 9 NUTRI LECHE 3127
## 10 MARLBORO 2579
## # ℹ 530 more rows
count(bd,Fabricante,sort =TRUE)
## # A tibble: 241 × 2
## Fabricante n
## <chr> <int>
## 1 COCA COLA 27519
## 2 PEPSI-COLA MEXICANA 22415
## 3 SABRITAS 14296
## 4 CERVECERIA CUAUHTEMOC MOCTEZUMA 13681
## 5 GRUPO BIMBO 13077
## 6 SIGMA ALIMENTOS 8014
## 7 GRUPO INDUSTRIAL LALA 5868
## 8 GRUPO GAMESA 5527
## 9 NESTLE 3698
## 10 JUGOS DEL VALLE S.A. DE C.V. 3581
## # ℹ 231 more rows
count(bd,Producto,sort =TRUE)
## # A tibble: 3,404 × 2
## Producto n
## <chr> <int>
## 1 Pepsi N.R. 1.5L 5108
## 2 Coca Cola Retornable 2.5L 3771
## 3 Caguamon Tecate Light 1.2Lt 3471
## 4 Pepsi N. R. 2.5L 2899
## 5 Cerveza Tecate Light 340Ml 2619
## 6 Cerveza Tecate Light 16Oz 2315
## 7 Coca Cola Retornable 1.5L 2124
## 8 Pepsi N.R. 3L 1832
## 9 Coca Cola Retornable 500Ml 1659
## 10 PEPSI N.R. 1.5L 1631
## # ℹ 3,394 more rows
count(bd,NombreDepartamento,sort =TRUE)
## # A tibble: 9 × 2
## NombreDepartamento n
## <chr> <int>
## 1 Abarrotes 198274
## 2 Bebes e Infantiles 1483
## 3 Ferretería 377
## 4 Farmacia 255
## 5 Vinos y Licores 104
## 6 Papelería 74
## 7 Mercería 44
## 8 Productos a Eliminar 8
## 9 Carnes 1
count(bd,NombreFamilia,sort =TRUE)
## # A tibble: 51 × 2
## NombreFamilia n
## <chr> <int>
## 1 Bebidas 64917
## 2 Botanas 21583
## 3 Lacteos y Refrigerados 17657
## 4 Cerveza 14017
## 5 Pan y Tortilla 10501
## 6 Limpieza del Hogar 8723
## 7 Galletas 7487
## 8 Cigarros 6817
## 9 Cuidado Personal 5433
## 10 Salsas y Sazonadores 5320
## # ℹ 41 more rows
count(bd,NombreCategoria,sort =TRUE)
## # A tibble: 174 × 2
## NombreCategoria n
## <chr> <int>
## 1 Refrescos Plástico (N.R.) 32861
## 2 Refrescos Retornables 13880
## 3 Frituras 11082
## 4 Lata 8150
## 5 Leche 7053
## 6 Cajetilla 6329
## 7 Botella 5867
## 8 Productos sin Categoria 5455
## 9 Papas Fritas 5344
## 10 Jugos y Néctares 5295
## # ℹ 164 more rows
count(bd,Estado,sort =TRUE)
## # A tibble: 5 × 2
## Estado n
## <chr> <int>
## 1 Nuevo León 96464
## 2 Sinaloa 83455
## 3 Quintana Roo 10021
## 4 Jalisco 6629
## 5 Chiapas 4051
count(bd,Tipo.ubicación,sort =TRUE)
## # A tibble: 3 × 2
## Tipo.ubicación n
## <chr> <int>
## 1 Esquina 189940
## 2 Rotonda 6629
## 3 Entre calles 4051
count(bd,DescGiro,sort =TRUE)
## # A tibble: 3 × 2
## DescGiro n
## <chr> <int>
## 1 Abarrotes 100515
## 2 Carnicería 83455
## 3 Depósito 16650
tibble(bd)
## # A tibble: 200,620 × 22
## vcClaveTienda DescGiro `Codigo Barras` PLU Fecha
## <chr> <chr> <dbl> <lgl> <dttm>
## 1 MX001 Abarrotes 7501020540666 NA 2020-06-19 00:00:00
## 2 MX001 Abarrotes 7501032397906 NA 2020-06-19 00:00:00
## 3 MX001 Abarrotes 7501000112845 NA 2020-06-19 00:00:00
## 4 MX001 Abarrotes 7501031302741 NA 2020-06-19 00:00:00
## 5 MX001 Abarrotes 7501026027543 NA 2020-06-19 00:00:00
## 6 MX001 Abarrotes 7501025433024 NA 2020-06-19 00:00:00
## 7 MX001 Abarrotes 7501032332013 NA 2020-06-19 00:00:00
## 8 MX001 Abarrotes 7501026005688 NA 2020-06-19 00:00:00
## 9 MX001 Abarrotes 7506195178188 NA 2020-06-19 00:00:00
## 10 MX001 Abarrotes 32239052017 NA 2020-06-19 00:00:00
## # ℹ 200,610 more rows
## # ℹ 17 more variables: Hora <dttm>, Marca <chr>, Fabricante <chr>,
## # Producto <chr>, Precio <dbl>, Ult.Costo <dbl>, Unidades <dbl>,
## # F.Ticket <dbl>, NombreDepartamento <chr>, NombreFamilia <chr>,
## # NombreCategoria <chr>, Estado <chr>, `Mts 2` <dbl>, Tipo.ubicación <chr>,
## # Giro <chr>, `Hora inicio` <dttm>, `Hora cierre` <dttm>
str(bd)
## tibble [200,620 × 22] (S3: tbl_df/tbl/data.frame)
## $ vcClaveTienda : chr [1:200620] "MX001" "MX001" "MX001" "MX001" ...
## $ DescGiro : chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ Codigo Barras : num [1:200620] 7.5e+12 7.5e+12 7.5e+12 7.5e+12 7.5e+12 ...
## $ PLU : logi [1:200620] NA NA NA NA NA NA ...
## $ Fecha : POSIXct[1:200620], format: "2020-06-19" "2020-06-19" ...
## $ Hora : POSIXct[1:200620], format: "1899-12-31 08:16:21" "1899-12-31 08:23:33" ...
## $ Marca : chr [1:200620] "NUTRI LECHE" "DAN UP" "BIMBO" "PEPSI" ...
## $ Fabricante : chr [1:200620] "MEXILAC" "DANONE DE MEXICO" "GRUPO BIMBO" "PEPSI-COLA MEXICANA" ...
## $ Producto : chr [1:200620] "Nutri Leche 1 Litro" "DANUP STRAWBERRY P/BEBER 350GR NAL" "Rebanadas Bimbo 2Pz" "Pepsi N.R. 400Ml" ...
## $ Precio : num [1:200620] 16 14 5 8 19.5 9.5 11 9.5 23.5 12 ...
## $ Ult.Costo : num [1:200620] 12.3 14 5 8 15 ...
## $ Unidades : num [1:200620] 1 1 1 1 1 1 1 1 1 1 ...
## $ F.Ticket : num [1:200620] 1 2 3 3 4 4 4 4 4 5 ...
## $ NombreDepartamento: chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ NombreFamilia : chr [1:200620] "Lacteos y Refrigerados" "Lacteos y Refrigerados" "Pan y Tortilla" "Bebidas" ...
## $ NombreCategoria : chr [1:200620] "Leche" "Yogurt" "Pan Dulce Empaquetado" "Refrescos Plástico (N.R.)" ...
## $ Estado : chr [1:200620] "Nuevo León" "Nuevo León" "Nuevo León" "Nuevo León" ...
## $ Mts 2 : num [1:200620] 60 60 60 60 60 60 60 60 60 60 ...
## $ Tipo.ubicación : chr [1:200620] "Esquina" "Esquina" "Esquina" "Esquina" ...
## $ Giro : chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ Hora inicio : POSIXct[1:200620], format: "1899-12-31 08:00:00" "1899-12-31 08:00:00" ...
## $ Hora cierre : POSIXct[1:200620], format: "1899-12-31 22:00:00" "1899-12-31 22:00:00" ...
head(bd)
## # A tibble: 6 × 22
## vcClaveTienda DescGiro `Codigo Barras` PLU Fecha
## <chr> <chr> <dbl> <lgl> <dttm>
## 1 MX001 Abarrotes 7501020540666 NA 2020-06-19 00:00:00
## 2 MX001 Abarrotes 7501032397906 NA 2020-06-19 00:00:00
## 3 MX001 Abarrotes 7501000112845 NA 2020-06-19 00:00:00
## 4 MX001 Abarrotes 7501031302741 NA 2020-06-19 00:00:00
## 5 MX001 Abarrotes 7501026027543 NA 2020-06-19 00:00:00
## 6 MX001 Abarrotes 7501025433024 NA 2020-06-19 00:00:00
## # ℹ 17 more variables: Hora <dttm>, Marca <chr>, Fabricante <chr>,
## # Producto <chr>, Precio <dbl>, Ult.Costo <dbl>, Unidades <dbl>,
## # F.Ticket <dbl>, NombreDepartamento <chr>, NombreFamilia <chr>,
## # NombreCategoria <chr>, Estado <chr>, `Mts 2` <dbl>, Tipo.ubicación <chr>,
## # Giro <chr>, `Hora inicio` <dttm>, `Hora cierre` <dttm>
tail (bd)
## # A tibble: 6 × 22
## vcClaveTienda DescGiro `Codigo Barras` PLU Fecha
## <chr> <chr> <dbl> <lgl> <dttm>
## 1 MX005 Depósito 7622210464811 NA 2020-07-12 00:00:00
## 2 MX005 Depósito 7622210464811 NA 2020-10-23 00:00:00
## 3 MX005 Depósito 7622210464811 NA 2020-10-10 00:00:00
## 4 MX005 Depósito 7622210464811 NA 2020-10-10 00:00:00
## 5 MX005 Depósito 7622210464811 NA 2020-06-27 00:00:00
## 6 MX005 Depósito 7622210464811 NA 2020-06-26 00:00:00
## # ℹ 17 more variables: Hora <dttm>, Marca <chr>, Fabricante <chr>,
## # Producto <chr>, Precio <dbl>, Ult.Costo <dbl>, Unidades <dbl>,
## # F.Ticket <dbl>, NombreDepartamento <chr>, NombreFamilia <chr>,
## # NombreCategoria <chr>, Estado <chr>, `Mts 2` <dbl>, Tipo.ubicación <chr>,
## # Giro <chr>, `Hora inicio` <dttm>, `Hora cierre` <dttm>
#install.packages("janitor")
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
tabyl(bd, vcClaveTienda, NombreDepartamento)
## vcClaveTienda Abarrotes Bebes e Infantiles Carnes Farmacia Ferretería Mercería
## MX001 95410 515 1 147 245 28
## MX002 6590 21 0 4 10 0
## MX003 4026 15 0 2 8 0
## MX004 82234 932 0 102 114 16
## MX005 10014 0 0 0 0 0
## Papelería Productos a Eliminar Vinos y Licores
## 35 3 80
## 0 0 4
## 0 0 0
## 32 5 20
## 7 0 0
Técnicas para limpieza de datos
Eliminar columnas Primer solución:Eliminar PLU (solución radical)
bd1 <- bd
bd1 <- subset(bd1,select = -c(PLU))
summary(bd1)
## vcClaveTienda DescGiro Codigo Barras
## Length:200620 Length:200620 Min. :8.347e+05
## Class :character Class :character 1st Qu.:7.501e+12
## Mode :character Mode :character Median :7.501e+12
## Mean :5.950e+12
## 3rd Qu.:7.501e+12
## Max. :1.750e+13
## Fecha Hora
## Min. :2020-05-01 00:00:00.00 Min. :1899-12-31 00:00:00.00
## 1st Qu.:2020-06-06 00:00:00.00 1st Qu.:1899-12-31 13:12:42.75
## Median :2020-07-11 00:00:00.00 Median :1899-12-31 17:35:59.00
## Mean :2020-07-18 22:35:49.58 Mean :1899-12-31 16:43:52.05
## 3rd Qu.:2020-08-29 00:00:00.00 3rd Qu.:1899-12-31 20:47:06.00
## Max. :2020-11-11 00:00:00.00 Max. :1899-12-31 23:59:59.00
## Marca Fabricante Producto Precio
## Length:200620 Length:200620 Length:200620 Min. :-147.00
## Class :character Class :character Class :character 1st Qu.: 11.00
## Mode :character Mode :character Mode :character Median : 16.00
## Mean : 19.42
## 3rd Qu.: 25.00
## Max. :1000.00
## Ult.Costo Unidades F.Ticket NombreDepartamento
## Min. : 0.38 Min. : 0.200 Min. : 1 Length:200620
## 1st Qu.: 8.46 1st Qu.: 1.000 1st Qu.: 33967 Class :character
## Median : 12.31 Median : 1.000 Median :105996 Mode :character
## Mean : 15.31 Mean : 1.262 Mean :193994
## 3rd Qu.: 19.23 3rd Qu.: 1.000 3rd Qu.:383009
## Max. :769.23 Max. :96.000 Max. :450040
## NombreFamilia NombreCategoria Estado Mts 2
## Length:200620 Length:200620 Length:200620 Min. :47.0
## Class :character Class :character Class :character 1st Qu.:53.0
## Mode :character Mode :character Mode :character Median :60.0
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Tipo.ubicación Giro Hora inicio
## Length:200620 Length:200620 Min. :1899-12-31 07:00:00.00
## Class :character Class :character 1st Qu.:1899-12-31 07:00:00.00
## Mode :character Mode :character Median :1899-12-31 08:00:00.00
## Mean :1899-12-31 07:35:49.71
## 3rd Qu.:1899-12-31 08:00:00.00
## Max. :1899-12-31 09:00:00.00
## Hora cierre
## Min. :1899-12-31 21:00:00.00
## 1st Qu.:1899-12-31 22:00:00.00
## Median :1899-12-31 22:00:00.00
## Mean :1899-12-31 22:23:11.42
## 3rd Qu.:1899-12-31 23:00:00.00
## Max. :1899-12-31 23:00:00.00
subset extraer de una base de datos -c es para borrar las columnas seleccionadas
Eliminar renglones Segunda solución: Eliminar renglones que tengan PLU en NA
bd2 <- bd1
bd2 <- bd2 [bd2$Precio>0,]
summary(bd1$Precio)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -147.00 11.00 16.00 19.42 25.00 1000.00
¿Cuántos renglones/registros duplicados tenemos?
bd2[duplicated(bd2),]
## # A tibble: 0 × 21
## # ℹ 21 variables: vcClaveTienda <chr>, DescGiro <chr>, Codigo Barras <dbl>,
## # Fecha <dttm>, Hora <dttm>, Marca <chr>, Fabricante <chr>, Producto <chr>,
## # Precio <dbl>, Ult.Costo <dbl>, Unidades <dbl>, F.Ticket <dbl>,
## # NombreDepartamento <chr>, NombreFamilia <chr>, NombreCategoria <chr>,
## # Estado <chr>, Mts 2 <dbl>, Tipo.ubicación <chr>, Giro <chr>,
## # Hora inicio <dttm>, Hora cierre <dttm>
sum(duplicated(bd2))
## [1] 0
Eliminar registros duplicados
bd3 <- bd2
library(dplyr)
bd3 <- distinct(bd3)
dplyr: Realizar operaciones de manipulación de datos comunes como: filtrar por fila, seleccionar columnas específicas, reordenar filas, añadir nuevas filas y agregar datos
Solución 1: Precios en absoluto (Debido a los datos en negativo, en caso de que fuera error de dedo)
bd4 <- bd1
bd4$Precio <- abs(bd4$Precio)
summary(bd4$Precio)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.50 11.00 16.00 19.45 25.00 1000.00
Obtenemos resultados positivos
primer renglón: cuántos duplicados hay segundo renglón: que me los sume
Solución 2: Cantidades en enteros
bd5 <- bd4
bd5$Unidades <- -ceiling(bd5$Unidades)
summary(bd5$Unidades)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -96.000 -1.000 -1.000 -1.262 -1.000 -1.000
signo de $ para decir columna y signo de % para decir formato y minuscula dos digitos en año Y son 4
Solución 1:Convertir de caracter a fecha NO FUNCIONA bd6<-bd5 bd6\(Fecha <- as.Date(bd6\)Fecha, “%d/%m/%Y”) tibble(bd6)
Solución 2: #Convertir de caracter a entero
bd7 <- bd5
bd7$Hora <- substr(bd7$Hora,start=1, stop=2)
tibble(bd7)
## # A tibble: 200,620 × 21
## vcClaveTienda DescGiro `Codigo Barras` Fecha Hora Marca
## <chr> <chr> <dbl> <dttm> <chr> <chr>
## 1 MX001 Abarrotes 7501020540666 2020-06-19 00:00:00 18 NUTRI LECHE
## 2 MX001 Abarrotes 7501032397906 2020-06-19 00:00:00 18 DAN UP
## 3 MX001 Abarrotes 7501000112845 2020-06-19 00:00:00 18 BIMBO
## 4 MX001 Abarrotes 7501031302741 2020-06-19 00:00:00 18 PEPSI
## 5 MX001 Abarrotes 7501026027543 2020-06-19 00:00:00 18 BLANCA NIE…
## 6 MX001 Abarrotes 7501025433024 2020-06-19 00:00:00 18 FLASH
## 7 MX001 Abarrotes 7501032332013 2020-06-19 00:00:00 18 VARIOS DAN…
## 8 MX001 Abarrotes 7501026005688 2020-06-19 00:00:00 18 ZOTE
## 9 MX001 Abarrotes 7506195178188 2020-06-19 00:00:00 18 ALWAYS
## 10 MX001 Abarrotes 32239052017 2020-06-19 00:00:00 18 JUMEX
## # ℹ 200,610 more rows
## # ℹ 15 more variables: Fabricante <chr>, Producto <chr>, Precio <dbl>,
## # Ult.Costo <dbl>, Unidades <dbl>, F.Ticket <dbl>, NombreDepartamento <chr>,
## # NombreFamilia <chr>, NombreCategoria <chr>, Estado <chr>, `Mts 2` <dbl>,
## # Tipo.ubicación <chr>, Giro <chr>, `Hora inicio` <dttm>,
## # `Hora cierre` <dttm>
bd7$Hora <- as.integer(bd7$Hora)
str(bd7)
## tibble [200,620 × 21] (S3: tbl_df/tbl/data.frame)
## $ vcClaveTienda : chr [1:200620] "MX001" "MX001" "MX001" "MX001" ...
## $ DescGiro : chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ Codigo Barras : num [1:200620] 7.5e+12 7.5e+12 7.5e+12 7.5e+12 7.5e+12 ...
## $ Fecha : POSIXct[1:200620], format: "2020-06-19" "2020-06-19" ...
## $ Hora : int [1:200620] 18 18 18 18 18 18 18 18 18 18 ...
## $ Marca : chr [1:200620] "NUTRI LECHE" "DAN UP" "BIMBO" "PEPSI" ...
## $ Fabricante : chr [1:200620] "MEXILAC" "DANONE DE MEXICO" "GRUPO BIMBO" "PEPSI-COLA MEXICANA" ...
## $ Producto : chr [1:200620] "Nutri Leche 1 Litro" "DANUP STRAWBERRY P/BEBER 350GR NAL" "Rebanadas Bimbo 2Pz" "Pepsi N.R. 400Ml" ...
## $ Precio : num [1:200620] 16 14 5 8 19.5 9.5 11 9.5 23.5 12 ...
## $ Ult.Costo : num [1:200620] 12.3 14 5 8 15 ...
## $ Unidades : num [1:200620] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ F.Ticket : num [1:200620] 1 2 3 3 4 4 4 4 4 5 ...
## $ NombreDepartamento: chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ NombreFamilia : chr [1:200620] "Lacteos y Refrigerados" "Lacteos y Refrigerados" "Pan y Tortilla" "Bebidas" ...
## $ NombreCategoria : chr [1:200620] "Leche" "Yogurt" "Pan Dulce Empaquetado" "Refrescos Plástico (N.R.)" ...
## $ Estado : chr [1:200620] "Nuevo León" "Nuevo León" "Nuevo León" "Nuevo León" ...
## $ Mts 2 : num [1:200620] 60 60 60 60 60 60 60 60 60 60 ...
## $ Tipo.ubicación : chr [1:200620] "Esquina" "Esquina" "Esquina" "Esquina" ...
## $ Giro : chr [1:200620] "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ Hora inicio : POSIXct[1:200620], format: "1899-12-31 08:00:00" "1899-12-31 08:00:00" ...
## $ Hora cierre : POSIXct[1:200620], format: "1899-12-31 22:00:00" "1899-12-31 22:00:00" ...
Cuántos NS tengo en la base de datos?
sum(is.na(bd))
## [1] 199183
sum(is.na(bd7))
## [1] 0
Cuántos NA tengo por variable?
sapply(bd7, function(x) sum(is.na(x)))
## vcClaveTienda DescGiro Codigo Barras Fecha
## 0 0 0 0
## Hora Marca Fabricante Producto
## 0 0 0 0
## Precio Ult.Costo Unidades F.Ticket
## 0 0 0 0
## NombreDepartamento NombreFamilia NombreCategoria Estado
## 0 0 0 0
## Mts 2 Tipo.ubicación Giro Hora inicio
## 0 0 0 0
## Hora cierre
## 0
sapply(bd, function(x) sum(is.na(x)))
## vcClaveTienda DescGiro Codigo Barras PLU
## 0 0 0 199183
## Fecha Hora Marca Fabricante
## 0 0 0 0
## Producto Precio Ult.Costo Unidades
## 0 0 0 0
## F.Ticket NombreDepartamento NombreFamilia NombreCategoria
## 0 0 0 0
## Estado Mts 2 Tipo.ubicación Giro
## 0 0 0 0
## Hora inicio Hora cierre
## 0 0
Solución 1: Borrar todos los registros de NA de una tabla
bd8<- bd7
bd8<- na.omit(bd8)
summary(bd8)
## vcClaveTienda DescGiro Codigo Barras
## Length:200620 Length:200620 Min. :8.347e+05
## Class :character Class :character 1st Qu.:7.501e+12
## Mode :character Mode :character Median :7.501e+12
## Mean :5.950e+12
## 3rd Qu.:7.501e+12
## Max. :1.750e+13
## Fecha Hora Marca
## Min. :2020-05-01 00:00:00.00 Min. :18 Length:200620
## 1st Qu.:2020-06-06 00:00:00.00 1st Qu.:18 Class :character
## Median :2020-07-11 00:00:00.00 Median :18 Mode :character
## Mean :2020-07-18 22:35:49.58 Mean :18
## 3rd Qu.:2020-08-29 00:00:00.00 3rd Qu.:18
## Max. :2020-11-11 00:00:00.00 Max. :18
## Fabricante Producto Precio Ult.Costo
## Length:200620 Length:200620 Min. : 0.50 Min. : 0.38
## Class :character Class :character 1st Qu.: 11.00 1st Qu.: 8.46
## Mode :character Mode :character Median : 16.00 Median : 12.31
## Mean : 19.45 Mean : 15.31
## 3rd Qu.: 25.00 3rd Qu.: 19.23
## Max. :1000.00 Max. :769.23
## Unidades F.Ticket NombreDepartamento NombreFamilia
## Min. :-96.000 Min. : 1 Length:200620 Length:200620
## 1st Qu.: -1.000 1st Qu.: 33967 Class :character Class :character
## Median : -1.000 Median :105996 Mode :character Mode :character
## Mean : -1.262 Mean :193994
## 3rd Qu.: -1.000 3rd Qu.:383009
## Max. : -1.000 Max. :450040
## NombreCategoria Estado Mts 2 Tipo.ubicación
## Length:200620 Length:200620 Min. :47.0 Length:200620
## Class :character Class :character 1st Qu.:53.0 Class :character
## Mode :character Mode :character Median :60.0 Mode :character
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Giro Hora inicio
## Length:200620 Min. :1899-12-31 07:00:00.00
## Class :character 1st Qu.:1899-12-31 07:00:00.00
## Mode :character Median :1899-12-31 08:00:00.00
## Mean :1899-12-31 07:35:49.71
## 3rd Qu.:1899-12-31 08:00:00.00
## Max. :1899-12-31 09:00:00.00
## Hora cierre
## Min. :1899-12-31 21:00:00.00
## 1st Qu.:1899-12-31 22:00:00.00
## Median :1899-12-31 22:00:00.00
## Mean :1899-12-31 22:23:11.42
## 3rd Qu.:1899-12-31 23:00:00.00
## Max. :1899-12-31 23:00:00.00
bd15<- bd
bd15<- na.omit(bd15)
summary(bd15)
## vcClaveTienda DescGiro Codigo Barras PLU
## Length:1437 Length:1437 Min. :6.750e+08 Mode:logical
## Class :character Class :character 1st Qu.:6.750e+08 TRUE:1437
## Mode :character Mode :character Median :6.750e+08
## Mean :2.616e+11
## 3rd Qu.:6.750e+08
## Max. :7.501e+12
## Fecha Hora
## Min. :2020-06-06 00:00:00.00 Min. :1899-12-31 00:01:22.00
## 1st Qu.:2020-06-20 00:00:00.00 1st Qu.:1899-12-31 15:57:22.00
## Median :2020-07-10 00:00:00.00 Median :1899-12-31 18:49:20.00
## Mean :2020-07-15 18:04:15.52 Mean :1899-12-31 17:46:04.46
## 3rd Qu.:2020-08-08 00:00:00.00 3rd Qu.:1899-12-31 21:09:03.00
## Max. :2020-11-11 00:00:00.00 Max. :1899-12-31 23:58:14.00
## Marca Fabricante Producto Precio
## Length:1437 Length:1437 Length:1437 Min. :30.00
## Class :character Class :character Class :character 1st Qu.:90.00
## Mode :character Mode :character Mode :character Median :90.00
## Mean :87.94
## 3rd Qu.:90.00
## Max. :90.00
## Ult.Costo Unidades F.Ticket NombreDepartamento
## Min. : 1.00 Min. :1.000 Min. : 772 Length:1437
## 1st Qu.:64.62 1st Qu.:1.000 1st Qu.: 99955 Class :character
## Median :64.62 Median :1.000 Median :102493 Mode :character
## Mean :56.65 Mean :1.124 Mean :100595
## 3rd Qu.:64.62 3rd Qu.:1.000 3rd Qu.:106546
## Max. :64.62 Max. :7.000 Max. :118356
## NombreFamilia NombreCategoria Estado Mts 2
## Length:1437 Length:1437 Length:1437 Min. :58.00
## Class :character Class :character Class :character 1st Qu.:58.00
## Mode :character Mode :character Mode :character Median :58.00
## Mean :58.07
## 3rd Qu.:58.00
## Max. :60.00
## Tipo.ubicación Giro Hora inicio
## Length:1437 Length:1437 Min. :1899-12-31 08:00:00
## Class :character Class :character 1st Qu.:1899-12-31 08:00:00
## Mode :character Mode :character Median :1899-12-31 08:00:00
## Mean :1899-12-31 08:00:00
## 3rd Qu.:1899-12-31 08:00:00
## Max. :1899-12-31 08:00:00
## Hora cierre
## Min. :1899-12-31 21:00:00.00
## 1st Qu.:1899-12-31 21:00:00.00
## Median :1899-12-31 21:00:00.00
## Mean :1899-12-31 21:02:06.26
## 3rd Qu.:1899-12-31 21:00:00.00
## Max. :1899-12-31 22:00:00.00
usar na.omit con cuidado no siempre en la segunda sección intenté con la base de datos original por que me marcaba cero NA’s en la bd7, y no servía de nada.
Solución 2: Reemplazar NA con 0
bd9<- bd
bd9[is.na(bd9)] <-0
summary(bd9)
## vcClaveTienda DescGiro Codigo Barras PLU
## Length:200620 Length:200620 Min. :8.347e+05 Mode :logical
## Class :character Class :character 1st Qu.:7.501e+12 FALSE:199183
## Mode :character Mode :character Median :7.501e+12 TRUE :1437
## Mean :5.950e+12
## 3rd Qu.:7.501e+12
## Max. :1.750e+13
## Fecha Hora
## Min. :2020-05-01 00:00:00.00 Min. :1899-12-31 00:00:00.00
## 1st Qu.:2020-06-06 00:00:00.00 1st Qu.:1899-12-31 13:12:42.75
## Median :2020-07-11 00:00:00.00 Median :1899-12-31 17:35:59.00
## Mean :2020-07-18 22:35:49.58 Mean :1899-12-31 16:43:52.05
## 3rd Qu.:2020-08-29 00:00:00.00 3rd Qu.:1899-12-31 20:47:06.00
## Max. :2020-11-11 00:00:00.00 Max. :1899-12-31 23:59:59.00
## Marca Fabricante Producto Precio
## Length:200620 Length:200620 Length:200620 Min. :-147.00
## Class :character Class :character Class :character 1st Qu.: 11.00
## Mode :character Mode :character Mode :character Median : 16.00
## Mean : 19.42
## 3rd Qu.: 25.00
## Max. :1000.00
## Ult.Costo Unidades F.Ticket NombreDepartamento
## Min. : 0.38 Min. : 0.200 Min. : 1 Length:200620
## 1st Qu.: 8.46 1st Qu.: 1.000 1st Qu.: 33967 Class :character
## Median : 12.31 Median : 1.000 Median :105996 Mode :character
## Mean : 15.31 Mean : 1.262 Mean :193994
## 3rd Qu.: 19.23 3rd Qu.: 1.000 3rd Qu.:383009
## Max. :769.23 Max. :96.000 Max. :450040
## NombreFamilia NombreCategoria Estado Mts 2
## Length:200620 Length:200620 Length:200620 Min. :47.0
## Class :character Class :character Class :character 1st Qu.:53.0
## Mode :character Mode :character Mode :character Median :60.0
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Tipo.ubicación Giro Hora inicio
## Length:200620 Length:200620 Min. :1899-12-31 07:00:00.00
## Class :character Class :character 1st Qu.:1899-12-31 07:00:00.00
## Mode :character Mode :character Median :1899-12-31 08:00:00.00
## Mean :1899-12-31 07:35:49.71
## 3rd Qu.:1899-12-31 08:00:00.00
## Max. :1899-12-31 09:00:00.00
## Hora cierre
## Min. :1899-12-31 21:00:00.00
## 1st Qu.:1899-12-31 22:00:00.00
## Median :1899-12-31 22:00:00.00
## Mean :1899-12-31 22:23:11.42
## 3rd Qu.:1899-12-31 23:00:00.00
## Max. :1899-12-31 23:00:00.00
sum(is.na(bd9))
## [1] 0
Solución 3: Reemplazar NA’s con promedio
bd10<- bd
bd10$PLU[is.na(bd10$PLU)]<-mean(bd10$PLU, na.rm=TRUE)
summary(bd10)
## vcClaveTienda DescGiro Codigo Barras PLU
## Length:200620 Length:200620 Min. :8.347e+05 Min. :1
## Class :character Class :character 1st Qu.:7.501e+12 1st Qu.:1
## Mode :character Mode :character Median :7.501e+12 Median :1
## Mean :5.950e+12 Mean :1
## 3rd Qu.:7.501e+12 3rd Qu.:1
## Max. :1.750e+13 Max. :1
## Fecha Hora
## Min. :2020-05-01 00:00:00.00 Min. :1899-12-31 00:00:00.00
## 1st Qu.:2020-06-06 00:00:00.00 1st Qu.:1899-12-31 13:12:42.75
## Median :2020-07-11 00:00:00.00 Median :1899-12-31 17:35:59.00
## Mean :2020-07-18 22:35:49.58 Mean :1899-12-31 16:43:52.05
## 3rd Qu.:2020-08-29 00:00:00.00 3rd Qu.:1899-12-31 20:47:06.00
## Max. :2020-11-11 00:00:00.00 Max. :1899-12-31 23:59:59.00
## Marca Fabricante Producto Precio
## Length:200620 Length:200620 Length:200620 Min. :-147.00
## Class :character Class :character Class :character 1st Qu.: 11.00
## Mode :character Mode :character Mode :character Median : 16.00
## Mean : 19.42
## 3rd Qu.: 25.00
## Max. :1000.00
## Ult.Costo Unidades F.Ticket NombreDepartamento
## Min. : 0.38 Min. : 0.200 Min. : 1 Length:200620
## 1st Qu.: 8.46 1st Qu.: 1.000 1st Qu.: 33967 Class :character
## Median : 12.31 Median : 1.000 Median :105996 Mode :character
## Mean : 15.31 Mean : 1.262 Mean :193994
## 3rd Qu.: 19.23 3rd Qu.: 1.000 3rd Qu.:383009
## Max. :769.23 Max. :96.000 Max. :450040
## NombreFamilia NombreCategoria Estado Mts 2
## Length:200620 Length:200620 Length:200620 Min. :47.0
## Class :character Class :character Class :character 1st Qu.:53.0
## Mode :character Mode :character Mode :character Median :60.0
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Tipo.ubicación Giro Hora inicio
## Length:200620 Length:200620 Min. :1899-12-31 07:00:00.00
## Class :character Class :character 1st Qu.:1899-12-31 07:00:00.00
## Mode :character Mode :character Median :1899-12-31 08:00:00.00
## Mean :1899-12-31 07:35:49.71
## 3rd Qu.:1899-12-31 08:00:00.00
## Max. :1899-12-31 09:00:00.00
## Hora cierre
## Min. :1899-12-31 21:00:00.00
## 1st Qu.:1899-12-31 22:00:00.00
## Median :1899-12-31 22:00:00.00
## Mean :1899-12-31 22:23:11.42
## 3rd Qu.:1899-12-31 23:00:00.00
## Max. :1899-12-31 23:00:00.00
sum(is.na(bd10))
## [1] 0
Solución 4: Reemplazar negativos con CEROS NO FUNCIONA
bd11<-bd bd11[bd11<0] <-0 summary(bd11) “ERROR Assigned data
0 must be compatible with existing data.Error in
[<-:”
Gráfica de caja y bigotes
bd12<-bd7
boxplot(bd12$Precio,horizontal=TRUE)
boxplot(bd12$Unidades,horizontal=TRUE)
Da negativas las unidades(?)
Agregar Columnas
#install.packages("lubridate")
library(lubridate)
bd12$Dia_de_la_semana <-wday(bd12$Fecha)
summary(bd12)
## vcClaveTienda DescGiro Codigo Barras
## Length:200620 Length:200620 Min. :8.347e+05
## Class :character Class :character 1st Qu.:7.501e+12
## Mode :character Mode :character Median :7.501e+12
## Mean :5.950e+12
## 3rd Qu.:7.501e+12
## Max. :1.750e+13
## Fecha Hora Marca
## Min. :2020-05-01 00:00:00.00 Min. :18 Length:200620
## 1st Qu.:2020-06-06 00:00:00.00 1st Qu.:18 Class :character
## Median :2020-07-11 00:00:00.00 Median :18 Mode :character
## Mean :2020-07-18 22:35:49.58 Mean :18
## 3rd Qu.:2020-08-29 00:00:00.00 3rd Qu.:18
## Max. :2020-11-11 00:00:00.00 Max. :18
## Fabricante Producto Precio Ult.Costo
## Length:200620 Length:200620 Min. : 0.50 Min. : 0.38
## Class :character Class :character 1st Qu.: 11.00 1st Qu.: 8.46
## Mode :character Mode :character Median : 16.00 Median : 12.31
## Mean : 19.45 Mean : 15.31
## 3rd Qu.: 25.00 3rd Qu.: 19.23
## Max. :1000.00 Max. :769.23
## Unidades F.Ticket NombreDepartamento NombreFamilia
## Min. :-96.000 Min. : 1 Length:200620 Length:200620
## 1st Qu.: -1.000 1st Qu.: 33967 Class :character Class :character
## Median : -1.000 Median :105996 Mode :character Mode :character
## Mean : -1.262 Mean :193994
## 3rd Qu.: -1.000 3rd Qu.:383009
## Max. : -1.000 Max. :450040
## NombreCategoria Estado Mts 2 Tipo.ubicación
## Length:200620 Length:200620 Min. :47.0 Length:200620
## Class :character Class :character 1st Qu.:53.0 Class :character
## Mode :character Mode :character Median :60.0 Mode :character
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Giro Hora inicio
## Length:200620 Min. :1899-12-31 07:00:00.00
## Class :character 1st Qu.:1899-12-31 07:00:00.00
## Mode :character Median :1899-12-31 08:00:00.00
## Mean :1899-12-31 07:35:49.71
## 3rd Qu.:1899-12-31 08:00:00.00
## Max. :1899-12-31 09:00:00.00
## Hora cierre Dia_de_la_semana
## Min. :1899-12-31 21:00:00.00 Min. :1.000
## 1st Qu.:1899-12-31 22:00:00.00 1st Qu.:2.000
## Median :1899-12-31 22:00:00.00 Median :4.000
## Mean :1899-12-31 22:23:11.42 Mean :3.912
## 3rd Qu.:1899-12-31 23:00:00.00 3rd Qu.:6.000
## Max. :1899-12-31 23:00:00.00 Max. :7.000
bd12$subtotal<- bd12$Precio*bd12$Unidades
summary(bd12)
## vcClaveTienda DescGiro Codigo Barras
## Length:200620 Length:200620 Min. :8.347e+05
## Class :character Class :character 1st Qu.:7.501e+12
## Mode :character Mode :character Median :7.501e+12
## Mean :5.950e+12
## 3rd Qu.:7.501e+12
## Max. :1.750e+13
## Fecha Hora Marca
## Min. :2020-05-01 00:00:00.00 Min. :18 Length:200620
## 1st Qu.:2020-06-06 00:00:00.00 1st Qu.:18 Class :character
## Median :2020-07-11 00:00:00.00 Median :18 Mode :character
## Mean :2020-07-18 22:35:49.58 Mean :18
## 3rd Qu.:2020-08-29 00:00:00.00 3rd Qu.:18
## Max. :2020-11-11 00:00:00.00 Max. :18
## Fabricante Producto Precio Ult.Costo
## Length:200620 Length:200620 Min. : 0.50 Min. : 0.38
## Class :character Class :character 1st Qu.: 11.00 1st Qu.: 8.46
## Mode :character Mode :character Median : 16.00 Median : 12.31
## Mean : 19.45 Mean : 15.31
## 3rd Qu.: 25.00 3rd Qu.: 19.23
## Max. :1000.00 Max. :769.23
## Unidades F.Ticket NombreDepartamento NombreFamilia
## Min. :-96.000 Min. : 1 Length:200620 Length:200620
## 1st Qu.: -1.000 1st Qu.: 33967 Class :character Class :character
## Median : -1.000 Median :105996 Mode :character Mode :character
## Mean : -1.262 Mean :193994
## 3rd Qu.: -1.000 3rd Qu.:383009
## Max. : -1.000 Max. :450040
## NombreCategoria Estado Mts 2 Tipo.ubicación
## Length:200620 Length:200620 Min. :47.0 Length:200620
## Class :character Class :character 1st Qu.:53.0 Class :character
## Mode :character Mode :character Median :60.0 Mode :character
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Giro Hora inicio
## Length:200620 Min. :1899-12-31 07:00:00.00
## Class :character 1st Qu.:1899-12-31 07:00:00.00
## Mode :character Median :1899-12-31 08:00:00.00
## Mean :1899-12-31 07:35:49.71
## 3rd Qu.:1899-12-31 08:00:00.00
## Max. :1899-12-31 09:00:00.00
## Hora cierre Dia_de_la_semana subtotal
## Min. :1899-12-31 21:00:00.00 Min. :1.000 Min. :-2496.00
## 1st Qu.:1899-12-31 22:00:00.00 1st Qu.:2.000 1st Qu.: -27.00
## Median :1899-12-31 22:00:00.00 Median :4.000 Median : -18.00
## Mean :1899-12-31 22:23:11.42 Mean :3.912 Mean : -24.33
## 3rd Qu.:1899-12-31 23:00:00.00 3rd Qu.:6.000 3rd Qu.: -12.00
## Max. :1899-12-31 23:00:00.00 Max. :7.000 Max. : -1.00
bd12$Utilidad <- bd12$Precio - bd12$Ult.Costo
summary(bd12)
## vcClaveTienda DescGiro Codigo Barras
## Length:200620 Length:200620 Min. :8.347e+05
## Class :character Class :character 1st Qu.:7.501e+12
## Mode :character Mode :character Median :7.501e+12
## Mean :5.950e+12
## 3rd Qu.:7.501e+12
## Max. :1.750e+13
## Fecha Hora Marca
## Min. :2020-05-01 00:00:00.00 Min. :18 Length:200620
## 1st Qu.:2020-06-06 00:00:00.00 1st Qu.:18 Class :character
## Median :2020-07-11 00:00:00.00 Median :18 Mode :character
## Mean :2020-07-18 22:35:49.58 Mean :18
## 3rd Qu.:2020-08-29 00:00:00.00 3rd Qu.:18
## Max. :2020-11-11 00:00:00.00 Max. :18
## Fabricante Producto Precio Ult.Costo
## Length:200620 Length:200620 Min. : 0.50 Min. : 0.38
## Class :character Class :character 1st Qu.: 11.00 1st Qu.: 8.46
## Mode :character Mode :character Median : 16.00 Median : 12.31
## Mean : 19.45 Mean : 15.31
## 3rd Qu.: 25.00 3rd Qu.: 19.23
## Max. :1000.00 Max. :769.23
## Unidades F.Ticket NombreDepartamento NombreFamilia
## Min. :-96.000 Min. : 1 Length:200620 Length:200620
## 1st Qu.: -1.000 1st Qu.: 33967 Class :character Class :character
## Median : -1.000 Median :105996 Mode :character Mode :character
## Mean : -1.262 Mean :193994
## 3rd Qu.: -1.000 3rd Qu.:383009
## Max. : -1.000 Max. :450040
## NombreCategoria Estado Mts 2 Tipo.ubicación
## Length:200620 Length:200620 Min. :47.0 Length:200620
## Class :character Class :character 1st Qu.:53.0 Class :character
## Mode :character Mode :character Median :60.0 Mode :character
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Giro Hora inicio
## Length:200620 Min. :1899-12-31 07:00:00.00
## Class :character 1st Qu.:1899-12-31 07:00:00.00
## Mode :character Median :1899-12-31 08:00:00.00
## Mean :1899-12-31 07:35:49.71
## 3rd Qu.:1899-12-31 08:00:00.00
## Max. :1899-12-31 09:00:00.00
## Hora cierre Dia_de_la_semana subtotal
## Min. :1899-12-31 21:00:00.00 Min. :1.000 Min. :-2496.00
## 1st Qu.:1899-12-31 22:00:00.00 1st Qu.:2.000 1st Qu.: -27.00
## Median :1899-12-31 22:00:00.00 Median :4.000 Median : -18.00
## Mean :1899-12-31 22:23:11.42 Mean :3.912 Mean : -24.33
## 3rd Qu.:1899-12-31 23:00:00.00 3rd Qu.:6.000 3rd Qu.: -12.00
## Max. :1899-12-31 23:00:00.00 Max. :7.000 Max. : -1.00
## Utilidad
## Min. : 0.000
## 1st Qu.: 2.310
## Median : 3.230
## Mean : 4.142
## 3rd Qu.: 5.420
## Max. :230.770
1 es domingo, 2 Lunes, etc. Agregamos columnas de día de la semana, subtotal y utilidad.
bd_limpia<- bd12
write.csv(bd_limpia,file="Abarrotes_bd_limpia_R.csv",row.names=FALSE)
que no tome en cuenta el primer renglón como títulos
Instalar paquetes
#install.packages("plyr")
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:purrr':
##
## compact
#install.packages("Matrix")
library(Matrix)
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
#install.packages("arules")
library(arules)
##
## Attaching package: 'arules'
## The following object is masked from 'package:dplyr':
##
## recode
## The following objects are masked from 'package:base':
##
## abbreviate, write
#install.packages("arulesViz")
library(arules)
#install.packages("datasets")
library(datasets)
Ordenar de menor a mayor los Tickets
bd_limpia <- bd_limpia[order(bd_limpia$F.Ticket),]
head(bd_limpia)
## # A tibble: 6 × 24
## vcClaveTienda DescGiro `Codigo Barras` Fecha Hora Marca
## <chr> <chr> <dbl> <dttm> <int> <chr>
## 1 MX001 Abarrotes 7501020540666 2020-06-19 00:00:00 18 NUTRI LECHE
## 2 MX001 Abarrotes 7501032397906 2020-06-19 00:00:00 18 DAN UP
## 3 MX001 Abarrotes 7501000112845 2020-06-19 00:00:00 18 BIMBO
## 4 MX001 Abarrotes 7501031302741 2020-06-19 00:00:00 18 PEPSI
## 5 MX001 Abarrotes 7501026027543 2020-06-19 00:00:00 18 BLANCA NIEV…
## 6 MX001 Abarrotes 7501025433024 2020-06-19 00:00:00 18 FLASH
## # ℹ 18 more variables: Fabricante <chr>, Producto <chr>, Precio <dbl>,
## # Ult.Costo <dbl>, Unidades <dbl>, F.Ticket <dbl>, NombreDepartamento <chr>,
## # NombreFamilia <chr>, NombreCategoria <chr>, Estado <chr>, `Mts 2` <dbl>,
## # Tipo.ubicación <chr>, Giro <chr>, `Hora inicio` <dttm>,
## # `Hora cierre` <dttm>, Dia_de_la_semana <dbl>, subtotal <dbl>,
## # Utilidad <dbl>
tail(bd_limpia)
## # A tibble: 6 × 24
## vcClaveTienda DescGiro `Codigo Barras` Fecha Hora Marca
## <chr> <chr> <dbl> <dttm> <int> <chr>
## 1 MX004 Carnicería 10248765241 2020-10-15 00:00:00 18 YEMINA
## 2 MX004 Carnicería 7501079702855 2020-10-15 00:00:00 18 DEL FUERTE
## 3 MX004 Carnicería 7501055320639 2020-10-15 00:00:00 18 COCA COLA …
## 4 MX004 Carnicería 7501214100256 2020-10-15 00:00:00 18 DIAMANTE
## 5 MX004 Carnicería 7501031311620 2020-10-15 00:00:00 18 PEPSI
## 6 MX004 Carnicería 75004699 2020-10-15 00:00:00 18 COCA COLA
## # ℹ 18 more variables: Fabricante <chr>, Producto <chr>, Precio <dbl>,
## # Ult.Costo <dbl>, Unidades <dbl>, F.Ticket <dbl>, NombreDepartamento <chr>,
## # NombreFamilia <chr>, NombreCategoria <chr>, Estado <chr>, `Mts 2` <dbl>,
## # Tipo.ubicación <chr>, Giro <chr>, `Hora inicio` <dttm>,
## # `Hora cierre` <dttm>, Dia_de_la_semana <dbl>, subtotal <dbl>,
## # Utilidad <dbl>
Generar Basket
basket<- ddply(bd_limpia,c("F.Ticket"),function(bd_limpia)paste(bd_limpia$Marca,collapse=","))
Eliminar número de ticket
basket$F.Ticket<-NULL
Renombrar el nombre de la columna
colnames(basket) <- c ("Marca")
Exportar Basket
write.csv(basket,"basket.csv", quote=FALSE,row.names=FALSE)
Importar Transacciones
#file.choose()
tr<-read.transactions("D:\\Lesly Gómez\\Documentos\\basket.csv", format= "basket", sep =",")
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in asMethod(object): removing duplicated items in transactions
Generar reglas de asociación
reglas.asociacion<-apriori(tr,parameter = list(supp=0.001,conf=0.2,maxlen=10))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 115
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[604 item(s), 115111 transaction(s)] done [0.03s].
## sorting and recoding items ... [207 item(s)] done [0.00s].
## creating transaction tree ... done [0.03s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [11 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
summary(reglas.asociacion)
## set of 11 rules
##
## rule length distribution (lhs + rhs):sizes
## 2
## 11
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2 2 2 2 2 2
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.001016 Min. :0.2069 Min. :0.003562 Min. : 1.325
## 1st Qu.:0.001103 1st Qu.:0.2356 1st Qu.:0.004504 1st Qu.: 1.787
## Median :0.001416 Median :0.2442 Median :0.005803 Median : 3.972
## Mean :0.001519 Mean :0.2536 Mean :0.006054 Mean :17.563
## 3rd Qu.:0.001651 3rd Qu.:0.2685 3rd Qu.:0.006893 3rd Qu.:21.798
## Max. :0.002745 Max. :0.3098 Max. :0.010503 Max. :65.908
## count
## Min. :117.0
## 1st Qu.:127.0
## Median :163.0
## Mean :174.9
## 3rd Qu.:190.0
## Max. :316.0
##
## mining info:
## data ntransactions support confidence
## tr 115111 0.001 0.2
## call
## apriori(data = tr, parameter = list(supp = 0.001, conf = 0.2, maxlen = 10))
inspect(reglas.asociacion)
## lhs rhs support confidence coverage
## [1] {FANTA} => {COCA COLA} 0.001051159 0.2439516 0.004308884
## [2] {SALVO} => {FABULOSO} 0.001103283 0.3097561 0.003561779
## [3] {FABULOSO} => {SALVO} 0.001103283 0.2347505 0.004699811
## [4] {COCA COLA ZERO} => {COCA COLA} 0.001416025 0.2969035 0.004769310
## [5] {SPRITE} => {COCA COLA} 0.001346526 0.2069426 0.006506763
## [6] {PINOL} => {CLORALEX} 0.001016410 0.2363636 0.004300197
## [7] {BLUE HOUSE} => {BIMBO} 0.001711392 0.2720994 0.006289581
## [8] {HELLMANN´S} => {BIMBO} 0.001537646 0.2649701 0.005803094
## [9] {REYMA} => {CONVERMEX} 0.002093631 0.2441743 0.008574333
## [10] {FUD} => {BIMBO} 0.001589770 0.2183771 0.007279930
## [11] {COCA COLA LIGHT} => {COCA COLA} 0.002745176 0.2613730 0.010502906
## lift count
## [1] 1.561906 121
## [2] 65.908196 127
## [3] 65.908196 127
## [4] 1.900932 163
## [5] 1.324955 155
## [6] 25.030409 117
## [7] 4.078870 197
## [8] 3.971997 177
## [9] 18.564824 241
## [10] 3.273552 183
## [11] 1.673447 316
Ordenar reglas de asociación
reglas.asociacion<-sort(reglas.asociacion,by="confidence",decreasing=TRUE)
summary(reglas.asociacion)
## set of 11 rules
##
## rule length distribution (lhs + rhs):sizes
## 2
## 11
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2 2 2 2 2 2
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.001016 Min. :0.2069 Min. :0.003562 Min. : 1.325
## 1st Qu.:0.001103 1st Qu.:0.2356 1st Qu.:0.004504 1st Qu.: 1.787
## Median :0.001416 Median :0.2442 Median :0.005803 Median : 3.972
## Mean :0.001519 Mean :0.2536 Mean :0.006054 Mean :17.563
## 3rd Qu.:0.001651 3rd Qu.:0.2685 3rd Qu.:0.006893 3rd Qu.:21.798
## Max. :0.002745 Max. :0.3098 Max. :0.010503 Max. :65.908
## count
## Min. :117.0
## 1st Qu.:127.0
## Median :163.0
## Mean :174.9
## 3rd Qu.:190.0
## Max. :316.0
##
## mining info:
## data ntransactions support confidence
## tr 115111 0.001 0.2
## call
## apriori(data = tr, parameter = list(supp = 0.001, conf = 0.2, maxlen = 10))
inspect(reglas.asociacion)
## lhs rhs support confidence coverage
## [1] {SALVO} => {FABULOSO} 0.001103283 0.3097561 0.003561779
## [2] {COCA COLA ZERO} => {COCA COLA} 0.001416025 0.2969035 0.004769310
## [3] {BLUE HOUSE} => {BIMBO} 0.001711392 0.2720994 0.006289581
## [4] {HELLMANN´S} => {BIMBO} 0.001537646 0.2649701 0.005803094
## [5] {COCA COLA LIGHT} => {COCA COLA} 0.002745176 0.2613730 0.010502906
## [6] {REYMA} => {CONVERMEX} 0.002093631 0.2441743 0.008574333
## [7] {FANTA} => {COCA COLA} 0.001051159 0.2439516 0.004308884
## [8] {PINOL} => {CLORALEX} 0.001016410 0.2363636 0.004300197
## [9] {FABULOSO} => {SALVO} 0.001103283 0.2347505 0.004699811
## [10] {FUD} => {BIMBO} 0.001589770 0.2183771 0.007279930
## [11] {SPRITE} => {COCA COLA} 0.001346526 0.2069426 0.006506763
## lift count
## [1] 65.908196 127
## [2] 1.900932 163
## [3] 4.078870 197
## [4] 3.971997 177
## [5] 1.673447 316
## [6] 18.564824 241
## [7] 1.561906 121
## [8] 25.030409 117
## [9] 65.908196 127
## [10] 3.273552 183
## [11] 1.324955 155
Visualizar reglas de asociación
#install.packages("arulesViz")
library(arulesViz)
top10reglas <- head(reglas.asociacion,n=10,bye="confidence")
plot(top10reglas,method="graph",engine="htmlwidget")