“Estas herramientas de big data permiten a las empresas
hacer nuevos tipos de predicciones, descubrir patrones inesperados en la
actividad comercial y desbloquear nuevas fuentes de valor”
David L. Rogers (2016)
Se realizarán 5 pasos que nos permitirán generar valor a los datos que tenemos.
El departamento que buscamos impactar es el departamento de
mercadotecnia, ya que, el objetivo que deseamos cumplir es
incrementar las ventas a través de estrategias.
Indicador clave de rendimiento: Ventas x Mes
Segmentaci´pn Segmentar los productos nos ayudará a saber cuales son los productos más rentables.
Hipótesis
Si se logran identificar factores que influyen en el comportamiento del
consumidor al momento de la compra, se podrá conocer sus compras
futuras.
Eliminar columnas Código de Barras
Eliminar renglores Precios en 0
Eliminar renglones duplicados
Convertir los precios en absoluto
Convertir cantidades en enteros
Reemplazar NA con ceros
Reemplazar valores negativos con ceros
Agregar columnas Día de la Semana, Sub Total, Utilidad
El equipo de mercadotecnia realizará un análisis de ventas para
encontrar relación entre los productos, esto permitirá:
Generar promociones
Identificar el mejor acomodo de productos que sean complementarios.
Pasos Previos #### A la base de datos se le hicieron los
siguientes cambios: Se cambió el formato a fecha corta
Se duplicaron los primeros 5 registros
Se cambió el formato a Hora (Español, Mexico)
Se cambió el formato a codigo de barras (para que salga completo)
Se guardó como CSV UTF-8 (Delimitado por comas)
# file.choose()
bd <- read.csv("/Users/anita3/Downloads/Abarrotes_Ventas.csv")
install.packages(“dplyr”)
library(dplyr)
install.packages(“tidyverse”)
library(tidyverse)
install.packages(“janitor”)
library(janitor)
count(bd, vcClaveTienda, sort = TRUE)
count(bd, DescGiro, sort = TRUE)
count(bd, Marca, sort = TRUE)
count(bd, Fabricante, sort = TRUE)
count(bd, Producto, sort = TRUE)
count(bd, NombreDepartamento, sort = TRUE)
count(bd, NombreFamilia, sort = TRUE)
count(bd, NombreCategoria, sort = TRUE)
count(bd, Estado, sort = TRUE)
count(bd, Mts.2, sort = TRUE)
count(bd, Tipo.ubicación, sort = TRUE)
count(bd, Giro, sort = TRUE)
count(bd, Hora.inicio, sort = TRUE)
count(bd, Hora.cierre, sort = TRUE)
#install.packages("tidyverse")
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
#install.packages("janitor")
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
summary(bd)
## vcClaveTienda DescGiro Codigo.Barras PLU
## Length:200625 Length:200625 Min. :8.347e+05 Min. : 1.00
## Class :character Class :character 1st Qu.:7.501e+12 1st Qu.: 1.00
## Mode :character Mode :character Median :7.501e+12 Median : 1.00
## Mean :5.950e+12 Mean : 2.11
## 3rd Qu.:7.501e+12 3rd Qu.: 1.00
## Max. :1.750e+13 Max. :30.00
## NA's :199188
## Fecha Hora Marca Fabricante
## Length:200625 Length:200625 Length:200625 Length:200625
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Producto Precio Ult.Costo Unidades
## Length:200625 Min. :-147.00 Min. : 0.38 Min. : 0.200
## Class :character 1st Qu.: 11.00 1st Qu.: 8.46 1st Qu.: 1.000
## Mode :character Median : 16.00 Median : 12.31 Median : 1.000
## Mean : 19.42 Mean : 15.31 Mean : 1.262
## 3rd Qu.: 25.00 3rd Qu.: 19.23 3rd Qu.: 1.000
## Max. :1000.00 Max. :769.23 Max. :96.000
##
## F.Ticket NombreDepartamento NombreFamilia NombreCategoria
## Min. : 1 Length:200625 Length:200625 Length:200625
## 1st Qu.: 33964 Class :character Class :character Class :character
## Median :105993 Mode :character Mode :character Mode :character
## Mean :193990
## 3rd Qu.:383005
## Max. :450040
##
## Estado Mts.2 Tipo.ubicación Giro
## Length:200625 Min. :47.0 Length:200625 Length:200625
## Class :character 1st Qu.:53.0 Class :character Class :character
## Mode :character Median :60.0 Mode :character Mode :character
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
##
## Hora.inicio Hora.cierre
## Length:200625 Length:200625
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
tibble(bd)
## # A tibble: 200,625 × 22
## vcClaveTienda DescGiro Codig…¹ PLU Fecha Hora Marca Fabri…² Produ…³ Precio
## <chr> <chr> <dbl> <int> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 MX001 Abarrot… 7.50e12 NA 6/19… 08:1… NUTR… MEXILAC Nutri … 16
## 2 MX001 Abarrot… 7.50e12 NA 6/19… 08:2… DAN … DANONE… DANUP … 14
## 3 MX001 Abarrot… 7.50e12 NA 6/19… 08:2… BIMBO GRUPO … Rebana… 5
## 4 MX001 Abarrot… 7.50e12 NA 6/19… 08:2… PEPSI PEPSI-… Pepsi … 8
## 5 MX001 Abarrot… 7.50e12 NA 6/19… 08:2… BLAN… FABRIC… Deterg… 19.5
## 6 MX001 Abarrot… 7.50e12 NA 6/19… 08:1… NUTR… MEXILAC Nutri … 16
## 7 MX001 Abarrot… 7.50e12 NA 6/19… 08:2… DAN … DANONE… DANUP … 14
## 8 MX001 Abarrot… 7.50e12 NA 6/19… 08:2… BIMBO GRUPO … Rebana… 5
## 9 MX001 Abarrot… 7.50e12 NA 6/19… 08:2… PEPSI PEPSI-… Pepsi … 8
## 10 MX001 Abarrot… 7.50e12 NA 6/19… 08:2… BLAN… FABRIC… Deterg… 19.5
## # … with 200,615 more rows, 12 more variables: Ult.Costo <dbl>, Unidades <dbl>,
## # F.Ticket <int>, NombreDepartamento <chr>, NombreFamilia <chr>,
## # NombreCategoria <chr>, Estado <chr>, Mts.2 <int>, Tipo.ubicación <chr>,
## # Giro <chr>, Hora.inicio <chr>, Hora.cierre <chr>, and abbreviated variable
## # names ¹Codigo.Barras, ²Fabricante, ³Producto
str(bd)
## 'data.frame': 200625 obs. of 22 variables:
## $ vcClaveTienda : chr "MX001" "MX001" "MX001" "MX001" ...
## $ DescGiro : chr "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ Codigo.Barras : num 7.5e+12 7.5e+12 7.5e+12 7.5e+12 7.5e+12 ...
## $ PLU : int NA NA NA NA NA NA NA NA NA NA ...
## $ Fecha : chr "6/19/2020" "6/19/2020" "6/19/2020" "6/19/2020" ...
## $ Hora : chr "08:16:21 a. m." "08:23:33 a. m." "08:24:33 a. m." "08:24:33 a. m." ...
## $ Marca : chr "NUTRI LECHE" "DAN UP" "BIMBO" "PEPSI" ...
## $ Fabricante : chr "MEXILAC" "DANONE DE MEXICO" "GRUPO BIMBO" "PEPSI-COLA MEXICANA" ...
## $ Producto : chr "Nutri Leche 1 Litro" "DANUP STRAWBERRY P/BEBER 350GR NAL" "Rebanadas Bimbo 2Pz" "Pepsi N.R. 400Ml" ...
## $ Precio : num 16 14 5 8 19.5 16 14 5 8 19.5 ...
## $ Ult.Costo : num 12.3 14 5 8 15 ...
## $ Unidades : num 1 1 1 1 1 1 1 1 1 1 ...
## $ F.Ticket : int 1 2 3 3 4 1 2 3 3 4 ...
## $ NombreDepartamento: chr "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ NombreFamilia : chr "Lacteos y Refrigerados" "Lacteos y Refrigerados" "Pan y Tortilla" "Bebidas" ...
## $ NombreCategoria : chr "Leche" "Yogurt" "Pan Dulce Empaquetado" "Refrescos Plástico (N.R.)" ...
## $ Estado : chr "Nuevo León" "Nuevo León" "Nuevo León" "Nuevo León" ...
## $ Mts.2 : int 60 60 60 60 60 60 60 60 60 60 ...
## $ Tipo.ubicación : chr "Esquina" "Esquina" "Esquina" "Esquina" ...
## $ Giro : chr "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ Hora.inicio : chr "8:00" "8:00" "8:00" "8:00" ...
## $ Hora.cierre : chr "22:00" "22:00" "22:00" "22:00" ...
head(bd)
## vcClaveTienda DescGiro Codigo.Barras PLU Fecha Hora
## 1 MX001 Abarrotes 7.501021e+12 NA 6/19/2020 08:16:21 a. m.
## 2 MX001 Abarrotes 7.501032e+12 NA 6/19/2020 08:23:33 a. m.
## 3 MX001 Abarrotes 7.501000e+12 NA 6/19/2020 08:24:33 a. m.
## 4 MX001 Abarrotes 7.501031e+12 NA 6/19/2020 08:24:33 a. m.
## 5 MX001 Abarrotes 7.501026e+12 NA 6/19/2020 08:26:28 a. m.
## 6 MX001 Abarrotes 7.501021e+12 NA 6/19/2020 08:16:21 a. m.
## Marca Fabricante
## 1 NUTRI LECHE MEXILAC
## 2 DAN UP DANONE DE MEXICO
## 3 BIMBO GRUPO BIMBO
## 4 PEPSI PEPSI-COLA MEXICANA
## 5 BLANCA NIEVES (DETERGENTE) FABRICA DE JABON LA CORONA
## 6 NUTRI LECHE MEXILAC
## Producto Precio Ult.Costo Unidades F.Ticket
## 1 Nutri Leche 1 Litro 16.0 12.31 1 1
## 2 DANUP STRAWBERRY P/BEBER 350GR NAL 14.0 14.00 1 2
## 3 Rebanadas Bimbo 2Pz 5.0 5.00 1 3
## 4 Pepsi N.R. 400Ml 8.0 8.00 1 3
## 5 Detergente Blanca Nieves 500G 19.5 15.00 1 4
## 6 Nutri Leche 1 Litro 16.0 12.31 1 1
## NombreDepartamento NombreFamilia NombreCategoria
## 1 Abarrotes Lacteos y Refrigerados Leche
## 2 Abarrotes Lacteos y Refrigerados Yogurt
## 3 Abarrotes Pan y Tortilla Pan Dulce Empaquetado
## 4 Abarrotes Bebidas Refrescos Plástico (N.R.)
## 5 Abarrotes Limpieza del Hogar Lavandería
## 6 Abarrotes Lacteos y Refrigerados Leche
## Estado Mts.2 Tipo.ubicación Giro Hora.inicio Hora.cierre
## 1 Nuevo León 60 Esquina Abarrotes 8:00 22:00
## 2 Nuevo León 60 Esquina Abarrotes 8:00 22:00
## 3 Nuevo León 60 Esquina Abarrotes 8:00 22:00
## 4 Nuevo León 60 Esquina Abarrotes 8:00 22:00
## 5 Nuevo León 60 Esquina Abarrotes 8:00 22:00
## 6 Nuevo León 60 Esquina Abarrotes 8:00 22:00
head(bd, n=7)
## vcClaveTienda DescGiro Codigo.Barras PLU Fecha Hora
## 1 MX001 Abarrotes 7.501021e+12 NA 6/19/2020 08:16:21 a. m.
## 2 MX001 Abarrotes 7.501032e+12 NA 6/19/2020 08:23:33 a. m.
## 3 MX001 Abarrotes 7.501000e+12 NA 6/19/2020 08:24:33 a. m.
## 4 MX001 Abarrotes 7.501031e+12 NA 6/19/2020 08:24:33 a. m.
## 5 MX001 Abarrotes 7.501026e+12 NA 6/19/2020 08:26:28 a. m.
## 6 MX001 Abarrotes 7.501021e+12 NA 6/19/2020 08:16:21 a. m.
## 7 MX001 Abarrotes 7.501032e+12 NA 6/19/2020 08:23:33 a. m.
## Marca Fabricante
## 1 NUTRI LECHE MEXILAC
## 2 DAN UP DANONE DE MEXICO
## 3 BIMBO GRUPO BIMBO
## 4 PEPSI PEPSI-COLA MEXICANA
## 5 BLANCA NIEVES (DETERGENTE) FABRICA DE JABON LA CORONA
## 6 NUTRI LECHE MEXILAC
## 7 DAN UP DANONE DE MEXICO
## Producto Precio Ult.Costo Unidades F.Ticket
## 1 Nutri Leche 1 Litro 16.0 12.31 1 1
## 2 DANUP STRAWBERRY P/BEBER 350GR NAL 14.0 14.00 1 2
## 3 Rebanadas Bimbo 2Pz 5.0 5.00 1 3
## 4 Pepsi N.R. 400Ml 8.0 8.00 1 3
## 5 Detergente Blanca Nieves 500G 19.5 15.00 1 4
## 6 Nutri Leche 1 Litro 16.0 12.31 1 1
## 7 DANUP STRAWBERRY P/BEBER 350GR NAL 14.0 14.00 1 2
## NombreDepartamento NombreFamilia NombreCategoria
## 1 Abarrotes Lacteos y Refrigerados Leche
## 2 Abarrotes Lacteos y Refrigerados Yogurt
## 3 Abarrotes Pan y Tortilla Pan Dulce Empaquetado
## 4 Abarrotes Bebidas Refrescos Plástico (N.R.)
## 5 Abarrotes Limpieza del Hogar Lavandería
## 6 Abarrotes Lacteos y Refrigerados Leche
## 7 Abarrotes Lacteos y Refrigerados Yogurt
## Estado Mts.2 Tipo.ubicación Giro Hora.inicio Hora.cierre
## 1 Nuevo León 60 Esquina Abarrotes 8:00 22:00
## 2 Nuevo León 60 Esquina Abarrotes 8:00 22:00
## 3 Nuevo León 60 Esquina Abarrotes 8:00 22:00
## 4 Nuevo León 60 Esquina Abarrotes 8:00 22:00
## 5 Nuevo León 60 Esquina Abarrotes 8:00 22:00
## 6 Nuevo León 60 Esquina Abarrotes 8:00 22:00
## 7 Nuevo León 60 Esquina Abarrotes 8:00 22:00
tail(bd)
## vcClaveTienda DescGiro Codigo.Barras PLU Fecha Hora
## 200620 MX005 Depósito 7.62221e+12 NA 7/12/2020 01:08:25 a. m.
## 200621 MX005 Depósito 7.62221e+12 NA 10/23/2020 10:17:37 p. m.
## 200622 MX005 Depósito 7.62221e+12 NA 10/10/2020 08:30:20 p. m.
## 200623 MX005 Depósito 7.62221e+12 NA 10/10/2020 10:40:43 p. m.
## 200624 MX005 Depósito 7.62221e+12 NA 6/27/2020 10:30:19 p. m.
## 200625 MX005 Depósito 7.62221e+12 NA 6/26/2020 11:43:34 p. m.
## Marca Fabricante Producto Precio
## 200620 TRIDENT XTRA CARE CADBURY ADAMS Trident Xtracare Freshmint 16.32G 9
## 200621 TRIDENT XTRA CARE CADBURY ADAMS Trident Xtracare Freshmint 16.32G 9
## 200622 TRIDENT XTRA CARE CADBURY ADAMS Trident Xtracare Freshmint 16.32G 9
## 200623 TRIDENT XTRA CARE CADBURY ADAMS Trident Xtracare Freshmint 16.32G 9
## 200624 TRIDENT XTRA CARE CADBURY ADAMS Trident Xtracare Freshmint 16.32G 9
## 200625 TRIDENT XTRA CARE CADBURY ADAMS Trident Xtracare Freshmint 16.32G 9
## Ult.Costo Unidades F.Ticket NombreDepartamento NombreFamilia
## 200620 6.92 1 103100 Abarrotes Dulcería
## 200621 6.92 1 116598 Abarrotes Dulcería
## 200622 6.92 1 114886 Abarrotes Dulcería
## 200623 6.92 1 114955 Abarrotes Dulcería
## 200624 6.92 1 101121 Abarrotes Dulcería
## 200625 6.92 1 100879 Abarrotes Dulcería
## NombreCategoria Estado Mts.2 Tipo.ubicación Giro Hora.inicio
## 200620 Gomas de Mazcar Quintana Roo 58 Esquina Mini súper 8:00
## 200621 Gomas de Mazcar Quintana Roo 58 Esquina Mini súper 8:00
## 200622 Gomas de Mazcar Quintana Roo 58 Esquina Mini súper 8:00
## 200623 Gomas de Mazcar Quintana Roo 58 Esquina Mini súper 8:00
## 200624 Gomas de Mazcar Quintana Roo 58 Esquina Mini súper 8:00
## 200625 Gomas de Mazcar Quintana Roo 58 Esquina Mini súper 8:00
## Hora.cierre
## 200620 21:00
## 200621 21:00
## 200622 21:00
## 200623 21:00
## 200624 21:00
## 200625 21:00
tabyl(bd, vcClaveTienda, NombreDepartamento)
## vcClaveTienda Abarrotes Bebes e Infantiles Carnes Farmacia Ferretería Mercería
## MX001 95415 515 1 147 245 28
## MX002 6590 21 0 4 10 0
## MX003 4026 15 0 2 8 0
## MX004 82234 932 0 102 114 16
## MX005 10014 0 0 0 0 0
## Papelería Productos a Eliminar Vinos y Licores
## 35 3 80
## 0 0 4
## 0 0 0
## 32 5 20
## 7 0 0
bd1 <- bd
bd1 <- subset(bd1, select = -c (PLU, Codigo.Barras))
bd2 <- bd1
bd2 <- bd2[bd2$Precio > 0, ]
summary(bd1)
## vcClaveTienda DescGiro Fecha Hora
## Length:200625 Length:200625 Length:200625 Length:200625
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Marca Fabricante Producto Precio
## Length:200625 Length:200625 Length:200625 Min. :-147.00
## Class :character Class :character Class :character 1st Qu.: 11.00
## Mode :character Mode :character Mode :character Median : 16.00
## Mean : 19.42
## 3rd Qu.: 25.00
## Max. :1000.00
## Ult.Costo Unidades F.Ticket NombreDepartamento
## Min. : 0.38 Min. : 0.200 Min. : 1 Length:200625
## 1st Qu.: 8.46 1st Qu.: 1.000 1st Qu.: 33964 Class :character
## Median : 12.31 Median : 1.000 Median :105993 Mode :character
## Mean : 15.31 Mean : 1.262 Mean :193990
## 3rd Qu.: 19.23 3rd Qu.: 1.000 3rd Qu.:383005
## Max. :769.23 Max. :96.000 Max. :450040
## NombreFamilia NombreCategoria Estado Mts.2
## Length:200625 Length:200625 Length:200625 Min. :47.0
## Class :character Class :character Class :character 1st Qu.:53.0
## Mode :character Mode :character Mode :character Median :60.0
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Tipo.ubicación Giro Hora.inicio Hora.cierre
## Length:200625 Length:200625 Length:200625 Length:200625
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
summary(bd2)
## vcClaveTienda DescGiro Fecha Hora
## Length:200478 Length:200478 Length:200478 Length:200478
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Marca Fabricante Producto Precio
## Length:200478 Length:200478 Length:200478 Min. : 0.50
## Class :character Class :character Class :character 1st Qu.: 11.00
## Mode :character Mode :character Mode :character Median : 16.00
## Mean : 19.45
## 3rd Qu.: 25.00
## Max. :1000.00
## Ult.Costo Unidades F.Ticket NombreDepartamento
## Min. : 0.38 Min. : 0.200 Min. : 1 Length:200478
## 1st Qu.: 8.46 1st Qu.: 1.000 1st Qu.: 33977 Class :character
## Median : 12.31 Median : 1.000 Median :106034 Mode :character
## Mean : 15.31 Mean : 1.261 Mean :194096
## 3rd Qu.: 19.23 3rd Qu.: 1.000 3rd Qu.:383062
## Max. :769.23 Max. :96.000 Max. :450040
## NombreFamilia NombreCategoria Estado Mts.2
## Length:200478 Length:200478 Length:200478 Min. :47.0
## Class :character Class :character Class :character 1st Qu.:53.0
## Mode :character Mode :character Mode :character Median :60.0
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Tipo.ubicación Giro Hora.inicio Hora.cierre
## Length:200478 Length:200478 Length:200478 Length:200478
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
Esto no lo usaremos, pondremos precios negativos como absoluto
bd1[duplicated(bd1),]
## vcClaveTienda DescGiro Fecha Hora Marca
## 6 MX001 Abarrotes 6/19/2020 08:16:21 a. m. NUTRI LECHE
## 7 MX001 Abarrotes 6/19/2020 08:23:33 a. m. DAN UP
## 8 MX001 Abarrotes 6/19/2020 08:24:33 a. m. BIMBO
## 9 MX001 Abarrotes 6/19/2020 08:24:33 a. m. PEPSI
## 10 MX001 Abarrotes 6/19/2020 08:26:28 a. m. BLANCA NIEVES (DETERGENTE)
## Fabricante Producto Precio
## 6 MEXILAC Nutri Leche 1 Litro 16.0
## 7 DANONE DE MEXICO DANUP STRAWBERRY P/BEBER 350GR NAL 14.0
## 8 GRUPO BIMBO Rebanadas Bimbo 2Pz 5.0
## 9 PEPSI-COLA MEXICANA Pepsi N.R. 400Ml 8.0
## 10 FABRICA DE JABON LA CORONA Detergente Blanca Nieves 500G 19.5
## Ult.Costo Unidades F.Ticket NombreDepartamento NombreFamilia
## 6 12.31 1 1 Abarrotes Lacteos y Refrigerados
## 7 14.00 1 2 Abarrotes Lacteos y Refrigerados
## 8 5.00 1 3 Abarrotes Pan y Tortilla
## 9 8.00 1 3 Abarrotes Bebidas
## 10 15.00 1 4 Abarrotes Limpieza del Hogar
## NombreCategoria Estado Mts.2 Tipo.ubicación Giro
## 6 Leche Nuevo León 60 Esquina Abarrotes
## 7 Yogurt Nuevo León 60 Esquina Abarrotes
## 8 Pan Dulce Empaquetado Nuevo León 60 Esquina Abarrotes
## 9 Refrescos Plástico (N.R.) Nuevo León 60 Esquina Abarrotes
## 10 Lavandería Nuevo León 60 Esquina Abarrotes
## Hora.inicio Hora.cierre
## 6 8:00 22:00
## 7 8:00 22:00
## 8 8:00 22:00
## 9 8:00 22:00
## 10 8:00 22:00
sum(duplicated(bd1))
## [1] 5
#install.packages("tidyverse")
library(tidyverse)
#install.packages("janitor")
library(janitor)
bd3 <- bd1
bd3 <- distinct(bd3)
bd4 <- bd3
bd4$Precio <- abs(bd4$Precio)
summary(bd4)
## vcClaveTienda DescGiro Fecha Hora
## Length:200620 Length:200620 Length:200620 Length:200620
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Marca Fabricante Producto Precio
## Length:200620 Length:200620 Length:200620 Min. : 0.50
## Class :character Class :character Class :character 1st Qu.: 11.00
## Mode :character Mode :character Mode :character Median : 16.00
## Mean : 19.45
## 3rd Qu.: 25.00
## Max. :1000.00
## Ult.Costo Unidades F.Ticket NombreDepartamento
## Min. : 0.38 Min. : 0.200 Min. : 1 Length:200620
## 1st Qu.: 8.46 1st Qu.: 1.000 1st Qu.: 33967 Class :character
## Median : 12.31 Median : 1.000 Median :105996 Mode :character
## Mean : 15.31 Mean : 1.262 Mean :193994
## 3rd Qu.: 19.23 3rd Qu.: 1.000 3rd Qu.:383008
## Max. :769.23 Max. :96.000 Max. :450040
## NombreFamilia NombreCategoria Estado Mts.2
## Length:200620 Length:200620 Length:200620 Min. :47.0
## Class :character Class :character Class :character 1st Qu.:53.0
## Mode :character Mode :character Mode :character Median :60.0
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Tipo.ubicación Giro Hora.inicio Hora.cierre
## Length:200620 Length:200620 Length:200620 Length:200620
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
bd5 <- bd4
bd4$Unidades <- ceiling(bd5$Unidades)
summary(bd5)
## vcClaveTienda DescGiro Fecha Hora
## Length:200620 Length:200620 Length:200620 Length:200620
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Marca Fabricante Producto Precio
## Length:200620 Length:200620 Length:200620 Min. : 0.50
## Class :character Class :character Class :character 1st Qu.: 11.00
## Mode :character Mode :character Mode :character Median : 16.00
## Mean : 19.45
## 3rd Qu.: 25.00
## Max. :1000.00
## Ult.Costo Unidades F.Ticket NombreDepartamento
## Min. : 0.38 Min. : 0.200 Min. : 1 Length:200620
## 1st Qu.: 8.46 1st Qu.: 1.000 1st Qu.: 33967 Class :character
## Median : 12.31 Median : 1.000 Median :105996 Mode :character
## Mean : 15.31 Mean : 1.262 Mean :193994
## 3rd Qu.: 19.23 3rd Qu.: 1.000 3rd Qu.:383008
## Max. :769.23 Max. :96.000 Max. :450040
## NombreFamilia NombreCategoria Estado Mts.2
## Length:200620 Length:200620 Length:200620 Min. :47.0
## Class :character Class :character Class :character 1st Qu.:53.0
## Mode :character Mode :character Mode :character Median :60.0
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Tipo.ubicación Giro Hora.inicio Hora.cierre
## Length:200620 Length:200620 Length:200620 Length:200620
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
bd6 <- bd5
bd6$Fecha <- as.Date(bd6$Fecha, format = "%m/%d/%Y")
tibble(bd6)
## # A tibble: 200,620 × 20
## vcCla…¹ DescG…² Fecha Hora Marca Fabri…³ Produ…⁴ Precio Ult.C…⁵ Unida…⁶
## <chr> <chr> <date> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 MX001 Abarro… 2020-06-19 08:1… NUTR… MEXILAC Nutri … 16 12.3 1
## 2 MX001 Abarro… 2020-06-19 08:2… DAN … DANONE… DANUP … 14 14 1
## 3 MX001 Abarro… 2020-06-19 08:2… BIMBO GRUPO … Rebana… 5 5 1
## 4 MX001 Abarro… 2020-06-19 08:2… PEPSI PEPSI-… Pepsi … 8 8 1
## 5 MX001 Abarro… 2020-06-19 08:2… BLAN… FABRIC… Deterg… 19.5 15 1
## 6 MX001 Abarro… 2020-06-19 08:2… FLASH ALEN Flash … 9.5 7.31 1
## 7 MX001 Abarro… 2020-06-19 08:2… VARI… DANONE… Danone… 11 11 1
## 8 MX001 Abarro… 2020-06-19 08:2… ZOTE FABRIC… Jabon … 9.5 7.31 1
## 9 MX001 Abarro… 2020-06-19 08:2… ALWA… PROCTE… T Feme… 23.5 18.1 1
## 10 MX001 Abarro… 2020-06-19 03:2… JUMEX JUMEX Jugo D… 12 12 1
## # … with 200,610 more rows, 10 more variables: F.Ticket <int>,
## # NombreDepartamento <chr>, NombreFamilia <chr>, NombreCategoria <chr>,
## # Estado <chr>, Mts.2 <int>, Tipo.ubicación <chr>, Giro <chr>,
## # Hora.inicio <chr>, Hora.cierre <chr>, and abbreviated variable names
## # ¹vcClaveTienda, ²DescGiro, ³Fabricante, ⁴Producto, ⁵Ult.Costo, ⁶Unidades
bd7 <- bd6
bd7$Hora <- substr(bd7$Hora, start = 1, stop = 2)
tibble(bd7)
## # A tibble: 200,620 × 20
## vcCla…¹ DescG…² Fecha Hora Marca Fabri…³ Produ…⁴ Precio Ult.C…⁵ Unida…⁶
## <chr> <chr> <date> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 MX001 Abarro… 2020-06-19 08 NUTR… MEXILAC Nutri … 16 12.3 1
## 2 MX001 Abarro… 2020-06-19 08 DAN … DANONE… DANUP … 14 14 1
## 3 MX001 Abarro… 2020-06-19 08 BIMBO GRUPO … Rebana… 5 5 1
## 4 MX001 Abarro… 2020-06-19 08 PEPSI PEPSI-… Pepsi … 8 8 1
## 5 MX001 Abarro… 2020-06-19 08 BLAN… FABRIC… Deterg… 19.5 15 1
## 6 MX001 Abarro… 2020-06-19 08 FLASH ALEN Flash … 9.5 7.31 1
## 7 MX001 Abarro… 2020-06-19 08 VARI… DANONE… Danone… 11 11 1
## 8 MX001 Abarro… 2020-06-19 08 ZOTE FABRIC… Jabon … 9.5 7.31 1
## 9 MX001 Abarro… 2020-06-19 08 ALWA… PROCTE… T Feme… 23.5 18.1 1
## 10 MX001 Abarro… 2020-06-19 03 JUMEX JUMEX Jugo D… 12 12 1
## # … with 200,610 more rows, 10 more variables: F.Ticket <int>,
## # NombreDepartamento <chr>, NombreFamilia <chr>, NombreCategoria <chr>,
## # Estado <chr>, Mts.2 <int>, Tipo.ubicación <chr>, Giro <chr>,
## # Hora.inicio <chr>, Hora.cierre <chr>, and abbreviated variable names
## # ¹vcClaveTienda, ²DescGiro, ³Fabricante, ⁴Producto, ⁵Ult.Costo, ⁶Unidades
bd7$Hora <- as.integer(bd7$Hora)
str(bd7)
## 'data.frame': 200620 obs. of 20 variables:
## $ vcClaveTienda : chr "MX001" "MX001" "MX001" "MX001" ...
## $ DescGiro : chr "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ Fecha : Date, format: "2020-06-19" "2020-06-19" ...
## $ Hora : int 8 8 8 8 8 8 8 8 8 3 ...
## $ Marca : chr "NUTRI LECHE" "DAN UP" "BIMBO" "PEPSI" ...
## $ Fabricante : chr "MEXILAC" "DANONE DE MEXICO" "GRUPO BIMBO" "PEPSI-COLA MEXICANA" ...
## $ Producto : chr "Nutri Leche 1 Litro" "DANUP STRAWBERRY P/BEBER 350GR NAL" "Rebanadas Bimbo 2Pz" "Pepsi N.R. 400Ml" ...
## $ Precio : num 16 14 5 8 19.5 9.5 11 9.5 23.5 12 ...
## $ Ult.Costo : num 12.3 14 5 8 15 ...
## $ Unidades : num 1 1 1 1 1 1 1 1 1 1 ...
## $ F.Ticket : int 1 2 3 3 4 4 4 4 4 5 ...
## $ NombreDepartamento: chr "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ NombreFamilia : chr "Lacteos y Refrigerados" "Lacteos y Refrigerados" "Pan y Tortilla" "Bebidas" ...
## $ NombreCategoria : chr "Leche" "Yogurt" "Pan Dulce Empaquetado" "Refrescos Plástico (N.R.)" ...
## $ Estado : chr "Nuevo León" "Nuevo León" "Nuevo León" "Nuevo León" ...
## $ Mts.2 : int 60 60 60 60 60 60 60 60 60 60 ...
## $ Tipo.ubicación : chr "Esquina" "Esquina" "Esquina" "Esquina" ...
## $ Giro : chr "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
## $ Hora.inicio : chr "8:00" "8:00" "8:00" "8:00" ...
## $ Hora.cierre : chr "22:00" "22:00" "22:00" "22:00" ...
sum(is.na(bd7))
## [1] 0
sum(is.na(bd))
## [1] 199188
sapply(bd7, function(x) sum(is.na(x)))
## vcClaveTienda DescGiro Fecha Hora
## 0 0 0 0
## Marca Fabricante Producto Precio
## 0 0 0 0
## Ult.Costo Unidades F.Ticket NombreDepartamento
## 0 0 0 0
## NombreFamilia NombreCategoria Estado Mts.2
## 0 0 0 0
## Tipo.ubicación Giro Hora.inicio Hora.cierre
## 0 0 0 0
sapply(bd, function(x) sum(is.na(x)))
## vcClaveTienda DescGiro Codigo.Barras PLU
## 0 0 0 199188
## Fecha Hora Marca Fabricante
## 0 0 0 0
## Producto Precio Ult.Costo Unidades
## 0 0 0 0
## F.Ticket NombreDepartamento NombreFamilia NombreCategoria
## 0 0 0 0
## Estado Mts.2 Tipo.ubicación Giro
## 0 0 0 0
## Hora.inicio Hora.cierre
## 0 0
bd8 <- bd
bd8 <- na.omit(bd8)
summary(bd8)
## vcClaveTienda DescGiro Codigo.Barras PLU
## Length:1437 Length:1437 Min. :6.750e+08 Min. : 1.000
## Class :character Class :character 1st Qu.:6.750e+08 1st Qu.: 1.000
## Mode :character Mode :character Median :6.750e+08 Median : 1.000
## Mean :2.616e+11 Mean : 2.112
## 3rd Qu.:6.750e+08 3rd Qu.: 1.000
## Max. :7.501e+12 Max. :30.000
## Fecha Hora Marca Fabricante
## Length:1437 Length:1437 Length:1437 Length:1437
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Producto Precio Ult.Costo Unidades
## Length:1437 Min. :30.00 Min. : 1.00 Min. :1.000
## Class :character 1st Qu.:90.00 1st Qu.:64.62 1st Qu.:1.000
## Mode :character Median :90.00 Median :64.62 Median :1.000
## Mean :87.94 Mean :56.65 Mean :1.124
## 3rd Qu.:90.00 3rd Qu.:64.62 3rd Qu.:1.000
## Max. :90.00 Max. :64.62 Max. :7.000
## F.Ticket NombreDepartamento NombreFamilia NombreCategoria
## Min. : 772 Length:1437 Length:1437 Length:1437
## 1st Qu.: 99955 Class :character Class :character Class :character
## Median :102493 Mode :character Mode :character Mode :character
## Mean :100595
## 3rd Qu.:106546
## Max. :118356
## Estado Mts.2 Tipo.ubicación Giro
## Length:1437 Min. :58.00 Length:1437 Length:1437
## Class :character 1st Qu.:58.00 Class :character Class :character
## Mode :character Median :58.00 Mode :character Mode :character
## Mean :58.07
## 3rd Qu.:58.00
## Max. :60.00
## Hora.inicio Hora.cierre
## Length:1437 Length:1437
## Class :character Class :character
## Mode :character Mode :character
##
##
##
bd9 <- bd
bd9[is.na(bd9)]<-0
summary(bd9)
## vcClaveTienda DescGiro Codigo.Barras PLU
## Length:200625 Length:200625 Min. :8.347e+05 Min. : 0.00000
## Class :character Class :character 1st Qu.:7.501e+12 1st Qu.: 0.00000
## Mode :character Mode :character Median :7.501e+12 Median : 0.00000
## Mean :5.950e+12 Mean : 0.01513
## 3rd Qu.:7.501e+12 3rd Qu.: 0.00000
## Max. :1.750e+13 Max. :30.00000
## Fecha Hora Marca Fabricante
## Length:200625 Length:200625 Length:200625 Length:200625
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Producto Precio Ult.Costo Unidades
## Length:200625 Min. :-147.00 Min. : 0.38 Min. : 0.200
## Class :character 1st Qu.: 11.00 1st Qu.: 8.46 1st Qu.: 1.000
## Mode :character Median : 16.00 Median : 12.31 Median : 1.000
## Mean : 19.42 Mean : 15.31 Mean : 1.262
## 3rd Qu.: 25.00 3rd Qu.: 19.23 3rd Qu.: 1.000
## Max. :1000.00 Max. :769.23 Max. :96.000
## F.Ticket NombreDepartamento NombreFamilia NombreCategoria
## Min. : 1 Length:200625 Length:200625 Length:200625
## 1st Qu.: 33964 Class :character Class :character Class :character
## Median :105993 Mode :character Mode :character Mode :character
## Mean :193990
## 3rd Qu.:383005
## Max. :450040
## Estado Mts.2 Tipo.ubicación Giro
## Length:200625 Min. :47.0 Length:200625 Length:200625
## Class :character 1st Qu.:53.0 Class :character Class :character
## Mode :character Median :60.0 Mode :character Mode :character
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Hora.inicio Hora.cierre
## Length:200625 Length:200625
## Class :character Class :character
## Mode :character Mode :character
##
##
##
bd10 <- bd
bd10$PLU[is.na(bd10$PLU)] <- mean(bd10$PLU, na.rm = TRUE)
summary(bd10)
## vcClaveTienda DescGiro Codigo.Barras PLU
## Length:200625 Length:200625 Min. :8.347e+05 Min. : 1.000
## Class :character Class :character 1st Qu.:7.501e+12 1st Qu.: 2.112
## Mode :character Mode :character Median :7.501e+12 Median : 2.112
## Mean :5.950e+12 Mean : 2.112
## 3rd Qu.:7.501e+12 3rd Qu.: 2.112
## Max. :1.750e+13 Max. :30.000
## Fecha Hora Marca Fabricante
## Length:200625 Length:200625 Length:200625 Length:200625
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Producto Precio Ult.Costo Unidades
## Length:200625 Min. :-147.00 Min. : 0.38 Min. : 0.200
## Class :character 1st Qu.: 11.00 1st Qu.: 8.46 1st Qu.: 1.000
## Mode :character Median : 16.00 Median : 12.31 Median : 1.000
## Mean : 19.42 Mean : 15.31 Mean : 1.262
## 3rd Qu.: 25.00 3rd Qu.: 19.23 3rd Qu.: 1.000
## Max. :1000.00 Max. :769.23 Max. :96.000
## F.Ticket NombreDepartamento NombreFamilia NombreCategoria
## Min. : 1 Length:200625 Length:200625 Length:200625
## 1st Qu.: 33964 Class :character Class :character Class :character
## Median :105993 Mode :character Mode :character Mode :character
## Mean :193990
## 3rd Qu.:383005
## Max. :450040
## Estado Mts.2 Tipo.ubicación Giro
## Length:200625 Min. :47.0 Length:200625 Length:200625
## Class :character 1st Qu.:53.0 Class :character Class :character
## Mode :character Median :60.0 Mode :character Mode :character
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Hora.inicio Hora.cierre
## Length:200625 Length:200625
## Class :character Class :character
## Mode :character Mode :character
##
##
##
bd11 <- bd
bd11[bd11 < 0] <- 0
summary(bd11)
## vcClaveTienda DescGiro Codigo.Barras PLU
## Length:200625 Length:200625 Min. :8.347e+05 Min. : 1.00
## Class :character Class :character 1st Qu.:7.501e+12 1st Qu.: 1.00
## Mode :character Mode :character Median :7.501e+12 Median : 1.00
## Mean :5.950e+12 Mean : 2.11
## 3rd Qu.:7.501e+12 3rd Qu.: 1.00
## Max. :1.750e+13 Max. :30.00
## NA's :199188
## Fecha Hora Marca Fabricante
## Length:200625 Length:200625 Length:200625 Length:200625
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Producto Precio Ult.Costo Unidades
## Length:200625 Min. : 0.00 Min. : 0.38 Min. : 0.200
## Class :character 1st Qu.: 11.00 1st Qu.: 8.46 1st Qu.: 1.000
## Mode :character Median : 16.00 Median : 12.31 Median : 1.000
## Mean : 19.44 Mean : 15.31 Mean : 1.262
## 3rd Qu.: 25.00 3rd Qu.: 19.23 3rd Qu.: 1.000
## Max. :1000.00 Max. :769.23 Max. :96.000
##
## F.Ticket NombreDepartamento NombreFamilia NombreCategoria
## Min. : 1 Length:200625 Length:200625 Length:200625
## 1st Qu.: 33964 Class :character Class :character Class :character
## Median :105993 Mode :character Mode :character Mode :character
## Mean :193990
## 3rd Qu.:383005
## Max. :450040
##
## Estado Mts.2 Tipo.ubicación Giro
## Length:200625 Min. :47.0 Length:200625 Length:200625
## Class :character 1st Qu.:53.0 Class :character Class :character
## Mode :character Median :60.0 Mode :character Mode :character
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
##
## Hora.inicio Hora.cierre
## Length:200625 Length:200625
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
bd12 <- bd7
boxplot(bd12$Precio, horizontal = TRUE)
boxplot(bd12$Unidades, horizontal = TRUE)
#install.packages("lubridate")
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
bd12$Dia_de_la_semana <- wday(bd12$Fecha)
summary(bd12)
## vcClaveTienda DescGiro Fecha Hora
## Length:200620 Length:200620 Min. :2020-05-01 Min. : 1.000
## Class :character Class :character 1st Qu.:2020-06-06 1st Qu.: 5.000
## Mode :character Mode :character Median :2020-07-11 Median : 8.000
## Mean :2020-07-18 Mean : 7.299
## 3rd Qu.:2020-08-29 3rd Qu.:10.000
## Max. :2020-11-11 Max. :12.000
## Marca Fabricante Producto Precio
## Length:200620 Length:200620 Length:200620 Min. : 0.50
## Class :character Class :character Class :character 1st Qu.: 11.00
## Mode :character Mode :character Mode :character Median : 16.00
## Mean : 19.45
## 3rd Qu.: 25.00
## Max. :1000.00
## Ult.Costo Unidades F.Ticket NombreDepartamento
## Min. : 0.38 Min. : 0.200 Min. : 1 Length:200620
## 1st Qu.: 8.46 1st Qu.: 1.000 1st Qu.: 33967 Class :character
## Median : 12.31 Median : 1.000 Median :105996 Mode :character
## Mean : 15.31 Mean : 1.262 Mean :193994
## 3rd Qu.: 19.23 3rd Qu.: 1.000 3rd Qu.:383008
## Max. :769.23 Max. :96.000 Max. :450040
## NombreFamilia NombreCategoria Estado Mts.2
## Length:200620 Length:200620 Length:200620 Min. :47.0
## Class :character Class :character Class :character 1st Qu.:53.0
## Mode :character Mode :character Mode :character Median :60.0
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Tipo.ubicación Giro Hora.inicio Hora.cierre
## Length:200620 Length:200620 Length:200620 Length:200620
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Dia_de_la_semana
## Min. :1.000
## 1st Qu.:2.000
## Median :4.000
## Mean :3.912
## 3rd Qu.:6.000
## Max. :7.000
bd12$SubTotal <- bd12$Precio * bd12$Unidades
summary(bd12)
## vcClaveTienda DescGiro Fecha Hora
## Length:200620 Length:200620 Min. :2020-05-01 Min. : 1.000
## Class :character Class :character 1st Qu.:2020-06-06 1st Qu.: 5.000
## Mode :character Mode :character Median :2020-07-11 Median : 8.000
## Mean :2020-07-18 Mean : 7.299
## 3rd Qu.:2020-08-29 3rd Qu.:10.000
## Max. :2020-11-11 Max. :12.000
## Marca Fabricante Producto Precio
## Length:200620 Length:200620 Length:200620 Min. : 0.50
## Class :character Class :character Class :character 1st Qu.: 11.00
## Mode :character Mode :character Mode :character Median : 16.00
## Mean : 19.45
## 3rd Qu.: 25.00
## Max. :1000.00
## Ult.Costo Unidades F.Ticket NombreDepartamento
## Min. : 0.38 Min. : 0.200 Min. : 1 Length:200620
## 1st Qu.: 8.46 1st Qu.: 1.000 1st Qu.: 33967 Class :character
## Median : 12.31 Median : 1.000 Median :105996 Mode :character
## Mean : 15.31 Mean : 1.262 Mean :193994
## 3rd Qu.: 19.23 3rd Qu.: 1.000 3rd Qu.:383008
## Max. :769.23 Max. :96.000 Max. :450040
## NombreFamilia NombreCategoria Estado Mts.2
## Length:200620 Length:200620 Length:200620 Min. :47.0
## Class :character Class :character Class :character 1st Qu.:53.0
## Mode :character Mode :character Mode :character Median :60.0
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Tipo.ubicación Giro Hora.inicio Hora.cierre
## Length:200620 Length:200620 Length:200620 Length:200620
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Dia_de_la_semana SubTotal
## Min. :1.000 Min. : 1.00
## 1st Qu.:2.000 1st Qu.: 12.00
## Median :4.000 Median : 18.00
## Mean :3.912 Mean : 24.33
## 3rd Qu.:6.000 3rd Qu.: 27.00
## Max. :7.000 Max. :2496.00
bd12$Utilidad <- bd12$Precio - bd12$Ult.Costo
summary(bd12)
## vcClaveTienda DescGiro Fecha Hora
## Length:200620 Length:200620 Min. :2020-05-01 Min. : 1.000
## Class :character Class :character 1st Qu.:2020-06-06 1st Qu.: 5.000
## Mode :character Mode :character Median :2020-07-11 Median : 8.000
## Mean :2020-07-18 Mean : 7.299
## 3rd Qu.:2020-08-29 3rd Qu.:10.000
## Max. :2020-11-11 Max. :12.000
## Marca Fabricante Producto Precio
## Length:200620 Length:200620 Length:200620 Min. : 0.50
## Class :character Class :character Class :character 1st Qu.: 11.00
## Mode :character Mode :character Mode :character Median : 16.00
## Mean : 19.45
## 3rd Qu.: 25.00
## Max. :1000.00
## Ult.Costo Unidades F.Ticket NombreDepartamento
## Min. : 0.38 Min. : 0.200 Min. : 1 Length:200620
## 1st Qu.: 8.46 1st Qu.: 1.000 1st Qu.: 33967 Class :character
## Median : 12.31 Median : 1.000 Median :105996 Mode :character
## Mean : 15.31 Mean : 1.262 Mean :193994
## 3rd Qu.: 19.23 3rd Qu.: 1.000 3rd Qu.:383008
## Max. :769.23 Max. :96.000 Max. :450040
## NombreFamilia NombreCategoria Estado Mts.2
## Length:200620 Length:200620 Length:200620 Min. :47.0
## Class :character Class :character Class :character 1st Qu.:53.0
## Mode :character Mode :character Mode :character Median :60.0
## Mean :56.6
## 3rd Qu.:60.0
## Max. :62.0
## Tipo.ubicación Giro Hora.inicio Hora.cierre
## Length:200620 Length:200620 Length:200620 Length:200620
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Dia_de_la_semana SubTotal Utilidad
## Min. :1.000 Min. : 1.00 Min. : 0.000
## 1st Qu.:2.000 1st Qu.: 12.00 1st Qu.: 2.310
## Median :4.000 Median : 18.00 Median : 3.230
## Mean :3.912 Mean : 24.33 Mean : 4.142
## 3rd Qu.:6.000 3rd Qu.: 27.00 3rd Qu.: 5.420
## Max. :7.000 Max. :2496.00 Max. :230.770
bd_limpia <- bd12
write.csv(bd_limpia, file = "abarrotes_bd_limpia.csv", row.names = FALSE)
install.packages(“plyr”)
library(Matrix)
install.packages(“arules”)
library(arules)
install.packages(“arulesViz”)
library(arulesViz)
library(datasets)
bd_limpia <- bd_limpia[order(bd_limpia$F.Ticket),]
head(bd_limpia)
## vcClaveTienda DescGiro Fecha Hora Marca
## 1 MX001 Abarrotes 2020-06-19 8 NUTRI LECHE
## 2 MX001 Abarrotes 2020-06-19 8 DAN UP
## 3 MX001 Abarrotes 2020-06-19 8 BIMBO
## 4 MX001 Abarrotes 2020-06-19 8 PEPSI
## 5 MX001 Abarrotes 2020-06-19 8 BLANCA NIEVES (DETERGENTE)
## 6 MX001 Abarrotes 2020-06-19 8 FLASH
## Fabricante Producto Precio
## 1 MEXILAC Nutri Leche 1 Litro 16.0
## 2 DANONE DE MEXICO DANUP STRAWBERRY P/BEBER 350GR NAL 14.0
## 3 GRUPO BIMBO Rebanadas Bimbo 2Pz 5.0
## 4 PEPSI-COLA MEXICANA Pepsi N.R. 400Ml 8.0
## 5 FABRICA DE JABON LA CORONA Detergente Blanca Nieves 500G 19.5
## 6 ALEN Flash Xtra Brisa Marina 500Ml 9.5
## Ult.Costo Unidades F.Ticket NombreDepartamento NombreFamilia
## 1 12.31 1 1 Abarrotes Lacteos y Refrigerados
## 2 14.00 1 2 Abarrotes Lacteos y Refrigerados
## 3 5.00 1 3 Abarrotes Pan y Tortilla
## 4 8.00 1 3 Abarrotes Bebidas
## 5 15.00 1 4 Abarrotes Limpieza del Hogar
## 6 7.31 1 4 Abarrotes Limpieza del Hogar
## NombreCategoria Estado Mts.2 Tipo.ubicación Giro
## 1 Leche Nuevo León 60 Esquina Abarrotes
## 2 Yogurt Nuevo León 60 Esquina Abarrotes
## 3 Pan Dulce Empaquetado Nuevo León 60 Esquina Abarrotes
## 4 Refrescos Plástico (N.R.) Nuevo León 60 Esquina Abarrotes
## 5 Lavandería Nuevo León 60 Esquina Abarrotes
## 6 Limpiadores Líquidos Nuevo León 60 Esquina Abarrotes
## Hora.inicio Hora.cierre Dia_de_la_semana SubTotal Utilidad
## 1 8:00 22:00 6 16.0 3.69
## 2 8:00 22:00 6 14.0 0.00
## 3 8:00 22:00 6 5.0 0.00
## 4 8:00 22:00 6 8.0 0.00
## 5 8:00 22:00 6 19.5 4.50
## 6 8:00 22:00 6 9.5 2.19
tail(bd_limpia)
## vcClaveTienda DescGiro Fecha Hora Marca
## 107394 MX004 Carnicería 2020-10-15 11 YEMINA
## 167771 MX004 Carnicería 2020-10-15 11 DEL FUERTE
## 149429 MX004 Carnicería 2020-10-15 11 COCA COLA ZERO
## 168750 MX004 Carnicería 2020-10-15 11 DIAMANTE
## 161193 MX004 Carnicería 2020-10-15 12 PEPSI
## 112970 MX004 Carnicería 2020-10-15 12 COCA COLA
## Fabricante Producto Precio Ult.Costo
## 107394 HERDEZ PASTA SPAGHETTI YEMINA 200G 7 5.38
## 167771 ALIMENTOS DEL FUERTE PURE DE TOMATE DEL FUERTE 345G 12 9.23
## 149429 COCA COLA COCA COLA ZERO 600ML 15 11.54
## 168750 EMPACADOS ARROZ DIAMANTE225G 11 8.46
## 161193 PEPSI-COLA MEXICANA PEPSI N. R. 500ML 10 7.69
## 112970 COCA COLA COCA COLA RETORNABLE 500ML 10 7.69
## Unidades F.Ticket NombreDepartamento NombreFamilia
## 107394 2 450032 Abarrotes Sopas y Pastas
## 167771 1 450032 Abarrotes Salsas y Sazonadores
## 149429 2 450034 Abarrotes Bebidas
## 168750 1 450037 Abarrotes Granos y Semillas
## 161193 1 450039 Abarrotes Bebidas
## 112970 8 450040 Abarrotes Bebidas
## NombreCategoria Estado Mts.2 Tipo.ubicación Giro
## 107394 Fideos, Spaguetti, Tallarines Sinaloa 53 Esquina Abarrotes
## 167771 Salsa para Spaguetti Sinaloa 53 Esquina Abarrotes
## 149429 Refrescos Retornables Sinaloa 53 Esquina Abarrotes
## 168750 Arroz Sinaloa 53 Esquina Abarrotes
## 161193 Refrescos Plástico (N.R.) Sinaloa 53 Esquina Abarrotes
## 112970 Refrescos Retornables Sinaloa 53 Esquina Abarrotes
## Hora.inicio Hora.cierre Dia_de_la_semana SubTotal Utilidad
## 107394 7:00 23:00 5 14 1.62
## 167771 7:00 23:00 5 12 2.77
## 149429 7:00 23:00 5 30 3.46
## 168750 7:00 23:00 5 11 2.54
## 161193 7:00 23:00 5 10 2.31
## 112970 7:00 23:00 5 80 2.31
#install.packages("plyr")
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:purrr':
##
## compact
basket <- ddply(bd_limpia,c("F.Ticket"), function(bd_limpia)paste(bd_limpia$Marca, collapse = ","))
basket$F.Ticket <- NULL
colnames(basket) <- c("Marca")
write.csv(basket,"basket.csv", quote = FALSE, row.names = FALSE)
#install.packages("arules")
library(arules)
library(arulesViz)
#file.choose()
tr <- read.transactions("/Users/anita3/basket.csv", format = "basket", sep= ",")
reglas.asociacion <- apriori(tr, parameter = list(supp=0.001, conf=0.2, maxlen=10))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 115
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[604 item(s), 115111 transaction(s)] done [0.02s].
## sorting and recoding items ... [207 item(s)] done [0.00s].
## creating transaction tree ... done [0.03s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [11 rule(s)] done [0.00s].
## creating S4 object ... done [0.01s].
summary(reglas.asociacion)
## set of 11 rules
##
## rule length distribution (lhs + rhs):sizes
## 2
## 11
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2 2 2 2 2 2
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.001016 Min. :0.2069 Min. :0.003562 Min. : 1.325
## 1st Qu.:0.001103 1st Qu.:0.2356 1st Qu.:0.004504 1st Qu.: 1.787
## Median :0.001416 Median :0.2442 Median :0.005803 Median : 3.972
## Mean :0.001519 Mean :0.2536 Mean :0.006054 Mean :17.563
## 3rd Qu.:0.001651 3rd Qu.:0.2685 3rd Qu.:0.006893 3rd Qu.:21.798
## Max. :0.002745 Max. :0.3098 Max. :0.010503 Max. :65.908
## count
## Min. :117.0
## 1st Qu.:127.0
## Median :163.0
## Mean :174.9
## 3rd Qu.:190.0
## Max. :316.0
##
## mining info:
## data ntransactions support confidence
## tr 115111 0.001 0.2
## call
## apriori(data = tr, parameter = list(supp = 0.001, conf = 0.2, maxlen = 10))
inspect(reglas.asociacion)
## lhs rhs support confidence coverage
## [1] {FANTA} => {COCA COLA} 0.001051159 0.2439516 0.004308884
## [2] {SALVO} => {FABULOSO} 0.001103283 0.3097561 0.003561779
## [3] {FABULOSO} => {SALVO} 0.001103283 0.2347505 0.004699811
## [4] {COCA COLA ZERO} => {COCA COLA} 0.001416025 0.2969035 0.004769310
## [5] {SPRITE} => {COCA COLA} 0.001346526 0.2069426 0.006506763
## [6] {PINOL} => {CLORALEX} 0.001016410 0.2363636 0.004300197
## [7] {BLUE HOUSE} => {BIMBO} 0.001711392 0.2720994 0.006289581
## [8] {HELLMANN´S} => {BIMBO} 0.001537646 0.2649701 0.005803094
## [9] {REYMA} => {CONVERMEX} 0.002093631 0.2441743 0.008574333
## [10] {FUD} => {BIMBO} 0.001589770 0.2183771 0.007279930
## [11] {COCA COLA LIGHT} => {COCA COLA} 0.002745176 0.2613730 0.010502906
## lift count
## [1] 1.561906 121
## [2] 65.908196 127
## [3] 65.908196 127
## [4] 1.900932 163
## [5] 1.324955 155
## [6] 25.030409 117
## [7] 4.078870 197
## [8] 3.971997 177
## [9] 18.564824 241
## [10] 3.273552 183
## [11] 1.673447 316
top10reglas <- head(reglas.asociacion, n=10, by= "confidence")
plot(top10reglas, method = "graph", engine = "htmlwidget",
nodeCol = gray.colors(10))
Las herramientas de limpieza nos ayudan a encontrar datos duplicados,
valores faltantes, errores tipográficos y errores similares para poder
enmendarlos. Es importante tener una base de datos limpia para tomar
decisiones, ya que, los datos son un activo intangible clave para la
creación de valor.
El Market Basket Analysis nos permitió encontrar correlaciones entre los
productos y los productos con mayor preferencia. Una de las
correlaciones que se encontró fue que las personas que compran
Hellmann’s, Fud y Blue House también compran Bimbo. Con este dato
podemos inferir que las personas que compran estos productos van a
preparar sándwiches, por lo que se podría crear un combo sandwich.
Estas dos técnicas nos permiten generar estrategias eficientes de
marketing de acuerdo a los patrones de compra de los clientes
De acuerdo con David L. Rogers la temperatura del ambiente puede
inferir en nuestro comportamiento de compra.
En este apartado se generará un análisis para afirmar la teoría.
bd_mes <- bd_limpia
bd_mes <- cbind(bd_mes, mes = format(bd_mes$Fecha, "%m"))
bd_mes <- cbind(bd_mes, año = format(bd_mes$Fecha, "%Y"))
bd_mes <- filter(bd_mes, año == "2020")
utilidad_mes <- aggregate(SubTotal ~ mes + Estado, data = bd_mes, sum)
utilidad_mes_mty <- utilidad_mes
utilidad_mes_mty <- filter(utilidad_mes_mty, Estado == "Nuevo León")
plot(utilidad_mes_mty$mes, utilidad_mes_mty$SubTotal, type = "b", main = "Ventas por mes Nuevo León", xlab = "mes", ylab = "ventas")
#install.packages("riem")
library(riem)
view(riem_networks())
view(riem_stations("MX__ASOS"))
monterrey <- riem_measures("MMMY")
str(monterrey)
## tibble [77,687 × 32] (S3: tbl_df/tbl/data.frame)
## $ station : chr [1:77687] "MMMY" "MMMY" "MMMY" "MMMY" ...
## $ valid : POSIXct[1:77687], format: "2014-01-01 00:16:00" "2014-01-01 00:49:00" ...
## $ lon : num [1:77687] -100 -100 -100 -100 -100 ...
## $ lat : num [1:77687] 25.8 25.8 25.8 25.8 25.8 ...
## $ tmpf : num [1:77687] 48.2 48.2 48.2 46.4 46.4 46.4 46.4 46.4 46.4 46.4 ...
## $ dwpf : num [1:77687] 46.4 46.4 46.4 46.4 46.4 44.6 44.6 44.6 44.6 44.6 ...
## $ relh : num [1:77687] 93.5 93.5 93.5 100 100 ...
## $ drct : num [1:77687] 0 120 120 120 110 100 110 130 60 0 ...
## $ sknt : num [1:77687] 0 3 5 6 5 5 4 3 3 0 ...
## $ p01i : num [1:77687] 0 0 0 0 0 0 0 0 0 0 ...
## $ alti : num [1:77687] 30.3 30.3 30.3 30.3 30.3 ...
## $ mslp : num [1:77687] NA NA NA NA NA ...
## $ vsby : num [1:77687] 4 3 1 0.25 0.12 0.12 0.06 0.06 0.06 0.12 ...
## $ gust : num [1:77687] NA NA NA NA NA NA NA NA NA NA ...
## $ skyc1 : chr [1:77687] "SCT" "SCT" "SCT" "VV " ...
## $ skyc2 : chr [1:77687] "BKN" "BKN" "BKN" NA ...
## $ skyc3 : chr [1:77687] "OVC" "OVC" "OVC" NA ...
## $ skyc4 : chr [1:77687] NA NA NA NA ...
## $ skyl1 : num [1:77687] 700 300 200 200 100 100 100 100 100 100 ...
## $ skyl2 : num [1:77687] 1200 400 300 NA NA NA NA NA NA NA ...
## $ skyl3 : num [1:77687] 4000 900 500 NA NA NA NA NA NA NA ...
## $ skyl4 : num [1:77687] NA NA NA NA NA NA NA NA NA NA ...
## $ wxcodes : chr [1:77687] NA "BR" "BR" "FG" ...
## $ ice_accretion_1hr: logi [1:77687] NA NA NA NA NA NA ...
## $ ice_accretion_3hr: logi [1:77687] NA NA NA NA NA NA ...
## $ ice_accretion_6hr: logi [1:77687] NA NA NA NA NA NA ...
## $ peak_wind_gust : logi [1:77687] NA NA NA NA NA NA ...
## $ peak_wind_drct : logi [1:77687] NA NA NA NA NA NA ...
## $ peak_wind_time : logi [1:77687] NA NA NA NA NA NA ...
## $ feel : num [1:77687] 48.2 47.2 45.6 42.9 43.5 ...
## $ metar : chr [1:77687] "MMMY 010016Z 00000KT 4SM SCT007 BKN012 OVC040 09/08 A3028 RMK 8/5// BR" "MMMY 010049Z 12003KT 3SM BR SCT003 BKN004 OVC009 09/08 A3028 RMK 8/5// -DZ OCNL" "MMMY 010116Z 12005KT 1SM BR SCT002 BKN003 OVC005 09/08 A3028 RMK 8/6// -DZ OCNL" "MMMY 010120Z 12006KT 1/4SM FG VV002 08/08 A3029 RMK 8//// BC FG MOV SE/NW" ...
## $ snowdepth : logi [1:77687] NA NA NA NA NA NA ...
summary(monterrey)
## station valid lon
## Length:77687 Min. :2014-01-01 00:16:00.00 Min. :-100.1
## Class :character 1st Qu.:2016-03-08 19:10:00.00 1st Qu.:-100.1
## Mode :character Median :2018-05-01 13:40:00.00 Median :-100.1
## Mean :2018-05-05 23:02:08.88 Mean :-100.1
## 3rd Qu.:2020-06-25 12:10:30.00 3rd Qu.:-100.1
## Max. :2022-09-08 23:40:00.00 Max. :-100.1
##
## lat tmpf dwpf relh
## Min. :25.78 Min. : 23.00 Min. :-5.80 Min. : 2.32
## 1st Qu.:25.78 1st Qu.: 64.40 1st Qu.:51.80 1st Qu.: 48.05
## Median :25.78 Median : 73.40 Median :62.60 Median : 69.14
## Mean :25.78 Mean : 72.47 Mean :57.94 Mean : 65.04
## 3rd Qu.:25.78 3rd Qu.: 80.60 3rd Qu.:68.00 3rd Qu.: 83.32
## Max. :25.78 Max. :111.20 Max. :86.00 Max. :163.20
## NA's :89 NA's :1686 NA's :1741
## drct sknt p01i alti mslp
## Min. : 0.0 Min. : 0.00 Min. :0 Min. : 0.04 Min. : 913.2
## 1st Qu.: 70.0 1st Qu.: 4.00 1st Qu.:0 1st Qu.:29.88 1st Qu.:1011.4
## Median :110.0 Median : 5.00 Median :0 Median :29.97 Median :1014.5
## Mean :130.9 Mean : 5.82 Mean :0 Mean :29.98 Mean :1015.3
## 3rd Qu.:160.0 3rd Qu.: 8.00 3rd Qu.:0 3rd Qu.:30.07 3rd Qu.:1018.4
## Max. :360.0 Max. :98.00 Max. :0 Max. :30.81 Max. :1103.4
## NA's :72 NA's :72 NA's :26 NA's :66688
## vsby gust skyc1 skyc2
## Min. : 0.000 Min. : 13.00 Length:77687 Length:77687
## 1st Qu.: 6.000 1st Qu.: 20.00 Class :character Class :character
## Median :10.000 Median : 24.00 Mode :character Mode :character
## Mean : 9.112 Mean : 24.65
## 3rd Qu.:12.000 3rd Qu.: 28.00
## Max. :40.000 Max. :210.00
## NA's :31 NA's :75232
## skyc3 skyc4 skyl1 skyl2
## Length:77687 Length:77687 Min. : 0 Min. : 0
## Class :character Class :character 1st Qu.: 1500 1st Qu.: 2000
## Mode :character Mode :character Median : 3000 Median : 6000
## Mean : 5390 Mean : 8012
## 3rd Qu.: 7000 3rd Qu.:10000
## Max. :37000 Max. :30000
## NA's :22959 NA's :51468
## skyl3 skyl4 wxcodes ice_accretion_1hr
## Min. : 400 Min. : 3000 Length:77687 Mode:logical
## 1st Qu.: 8000 1st Qu.:20000 Class :character NA's:77687
## Median :16000 Median :20000 Mode :character
## Mean :14777 Mean :20656
## 3rd Qu.:20000 3rd Qu.:25000
## Max. :30000 Max. :25000
## NA's :72956 NA's :77492
## ice_accretion_3hr ice_accretion_6hr peak_wind_gust peak_wind_drct
## Mode:logical Mode:logical Mode:logical Mode:logical
## NA's:77687 NA's:77687 NA's:77687 NA's:77687
##
##
##
##
##
## peak_wind_time feel metar snowdepth
## Mode:logical Min. : 9.11 Length:77687 Mode:logical
## NA's:77687 1st Qu.: 64.40 Class :character NA's:77687
## Median : 73.40 Mode :character
## Mean : 73.14
## 3rd Qu.: 83.29
## Max. :131.06
## NA's :1744
este_mes <- subset(monterrey, valid >= as.POSIXct("2020-05-01 00:00")& valid <= as.POSIXct("2020-11-30 23:59"))
plot(este_mes$valid, este_mes$relh)
promedio <- monterrey %>%
mutate(date = ymd_hms(valid), date = as.Date(date)) %>%
group_by(date) %>%
summarize_if(is.numeric, ~mean(.,na.rm=TRUE))
tibble(promedio)
## # A tibble: 77,687 × 18
## date lon lat tmpf dwpf relh drct sknt p01i alti
## <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2014-01-01 00:16:00 -100. 25.8 48.2 46.4 93.4 0 0 0 30.3
## 2 2014-01-01 00:49:00 -100. 25.8 48.2 46.4 93.4 120 3 0 30.3
## 3 2014-01-01 01:16:00 -100. 25.8 48.2 46.4 93.4 120 5 0 30.3
## 4 2014-01-01 01:20:00 -100. 25.8 46.4 46.4 100 120 6 0 30.3
## 5 2014-01-01 01:27:00 -100. 25.8 46.4 46.4 100 110 5 0 30.3
## 6 2014-01-01 01:46:00 -100. 25.8 46.4 44.6 93.4 100 5 0 30.3
## 7 2014-01-01 02:40:00 -100. 25.8 46.4 44.6 93.4 110 4 0 30.3
## 8 2014-01-01 03:40:00 -100. 25.8 46.4 44.6 93.4 130 3 0 30.3
## 9 2014-01-01 04:40:00 -100. 25.8 46.4 44.6 93.4 60 3 0 30.3
## 10 2014-01-01 05:40:00 -100. 25.8 46.4 44.6 93.4 0 0 0 30.3
## # … with 77,677 more rows, and 8 more variables: mslp <dbl>, vsby <dbl>,
## # gust <dbl>, skyl1 <dbl>, skyl2 <dbl>, skyl3 <dbl>, skyl4 <dbl>, feel <dbl>
centigrados <- promedio
centigrados$tmpc <- (centigrados$tmpf-32)/1.8
str(centigrados)
## tibble [77,687 × 19] (S3: tbl_df/tbl/data.frame)
## $ date : POSIXct[1:77687], format: "2014-01-01 00:16:00" "2014-01-01 00:49:00" ...
## $ lon : num [1:77687] -100 -100 -100 -100 -100 ...
## $ lat : num [1:77687] 25.8 25.8 25.8 25.8 25.8 ...
## $ tmpf : num [1:77687] 48.2 48.2 48.2 46.4 46.4 46.4 46.4 46.4 46.4 46.4 ...
## $ dwpf : num [1:77687] 46.4 46.4 46.4 46.4 46.4 44.6 44.6 44.6 44.6 44.6 ...
## $ relh : num [1:77687] 93.5 93.5 93.5 100 100 ...
## $ drct : num [1:77687] 0 120 120 120 110 100 110 130 60 0 ...
## $ sknt : num [1:77687] 0 3 5 6 5 5 4 3 3 0 ...
## $ p01i : num [1:77687] 0 0 0 0 0 0 0 0 0 0 ...
## $ alti : num [1:77687] 30.3 30.3 30.3 30.3 30.3 ...
## $ mslp : num [1:77687] NaN NaN NaN NaN NaN ...
## $ vsby : num [1:77687] 4 3 1 0.25 0.12 0.12 0.06 0.06 0.06 0.12 ...
## $ gust : num [1:77687] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
## $ skyl1: num [1:77687] 700 300 200 200 100 100 100 100 100 100 ...
## $ skyl2: num [1:77687] 1200 400 300 NaN NaN NaN NaN NaN NaN NaN ...
## $ skyl3: num [1:77687] 4000 900 500 NaN NaN NaN NaN NaN NaN NaN ...
## $ skyl4: num [1:77687] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
## $ feel : num [1:77687] 48.2 47.2 45.6 42.9 43.5 ...
## $ tmpc : num [1:77687] 9 9 9 8 8 8 8 8 8 8 ...
centigrados$feelc <-(centigrados$feel-32)/1.8
str(centigrados)
## tibble [77,687 × 20] (S3: tbl_df/tbl/data.frame)
## $ date : POSIXct[1:77687], format: "2014-01-01 00:16:00" "2014-01-01 00:49:00" ...
## $ lon : num [1:77687] -100 -100 -100 -100 -100 ...
## $ lat : num [1:77687] 25.8 25.8 25.8 25.8 25.8 ...
## $ tmpf : num [1:77687] 48.2 48.2 48.2 46.4 46.4 46.4 46.4 46.4 46.4 46.4 ...
## $ dwpf : num [1:77687] 46.4 46.4 46.4 46.4 46.4 44.6 44.6 44.6 44.6 44.6 ...
## $ relh : num [1:77687] 93.5 93.5 93.5 100 100 ...
## $ drct : num [1:77687] 0 120 120 120 110 100 110 130 60 0 ...
## $ sknt : num [1:77687] 0 3 5 6 5 5 4 3 3 0 ...
## $ p01i : num [1:77687] 0 0 0 0 0 0 0 0 0 0 ...
## $ alti : num [1:77687] 30.3 30.3 30.3 30.3 30.3 ...
## $ mslp : num [1:77687] NaN NaN NaN NaN NaN ...
## $ vsby : num [1:77687] 4 3 1 0.25 0.12 0.12 0.06 0.06 0.06 0.12 ...
## $ gust : num [1:77687] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
## $ skyl1: num [1:77687] 700 300 200 200 100 100 100 100 100 100 ...
## $ skyl2: num [1:77687] 1200 400 300 NaN NaN NaN NaN NaN NaN NaN ...
## $ skyl3: num [1:77687] 4000 900 500 NaN NaN NaN NaN NaN NaN NaN ...
## $ skyl4: num [1:77687] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
## $ feel : num [1:77687] 48.2 47.2 45.6 42.9 43.5 ...
## $ tmpc : num [1:77687] 9 9 9 8 8 8 8 8 8 8 ...
## $ feelc: num [1:77687] 9 8.45 7.57 6.04 6.39 ...
este_año <- centigrados[centigrados$date >= "2020-05-01" & centigrados$date <= "2020-11-30",]
plot(este_año$date,este_año$tmpc, type = "l", main = "Temperatura Promedio en Monterrey durante 2020", xlab = "Fecha", ylab = "Cº")
Como se pudo observar en las gráficas anteriores, en el mes de noviembre las ventas bajan cuando hace más frío. Con esto podemos afirmar la teoría de que el clima afecta nuestro comportamiento de compra.