Observaciones

A la base de datos se le hicieron los siguientes cambios: Formato a fecha corta. Se duplicaron los primeros 5 registros. Se cambio el formato a hora (Espa?ol Mexico) Se cambio el formato a codigo de barras (para que salga completo) Se guardo como CSV UTF8 (Delimitado por comas)

Importar la base de datos

bd <-read.csv ("/Users/ivannagarza/Desktop/TEC/7 SEMESTRE/MODULO3/Abarrotes_Ventas-3.csv")

Entender la base de datos

Paso I.

Ver el resumen de la base de datos para identificar que variables son numéricas y que variables son texto.

resumen<-summary(bd)
resumen
##  vcClaveTienda        DescGiro         Codigo.Barras            PLU        
##  Length:200625      Length:200625      Min.   :8.347e+05   Min.   : 1.00   
##  Class :character   Class :character   1st Qu.:7.501e+12   1st Qu.: 1.00   
##  Mode  :character   Mode  :character   Median :7.501e+12   Median : 1.00   
##                                        Mean   :5.950e+12   Mean   : 2.11   
##                                        3rd Qu.:7.501e+12   3rd Qu.: 1.00   
##                                        Max.   :1.750e+13   Max.   :30.00   
##                                                            NA's   :199188  
##     Fecha               Hora              Marca            Fabricante       
##  Length:200625      Length:200625      Length:200625      Length:200625     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    Producto             Precio          Ult.Costo         Unidades     
##  Length:200625      Min.   :-147.00   Min.   :  0.38   Min.   : 0.200  
##  Class :character   1st Qu.:  11.00   1st Qu.:  8.46   1st Qu.: 1.000  
##  Mode  :character   Median :  16.00   Median : 12.31   Median : 1.000  
##                     Mean   :  19.42   Mean   : 15.31   Mean   : 1.262  
##                     3rd Qu.:  25.00   3rd Qu.: 19.23   3rd Qu.: 1.000  
##                     Max.   :1000.00   Max.   :769.23   Max.   :96.000  
##                                                                        
##     F.Ticket      NombreDepartamento NombreFamilia      NombreCategoria   
##  Min.   :     1   Length:200625      Length:200625      Length:200625     
##  1st Qu.: 33964   Class :character   Class :character   Class :character  
##  Median :105993   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :193990                                                           
##  3rd Qu.:383005                                                           
##  Max.   :450040                                                           
##                                                                           
##     Estado              Mts.2      Tipo.ubicación         Giro          
##  Length:200625      Min.   :47.0   Length:200625      Length:200625     
##  Class :character   1st Qu.:53.0   Class :character   Class :character  
##  Mode  :character   Median :60.0   Mode  :character   Mode  :character  
##                     Mean   :56.6                                        
##                     3rd Qu.:60.0                                        
##                     Max.   :62.0                                        
##                                                                         
##  Hora.inicio        Hora.cierre       
##  Length:200625      Length:200625     
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 

Paso II.

Contar los characters en las variables de texto, instalando los paquetes y librerias necesarias

# install.packages("dplyr")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

count (bd, vcClaveTienda, sort = TRUE)
count (bd,DescGiro, sort = TRUE)
count (bd,Marca, sort = TRUE)
count (bd,Fabricante,sort= TRUE)
count (bd,NombreDepartamento,sort=TRUE)
count (bd,NombreFamilia, sort = TRUE)
count (bd,NombreCategoria, sort = TRUE)
count (bd,Estado, sort = TRUE)
count (bd, Mts.2, sort= TRUE)
count (bd,Tipo.ubicación, sort= TRUE)
count (bd, Giro, sort = TRUE )
count (bd,Hora.inicio, sort = TRUE)
count (bd,Hora.cierre, sort = TRUE)

PASO III.

Instalar los paquetes y librerias necesarias para poder seguir conociendo la base de datos, con funciones que nos muestren primeros, últimos, y filas indicadas.

# install.packages("tidyverse")
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ stringr 1.4.1
## ✔ tidyr   1.2.0     ✔ forcats 0.5.2
## ✔ readr   2.1.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
# tibble(bd)

str(bd)
## 'data.frame':    200625 obs. of  22 variables:
##  $ vcClaveTienda     : chr  "MX001" "MX001" "MX001" "MX001" ...
##  $ DescGiro          : chr  "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ Codigo.Barras     : num  7.5e+12 7.5e+12 7.5e+12 7.5e+12 7.5e+12 ...
##  $ PLU               : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ Fecha             : chr  "6/19/2020" "6/19/2020" "6/19/2020" "6/19/2020" ...
##  $ Hora              : chr  "08:16:21 a. m." "08:23:33 a. m." "08:24:33 a. m." "08:24:33 a. m." ...
##  $ Marca             : chr  "NUTRI LECHE" "DAN UP" "BIMBO" "PEPSI" ...
##  $ Fabricante        : chr  "MEXILAC" "DANONE DE MEXICO" "GRUPO BIMBO" "PEPSI-COLA MEXICANA" ...
##  $ Producto          : chr  "Nutri Leche 1 Litro" "DANUP STRAWBERRY P/BEBER 350GR NAL" "Rebanadas Bimbo 2Pz" "Pepsi N.R. 400Ml" ...
##  $ Precio            : num  16 14 5 8 19.5 16 14 5 8 19.5 ...
##  $ Ult.Costo         : num  12.3 14 5 8 15 ...
##  $ Unidades          : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ F.Ticket          : int  1 2 3 3 4 1 2 3 3 4 ...
##  $ NombreDepartamento: chr  "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ NombreFamilia     : chr  "Lacteos y Refrigerados" "Lacteos y Refrigerados" "Pan y Tortilla" "Bebidas" ...
##  $ NombreCategoria   : chr  "Leche" "Yogurt" "Pan Dulce Empaquetado" "Refrescos Plástico (N.R.)" ...
##  $ Estado            : chr  "Nuevo León" "Nuevo León" "Nuevo León" "Nuevo León" ...
##  $ Mts.2             : int  60 60 60 60 60 60 60 60 60 60 ...
##  $ Tipo.ubicación    : chr  "Esquina" "Esquina" "Esquina" "Esquina" ...
##  $ Giro              : chr  "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ Hora.inicio       : chr  "8:00" "8:00" "8:00" "8:00" ...
##  $ Hora.cierre       : chr  "22:00" "22:00" "22:00" "22:00" ...
head(bd)
##   vcClaveTienda  DescGiro Codigo.Barras PLU     Fecha           Hora
## 1         MX001 Abarrotes  7.501021e+12  NA 6/19/2020 08:16:21 a. m.
## 2         MX001 Abarrotes  7.501032e+12  NA 6/19/2020 08:23:33 a. m.
## 3         MX001 Abarrotes  7.501000e+12  NA 6/19/2020 08:24:33 a. m.
## 4         MX001 Abarrotes  7.501031e+12  NA 6/19/2020 08:24:33 a. m.
## 5         MX001 Abarrotes  7.501026e+12  NA 6/19/2020 08:26:28 a. m.
## 6         MX001 Abarrotes  7.501021e+12  NA 6/19/2020 08:16:21 a. m.
##                        Marca                 Fabricante
## 1                NUTRI LECHE                    MEXILAC
## 2                     DAN UP           DANONE DE MEXICO
## 3                      BIMBO                GRUPO BIMBO
## 4                      PEPSI        PEPSI-COLA MEXICANA
## 5 BLANCA NIEVES (DETERGENTE) FABRICA DE JABON LA CORONA
## 6                NUTRI LECHE                    MEXILAC
##                             Producto Precio Ult.Costo Unidades F.Ticket
## 1                Nutri Leche 1 Litro   16.0     12.31        1        1
## 2 DANUP STRAWBERRY P/BEBER 350GR NAL   14.0     14.00        1        2
## 3                Rebanadas Bimbo 2Pz    5.0      5.00        1        3
## 4                   Pepsi N.R. 400Ml    8.0      8.00        1        3
## 5      Detergente Blanca Nieves 500G   19.5     15.00        1        4
## 6                Nutri Leche 1 Litro   16.0     12.31        1        1
##   NombreDepartamento          NombreFamilia           NombreCategoria
## 1          Abarrotes Lacteos y Refrigerados                     Leche
## 2          Abarrotes Lacteos y Refrigerados                    Yogurt
## 3          Abarrotes         Pan y Tortilla     Pan Dulce Empaquetado
## 4          Abarrotes                Bebidas Refrescos Plástico (N.R.)
## 5          Abarrotes     Limpieza del Hogar                Lavandería
## 6          Abarrotes Lacteos y Refrigerados                     Leche
##       Estado Mts.2 Tipo.ubicación      Giro Hora.inicio Hora.cierre
## 1 Nuevo León    60        Esquina Abarrotes        8:00       22:00
## 2 Nuevo León    60        Esquina Abarrotes        8:00       22:00
## 3 Nuevo León    60        Esquina Abarrotes        8:00       22:00
## 4 Nuevo León    60        Esquina Abarrotes        8:00       22:00
## 5 Nuevo León    60        Esquina Abarrotes        8:00       22:00
## 6 Nuevo León    60        Esquina Abarrotes        8:00       22:00
head (bd, n=7)
##   vcClaveTienda  DescGiro Codigo.Barras PLU     Fecha           Hora
## 1         MX001 Abarrotes  7.501021e+12  NA 6/19/2020 08:16:21 a. m.
## 2         MX001 Abarrotes  7.501032e+12  NA 6/19/2020 08:23:33 a. m.
## 3         MX001 Abarrotes  7.501000e+12  NA 6/19/2020 08:24:33 a. m.
## 4         MX001 Abarrotes  7.501031e+12  NA 6/19/2020 08:24:33 a. m.
## 5         MX001 Abarrotes  7.501026e+12  NA 6/19/2020 08:26:28 a. m.
## 6         MX001 Abarrotes  7.501021e+12  NA 6/19/2020 08:16:21 a. m.
## 7         MX001 Abarrotes  7.501032e+12  NA 6/19/2020 08:23:33 a. m.
##                        Marca                 Fabricante
## 1                NUTRI LECHE                    MEXILAC
## 2                     DAN UP           DANONE DE MEXICO
## 3                      BIMBO                GRUPO BIMBO
## 4                      PEPSI        PEPSI-COLA MEXICANA
## 5 BLANCA NIEVES (DETERGENTE) FABRICA DE JABON LA CORONA
## 6                NUTRI LECHE                    MEXILAC
## 7                     DAN UP           DANONE DE MEXICO
##                             Producto Precio Ult.Costo Unidades F.Ticket
## 1                Nutri Leche 1 Litro   16.0     12.31        1        1
## 2 DANUP STRAWBERRY P/BEBER 350GR NAL   14.0     14.00        1        2
## 3                Rebanadas Bimbo 2Pz    5.0      5.00        1        3
## 4                   Pepsi N.R. 400Ml    8.0      8.00        1        3
## 5      Detergente Blanca Nieves 500G   19.5     15.00        1        4
## 6                Nutri Leche 1 Litro   16.0     12.31        1        1
## 7 DANUP STRAWBERRY P/BEBER 350GR NAL   14.0     14.00        1        2
##   NombreDepartamento          NombreFamilia           NombreCategoria
## 1          Abarrotes Lacteos y Refrigerados                     Leche
## 2          Abarrotes Lacteos y Refrigerados                    Yogurt
## 3          Abarrotes         Pan y Tortilla     Pan Dulce Empaquetado
## 4          Abarrotes                Bebidas Refrescos Plástico (N.R.)
## 5          Abarrotes     Limpieza del Hogar                Lavandería
## 6          Abarrotes Lacteos y Refrigerados                     Leche
## 7          Abarrotes Lacteos y Refrigerados                    Yogurt
##       Estado Mts.2 Tipo.ubicación      Giro Hora.inicio Hora.cierre
## 1 Nuevo León    60        Esquina Abarrotes        8:00       22:00
## 2 Nuevo León    60        Esquina Abarrotes        8:00       22:00
## 3 Nuevo León    60        Esquina Abarrotes        8:00       22:00
## 4 Nuevo León    60        Esquina Abarrotes        8:00       22:00
## 5 Nuevo León    60        Esquina Abarrotes        8:00       22:00
## 6 Nuevo León    60        Esquina Abarrotes        8:00       22:00
## 7 Nuevo León    60        Esquina Abarrotes        8:00       22:00
tail(bd)
##        vcClaveTienda DescGiro Codigo.Barras PLU      Fecha           Hora
## 200620         MX005 Depósito   7.62221e+12  NA  7/12/2020 01:08:25 a. m.
## 200621         MX005 Depósito   7.62221e+12  NA 10/23/2020 10:17:37 p. m.
## 200622         MX005 Depósito   7.62221e+12  NA 10/10/2020 08:30:20 p. m.
## 200623         MX005 Depósito   7.62221e+12  NA 10/10/2020 10:40:43 p. m.
## 200624         MX005 Depósito   7.62221e+12  NA  6/27/2020 10:30:19 p. m.
## 200625         MX005 Depósito   7.62221e+12  NA  6/26/2020 11:43:34 p. m.
##                    Marca    Fabricante                          Producto Precio
## 200620 TRIDENT XTRA CARE CADBURY ADAMS Trident Xtracare Freshmint 16.32G      9
## 200621 TRIDENT XTRA CARE CADBURY ADAMS Trident Xtracare Freshmint 16.32G      9
## 200622 TRIDENT XTRA CARE CADBURY ADAMS Trident Xtracare Freshmint 16.32G      9
## 200623 TRIDENT XTRA CARE CADBURY ADAMS Trident Xtracare Freshmint 16.32G      9
## 200624 TRIDENT XTRA CARE CADBURY ADAMS Trident Xtracare Freshmint 16.32G      9
## 200625 TRIDENT XTRA CARE CADBURY ADAMS Trident Xtracare Freshmint 16.32G      9
##        Ult.Costo Unidades F.Ticket NombreDepartamento NombreFamilia
## 200620      6.92        1   103100          Abarrotes      Dulcería
## 200621      6.92        1   116598          Abarrotes      Dulcería
## 200622      6.92        1   114886          Abarrotes      Dulcería
## 200623      6.92        1   114955          Abarrotes      Dulcería
## 200624      6.92        1   101121          Abarrotes      Dulcería
## 200625      6.92        1   100879          Abarrotes      Dulcería
##        NombreCategoria       Estado Mts.2 Tipo.ubicación       Giro Hora.inicio
## 200620 Gomas de Mazcar Quintana Roo    58        Esquina Mini súper        8:00
## 200621 Gomas de Mazcar Quintana Roo    58        Esquina Mini súper        8:00
## 200622 Gomas de Mazcar Quintana Roo    58        Esquina Mini súper        8:00
## 200623 Gomas de Mazcar Quintana Roo    58        Esquina Mini súper        8:00
## 200624 Gomas de Mazcar Quintana Roo    58        Esquina Mini súper        8:00
## 200625 Gomas de Mazcar Quintana Roo    58        Esquina Mini súper        8:00
##        Hora.cierre
## 200620       21:00
## 200621       21:00
## 200622       21:00
## 200623       21:00
## 200624       21:00
## 200625       21:00
# install.packages("janitor")
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
# tabyl(bd, vcClaveTienda, NombreDepartamento)

# OBSERVACIONES 1. Casi ningún registro cuenta con PLU 2. Cambiar formato de fecha 3. Cambiar formato de hora. 4. Hay precios negativos. 5. Unidades menores a 1.

TÉCNICAS PARA LIMPIEZA DE DATOS

TÉCNICA 1.

Remover valores irrelevantes

Eliminar columnas

bd1 <- bd
bd1 <- subset (bd1,select = -c(PLU,Codigo.Barras))

Eliminar renglones

bd2 <- bd1
bd2 <- bd2 [bd2$Precio > 0, ]
summary (bd1)
##  vcClaveTienda        DescGiro            Fecha               Hora          
##  Length:200625      Length:200625      Length:200625      Length:200625     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     Marca            Fabricante          Producto             Precio       
##  Length:200625      Length:200625      Length:200625      Min.   :-147.00  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.42  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 0.200   Min.   :     1   Length:200625     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33964   Class :character  
##  Median : 12.31   Median : 1.000   Median :105993   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193990                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383005                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts.2     
##  Length:200625      Length:200625      Length:200625      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro           Hora.inicio        Hora.cierre       
##  Length:200625      Length:200625      Length:200625      Length:200625     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 
summary (bd2)
##  vcClaveTienda        DescGiro            Fecha               Hora          
##  Length:200478      Length:200478      Length:200478      Length:200478     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     Marca            Fabricante          Producto             Precio       
##  Length:200478      Length:200478      Length:200478      Min.   :   0.50  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.45  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 0.200   Min.   :     1   Length:200478     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33977   Class :character  
##  Median : 12.31   Median : 1.000   Median :106034   Mode  :character  
##  Mean   : 15.31   Mean   : 1.261   Mean   :194096                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383062                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts.2     
##  Length:200478      Length:200478      Length:200478      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro           Hora.inicio        Hora.cierre       
##  Length:200478      Length:200478      Length:200478      Length:200478     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 

Esto no lo usaremos, pondremos precios negativos como absoluto

TÉCNICA 2.

Remover valores duplicados

bd1[duplicated(bd1),]
##    vcClaveTienda  DescGiro     Fecha           Hora                      Marca
## 6          MX001 Abarrotes 6/19/2020 08:16:21 a. m.                NUTRI LECHE
## 7          MX001 Abarrotes 6/19/2020 08:23:33 a. m.                     DAN UP
## 8          MX001 Abarrotes 6/19/2020 08:24:33 a. m.                      BIMBO
## 9          MX001 Abarrotes 6/19/2020 08:24:33 a. m.                      PEPSI
## 10         MX001 Abarrotes 6/19/2020 08:26:28 a. m. BLANCA NIEVES (DETERGENTE)
##                    Fabricante                           Producto Precio
## 6                     MEXILAC                Nutri Leche 1 Litro   16.0
## 7            DANONE DE MEXICO DANUP STRAWBERRY P/BEBER 350GR NAL   14.0
## 8                 GRUPO BIMBO                Rebanadas Bimbo 2Pz    5.0
## 9         PEPSI-COLA MEXICANA                   Pepsi N.R. 400Ml    8.0
## 10 FABRICA DE JABON LA CORONA      Detergente Blanca Nieves 500G   19.5
##    Ult.Costo Unidades F.Ticket NombreDepartamento          NombreFamilia
## 6      12.31        1        1          Abarrotes Lacteos y Refrigerados
## 7      14.00        1        2          Abarrotes Lacteos y Refrigerados
## 8       5.00        1        3          Abarrotes         Pan y Tortilla
## 9       8.00        1        3          Abarrotes                Bebidas
## 10     15.00        1        4          Abarrotes     Limpieza del Hogar
##              NombreCategoria     Estado Mts.2 Tipo.ubicación      Giro
## 6                      Leche Nuevo León    60        Esquina Abarrotes
## 7                     Yogurt Nuevo León    60        Esquina Abarrotes
## 8      Pan Dulce Empaquetado Nuevo León    60        Esquina Abarrotes
## 9  Refrescos Plástico (N.R.) Nuevo León    60        Esquina Abarrotes
## 10                Lavandería Nuevo León    60        Esquina Abarrotes
##    Hora.inicio Hora.cierre
## 6         8:00       22:00
## 7         8:00       22:00
## 8         8:00       22:00
## 9         8:00       22:00
## 10        8:00       22:00
sum(duplicated(bd1))  
## [1] 5

Eliminar renglones duplicados

bd3 <- bd1
library (dplyr)
bd3 <- distinct(bd3)

TÉCNICA 3.

Errores tipográficos y errores similares

Precios en absoluto

bd4 <- bd3
bd4$Precio <- abs(bd4$Precio)
summary (bd4)
##  vcClaveTienda        DescGiro            Fecha               Hora          
##  Length:200620      Length:200620      Length:200620      Length:200620     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     Marca            Fabricante          Producto             Precio       
##  Length:200620      Length:200620      Length:200620      Min.   :   0.50  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.45  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 0.200   Min.   :     1   Length:200620     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33967   Class :character  
##  Median : 12.31   Median : 1.000   Median :105996   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193994                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383008                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts.2     
##  Length:200620      Length:200620      Length:200620      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro           Hora.inicio        Hora.cierre       
##  Length:200620      Length:200620      Length:200620      Length:200620     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 

Cantidades en enteros

bd5 <- bd4
bd5$Unidades <- ceiling (bd5$Unidades)
summary (bd5)    
##  vcClaveTienda        DescGiro            Fecha               Hora          
##  Length:200620      Length:200620      Length:200620      Length:200620     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     Marca            Fabricante          Producto             Precio       
##  Length:200620      Length:200620      Length:200620      Min.   :   0.50  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.45  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 1.000   Min.   :     1   Length:200620     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33967   Class :character  
##  Median : 12.31   Median : 1.000   Median :105996   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193994                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383008                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##  NombreFamilia      NombreCategoria       Estado              Mts.2     
##  Length:200620      Length:200620      Length:200620      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##  Tipo.ubicación         Giro           Hora.inicio        Hora.cierre       
##  Length:200620      Length:200620      Length:200620      Length:200620     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 

TÉCNICA 4.

Convertir tipos de datos

Convertir de caracter a fecha

bd6 <- bd5
bd6$Fecha <- as.Date(bd6$Fecha, format = "%d/%m/%Y")
tibble(bd6)
## # A tibble: 200,620 × 20
##    vcCla…¹ DescG…² Fecha  Hora      Marca Fabri…³ Produ…⁴ Precio Ult.C…⁵ Unida…⁶
##    <chr>   <chr>   <date> <chr>     <chr> <chr>   <chr>    <dbl>   <dbl>   <dbl>
##  1 MX001   Abarro… NA     08:16:21… NUTR… MEXILAC Nutri …   16     12.3        1
##  2 MX001   Abarro… NA     08:23:33… DAN … DANONE… DANUP …   14     14          1
##  3 MX001   Abarro… NA     08:24:33… BIMBO GRUPO … Rebana…    5      5          1
##  4 MX001   Abarro… NA     08:24:33… PEPSI PEPSI-… Pepsi …    8      8          1
##  5 MX001   Abarro… NA     08:26:28… BLAN… FABRIC… Deterg…   19.5   15          1
##  6 MX001   Abarro… NA     08:26:28… FLASH ALEN    Flash …    9.5    7.31       1
##  7 MX001   Abarro… NA     08:26:28… VARI… DANONE… Danone…   11     11          1
##  8 MX001   Abarro… NA     08:26:28… ZOTE  FABRIC… Jabon …    9.5    7.31       1
##  9 MX001   Abarro… NA     08:26:28… ALWA… PROCTE… T Feme…   23.5   18.1        1
## 10 MX001   Abarro… NA     03:24:02… JUMEX JUMEX   Jugo D…   12     12          1
## # … with 200,610 more rows, 10 more variables: F.Ticket <int>,
## #   NombreDepartamento <chr>, NombreFamilia <chr>, NombreCategoria <chr>,
## #   Estado <chr>, Mts.2 <int>, Tipo.ubicación <chr>, Giro <chr>,
## #   Hora.inicio <chr>, Hora.cierre <chr>, and abbreviated variable names
## #   ¹​vcClaveTienda, ²​DescGiro, ³​Fabricante, ⁴​Producto, ⁵​Ult.Costo, ⁶​Unidades

Convertir de caracter a entero

bd7 <- bd6 
bd7$Hora <- substr(bd7$Hora, start = 1, stop = 2)
tibble (bd7)
## # A tibble: 200,620 × 20
##    vcCla…¹ DescG…² Fecha  Hora  Marca     Fabri…³ Produ…⁴ Precio Ult.C…⁵ Unida…⁶
##    <chr>   <chr>   <date> <chr> <chr>     <chr>   <chr>    <dbl>   <dbl>   <dbl>
##  1 MX001   Abarro… NA     08    NUTRI LE… MEXILAC Nutri …   16     12.3        1
##  2 MX001   Abarro… NA     08    DAN UP    DANONE… DANUP …   14     14          1
##  3 MX001   Abarro… NA     08    BIMBO     GRUPO … Rebana…    5      5          1
##  4 MX001   Abarro… NA     08    PEPSI     PEPSI-… Pepsi …    8      8          1
##  5 MX001   Abarro… NA     08    BLANCA N… FABRIC… Deterg…   19.5   15          1
##  6 MX001   Abarro… NA     08    FLASH     ALEN    Flash …    9.5    7.31       1
##  7 MX001   Abarro… NA     08    VARIOS D… DANONE… Danone…   11     11          1
##  8 MX001   Abarro… NA     08    ZOTE      FABRIC… Jabon …    9.5    7.31       1
##  9 MX001   Abarro… NA     08    ALWAYS    PROCTE… T Feme…   23.5   18.1        1
## 10 MX001   Abarro… NA     03    JUMEX     JUMEX   Jugo D…   12     12          1
## # … with 200,610 more rows, 10 more variables: F.Ticket <int>,
## #   NombreDepartamento <chr>, NombreFamilia <chr>, NombreCategoria <chr>,
## #   Estado <chr>, Mts.2 <int>, Tipo.ubicación <chr>, Giro <chr>,
## #   Hora.inicio <chr>, Hora.cierre <chr>, and abbreviated variable names
## #   ¹​vcClaveTienda, ²​DescGiro, ³​Fabricante, ⁴​Producto, ⁵​Ult.Costo, ⁶​Unidades
bd7$Hora <- as.integer(bd7$Hora)
str(bd7)   
## 'data.frame':    200620 obs. of  20 variables:
##  $ vcClaveTienda     : chr  "MX001" "MX001" "MX001" "MX001" ...
##  $ DescGiro          : chr  "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ Fecha             : Date, format: NA NA ...
##  $ Hora              : int  8 8 8 8 8 8 8 8 8 3 ...
##  $ Marca             : chr  "NUTRI LECHE" "DAN UP" "BIMBO" "PEPSI" ...
##  $ Fabricante        : chr  "MEXILAC" "DANONE DE MEXICO" "GRUPO BIMBO" "PEPSI-COLA MEXICANA" ...
##  $ Producto          : chr  "Nutri Leche 1 Litro" "DANUP STRAWBERRY P/BEBER 350GR NAL" "Rebanadas Bimbo 2Pz" "Pepsi N.R. 400Ml" ...
##  $ Precio            : num  16 14 5 8 19.5 9.5 11 9.5 23.5 12 ...
##  $ Ult.Costo         : num  12.3 14 5 8 15 ...
##  $ Unidades          : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ F.Ticket          : int  1 2 3 3 4 4 4 4 4 5 ...
##  $ NombreDepartamento: chr  "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ NombreFamilia     : chr  "Lacteos y Refrigerados" "Lacteos y Refrigerados" "Pan y Tortilla" "Bebidas" ...
##  $ NombreCategoria   : chr  "Leche" "Yogurt" "Pan Dulce Empaquetado" "Refrescos Plástico (N.R.)" ...
##  $ Estado            : chr  "Nuevo León" "Nuevo León" "Nuevo León" "Nuevo León" ...
##  $ Mts.2             : int  60 60 60 60 60 60 60 60 60 60 ...
##  $ Tipo.ubicación    : chr  "Esquina" "Esquina" "Esquina" "Esquina" ...
##  $ Giro              : chr  "Abarrotes" "Abarrotes" "Abarrotes" "Abarrotes" ...
##  $ Hora.inicio       : chr  "8:00" "8:00" "8:00" "8:00" ...
##  $ Hora.cierre       : chr  "22:00" "22:00" "22:00" "22:00" ...

TÉCNICA 5.

Valores faltantes

¿Cuántos NA tengo en la base de datos?

sum(is.na(bd7))
## [1] 113205
sum(is.na(bd))   
## [1] 199188

¿Cuántos NA tengo por variable?

sapply(bd7, function(x) sum (is.na(x)))
##      vcClaveTienda           DescGiro              Fecha               Hora 
##                  0                  0             113205                  0 
##              Marca         Fabricante           Producto             Precio 
##                  0                  0                  0                  0 
##          Ult.Costo           Unidades           F.Ticket NombreDepartamento 
##                  0                  0                  0                  0 
##      NombreFamilia    NombreCategoria             Estado              Mts.2 
##                  0                  0                  0                  0 
##     Tipo.ubicación               Giro        Hora.inicio        Hora.cierre 
##                  0                  0                  0                  0
sapply(bd, function(x) sum(is.na(x)))   
##      vcClaveTienda           DescGiro      Codigo.Barras                PLU 
##                  0                  0                  0             199188 
##              Fecha               Hora              Marca         Fabricante 
##                  0                  0                  0                  0 
##           Producto             Precio          Ult.Costo           Unidades 
##                  0                  0                  0                  0 
##           F.Ticket NombreDepartamento      NombreFamilia    NombreCategoria 
##                  0                  0                  0                  0 
##             Estado              Mts.2     Tipo.ubicación               Giro 
##                  0                  0                  0                  0 
##        Hora.inicio        Hora.cierre 
##                  0                  0

Borrar todos los registros NA de una tabla

bd8 <- bd
bd8 <- na.omit(bd8)
summary(bd8)
##  vcClaveTienda        DescGiro         Codigo.Barras            PLU        
##  Length:1437        Length:1437        Min.   :6.750e+08   Min.   : 1.000  
##  Class :character   Class :character   1st Qu.:6.750e+08   1st Qu.: 1.000  
##  Mode  :character   Mode  :character   Median :6.750e+08   Median : 1.000  
##                                        Mean   :2.616e+11   Mean   : 2.112  
##                                        3rd Qu.:6.750e+08   3rd Qu.: 1.000  
##                                        Max.   :7.501e+12   Max.   :30.000  
##     Fecha               Hora              Marca            Fabricante       
##  Length:1437        Length:1437        Length:1437        Length:1437       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Producto             Precio        Ult.Costo        Unidades    
##  Length:1437        Min.   :30.00   Min.   : 1.00   Min.   :1.000  
##  Class :character   1st Qu.:90.00   1st Qu.:64.62   1st Qu.:1.000  
##  Mode  :character   Median :90.00   Median :64.62   Median :1.000  
##                     Mean   :87.94   Mean   :56.65   Mean   :1.124  
##                     3rd Qu.:90.00   3rd Qu.:64.62   3rd Qu.:1.000  
##                     Max.   :90.00   Max.   :64.62   Max.   :7.000  
##     F.Ticket      NombreDepartamento NombreFamilia      NombreCategoria   
##  Min.   :   772   Length:1437        Length:1437        Length:1437       
##  1st Qu.: 99955   Class :character   Class :character   Class :character  
##  Median :102493   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100595                                                           
##  3rd Qu.:106546                                                           
##  Max.   :118356                                                           
##     Estado              Mts.2       Tipo.ubicación         Giro          
##  Length:1437        Min.   :58.00   Length:1437        Length:1437       
##  Class :character   1st Qu.:58.00   Class :character   Class :character  
##  Mode  :character   Median :58.00   Mode  :character   Mode  :character  
##                     Mean   :58.07                                        
##                     3rd Qu.:58.00                                        
##                     Max.   :60.00                                        
##  Hora.inicio        Hora.cierre       
##  Length:1437        Length:1437       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

Reemplazar NA con ceros

bd9 <- bd 
bd9 [is.na(bd9)]<-0 
summary (bd9)
##  vcClaveTienda        DescGiro         Codigo.Barras            PLU          
##  Length:200625      Length:200625      Min.   :8.347e+05   Min.   : 0.00000  
##  Class :character   Class :character   1st Qu.:7.501e+12   1st Qu.: 0.00000  
##  Mode  :character   Mode  :character   Median :7.501e+12   Median : 0.00000  
##                                        Mean   :5.950e+12   Mean   : 0.01513  
##                                        3rd Qu.:7.501e+12   3rd Qu.: 0.00000  
##                                        Max.   :1.750e+13   Max.   :30.00000  
##     Fecha               Hora              Marca            Fabricante       
##  Length:200625      Length:200625      Length:200625      Length:200625     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Producto             Precio          Ult.Costo         Unidades     
##  Length:200625      Min.   :-147.00   Min.   :  0.38   Min.   : 0.200  
##  Class :character   1st Qu.:  11.00   1st Qu.:  8.46   1st Qu.: 1.000  
##  Mode  :character   Median :  16.00   Median : 12.31   Median : 1.000  
##                     Mean   :  19.42   Mean   : 15.31   Mean   : 1.262  
##                     3rd Qu.:  25.00   3rd Qu.: 19.23   3rd Qu.: 1.000  
##                     Max.   :1000.00   Max.   :769.23   Max.   :96.000  
##     F.Ticket      NombreDepartamento NombreFamilia      NombreCategoria   
##  Min.   :     1   Length:200625      Length:200625      Length:200625     
##  1st Qu.: 33964   Class :character   Class :character   Class :character  
##  Median :105993   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :193990                                                           
##  3rd Qu.:383005                                                           
##  Max.   :450040                                                           
##     Estado              Mts.2      Tipo.ubicación         Giro          
##  Length:200625      Min.   :47.0   Length:200625      Length:200625     
##  Class :character   1st Qu.:53.0   Class :character   Class :character  
##  Mode  :character   Median :60.0   Mode  :character   Mode  :character  
##                     Mean   :56.6                                        
##                     3rd Qu.:60.0                                        
##                     Max.   :62.0                                        
##  Hora.inicio        Hora.cierre       
##  Length:200625      Length:200625     
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

Reemplazar NA con promedio

bd10 <- bd
bd10$PLU [is.na(bd$PLU)]<- mean(bd10$PLU, na.rm=TRUE)
summary (bd10)    
##  vcClaveTienda        DescGiro         Codigo.Barras            PLU        
##  Length:200625      Length:200625      Min.   :8.347e+05   Min.   : 1.000  
##  Class :character   Class :character   1st Qu.:7.501e+12   1st Qu.: 2.112  
##  Mode  :character   Mode  :character   Median :7.501e+12   Median : 2.112  
##                                        Mean   :5.950e+12   Mean   : 2.112  
##                                        3rd Qu.:7.501e+12   3rd Qu.: 2.112  
##                                        Max.   :1.750e+13   Max.   :30.000  
##     Fecha               Hora              Marca            Fabricante       
##  Length:200625      Length:200625      Length:200625      Length:200625     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Producto             Precio          Ult.Costo         Unidades     
##  Length:200625      Min.   :-147.00   Min.   :  0.38   Min.   : 0.200  
##  Class :character   1st Qu.:  11.00   1st Qu.:  8.46   1st Qu.: 1.000  
##  Mode  :character   Median :  16.00   Median : 12.31   Median : 1.000  
##                     Mean   :  19.42   Mean   : 15.31   Mean   : 1.262  
##                     3rd Qu.:  25.00   3rd Qu.: 19.23   3rd Qu.: 1.000  
##                     Max.   :1000.00   Max.   :769.23   Max.   :96.000  
##     F.Ticket      NombreDepartamento NombreFamilia      NombreCategoria   
##  Min.   :     1   Length:200625      Length:200625      Length:200625     
##  1st Qu.: 33964   Class :character   Class :character   Class :character  
##  Median :105993   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :193990                                                           
##  3rd Qu.:383005                                                           
##  Max.   :450040                                                           
##     Estado              Mts.2      Tipo.ubicación         Giro          
##  Length:200625      Min.   :47.0   Length:200625      Length:200625     
##  Class :character   1st Qu.:53.0   Class :character   Class :character  
##  Mode  :character   Median :60.0   Mode  :character   Mode  :character  
##                     Mean   :56.6                                        
##                     3rd Qu.:60.0                                        
##                     Max.   :62.0                                        
##  Hora.inicio        Hora.cierre       
##  Length:200625      Length:200625     
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

Reemplazar negativos con cero

bd11 <- bd
bd11[bd11 < 0]<- 0 
summary (bd11)
##  vcClaveTienda        DescGiro         Codigo.Barras            PLU        
##  Length:200625      Length:200625      Min.   :8.347e+05   Min.   : 1.00   
##  Class :character   Class :character   1st Qu.:7.501e+12   1st Qu.: 1.00   
##  Mode  :character   Mode  :character   Median :7.501e+12   Median : 1.00   
##                                        Mean   :5.950e+12   Mean   : 2.11   
##                                        3rd Qu.:7.501e+12   3rd Qu.: 1.00   
##                                        Max.   :1.750e+13   Max.   :30.00   
##                                                            NA's   :199188  
##     Fecha               Hora              Marca            Fabricante       
##  Length:200625      Length:200625      Length:200625      Length:200625     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    Producto             Precio          Ult.Costo         Unidades     
##  Length:200625      Min.   :   0.00   Min.   :  0.38   Min.   : 0.200  
##  Class :character   1st Qu.:  11.00   1st Qu.:  8.46   1st Qu.: 1.000  
##  Mode  :character   Median :  16.00   Median : 12.31   Median : 1.000  
##                     Mean   :  19.44   Mean   : 15.31   Mean   : 1.262  
##                     3rd Qu.:  25.00   3rd Qu.: 19.23   3rd Qu.: 1.000  
##                     Max.   :1000.00   Max.   :769.23   Max.   :96.000  
##                                                                        
##     F.Ticket      NombreDepartamento NombreFamilia      NombreCategoria   
##  Min.   :     1   Length:200625      Length:200625      Length:200625     
##  1st Qu.: 33964   Class :character   Class :character   Class :character  
##  Median :105993   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :193990                                                           
##  3rd Qu.:383005                                                           
##  Max.   :450040                                                           
##                                                                           
##     Estado              Mts.2      Tipo.ubicación         Giro          
##  Length:200625      Min.   :47.0   Length:200625      Length:200625     
##  Class :character   1st Qu.:53.0   Class :character   Class :character  
##  Mode  :character   Median :60.0   Mode  :character   Mode  :character  
##                     Mean   :56.6                                        
##                     3rd Qu.:60.0                                        
##                     Max.   :62.0                                        
##                                                                         
##  Hora.inicio        Hora.cierre       
##  Length:200625      Length:200625     
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 

TÉCNICA 6.

Método estadístico

bd12 <- bd7
boxplot(bd12$Precio, horizontal = TRUE)   

boxplot(bd12$Unidades, horizontal = TRUE)

Agregar columnas

# install.packages ("lubridate")
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
bd12$Dia_de_la_semana <- wday (bd12$Fecha)
summary (bd12)
##  vcClaveTienda        DescGiro             Fecha                 Hora       
##  Length:200620      Length:200620      Min.   :2020-01-05   Min.   : 1.000  
##  Class :character   Class :character   1st Qu.:2020-03-11   1st Qu.: 5.000  
##  Mode  :character   Mode  :character   Median :2020-06-10   Median : 8.000  
##                                        Mean   :2020-06-20   Mean   : 7.299  
##                                        3rd Qu.:2020-09-10   3rd Qu.:10.000  
##                                        Max.   :2020-12-10   Max.   :12.000  
##                                        NA's   :113205                       
##     Marca            Fabricante          Producto             Precio       
##  Length:200620      Length:200620      Length:200620      Min.   :   0.50  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.45  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##                                                                            
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 1.000   Min.   :     1   Length:200620     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33967   Class :character  
##  Median : 12.31   Median : 1.000   Median :105996   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193994                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383008                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##                                                                       
##  NombreFamilia      NombreCategoria       Estado              Mts.2     
##  Length:200620      Length:200620      Length:200620      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##                                                                         
##  Tipo.ubicación         Giro           Hora.inicio        Hora.cierre       
##  Length:200620      Length:200620      Length:200620      Length:200620     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Dia_de_la_semana
##  Min.   :1.00    
##  1st Qu.:2.00    
##  Median :4.00    
##  Mean   :3.96    
##  3rd Qu.:6.00    
##  Max.   :7.00    
##  NA's   :113205
#1=DOMINGO, #7=SABADO
bd12$Subtotal <- bd12$Precio * bd12$Unidades
summary (bd12)
##  vcClaveTienda        DescGiro             Fecha                 Hora       
##  Length:200620      Length:200620      Min.   :2020-01-05   Min.   : 1.000  
##  Class :character   Class :character   1st Qu.:2020-03-11   1st Qu.: 5.000  
##  Mode  :character   Mode  :character   Median :2020-06-10   Median : 8.000  
##                                        Mean   :2020-06-20   Mean   : 7.299  
##                                        3rd Qu.:2020-09-10   3rd Qu.:10.000  
##                                        Max.   :2020-12-10   Max.   :12.000  
##                                        NA's   :113205                       
##     Marca            Fabricante          Producto             Precio       
##  Length:200620      Length:200620      Length:200620      Min.   :   0.50  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.45  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##                                                                            
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 1.000   Min.   :     1   Length:200620     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33967   Class :character  
##  Median : 12.31   Median : 1.000   Median :105996   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193994                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383008                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##                                                                       
##  NombreFamilia      NombreCategoria       Estado              Mts.2     
##  Length:200620      Length:200620      Length:200620      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##                                                                         
##  Tipo.ubicación         Giro           Hora.inicio        Hora.cierre       
##  Length:200620      Length:200620      Length:200620      Length:200620     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Dia_de_la_semana    Subtotal      
##  Min.   :1.00     Min.   :   1.00  
##  1st Qu.:2.00     1st Qu.:  12.00  
##  Median :4.00     Median :  18.00  
##  Mean   :3.96     Mean   :  24.33  
##  3rd Qu.:6.00     3rd Qu.:  27.00  
##  Max.   :7.00     Max.   :2496.00  
##  NA's   :113205
bd12$Utilidad <- bd12$Precio - bd12$Ult.Costo      
summary (bd12)  
##  vcClaveTienda        DescGiro             Fecha                 Hora       
##  Length:200620      Length:200620      Min.   :2020-01-05   Min.   : 1.000  
##  Class :character   Class :character   1st Qu.:2020-03-11   1st Qu.: 5.000  
##  Mode  :character   Mode  :character   Median :2020-06-10   Median : 8.000  
##                                        Mean   :2020-06-20   Mean   : 7.299  
##                                        3rd Qu.:2020-09-10   3rd Qu.:10.000  
##                                        Max.   :2020-12-10   Max.   :12.000  
##                                        NA's   :113205                       
##     Marca            Fabricante          Producto             Precio       
##  Length:200620      Length:200620      Length:200620      Min.   :   0.50  
##  Class :character   Class :character   Class :character   1st Qu.:  11.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  16.00  
##                                                           Mean   :  19.45  
##                                                           3rd Qu.:  25.00  
##                                                           Max.   :1000.00  
##                                                                            
##    Ult.Costo         Unidades         F.Ticket      NombreDepartamento
##  Min.   :  0.38   Min.   : 1.000   Min.   :     1   Length:200620     
##  1st Qu.:  8.46   1st Qu.: 1.000   1st Qu.: 33967   Class :character  
##  Median : 12.31   Median : 1.000   Median :105996   Mode  :character  
##  Mean   : 15.31   Mean   : 1.262   Mean   :193994                     
##  3rd Qu.: 19.23   3rd Qu.: 1.000   3rd Qu.:383008                     
##  Max.   :769.23   Max.   :96.000   Max.   :450040                     
##                                                                       
##  NombreFamilia      NombreCategoria       Estado              Mts.2     
##  Length:200620      Length:200620      Length:200620      Min.   :47.0  
##  Class :character   Class :character   Class :character   1st Qu.:53.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :60.0  
##                                                           Mean   :56.6  
##                                                           3rd Qu.:60.0  
##                                                           Max.   :62.0  
##                                                                         
##  Tipo.ubicación         Giro           Hora.inicio        Hora.cierre       
##  Length:200620      Length:200620      Length:200620      Length:200620     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Dia_de_la_semana    Subtotal          Utilidad      
##  Min.   :1.00     Min.   :   1.00   Min.   :  0.000  
##  1st Qu.:2.00     1st Qu.:  12.00   1st Qu.:  2.310  
##  Median :4.00     Median :  18.00   Median :  3.230  
##  Mean   :3.96     Mean   :  24.33   Mean   :  4.142  
##  3rd Qu.:6.00     3rd Qu.:  27.00   3rd Qu.:  5.420  
##  Max.   :7.00     Max.   :2496.00   Max.   :230.770  
##  NA's   :113205

Exportar base de datos limpia

bd_limpia <-bd12
write.csv(bd_limpia, file ="Abarrotes base de datos limpia.csv", row.names = FALSE)
LS0tCnRpdGxlOiAiQWJhcnJvdGVzX0xpbXBpZXphRGVEYXRvcyIKYXV0aG9yOiAiSXZhbm5hIEdhcnphIEEwMTI4Mzc1OSIKZGF0ZTogIjIwMjItMDktMDciCm91dHB1dDogCiAgaHRtbF9kb2N1bWVudDoKICAgIHRvYzogdHJ1ZSAKICAgIHRvY19mbG9hdDogdHJ1ZSAKICAgIHRoZW1lOiBjZXJ1bGVhbgogICAgaGlnaGxpZ2h0OiB0YW5nbwogICAgY29kZV9kb3dubG9hZDogdHJ1ZQotLS0KCjxpbWcgc3JjPSAiL1VzZXJzL2l2YW5uYWdhcnphL0Rlc2t0b3AvcXVldmVuZGVtb3MyLnBuZyI+CgojIE9ic2VydmFjaW9uZXMKCipBIGxhIGJhc2UgZGUgZGF0b3Mgc2UgbGUgaGljaWVyb24gbG9zIHNpZ3VpZW50ZXMgY2FtYmlvczoqCipGb3JtYXRvIGEgZmVjaGEgY29ydGEuKgoqU2UgZHVwbGljYXJvbiBsb3MgcHJpbWVyb3MgNSByZWdpc3Ryb3MuKgoqU2UgY2FtYmlvIGVsIGZvcm1hdG8gYSBob3JhIChFc3BhP29sIE1leGljbykqCipTZSBjYW1iaW8gZWwgZm9ybWF0byBhIGNvZGlnbyBkZSBiYXJyYXMgKHBhcmEgcXVlIHNhbGdhIGNvbXBsZXRvKSoKKlNlIGd1YXJkbyBjb21vIENTViBVVEY4IChEZWxpbWl0YWRvIHBvciBjb21hcykqCgojIyBJbXBvcnRhciBsYSBiYXNlIGRlIGRhdG9zIApgYGB7cn0KYmQgPC1yZWFkLmNzdiAoIi9Vc2Vycy9pdmFubmFnYXJ6YS9EZXNrdG9wL1RFQy83IFNFTUVTVFJFL01PRFVMTzMvQWJhcnJvdGVzX1ZlbnRhcy0zLmNzdiIpCmBgYAoKIyMgRW50ZW5kZXIgbGEgYmFzZSBkZSBkYXRvcyAKCiMjIyBQYXNvIEkuIAoqKlZlciBlbCByZXN1bWVuIGRlIGxhIGJhc2UgZGUgZGF0b3MgcGFyYSBpZGVudGlmaWNhciBxdWUgdmFyaWFibGVzIHNvbiBudW3DqXJpY2FzIHkgcXVlIHZhcmlhYmxlcyBzb24gdGV4dG8uKioKYGBge3J9CnJlc3VtZW48LXN1bW1hcnkoYmQpCnJlc3VtZW4KYGBgCgojIyMgUGFzbyBJSS4gCioqQ29udGFyIGxvcyAqY2hhcmFjdGVycyogZW4gbGFzIHZhcmlhYmxlcyBkZSB0ZXh0bywgaW5zdGFsYW5kbyBsb3MgcGFxdWV0ZXMgeSBsaWJyZXJpYXMgbmVjZXNhcmlhcyAqKgoKYGBge3J9CiMgaW5zdGFsbC5wYWNrYWdlcygiZHBseXIiKQpsaWJyYXJ5KGRwbHlyKQpgYGAKCioqY291bnQgKGJkLCB2Y0NsYXZlVGllbmRhLCBzb3J0ID0gVFJVRSkqKiAgCioqY291bnQgKGJkLERlc2NHaXJvLCBzb3J0ID0gVFJVRSkqKiAgCioqY291bnQgKGJkLE1hcmNhLCBzb3J0ID0gVFJVRSkqKiAgCioqY291bnQgKGJkLEZhYnJpY2FudGUsc29ydD0gVFJVRSkqKiAgCioqY291bnQgKGJkLE5vbWJyZURlcGFydGFtZW50byxzb3J0PVRSVUUpKiogIAoqKmNvdW50IChiZCxOb21icmVGYW1pbGlhLCBzb3J0ID0gVFJVRSkqKiAgCioqY291bnQgKGJkLE5vbWJyZUNhdGVnb3JpYSwgc29ydCA9IFRSVUUpKiogIAoqKmNvdW50IChiZCxFc3RhZG8sIHNvcnQgPSBUUlVFKSoqICAKKipjb3VudCAoYmQsIE10cy4yLCBzb3J0PSBUUlVFKSoqICAKKipjb3VudCAoYmQsVGlwby51YmljYWNpw7NuLCBzb3J0PSBUUlVFKSoqICAKKipjb3VudCAoYmQsIEdpcm8sIHNvcnQgPSBUUlVFICkqKiAgCioqY291bnQgKGJkLEhvcmEuaW5pY2lvLCBzb3J0ID0gVFJVRSkqKiAgCioqY291bnQgKGJkLEhvcmEuY2llcnJlLCBzb3J0ID0gVFJVRSkqKiAgCgojIyMgUEFTTyBJSUkuCioqSW5zdGFsYXIgbG9zIHBhcXVldGVzIHkgbGlicmVyaWFzIG5lY2VzYXJpYXMgcGFyYSBwb2RlciBzZWd1aXIgY29ub2NpZW5kbyBsYSBiYXNlIGRlIGRhdG9zLCBjb24gZnVuY2lvbmVzIHF1ZSBub3MgbXVlc3RyZW4gcHJpbWVyb3MsIMO6bHRpbW9zLCB5IGZpbGFzIGluZGljYWRhcy4qKgoKYGBge3J9CiMgaW5zdGFsbC5wYWNrYWdlcygidGlkeXZlcnNlIikKbGlicmFyeSh0aWR5dmVyc2UpCgojIHRpYmJsZShiZCkKCnN0cihiZCkKCmhlYWQoYmQpCmhlYWQgKGJkLCBuPTcpCgp0YWlsKGJkKQoKIyBpbnN0YWxsLnBhY2thZ2VzKCJqYW5pdG9yIikKbGlicmFyeShqYW5pdG9yKQoKIyB0YWJ5bChiZCwgdmNDbGF2ZVRpZW5kYSwgTm9tYnJlRGVwYXJ0YW1lbnRvKQoKYGBgCgoqIyBPQlNFUlZBQ0lPTkVTKgoqMS4gQ2FzaSBuaW5nw7puIHJlZ2lzdHJvIGN1ZW50YSBjb24gUExVKgoqMi4gQ2FtYmlhciBmb3JtYXRvIGRlIGZlY2hhKgoqMy4gQ2FtYmlhciBmb3JtYXRvIGRlIGhvcmEuKgoqNC4gSGF5IHByZWNpb3MgbmVnYXRpdm9zLiogCio1LiBVbmlkYWRlcyBtZW5vcmVzIGEgMS4qCgojIyBUw4lDTklDQVMgUEFSQSBMSU1QSUVaQSBERSBEQVRPUyAKCiMjIyBUw4lDTklDQSAxLiAKKipSZW1vdmVyIHZhbG9yZXMgaXJyZWxldmFudGVzKioKCiMjIEVsaW1pbmFyIGNvbHVtbmFzIApgYGB7cn0KYmQxIDwtIGJkCmJkMSA8LSBzdWJzZXQgKGJkMSxzZWxlY3QgPSAtYyhQTFUsQ29kaWdvLkJhcnJhcykpCmBgYAoKIyMgRWxpbWluYXIgcmVuZ2xvbmVzIApgYGB7cn0KYmQyIDwtIGJkMQpiZDIgPC0gYmQyIFtiZDIkUHJlY2lvID4gMCwgXQpzdW1tYXJ5IChiZDEpCnN1bW1hcnkgKGJkMikKYGBgCipFc3RvIG5vIGxvIHVzYXJlbW9zLCBwb25kcmVtb3MgcHJlY2lvcyBuZWdhdGl2b3MgY29tbyBhYnNvbHV0byoKCiMjIyAgVMOJQ05JQ0EgMi4gCioqUmVtb3ZlciB2YWxvcmVzIGR1cGxpY2Fkb3MqKiAKYGBge3J9CmJkMVtkdXBsaWNhdGVkKGJkMSksXQpzdW0oZHVwbGljYXRlZChiZDEpKSAgCgpgYGAKCiMjIEVsaW1pbmFyIHJlbmdsb25lcyBkdXBsaWNhZG9zIApgYGB7cn0KYmQzIDwtIGJkMQpsaWJyYXJ5IChkcGx5cikKYmQzIDwtIGRpc3RpbmN0KGJkMykKYGBgCgojIyMgVMOJQ05JQ0EgMy4gCioqRXJyb3JlcyB0aXBvZ3LDoWZpY29zIHkgZXJyb3JlcyBzaW1pbGFyZXMqKgoKIyMgUHJlY2lvcyBlbiBhYnNvbHV0byAKYGBge3J9CmJkNCA8LSBiZDMKYmQ0JFByZWNpbyA8LSBhYnMoYmQ0JFByZWNpbykKc3VtbWFyeSAoYmQ0KQpgYGAKCiMjIENhbnRpZGFkZXMgZW4gZW50ZXJvcwpgYGB7cn0KYmQ1IDwtIGJkNApiZDUkVW5pZGFkZXMgPC0gY2VpbGluZyAoYmQ1JFVuaWRhZGVzKQpzdW1tYXJ5IChiZDUpICAgIApgYGAKCiMjIyBUw4lDTklDQSA0LiAKKipDb252ZXJ0aXIgdGlwb3MgZGUgZGF0b3MqKgoKIyMgQ29udmVydGlyIGRlIGNhcmFjdGVyIGEgZmVjaGEgCmBgYHtyfQpiZDYgPC0gYmQ1CmJkNiRGZWNoYSA8LSBhcy5EYXRlKGJkNiRGZWNoYSwgZm9ybWF0ID0gIiVkLyVtLyVZIikKdGliYmxlKGJkNikKYGBgCgojIyBDb252ZXJ0aXIgZGUgY2FyYWN0ZXIgYSBlbnRlcm8KYGBge3J9CmJkNyA8LSBiZDYgCmJkNyRIb3JhIDwtIHN1YnN0cihiZDckSG9yYSwgc3RhcnQgPSAxLCBzdG9wID0gMikKdGliYmxlIChiZDcpCmJkNyRIb3JhIDwtIGFzLmludGVnZXIoYmQ3JEhvcmEpCnN0cihiZDcpICAgCmBgYAoKIyMjIFTDiUNOSUNBIDUuIAoqKlZhbG9yZXMgZmFsdGFudGVzKioKCiMjIMK/Q3XDoW50b3MgTkEgdGVuZ28gZW4gbGEgYmFzZSBkZSBkYXRvcz8KYGBge3J9CnN1bShpcy5uYShiZDcpKQpzdW0oaXMubmEoYmQpKSAgIApgYGAKCiMjIMK/Q3XDoW50b3MgTkEgdGVuZ28gcG9yIHZhcmlhYmxlPwpgYGB7cn0Kc2FwcGx5KGJkNywgZnVuY3Rpb24oeCkgc3VtIChpcy5uYSh4KSkpCnNhcHBseShiZCwgZnVuY3Rpb24oeCkgc3VtKGlzLm5hKHgpKSkgICAKYGBgCgojIyBCb3JyYXIgdG9kb3MgbG9zIHJlZ2lzdHJvcyBOQSBkZSB1bmEgdGFibGEKYGBge3J9CmJkOCA8LSBiZApiZDggPC0gbmEub21pdChiZDgpCnN1bW1hcnkoYmQ4KQpgYGAKCiMjIFJlZW1wbGF6YXIgTkEgY29uIGNlcm9zIApgYGB7cn0KYmQ5IDwtIGJkIApiZDkgW2lzLm5hKGJkOSldPC0wIApzdW1tYXJ5IChiZDkpCmBgYAoKIyMgUmVlbXBsYXphciBOQSBjb24gcHJvbWVkaW8gCmBgYHtyfQpiZDEwIDwtIGJkCmJkMTAkUExVIFtpcy5uYShiZCRQTFUpXTwtIG1lYW4oYmQxMCRQTFUsIG5hLnJtPVRSVUUpCnN1bW1hcnkgKGJkMTApICAgIApgYGAKCiMjIFJlZW1wbGF6YXIgbmVnYXRpdm9zIGNvbiBjZXJvIApgYGB7cn0KYmQxMSA8LSBiZApiZDExW2JkMTEgPCAwXTwtIDAgCnN1bW1hcnkgKGJkMTEpCmBgYAoKIyMjIFTDiUNOSUNBIDYuIAoqKk3DqXRvZG8gZXN0YWTDrXN0aWNvKioKCmBgYHtyfQpiZDEyIDwtIGJkNwpib3hwbG90KGJkMTIkUHJlY2lvLCBob3Jpem9udGFsID0gVFJVRSkgICAKYm94cGxvdChiZDEyJFVuaWRhZGVzLCBob3Jpem9udGFsID0gVFJVRSkKYGBgCgojIyBBZ3JlZ2FyIGNvbHVtbmFzIAoKYGBge3J9CiMgaW5zdGFsbC5wYWNrYWdlcyAoImx1YnJpZGF0ZSIpCmxpYnJhcnkobHVicmlkYXRlKQpgYGAKCgpgYGB7cn0KYmQxMiREaWFfZGVfbGFfc2VtYW5hIDwtIHdkYXkgKGJkMTIkRmVjaGEpCnN1bW1hcnkgKGJkMTIpCiMxPURPTUlOR08sICM3PVNBQkFETwpgYGAKCmBgYHtyfQpiZDEyJFN1YnRvdGFsIDwtIGJkMTIkUHJlY2lvICogYmQxMiRVbmlkYWRlcwpzdW1tYXJ5IChiZDEyKQpgYGAKCmBgYHtyfQpiZDEyJFV0aWxpZGFkIDwtIGJkMTIkUHJlY2lvIC0gYmQxMiRVbHQuQ29zdG8gICAgICAKc3VtbWFyeSAoYmQxMikgIApgYGAKCiMjIEV4cG9ydGFyIGJhc2UgZGUgZGF0b3MgbGltcGlhIApgYGB7cn0KYmRfbGltcGlhIDwtYmQxMgp3cml0ZS5jc3YoYmRfbGltcGlhLCBmaWxlID0iQWJhcnJvdGVzIGJhc2UgZGUgZGF0b3MgbGltcGlhLmNzdiIsIHJvdy5uYW1lcyA9IEZBTFNFKQpgYGAKCg==