Software

All development, from data loading to visualization creation, was done in R and RStudio using the following libraries: igraph, visNetwork, data.table, tidyverse

Data

The data base is the collection of the transactions generated in January 2018 in a local supermarket in the city of Bogotá, Colombia.

It seeks to determine which products are bought together most frequently in order to evaluate possible sales strategies.

glimpse(data)
## Rows: 455
## Columns: 7
## $ CategoriaProd  <chr> "BEBIDAS", "BEBIDAS", "BEBIDAS", "BEBIDAS", "BEBIDAS", ~
## $ Id_Producto    <int> 60976, 252673, 252673, 252673, 252673, 252673, 255470, ~
## $ NombrePLU      <chr> "REFRESCO TAMPICO CITRUS GARRAFA X2000ml", "OF JUGO HIT~
## $ Id_Transaccion <int> 269, 23, 144, 204, 215, 218, 35, 85, 169, 135, 165, 607~
## $ Sucursal       <chr> "S061", "S037", "S001", "S075", "S030", "S075", "S070",~
## $ ID_Cliente     <chr> "02-52932262", "02-40611417", "02-1033805875", "02-1026~
## $ Transacciones  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~

Filters

Due to the large volume of data and the hardware limitations, only 5 product categories will be taken into account: Despensa (Pantry), Lácteos (Dairy), Fruver (Futas and Vegetables), Bebidas (Beverages) y Cuidado Personal( Personal Care).

Within each category, only the top 70 products (SKU) with the highest sales frequency are taken

data <- fread("Data Visualization/Week 3/Assignment/data/01_Enero2018_Compras_Totales.csv", 
              col.names = c("Id_Jerarquia","CategoriaProd","Id_Producto","NombrePLU","IdProveedor","Prove_Nombre","ID_Cliente",
                            "Id_Transaccion","Venta_Neta","Venta_Bruta", "Transacciones", "Sucursal","Fecha", "Hora")) %>% 
  filter(CategoriaProd %in% c("DESPENSA", "LACTEOS", "FRUVER", "BEBIDAS", "CUIDADO PERSONAL")) %>% 
  group_by(CategoriaProd,Id_Producto) %>% 
  summarise(Recurrencia = n()) %>% 
  group_by(CategoriaProd) %>% 
  top_n(70, Recurrencia)

Network Visualization

Data Preparation

nodes <- data %>% ungroup() %>% 
  group_by(Id_Producto, NombrePLU, CategoriaProd) %>% 
  summarise(Volumen = sum(Transacciones)) %>% 
  ungroup() %>% 
  mutate(id = Id_Producto, label = NombrePLU) %>% 
  select(id, label, CategoriaProd, Volumen) %>% 
  mutate(label = str_to_title(gsub("colsubsidio", "", 
                                   gsub("colsubs", "", 
                                        gsub("colsub", "", label, ignore.case = T), ignore.case = T), ignore.case = T)),
         title = label) %>% 
  rename(group = CategoriaProd, value = Volumen)

edges <- data %>%  ungroup() %>% inner_join(data, by = c("Id_Transaccion"="Id_Transaccion")) %>% 
  filter(Id_Producto.x != Id_Producto.y) %>%
  mutate(from = Id_Producto.x, to = Id_Producto.y) %>% 
  group_by(from, to) %>% 
  summarise(Transacciones = sum(Transacciones.x)) %>% 
  mutate(width  =Transacciones)

edges <- igraph::as_data_frame(simplify(graph_from_data_frame(edges, directed=FALSE)))

Network

visNetwork(nodes, edges,
           main = "Market Basket: Bought Together Products") %>% 
  visOptions(highlightNearest = list(enabled = T, degree = 2, hover = T), selectedBy = "group") %>% 
  visInteraction(hideEdgesOnDrag = TRUE, navigationButtons = TRUE, tooltipDelay = 1) %>%
  visLegend()
  • The network shows that there are many interactions between product categories, there is diversity in the market basket in terms of the categories bought together.

  • Each path that is built in the network can be considered as a commercial sales strategy