All development, from data loading to visualization creation, was done in R and RStudio using the following libraries: igraph, visNetwork, data.table, tidyverse
The data base is the collection of the transactions generated in January 2018 in a local supermarket in the city of Bogotá, Colombia.
It seeks to determine which products are bought together most frequently in order to evaluate possible sales strategies.
glimpse(data)
## Rows: 455
## Columns: 7
## $ CategoriaProd <chr> "BEBIDAS", "BEBIDAS", "BEBIDAS", "BEBIDAS", "BEBIDAS", ~
## $ Id_Producto <int> 60976, 252673, 252673, 252673, 252673, 252673, 255470, ~
## $ NombrePLU <chr> "REFRESCO TAMPICO CITRUS GARRAFA X2000ml", "OF JUGO HIT~
## $ Id_Transaccion <int> 269, 23, 144, 204, 215, 218, 35, 85, 169, 135, 165, 607~
## $ Sucursal <chr> "S061", "S037", "S001", "S075", "S030", "S075", "S070",~
## $ ID_Cliente <chr> "02-52932262", "02-40611417", "02-1033805875", "02-1026~
## $ Transacciones <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
Due to the large volume of data and the hardware limitations, only 5 product categories will be taken into account: Despensa (Pantry), Lácteos (Dairy), Fruver (Futas and Vegetables), Bebidas (Beverages) y Cuidado Personal( Personal Care).
Within each category, only the top 70 products (SKU) with the highest sales frequency are taken
data <- fread("Data Visualization/Week 3/Assignment/data/01_Enero2018_Compras_Totales.csv",
col.names = c("Id_Jerarquia","CategoriaProd","Id_Producto","NombrePLU","IdProveedor","Prove_Nombre","ID_Cliente",
"Id_Transaccion","Venta_Neta","Venta_Bruta", "Transacciones", "Sucursal","Fecha", "Hora")) %>%
filter(CategoriaProd %in% c("DESPENSA", "LACTEOS", "FRUVER", "BEBIDAS", "CUIDADO PERSONAL")) %>%
group_by(CategoriaProd,Id_Producto) %>%
summarise(Recurrencia = n()) %>%
group_by(CategoriaProd) %>%
top_n(70, Recurrencia)
nodes <- data %>% ungroup() %>%
group_by(Id_Producto, NombrePLU, CategoriaProd) %>%
summarise(Volumen = sum(Transacciones)) %>%
ungroup() %>%
mutate(id = Id_Producto, label = NombrePLU) %>%
select(id, label, CategoriaProd, Volumen) %>%
mutate(label = str_to_title(gsub("colsubsidio", "",
gsub("colsubs", "",
gsub("colsub", "", label, ignore.case = T), ignore.case = T), ignore.case = T)),
title = label) %>%
rename(group = CategoriaProd, value = Volumen)
edges <- data %>% ungroup() %>% inner_join(data, by = c("Id_Transaccion"="Id_Transaccion")) %>%
filter(Id_Producto.x != Id_Producto.y) %>%
mutate(from = Id_Producto.x, to = Id_Producto.y) %>%
group_by(from, to) %>%
summarise(Transacciones = sum(Transacciones.x)) %>%
mutate(width =Transacciones)
edges <- igraph::as_data_frame(simplify(graph_from_data_frame(edges, directed=FALSE)))
visNetwork(nodes, edges,
main = "Market Basket: Bought Together Products") %>%
visOptions(highlightNearest = list(enabled = T, degree = 2, hover = T), selectedBy = "group") %>%
visInteraction(hideEdgesOnDrag = TRUE, navigationButtons = TRUE, tooltipDelay = 1) %>%
visLegend()
The network shows that there are many interactions between product categories, there is diversity in the market basket in terms of the categories bought together.
Each path that is built in the network can be considered as a commercial sales strategy