For this project I explored the which suppliers had the most balanced sales across warehouse and retail channels? Using the Warehouse and Retail Sales data set from Data.gov, I focused on supplier, retail sales, retail transfers, and warehouse sales to measure how each supplier’s sales were distributed.
For me to analysis the data I had to clean by filtering out NA values and other values that would impact my results. I also had to calculate the average mean for retail, transfer, and warehouse sales per supplier. I then identify the most balanced suppliers and created a visualization of the top eight using a horizontal segment and point plot.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(dplyr)
setwd("C:/Users/SwagD/Desktop/Data 101")
data <- read_csv("Warehouse_and_Retail_Sales.csv")
## Rows: 307645 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): SUPPLIER, ITEM CODE, ITEM DESCRIPTION, ITEM TYPE
## dbl (5): YEAR, MONTH, RETAIL SALES, RETAIL TRANSFERS, WAREHOUSE SALES
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(data)
## spc_tbl_ [307,645 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ YEAR : num [1:307645] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
## $ MONTH : num [1:307645] 1 1 1 1 1 1 1 1 1 1 ...
## $ SUPPLIER : chr [1:307645] "REPUBLIC NATIONAL DISTRIBUTING CO" "PWSWN INC" "RELIABLE CHURCHILL LLLP" "LANTERNA DISTRIBUTORS INC" ...
## $ ITEM CODE : chr [1:307645] "100009" "100024" "1001" "100145" ...
## $ ITEM DESCRIPTION: chr [1:307645] "BOOTLEG RED - 750ML" "MOMENT DE PLAISIR - 750ML" "S SMITH ORGANIC PEAR CIDER - 18.7OZ" "SCHLINK HAUS KABINETT - 750ML" ...
## $ ITEM TYPE : chr [1:307645] "WINE" "WINE" "BEER" "WINE" ...
## $ RETAIL SALES : num [1:307645] 0 0 0 0 0.82 2.76 0.08 0 6.41 0.33 ...
## $ RETAIL TRANSFERS: num [1:307645] 0 1 0 0 0 0 1 0 4 1 ...
## $ WAREHOUSE SALES : num [1:307645] 2 4 1 1 0 6 1 2 0 2 ...
## - attr(*, "spec")=
## .. cols(
## .. YEAR = col_double(),
## .. MONTH = col_double(),
## .. SUPPLIER = col_character(),
## .. `ITEM CODE` = col_character(),
## .. `ITEM DESCRIPTION` = col_character(),
## .. `ITEM TYPE` = col_character(),
## .. `RETAIL SALES` = col_double(),
## .. `RETAIL TRANSFERS` = col_double(),
## .. `WAREHOUSE SALES` = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
head(data)
## # A tibble: 6 × 9
## YEAR MONTH SUPPLIER `ITEM CODE` `ITEM DESCRIPTION` `ITEM TYPE` `RETAIL SALES`
## <dbl> <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 2020 1 REPUBLI… 100009 BOOTLEG RED - 750… WINE 0
## 2 2020 1 PWSWN I… 100024 MOMENT DE PLAISIR… WINE 0
## 3 2020 1 RELIABL… 1001 S SMITH ORGANIC P… BEER 0
## 4 2020 1 LANTERN… 100145 SCHLINK HAUS KABI… WINE 0
## 5 2020 1 DIONYSO… 100293 SANTORINI GAVALA … WINE 0.82
## 6 2020 1 KYSELA … 100641 CORTENOVA VENETO … WINE 2.76
## # ℹ 2 more variables: `RETAIL TRANSFERS` <dbl>, `WAREHOUSE SALES` <dbl>
names(data) <- gsub("[(). \\-]", "_", names(data))
names(data) <- gsub("_$", "", names(data))
names(data) <- tolower(names(data))
head(data)
## # A tibble: 6 × 9
## year month supplier item_code item_description item_type retail_sales
## <dbl> <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 2020 1 REPUBLIC NATION… 100009 BOOTLEG RED - 7… WINE 0
## 2 2020 1 PWSWN INC 100024 MOMENT DE PLAIS… WINE 0
## 3 2020 1 RELIABLE CHURCH… 1001 S SMITH ORGANIC… BEER 0
## 4 2020 1 LANTERNA DISTRI… 100145 SCHLINK HAUS KA… WINE 0
## 5 2020 1 DIONYSOS IMPORT… 100293 SANTORINI GAVAL… WINE 0.82
## 6 2020 1 KYSELA PERE ET … 100641 CORTENOVA VENET… WINE 2.76
## # ℹ 2 more variables: retail_transfers <dbl>, warehouse_sales <dbl>
supplier_avgs <- data |>
group_by(supplier) |>
summarize(
avg_retail_sales = mean(retail_sales, na.rm = TRUE),
avg_retail_transfers = mean(retail_transfers, na.rm = TRUE),
avg_warehouse_sales = mean(warehouse_sales, na.rm = TRUE)
)
head(supplier_avgs)
## # A tibble: 6 × 4
## supplier avg_retail_sales avg_retail_transfers avg_warehouse_sales
## <chr> <dbl> <dbl> <dbl>
## 1 8 VINI INC 0.281 0.222 0.111
## 2 A HARDY USA LTD 0.14 0 0
## 3 A I G WINE & SPIRITS 0.186 0.0693 2.77
## 4 A VINTNERS SELECTIO… 0.949 0.824 3.53
## 5 A&E INC 0.274 0.00190 0
## 6 A&W BORDERS LLC 0.186 0 0
supplier_balanced <- supplier_avgs |>
mutate(
spread = max(avg_retail_sales, avg_retail_transfers, avg_warehouse_sales) -
min(avg_retail_sales, avg_retail_transfers, avg_warehouse_sales)
) |>
arrange(spread)
head(supplier_balanced)
## # A tibble: 6 × 5
## supplier avg_retail_sales avg_retail_transfers avg_warehouse_sales spread
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 8 VINI INC 0.281 0.222 0.111 4893.
## 2 A HARDY USA … 0.14 0 0 4893.
## 3 A I G WINE &… 0.186 0.0693 2.77 4893.
## 4 A VINTNERS S… 0.949 0.824 3.53 4893.
## 5 A&E INC 0.274 0.00190 0 4893.
## 6 A&W BORDERS … 0.186 0 0 4893.
top_suppliers <- supplier_balanced |>
slice(1:8)
ggplot(top_suppliers, aes(x = reorder(supplier, spread), y = spread)) +
geom_segment(aes(xend = reorder(supplier, spread), y = 0, yend = spread),
color = "skyblue", linewidth = 2) +
geom_point(color = "green", size = 4) +
coord_flip() +
labs(
title = "Top 8 Most Balanced Suppliers",
x = "Supplier",
y = "Spread Amount"
)
The conclusion I came to is that some suppliers showed balanced sales between their retail stores and their warehouses, while others relied more heavily on one type of income. This is important information to take note of as balanced suppliers may before more consistently. More research into how external factors like product type, season, or region impact sales could allow for more information.