Introduction

For this project I explored the which suppliers had the most balanced sales across warehouse and retail channels? Using the Warehouse and Retail Sales data set from Data.gov, I focused on supplier, retail sales, retail transfers, and warehouse sales to measure how each supplier’s sales were distributed.

Data Analysis

For me to analysis the data I had to clean by filtering out NA values and other values that would impact my results. I also had to calculate the average mean for retail, transfer, and warehouse sales per supplier. I then identify the most balanced suppliers and created a visualization of the top eight using a horizontal segment and point plot.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(dplyr)
setwd("C:/Users/SwagD/Desktop/Data 101")
data <- read_csv("Warehouse_and_Retail_Sales.csv")
## Rows: 307645 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): SUPPLIER, ITEM CODE, ITEM DESCRIPTION, ITEM TYPE
## dbl (5): YEAR, MONTH, RETAIL SALES, RETAIL TRANSFERS, WAREHOUSE SALES
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(data)
## spc_tbl_ [307,645 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ YEAR            : num [1:307645] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##  $ MONTH           : num [1:307645] 1 1 1 1 1 1 1 1 1 1 ...
##  $ SUPPLIER        : chr [1:307645] "REPUBLIC NATIONAL DISTRIBUTING CO" "PWSWN INC" "RELIABLE CHURCHILL LLLP" "LANTERNA DISTRIBUTORS INC" ...
##  $ ITEM CODE       : chr [1:307645] "100009" "100024" "1001" "100145" ...
##  $ ITEM DESCRIPTION: chr [1:307645] "BOOTLEG RED - 750ML" "MOMENT DE PLAISIR - 750ML" "S SMITH ORGANIC PEAR CIDER - 18.7OZ" "SCHLINK HAUS KABINETT - 750ML" ...
##  $ ITEM TYPE       : chr [1:307645] "WINE" "WINE" "BEER" "WINE" ...
##  $ RETAIL SALES    : num [1:307645] 0 0 0 0 0.82 2.76 0.08 0 6.41 0.33 ...
##  $ RETAIL TRANSFERS: num [1:307645] 0 1 0 0 0 0 1 0 4 1 ...
##  $ WAREHOUSE SALES : num [1:307645] 2 4 1 1 0 6 1 2 0 2 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   YEAR = col_double(),
##   ..   MONTH = col_double(),
##   ..   SUPPLIER = col_character(),
##   ..   `ITEM CODE` = col_character(),
##   ..   `ITEM DESCRIPTION` = col_character(),
##   ..   `ITEM TYPE` = col_character(),
##   ..   `RETAIL SALES` = col_double(),
##   ..   `RETAIL TRANSFERS` = col_double(),
##   ..   `WAREHOUSE SALES` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
head(data)
## # A tibble: 6 × 9
##    YEAR MONTH SUPPLIER `ITEM CODE` `ITEM DESCRIPTION` `ITEM TYPE` `RETAIL SALES`
##   <dbl> <dbl> <chr>    <chr>       <chr>              <chr>                <dbl>
## 1  2020     1 REPUBLI… 100009      BOOTLEG RED - 750… WINE                  0   
## 2  2020     1 PWSWN I… 100024      MOMENT DE PLAISIR… WINE                  0   
## 3  2020     1 RELIABL… 1001        S SMITH ORGANIC P… BEER                  0   
## 4  2020     1 LANTERN… 100145      SCHLINK HAUS KABI… WINE                  0   
## 5  2020     1 DIONYSO… 100293      SANTORINI GAVALA … WINE                  0.82
## 6  2020     1 KYSELA … 100641      CORTENOVA VENETO … WINE                  2.76
## # ℹ 2 more variables: `RETAIL TRANSFERS` <dbl>, `WAREHOUSE SALES` <dbl>
names(data) <- gsub("[(). \\-]", "_", names(data))  
names(data) <- gsub("_$", "", names(data))          
names(data) <- tolower(names(data))                
head(data)
## # A tibble: 6 × 9
##    year month supplier         item_code item_description item_type retail_sales
##   <dbl> <dbl> <chr>            <chr>     <chr>            <chr>            <dbl>
## 1  2020     1 REPUBLIC NATION… 100009    BOOTLEG RED - 7… WINE              0   
## 2  2020     1 PWSWN INC        100024    MOMENT DE PLAIS… WINE              0   
## 3  2020     1 RELIABLE CHURCH… 1001      S SMITH ORGANIC… BEER              0   
## 4  2020     1 LANTERNA DISTRI… 100145    SCHLINK HAUS KA… WINE              0   
## 5  2020     1 DIONYSOS IMPORT… 100293    SANTORINI GAVAL… WINE              0.82
## 6  2020     1 KYSELA PERE ET … 100641    CORTENOVA VENET… WINE              2.76
## # ℹ 2 more variables: retail_transfers <dbl>, warehouse_sales <dbl>
supplier_avgs <- data |>
  group_by(supplier) |>
  summarize(
    avg_retail_sales = mean(retail_sales, na.rm = TRUE),
    avg_retail_transfers = mean(retail_transfers, na.rm = TRUE),
    avg_warehouse_sales = mean(warehouse_sales, na.rm = TRUE)
  )

head(supplier_avgs)
## # A tibble: 6 × 4
##   supplier             avg_retail_sales avg_retail_transfers avg_warehouse_sales
##   <chr>                           <dbl>                <dbl>               <dbl>
## 1 8 VINI INC                      0.281              0.222                 0.111
## 2 A HARDY USA LTD                 0.14               0                     0    
## 3 A I G WINE & SPIRITS            0.186              0.0693                2.77 
## 4 A VINTNERS SELECTIO…            0.949              0.824                 3.53 
## 5 A&E INC                         0.274              0.00190               0    
## 6 A&W BORDERS LLC                 0.186              0                     0
supplier_balanced <- supplier_avgs |>
  mutate(
    spread = max(avg_retail_sales, avg_retail_transfers, avg_warehouse_sales) -
             min(avg_retail_sales, avg_retail_transfers, avg_warehouse_sales)
  ) |>
  arrange(spread)

head(supplier_balanced)
## # A tibble: 6 × 5
##   supplier      avg_retail_sales avg_retail_transfers avg_warehouse_sales spread
##   <chr>                    <dbl>                <dbl>               <dbl>  <dbl>
## 1 8 VINI INC               0.281              0.222                 0.111  4893.
## 2 A HARDY USA …            0.14               0                     0      4893.
## 3 A I G WINE &…            0.186              0.0693                2.77   4893.
## 4 A VINTNERS S…            0.949              0.824                 3.53   4893.
## 5 A&E INC                  0.274              0.00190               0      4893.
## 6 A&W BORDERS …            0.186              0                     0      4893.
top_suppliers <- supplier_balanced |>
  slice(1:8)

ggplot(top_suppliers, aes(x = reorder(supplier, spread), y = spread)) +
  geom_segment(aes(xend = reorder(supplier, spread), y = 0, yend = spread),
               color = "skyblue", linewidth = 2) +
  geom_point(color = "green", size = 4) +
  coord_flip() +
  labs(
    title = "Top 8 Most Balanced Suppliers",
    x = "Supplier",
    y = "Spread Amount"
  )

Reflection

The conclusion I came to is that some suppliers showed balanced sales between their retail stores and their warehouses, while others relied more heavily on one type of income. This is important information to take note of as balanced suppliers may before more consistently. More research into how external factors like product type, season, or region impact sales could allow for more information.