Abstract

This study jumps into the complexity of supply chain management, utilizing a complete dataset from DataCo Global, now Sword Venture. The analysis focuses on patterns and trends within the supply chain by examining sales volumes, profit ratios, and sales fluctuations between 2015 and 2018. Using R programming and different packages such as tidyverse and lubridate, the study navigates through data cleaning, exploratory analysis, and visualization to answer business questions. The dataset include over 180,000 rows across 53 columns, provides ground for detailed insights into product categories and sales trends. The findings aim to offer a practice of understanding of supply chain dynamics.

Introduction

In this project, I am curious about finding out what sells the most, what brings in the most money, and how sales change over time. Understanding this is important for businesses to make better decisions. I will be using numbers and trends from the data to figure out what is going on in supply chains and how they can be improved.

Intro to the data

The dataset I will be exploring was sourced from Kaggle., contributed by Fabian Constante from the Polytechnic Institute of Leiria’s Higher School of Technology and Management. Initially created by DataCo Global, now known as Sword Venture, this dataset provides a rich field for applying machine learning algorithms and R software. Sword Venture specializes in designing, implementing, and supporting practical technology and data solutions to drive informed business decisions.

Last updated four years ago.
The dataset spans from January 1, 2015, to January 31, 2018.
It is not confined to a specific geographic location.
It includes 180,519 rows and 53 columns
Has various data types: character, double, and logical.

We start with the \(glimpse\) function to inspect the dataset’s columns, rows, and data types. I chose to select only a few columns to provide a concise overview.

glimpse(supplyChain_md1)

## Rows: 180,519
## Columns: 7
## $ Product.Name       <chr> "Smart watch ", "Smart watch ", "Smart watch ", "Sm…
## $ Category.Name      <chr> "Sporting Goods", "Sporting Goods", "Sporting Goods…
## $ Sales.per.customer <dbl> 314.64, 311.36, 309.72, 304.81, 298.25, 294.98, 288…
## $ Delivery.Status    <chr> "Advance shipping", "Late delivery", "Shipping on t…
## $ Customer.Country   <chr> "Puerto Rico", "Puerto Rico", "EE. UU.", "EE. UU.",…
## $ Market             <chr> "Pacific Asia", "Pacific Asia", "Pacific Asia", "Pa…
## $ Order.Item.Total   <dbl> 314.64, 311.36, 309.72, 304.81, 298.25, 294.98, 288…

Data Cleaning Process

Modifying Data Types

Prior to using the dataset, it’s important to perform data cleaning.
The column “order date (Date Orders)”, initially recognized as “character” data, will be converted to the “date” format with the following structure:

supplyChain_md$OrderDate <- 
  as.Date(supplyChain_md$order.date..DateOrders., format="%m/%d/%Y %H:%M")

Having completed the preparation phase, we’re now ready to proceed with addressing the business questions and delving into the analysis.

Business Questions

Which product category registers the highest sales volumes?

(Sales Volume by Product Category)

Which product category has the highest average profit ratios?

(Average Profit Ratios)

During the dataset’s timeframe, when did the most significant increases and decreases in sales occur?

(Sales Trends Over Time)

Analysis

Sales Volume by Product Category:

I created a new table under sales_cat_products with with Category Name, Product Price, and Order Item Quantity to make it easier to work with. In the dataset, there are a total of 118 distinct products, so chose to stick with the top 10 products.

Group sales_cat_products by Category.Name.
Summarize to find total quantity sold per category.
Sort categories in descending order by quantity sold.
Select the top 10 categories based on quantity sold.

top_categories <- sales_cat_products %>%
  group_by(Category.Name) %>%
  summarise(Total.Quantity.Sold = sum(Order.Item.Quantity)) %>%
  arrange(desc(Total.Quantity.Sold)) %>%
  slice_head(n = 10)

Group the sales data by each category
Calculate the total quantity sold.

category_summary <- sales_cat_products %>%
  group_by(Category.Name) %>%
  summarise(Total.Quantity.Sold = sum(Order.Item.Quantity))

Filter to find the most and the least sold category.

most_sold_category <- category_summary %>% 
  filter(Total.Quantity.Sold == max(Total.Quantity.Sold))
least_sold_category <- category_summary %>% 
  filter(Total.Quantity.Sold == min(Total.Quantity.Sold))

Create bar chart of top_categories.
- X-axis: categories, ordered by sales.
- Y-axis: total quantity sold.
- Flip axes for readability.
- Apply minimal theme.
- Add chart labels.

bar_top_cat <- ggplot(top_categories, aes(x = reorder(Category.Name, Total.Quantity.Sold), y = Total.Quantity.Sold)) +
  geom_bar(stat = "identity", fill="lightblue") +
  coord_flip() + 
  theme_minimal() + 
  labs(title = "Top 10 Categories by Total Quantity Sold", 
       x = "Category", 
       y = "Total Quantity Sold")

Average Profit Ratios:

Add Profit.Ratio column.

category_profit_ratio <- supplyChain_md %>%
  mutate(Profit.Ratio = (Order.Profit.Per.Order / Order.Item.Total) * 100)

Group category_profit_ratio by Category.Name.
Calculate average Profit.Ratio per category.

category_avg_profit_ratio <- category_profit_ratio %>%
  group_by(Category.Name) %>%
  summarize(Avg.Profit.Ratio = mean(Profit.Ratio, na.rm = TRUE))

Sorting the categories by their average profit ratio in descending order (from highest to lowest).
Picking the top 10 categories with the highest average profit ratios.

top_10_categories <- category_avg_profit_ratio %>%
  arrange(desc(Avg.Profit.Ratio)) %>%
  top_n(10, Avg.Profit.Ratio)

Making a bar chart for top 10 categories.
Categories on x-axis, average profit ratio on y-axis.
Bars colored from grey to blue based on profit ratio.
Flipping the chart for easy reading.
Simple design with a clear title and labeled axes.

linear_bar_chart <- ggplot(top_10_categories, aes(x = reorder(Category.Name, Avg.Profit.Ratio), y = Avg.Profit.Ratio)) +
  geom_bar(stat = "identity", aes(fill = Avg.Profit.Ratio), 
  show.legend = FALSE) +
  coord_flip() + 
  scale_fill_gradient(low = "grey", high = "blue") + 
  theme_minimal() +
  labs(title = "Top 10 Categories by Average Profit Ratio", 
       x = "Category", 
       y = "Average Profit Ratio")

Sales Trends Over Time

Grouping data by month: Each order date is rounded down to the first of the month.
Calculating total sales: Adding up the prices of all items sold in each month.

sales_over_time <- supplyChain_md %>%
  group_by(OrderDate = floor_date(OrderDate, "month")) %>%
  summarise(TotalSales = sum(Order.Item.Product.Price))

Calculate changes over time

#sales_over_time <- sales_over_time %>%
#  mutate(ChangeInSales = TotalSales - lag(TotalSales))

Identify the most outstanding increases and decreases

most_significant_increase <- sales_over_time %>%
  filter(TotalSales == max(TotalSales, na.rm = TRUE))

most_significant_decrease <- sales_over_time %>%
  filter(TotalSales == min(TotalSales, na.rm = TRUE))

Create values for Inline R Code in text.

decrease_date <- most_significant_decrease[1, 1]
decrease_value <- sprintf("%.2f", most_significant_decrease[1, 2])

increase_date <- most_significant_increase[1, 1]
increase_value <- sprintf("%.2f", most_significant_increase[1, 2])

Make a line chart of OrderDate and ChangeInSales.
Add points to the data of the biggest increase and decrease.

point_sales_cat <- 
  ggplot(sales_over_time, aes(x = OrderDate, y = TotalSales)) +
  geom_line() +
  geom_point(data = most_significant_increase, color = "blue", size = 3) +
  geom_point(data = most_significant_decrease, color = "red", size = 3) +
  labs(title = "Changes in Sales Over Time", 
       x = "Date", 
       y = "Change in Sales") +
  scale_y_continuous(labels = scales::comma)

Results

Which product category registers the highest sales volumes?

print(bar_top_cat)

Which product category has the highest average profit ratios?

Profit ratio is a financial metric that compares profit to sales and is typically expressed as a percentage.

print(linear_bar_chart)

When did the most significant increases and decreases in sales occur?

print(point_sales_cat)

The most significant increase was in 2017-10-01 with 1059476.44, and the most significant decrease was in 2018-01-01 with 331650.12.

Conclusion

The analysis of the DataCo Global supply chain dataset has revealed insights into the dynamics of product sales and profitability within the supply chain. My key findings include identifying product categories with the highest and lowest sales volumes, as well as those with the most favorable profit ratios. The study also highlighted periods of notable sales increases and decreases, offering a perspective on supply chain dynamics. The insights was created from a middle sized dataset and with analytical processes, we can see the importance of data-driven approaches in understanding and optimizing operations. The findings can in some cases enable businesses to fine-tune their operations for increased efficiency and profitability.

Supply Chain Analysis

Oliver Tenmark | ANCL 320

2023-12-17

Abstract

Introduction

Intro to the data

Data Cleaning Process

Modifying Data Types

Business Questions

Which product category registers the highest sales volumes?

Which product category has the highest average profit ratios?

During the dataset’s timeframe, when did the most significant increases and decreases in sales occur?

Analysis

Sales Volume by Product Category:

Average Profit Ratios:

Sales Trends Over Time

Results

Which product category registers the highest sales volumes?

Which product category has the highest average profit ratios?

When did the most significant increases and decreases in sales occur?

Conclusion