Supply Chain Analysis
Supply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.
Here is a dataset we collected from a Fashion and Beauty startup. The dataset is based on the supply chain of Makeup products. We will try to analyse the product sales from the dataset and find insight that can be use for further analysisi. Business questions that we want to answers are:
Data input and Package Installation
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
supply <- read.csv("Data Input/supply_chain_data.csv", stringsAsFactors = T)
glimpse(supply)
## Rows: 100
## Columns: 24
## $ Product.type <fct> haircare, skincare, haircare, skincare, skinca…
## $ SKU <fct> SKU0, SKU1, SKU2, SKU3, SKU4, SKU5, SKU6, SKU7…
## $ Price <dbl> 69.808006, 14.843523, 11.319683, 61.163343, 4.…
## $ Availability <int> 55, 95, 34, 68, 26, 87, 48, 59, 78, 35, 11, 95…
## $ Number.of.products.sold <int> 802, 736, 8, 83, 871, 147, 65, 426, 150, 980, …
## $ Revenue.generated <dbl> 8661.997, 7460.900, 9577.750, 7766.836, 2686.5…
## $ Customer.demographics <fct> Non-binary, Female, Unknown, Non-binary, Non-b…
## $ Stock.levels <int> 58, 53, 1, 23, 5, 90, 11, 93, 5, 14, 51, 46, 1…
## $ Lead.times <int> 7, 30, 10, 13, 3, 27, 15, 17, 10, 27, 13, 23, …
## $ Order.quantities <int> 96, 37, 88, 59, 56, 66, 58, 11, 15, 83, 80, 60…
## $ Shipping.times <int> 4, 2, 2, 6, 8, 3, 8, 1, 7, 1, 2, 1, 4, 9, 5, 7…
## $ Shipping.carriers <fct> Carrier B, Carrier A, Carrier B, Carrier C, Ca…
## $ Shipping.costs <dbl> 2.956572, 9.716575, 8.054479, 1.729569, 3.8905…
## $ Supplier.name <fct> Supplier 3, Supplier 3, Supplier 1, Supplier 5…
## $ Location <fct> Mumbai, Mumbai, Mumbai, Kolkata, Delhi, Bangal…
## $ Lead.time <int> 29, 23, 12, 24, 5, 10, 14, 22, 13, 29, 18, 28,…
## $ Production.volumes <int> 215, 517, 971, 937, 414, 104, 314, 564, 769, 9…
## $ Manufacturing.lead.time <int> 29, 30, 27, 18, 3, 17, 24, 1, 8, 23, 5, 11, 3,…
## $ Manufacturing.costs <dbl> 46.279879, 33.616769, 30.688019, 35.624741, 92…
## $ Inspection.results <fct> Pending, Pending, Pending, Fail, Fail, Fail, P…
## $ Defect.rates <dbl> 0.22641036, 4.85406803, 4.58059262, 4.74664862…
## $ Transportation.modes <fct> Road, Road, Air, Rail, Air, Road, Sea, Road, S…
## $ Routes <fct> Route B, Route B, Route C, Route A, Route A, R…
## $ Costs <dbl> 187.7521, 503.0656, 141.9203, 254.7762, 923.44…
The first thing we check if there is missing value in our data.
anyNA(supply)
## [1] FALSE
We already take a look of our data with glimpse function, and the result are :
Number of columns : 24
Number of rows : 100
There is no missing value.
Our data is ready to be use, and now we move to the next part.
Data Exploration
Questions 1. Which product has generated the most revenue and numbers of producst sold?
most_revenue <- supply %>%
group_by(Product.type) %>%
summarise(sum_rev = sum(Revenue.generated )) %>%
arrange(-sum_rev)
most_revenue
## # A tibble: 3 × 2
## Product.type sum_rev
## <fct> <dbl>
## 1 skincare 241628.
## 2 haircare 174455.
## 3 cosmetics 161521.
most_sold <- supply %>%
group_by(Product.type) %>%
summarise(sum_sold = sum(Number.of.products.sold)) %>%
arrange(-sum_sold)
most_sold
## # A tibble: 3 × 2
## Product.type sum_sold
## <fct> <int>
## 1 skincare 20731
## 2 haircare 13611
## 3 cosmetics 11757
Lets change into percentage so that we can see the comparison better.
rev_percent <- most_revenue %>%
group_by(Product.type) %>%
summarise(revenue = sum(sum_rev) / sum(supply$Revenue.generated) * 100) %>%
mutate(revenue = round(revenue, 2)) %>%
arrange(-revenue)
rev_percent
## # A tibble: 3 × 2
## Product.type revenue
## <fct> <dbl>
## 1 skincare 41.8
## 2 haircare 30.2
## 3 cosmetics 28.0
sold_percent <- most_sold %>%
group_by(Product.type) %>%
summarise(product.sold = sum(sum_sold) / sum(supply$Number.of.products.sold) * 100) %>%
mutate(product.sold = round(product.sold, 2)) %>%
arrange(-product.sold)
sold_percent
## # A tibble: 3 × 2
## Product.type product.sold
## <fct> <dbl>
## 1 skincare 45.0
## 2 haircare 29.5
## 3 cosmetics 25.5
Based on our analysis, product type with most revenue generated and product sold is skincare.
Questions 2. Customer demographics
customer_gender<-supply %>%
select(Customer.demographics) %>%
group_by(Customer.demographics) %>%
summarise(TotalCust=n()) %>%
arrange(desc(TotalCust))
customer_gender
## # A tibble: 4 × 2
## Customer.demographics TotalCust
## <fct> <int>
## 1 Unknown 31
## 2 Female 25
## 3 Non-binary 23
## 4 Male 21
customer_location<-supply %>%
select(Location) %>%
group_by(Location) %>%
summarise(Loc=n()) %>%
arrange(desc(Loc))
customer_location
## # A tibble: 5 × 2
## Location Loc
## <fct> <int>
## 1 Kolkata 25
## 2 Mumbai 22
## 3 Chennai 20
## 4 Bangalore 18
## 5 Delhi 15
Based on the result, we found out that most of our customers considered themselves as “unknown”. Maybe that could give us some insight that in todays market, especially for Fashion and Make up products, genders are no longer relevant. Our customers doesnt want to be labeled as male or female.
Regarding the location, Kolkata is city with the highest customer population in our data. This result can give valuable information regarding market expansion or promotion.
Questions 3. Which transportation modes are mostly use for delivering the products??
head(supply)
## Product.type SKU Price Availability Number.of.products.sold
## 1 haircare SKU0 69.808006 55 802
## 2 skincare SKU1 14.843523 95 736
## 3 haircare SKU2 11.319683 34 8
## 4 skincare SKU3 61.163343 68 83
## 5 skincare SKU4 4.805496 26 871
## 6 haircare SKU5 1.699976 87 147
## Revenue.generated Customer.demographics Stock.levels Lead.times
## 1 8661.997 Non-binary 58 7
## 2 7460.900 Female 53 30
## 3 9577.750 Unknown 1 10
## 4 7766.836 Non-binary 23 13
## 5 2686.505 Non-binary 5 3
## 6 2828.349 Non-binary 90 27
## Order.quantities Shipping.times Shipping.carriers Shipping.costs
## 1 96 4 Carrier B 2.956572
## 2 37 2 Carrier A 9.716575
## 3 88 2 Carrier B 8.054479
## 4 59 6 Carrier C 1.729569
## 5 56 8 Carrier A 3.890548
## 6 66 3 Carrier B 4.444099
## Supplier.name Location Lead.time Production.volumes Manufacturing.lead.time
## 1 Supplier 3 Mumbai 29 215 29
## 2 Supplier 3 Mumbai 23 517 30
## 3 Supplier 1 Mumbai 12 971 27
## 4 Supplier 5 Kolkata 24 937 18
## 5 Supplier 1 Delhi 5 414 3
## 6 Supplier 4 Bangalore 10 104 17
## Manufacturing.costs Inspection.results Defect.rates Transportation.modes
## 1 46.27988 Pending 0.2264104 Road
## 2 33.61677 Pending 4.8540680 Road
## 3 30.68802 Pending 4.5805926 Air
## 4 35.62474 Fail 4.7466486 Rail
## 5 92.06516 Fail 3.1455795 Air
## 6 56.76648 Fail 2.7791935 Road
## Routes Costs
## 1 Route B 187.7521
## 2 Route B 503.0656
## 3 Route C 141.9203
## 4 Route A 254.7762
## 5 Route A 923.4406
## 6 Route A 235.4612
cost_delivery <- supply %>%
group_by(Shipping.carriers) %>%
summarise(total_shipping = n(),
avg_cost= mean(Costs)) %>%
arrange(desc(total_shipping))
cost_delivery
## # A tibble: 3 × 3
## Shipping.carriers total_shipping avg_cost
## <fct> <int> <dbl>
## 1 Carrier B 43 528.
## 2 Carrier C 29 561.
## 3 Carrier A 28 497.
We can see that our product mostly shipped by Courrier B with 43 shipping. However,there are some interesting insight from the cost perspective. We found out that Carrier B is not the highet average cost carriers, Carrier C outnumbered them in cost area. With this insight, maybe we can give some recommendatios to look for cheaper options othen than Carrier C to lower the shipping cost.
All questions has been answered already. I hope this project can give helpful insight and information regarding simpe exploratory data analysis and the study case aplication to help solving business questions.