Supply Chain Analysis

Supply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.

Here is a dataset we collected from a Fashion and Beauty startup. The dataset is based on the supply chain of Makeup products. We will try to analyse the product sales from the dataset and find insight that can be use for further analysisi. Business questions that we want to answers are:

  1. Which product has generated the most revenue and numbers of producst sold?
  2. Customer demographics
  3. Which transportation modes are mostly use for delivering the products?

Data input and Package Installation

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
supply <- read.csv("Data Input/supply_chain_data.csv", stringsAsFactors = T)
glimpse(supply)
## Rows: 100
## Columns: 24
## $ Product.type            <fct> haircare, skincare, haircare, skincare, skinca…
## $ SKU                     <fct> SKU0, SKU1, SKU2, SKU3, SKU4, SKU5, SKU6, SKU7…
## $ Price                   <dbl> 69.808006, 14.843523, 11.319683, 61.163343, 4.…
## $ Availability            <int> 55, 95, 34, 68, 26, 87, 48, 59, 78, 35, 11, 95…
## $ Number.of.products.sold <int> 802, 736, 8, 83, 871, 147, 65, 426, 150, 980, …
## $ Revenue.generated       <dbl> 8661.997, 7460.900, 9577.750, 7766.836, 2686.5…
## $ Customer.demographics   <fct> Non-binary, Female, Unknown, Non-binary, Non-b…
## $ Stock.levels            <int> 58, 53, 1, 23, 5, 90, 11, 93, 5, 14, 51, 46, 1…
## $ Lead.times              <int> 7, 30, 10, 13, 3, 27, 15, 17, 10, 27, 13, 23, …
## $ Order.quantities        <int> 96, 37, 88, 59, 56, 66, 58, 11, 15, 83, 80, 60…
## $ Shipping.times          <int> 4, 2, 2, 6, 8, 3, 8, 1, 7, 1, 2, 1, 4, 9, 5, 7…
## $ Shipping.carriers       <fct> Carrier B, Carrier A, Carrier B, Carrier C, Ca…
## $ Shipping.costs          <dbl> 2.956572, 9.716575, 8.054479, 1.729569, 3.8905…
## $ Supplier.name           <fct> Supplier 3, Supplier 3, Supplier 1, Supplier 5…
## $ Location                <fct> Mumbai, Mumbai, Mumbai, Kolkata, Delhi, Bangal…
## $ Lead.time               <int> 29, 23, 12, 24, 5, 10, 14, 22, 13, 29, 18, 28,…
## $ Production.volumes      <int> 215, 517, 971, 937, 414, 104, 314, 564, 769, 9…
## $ Manufacturing.lead.time <int> 29, 30, 27, 18, 3, 17, 24, 1, 8, 23, 5, 11, 3,…
## $ Manufacturing.costs     <dbl> 46.279879, 33.616769, 30.688019, 35.624741, 92…
## $ Inspection.results      <fct> Pending, Pending, Pending, Fail, Fail, Fail, P…
## $ Defect.rates            <dbl> 0.22641036, 4.85406803, 4.58059262, 4.74664862…
## $ Transportation.modes    <fct> Road, Road, Air, Rail, Air, Road, Sea, Road, S…
## $ Routes                  <fct> Route B, Route B, Route C, Route A, Route A, R…
## $ Costs                   <dbl> 187.7521, 503.0656, 141.9203, 254.7762, 923.44…

The first thing we check if there is missing value in our data.

anyNA(supply)
## [1] FALSE

We already take a look of our data with glimpse function, and the result are :

  1. Number of columns : 24

  2. Number of rows : 100

  3. There is no missing value.

Our data is ready to be use, and now we move to the next part.

Data Exploration

Questions 1. Which product has generated the most revenue and numbers of producst sold?

most_revenue <-  supply %>% 
  group_by(Product.type) %>% 
  summarise(sum_rev = sum(Revenue.generated )) %>% 
  arrange(-sum_rev)
  

most_revenue
## # A tibble: 3 × 2
##   Product.type sum_rev
##   <fct>          <dbl>
## 1 skincare     241628.
## 2 haircare     174455.
## 3 cosmetics    161521.
most_sold <-  supply %>% 
  group_by(Product.type) %>% 
  summarise(sum_sold = sum(Number.of.products.sold)) %>% 
  arrange(-sum_sold)
  

most_sold
## # A tibble: 3 × 2
##   Product.type sum_sold
##   <fct>           <int>
## 1 skincare        20731
## 2 haircare        13611
## 3 cosmetics       11757

Lets change into percentage so that we can see the comparison better.

rev_percent <- most_revenue %>%
  group_by(Product.type) %>%
  summarise(revenue = sum(sum_rev) / sum(supply$Revenue.generated) * 100) %>% 
  mutate(revenue = round(revenue, 2)) %>% 
  arrange(-revenue)

rev_percent
## # A tibble: 3 × 2
##   Product.type revenue
##   <fct>          <dbl>
## 1 skincare        41.8
## 2 haircare        30.2
## 3 cosmetics       28.0
sold_percent <- most_sold %>%
  group_by(Product.type) %>%
  summarise(product.sold = sum(sum_sold) / sum(supply$Number.of.products.sold) * 100) %>% 
  mutate(product.sold = round(product.sold, 2)) %>% 
  arrange(-product.sold)

sold_percent
## # A tibble: 3 × 2
##   Product.type product.sold
##   <fct>               <dbl>
## 1 skincare             45.0
## 2 haircare             29.5
## 3 cosmetics            25.5

Based on our analysis, product type with most revenue generated and product sold is skincare.

Questions 2. Customer demographics

customer_gender<-supply %>% 
  select(Customer.demographics) %>% 
  group_by(Customer.demographics) %>% 
  summarise(TotalCust=n()) %>% 
  arrange(desc(TotalCust))

customer_gender
## # A tibble: 4 × 2
##   Customer.demographics TotalCust
##   <fct>                     <int>
## 1 Unknown                      31
## 2 Female                       25
## 3 Non-binary                   23
## 4 Male                         21
customer_location<-supply %>% 
  select(Location) %>% 
  group_by(Location) %>% 
  summarise(Loc=n()) %>% 
  arrange(desc(Loc)) 

customer_location
## # A tibble: 5 × 2
##   Location    Loc
##   <fct>     <int>
## 1 Kolkata      25
## 2 Mumbai       22
## 3 Chennai      20
## 4 Bangalore    18
## 5 Delhi        15

Based on the result, we found out that most of our customers considered themselves as “unknown”. Maybe that could give us some insight that in todays market, especially for Fashion and Make up products, genders are no longer relevant. Our customers doesnt want to be labeled as male or female.

Regarding the location, Kolkata is city with the highest customer population in our data. This result can give valuable information regarding market expansion or promotion.

Questions 3. Which transportation modes are mostly use for delivering the products??

head(supply)
##   Product.type  SKU     Price Availability Number.of.products.sold
## 1     haircare SKU0 69.808006           55                     802
## 2     skincare SKU1 14.843523           95                     736
## 3     haircare SKU2 11.319683           34                       8
## 4     skincare SKU3 61.163343           68                      83
## 5     skincare SKU4  4.805496           26                     871
## 6     haircare SKU5  1.699976           87                     147
##   Revenue.generated Customer.demographics Stock.levels Lead.times
## 1          8661.997            Non-binary           58          7
## 2          7460.900                Female           53         30
## 3          9577.750               Unknown            1         10
## 4          7766.836            Non-binary           23         13
## 5          2686.505            Non-binary            5          3
## 6          2828.349            Non-binary           90         27
##   Order.quantities Shipping.times Shipping.carriers Shipping.costs
## 1               96              4         Carrier B       2.956572
## 2               37              2         Carrier A       9.716575
## 3               88              2         Carrier B       8.054479
## 4               59              6         Carrier C       1.729569
## 5               56              8         Carrier A       3.890548
## 6               66              3         Carrier B       4.444099
##   Supplier.name  Location Lead.time Production.volumes Manufacturing.lead.time
## 1    Supplier 3    Mumbai        29                215                      29
## 2    Supplier 3    Mumbai        23                517                      30
## 3    Supplier 1    Mumbai        12                971                      27
## 4    Supplier 5   Kolkata        24                937                      18
## 5    Supplier 1     Delhi         5                414                       3
## 6    Supplier 4 Bangalore        10                104                      17
##   Manufacturing.costs Inspection.results Defect.rates Transportation.modes
## 1            46.27988            Pending    0.2264104                 Road
## 2            33.61677            Pending    4.8540680                 Road
## 3            30.68802            Pending    4.5805926                  Air
## 4            35.62474               Fail    4.7466486                 Rail
## 5            92.06516               Fail    3.1455795                  Air
## 6            56.76648               Fail    2.7791935                 Road
##    Routes    Costs
## 1 Route B 187.7521
## 2 Route B 503.0656
## 3 Route C 141.9203
## 4 Route A 254.7762
## 5 Route A 923.4406
## 6 Route A 235.4612
cost_delivery <- supply %>% 
group_by(Shipping.carriers) %>% 
summarise(total_shipping = n(),
          avg_cost= mean(Costs)) %>% 
arrange(desc(total_shipping))

cost_delivery
## # A tibble: 3 × 3
##   Shipping.carriers total_shipping avg_cost
##   <fct>                      <int>    <dbl>
## 1 Carrier B                     43     528.
## 2 Carrier C                     29     561.
## 3 Carrier A                     28     497.

We can see that our product mostly shipped by Courrier B with 43 shipping. However,there are some interesting insight from the cost perspective. We found out that Carrier B is not the highet average cost carriers, Carrier C outnumbered them in cost area. With this insight, maybe we can give some recommendatios to look for cheaper options othen than Carrier C to lower the shipping cost.

All questions has been answered already. I hope this project can give helpful insight and information regarding simpe exploratory data analysis and the study case aplication to help solving business questions.