Project Overview

Supply chain disruptions caused by the COVID-19 pandemic pose a threat to the stability of ABC’s market leadership in the personal computer industry. Exploration and analysis by Prestige Worldwide Consulting can help ABC identify problems in the supply chain, anticipate and mitigate liabilities, optimize manufacturing and shipping throughput, and increase ABC’s resiliency to supply chain volatility in the uncertain post-COVID world.

Objectives

  • Clean data
  • Visualize data distributions
  • Explore relationships between variables
  • Impute more accurate values to contaminated data

Table of contents

Creation of Variables

To begin, we calculated the Quarter and Year of each Receipt Date in order to facilitate further data analysis.

We then calculated manufacturing lead time as the elapsed time between the date the purchase order was downloaded at the ODM site and the date the product was shipped from the site. We also calculated in-transit lead time as the elapsed time between the date the product was shipped from the original design manufacturers (ODM) site and the date it was received at the warehouse. Both values are given in days. Graphical exploration of these calculated values revealed impossible negative values as well as some suspiciously high values that we addressed later in the analysis.

Cleaning the Data

Missing Values

With these additional variables created, we then inspected the data for missingness.

## [1] 145
## [1] 1.589215

Because these observations are missing ship and receipt dates, we were unable to calculate lead times for these observations. The number of rows missing all of these values is 145, representing 1.6% of the entire 9124-observation data set, so we felt comfortable excluding them from our analysis leaving us with 8979 observations available for further analysis.

Impossible and Suspicious Values

We then turned our attention to the impossible and suspicious manufacturing and in-transit lead times contaminating the data. After identifying and excluding impossible values less than zero, we noticed the distribution of manufacturing lead times was right-skewed, so we applied a cube-root transformation to normalize the data before identifying and excluding suspicious outliers. The total of negative and unusual values was 348. Removing these along with the original 145 missing observations left us with 8631 out of 9124 observations, representing 94.6% of the original data set. We then visualized the distribution of acceptable manufacturing lead times, annotated with the mode, median, and mean.

We applied the same approach as above to the in-transit lead times. Due to the selection of a logarithmic transformation to normalize, we first excluded all values less than or equal to zero, then excluded outliers after applying the transformation. Removing 118 variables beyond the originally-removed 145 missing values left us with 8861 out of 9124 observations, representing 97.1% of the original data set. We then visualized the distribution of acceptable in-transit lead times, annotated with the mode, median, and mean.

Note: In this case, mode and median were equal to each other.

Data Exploration

With the data cleaned, we estimated lead times according to each unique combination of Product (LOB), Origin, and Quarter (as well as Shipping Mode, for in-transit lead times) in order to more accurately reflect the unique circumstances and challenges facing each piece of the supply chain.

## , ,  = Q1
## 
##            
##             Site A Site B Site C Site D
##   Product A   1428      0      0      0
##   Product B    193      0    754   1301
##   Product C     82      0      0      0
## 
## , ,  = Q2
## 
##            
##             Site A Site B Site C Site D
##   Product A   1116      0      0      0
##   Product B     56    661    690    752
##   Product C    156      0      0      0
## 
## , ,  = Q3
## 
##            
##             Site A Site B Site C Site D
##   Product A    243      0      0      0
##   Product B      0    819    122    245
##   Product C     13      0      0      0
## # A tibble: 16 × 4
## # Groups:   LOB, Origin [6]
##    LOB       Origin Quarter mean_value
##    <chr>     <chr>  <chr>        <dbl>
##  1 Product A Site A Q1            6.65
##  2 Product A Site A Q2            9.97
##  3 Product A Site A Q3            4.35
##  4 Product B Site A Q1           10.4 
##  5 Product B Site A Q2            2.12
##  6 Product B Site B Q2           12.5 
##  7 Product B Site B Q3           12.5 
##  8 Product B Site C Q1            6.95
##  9 Product B Site C Q2            4.51
## 10 Product B Site C Q3            4.11
## 11 Product B Site D Q1            6.89
## 12 Product B Site D Q2            5.93
## 13 Product B Site D Q3            4.19
## 14 Product C Site A Q1           13.1 
## 15 Product C Site A Q2           11.8 
## 16 Product C Site A Q3            7.46
## # A tibble: 6 × 3
## # Groups:   LOB [3]
##   LOB       Origin mean_value
##   <chr>     <chr>       <dbl>
## 1 Product A Site A       7.78
## 2 Product B Site A       8.52
## 3 Product B Site B      12.5 
## 4 Product B Site C       5.65
## 5 Product B Site D       6.29
## 6 Product C Site A      12.0
## # A tibble: 31 × 5
## # Groups:   LOB, Origin, Ship Mode [11]
##    LOB       Origin `Ship Mode` Quarter mean_value
##    <chr>     <chr>  <chr>       <chr>        <dbl>
##  1 Product A Site A AIR         Q1            8.20
##  2 Product A Site A AIR         Q2            8.02
##  3 Product A Site A AIR         Q3            7.55
##  4 Product A Site A AIR         Q4           16.8 
##  5 Product A Site A FASTBOAT    Q1           23.9 
##  6 Product A Site A FASTBOAT    Q2           31.2 
##  7 Product A Site A FASTBOAT    Q3           21.9 
##  8 Product A Site A FASTBOAT    Q4           34   
##  9 Product A Site A OCEAN       Q1           29.6 
## 10 Product A Site A OCEAN       Q2           34.6 
## # ℹ 21 more rows

We then replaced irregular values with corrected values based on the average of the subgroup with the same characteristics as the irregular value. We have now replaced all out-of-range values with appropriate estimations, and our data set once again has 8979 rows, or 98.4%, of the original 9124 observations.

Correlations

We explored the relationships in the data by producing pairwise plots between each of the variables.