ABC, a major technological company, is facing supply chain disruptions due to the COVID-19 pandemic. They need to analyze the relationship between ODM sites, ship modes, and logistics lead times to mitigate risks. ABC operates with multiple ODMs, shipping modes, and warehouses. The pandemic has caused uncertainty in lead times, leading to early or late deliveries. This makes it challenging for ABC to set accurate expectations for customers. The project focuses on data preparation and exploration to understand supply chain dynamics and make informed decisions. The goal is to reduce lead times and maintain a competitive edge. In this project, the focus is on data preparation and exploration to gain insights into the relationship between ODM sites, ship modes, and logistics lead times. The analysis aims to help ABC mitigate risks and make informed decisions regarding supply chain management.
Read Raw-Data and Calendar worksheets from
RawData.xlsx
## # A tibble: 9,124 × 6
## LOB Origin `Ship Mode` `PO Download Date` `Ship Date`
## <chr> <chr> <chr> <dttm> <dttm>
## 1 Product B Site C AIR 2020-07-25 00:00:00 2020-07-29 00:00:00
## 2 Product B Site B OCEAN 2020-06-20 00:00:00 2020-07-01 00:00:00
## 3 Product B Site C AIR 2020-01-25 00:00:00 2020-02-01 00:00:00
## 4 Product B Site C AIR 2020-07-18 00:00:00 2020-07-24 00:00:00
## 5 Product A Site A OCEAN 2020-05-01 00:00:00 2020-05-27 00:00:00
## 6 Product C Site A AIR 2020-01-11 00:00:00 2020-01-31 00:00:00
## 7 Product A Site A AIR 2020-07-16 00:00:00 2020-07-25 00:00:00
## 8 Product B Site D GROUND 2020-07-28 00:00:00 2020-07-31 00:00:00
## 9 Product B Site C AIR 2020-03-28 00:00:00 2020-04-07 00:00:00
## 10 Product B Site D GROUND 2020-01-24 00:00:00 2020-01-30 00:00:00
## # ℹ 9,114 more rows
## # ℹ 1 more variable: `Receipt Date` <dttm>
## # A tibble: 6 × 5
## S.No Start_Date End_date Quarter Year
## <dbl> <dttm> <dttm> <chr> <dbl>
## 1 1 2019-08-03 00:00:00 2019-11-01 00:00:00 Q3 2019
## 2 2 2019-11-02 00:00:00 2020-01-31 00:00:00 Q4 2019
## 3 3 2020-02-01 00:00:00 2020-05-01 00:00:00 Q1 2020
## 4 4 2020-05-02 00:00:00 2020-07-31 00:00:00 Q2 2020
## 5 5 2020-08-01 00:00:00 2020-10-30 00:00:00 Q3 2020
## 6 6 2020-10-31 00:00:00 2021-01-29 00:00:00 Q4 2020
Adding Quarter and Year columns to the
Raw-Data sheet by matching Receipt Date that
in the date ranges of Calendar sheet
## # A tibble: 6 × 8
## LOB Origin `Ship Mode` `PO Download Date` `Ship Date` `Receipt Date` Quarter
## <chr> <chr> <chr> <date> <date> <date> <chr>
## 1 Prod… Site C AIR 2020-07-25 2020-07-29 2020-08-05 Q3
## 2 Prod… Site B OCEAN 2020-06-20 2020-07-01 2020-07-28 Q2
## 3 Prod… Site C AIR 2020-01-25 2020-02-01 2020-02-11 Q1
## 4 Prod… Site C AIR 2020-07-18 2020-07-24 2020-08-01 Q3
## 5 Prod… Site A OCEAN 2020-05-01 2020-05-27 2020-07-08 Q2
## 6 Prod… Site A AIR 2020-01-11 2020-01-31 2020-02-03 Q1
## # ℹ 1 more variable: Year <dbl>
Calculate In-transit Lead Time and
Manufacturing Lead Time for each row in the joined
dataset
## # A tibble: 6 × 10
## LOB Origin `Ship Mode` `PO Download Date` `Ship Date` `Receipt Date` Quarter
## <chr> <chr> <chr> <date> <date> <date> <chr>
## 1 Prod… Site C AIR 2020-07-25 2020-07-29 2020-08-05 Q3
## 2 Prod… Site B OCEAN 2020-06-20 2020-07-01 2020-07-28 Q2
## 3 Prod… Site C AIR 2020-01-25 2020-02-01 2020-02-11 Q1
## 4 Prod… Site C AIR 2020-07-18 2020-07-24 2020-08-01 Q3
## 5 Prod… Site A OCEAN 2020-05-01 2020-05-27 2020-07-08 Q2
## 6 Prod… Site A AIR 2020-01-11 2020-01-31 2020-02-03 Q1
## # ℹ 3 more variables: Year <dbl>, `In-transit Lead Time` <dbl>,
## # `Manufacturing Lead Time` <dbl>
As we can see from the data frame, there are some unusual values
(negative values, and over 100 days), and missing values of
In-transit Lead Time and
Manufacturing Lead Time
## [1] 7 27 10 8 42 3 6 5 9 19 23 12 4 17 50
## [16] 39 25 20 21 2 48 31 29 11 26 34 43 1 22 37
## [31] 32 30 24 28 33 15 57 NA 18 16 14 59 44 35 52
## [46] 40 38 41 13 36 0 55 45 47 58 54 46 53 71 56
## [61] -167 -190 -221 -225 -235 -136 -201 -116 -205 -121 -129 -182 -200 -219 -217
## [76] -109 -171 -159 -107 -120 -104 -142 -202 -150 -189 -187 -203 -164 -191 -208
## [91] -173 -186 -125 -108 -168 -147 -175 -197 -162 -115 -153 -113 -87 -138 -185
## [106] -124 -212 -84 -184 -137 -135 -117 -183 -148 -141 -132 -207 -111 -122 -130
## [121] -176 -211 -112 -152 -133 -174 -88 -94 -118 -218 49 64 85 86 51
## [136] 70 67 63
## [1] 4 11 7 6 26 20 9 3 10 2 5 12 15 8 16 25 18 24
## [19] 1 30 14 13 34 31 19 0 NA -19 -14 -20 -21 -16 -15 -13 -18 -6
## [37] -17 -12 -1 -28 -10 -4 -9 -24 -22 -26 -7 -11 -5 -2 -25 -8 -3 -23
## [55] -27 172 215 222 226 237 140 204 141 216 132 123 178 224 231 105 185 106
## [73] 113 111 234 201 213 197 189 203 227 179 187 119 175 143 198 159 117 155
## [91] 125 137 170 162 208 120 212 174 144 180 147 156 184 181 115 127 220 122
## [109] 176 195 217 202 135 173 200 110 138 116 32 23 22 17 29 50 39 27
## [127] 35 33 49 43 28 36 51 37 21 48 41 45
## [1] 145
## [1] 145
Cleaning data. Delete the NAs values and Impute new values for all other unusual values
## # A tibble: 6 × 10
## LOB Origin `Ship Mode` `PO Download Date` `Ship Date` `Receipt Date` Quarter
## <chr> <chr> <chr> <date> <date> <date> <chr>
## 1 Prod… Site C AIR 2020-07-25 2020-07-29 2020-08-05 Q3
## 2 Prod… Site B OCEAN 2020-06-20 2020-07-01 2020-07-28 Q2
## 3 Prod… Site C AIR 2020-01-25 2020-02-01 2020-02-11 Q1
## 4 Prod… Site C AIR 2020-07-18 2020-07-24 2020-08-01 Q3
## 5 Prod… Site A OCEAN 2020-05-01 2020-05-27 2020-07-08 Q2
## 6 Prod… Site A AIR 2020-01-11 2020-01-31 2020-02-03 Q1
## # ℹ 3 more variables: Year <dbl>, `In-transit Lead Time` <dbl>,
## # `Manufacturing Lead Time` <dbl>
Now, NAs values and unusual values are not included in my dataset
clean_raw_data
Exploring the columns In-transit Lead Time and
Manufacturing Lead Time
## # A tibble: 4 × 3
## `Ship Mode` avg_Intransit_lead_time avg_manufacturing_lead_time
## <chr> <dbl> <dbl>
## 1 OCEAN 29.9 12.1
## 2 FASTBOAT 26.0 9.45
## 3 AIR 8.39 5.87
## 4 GROUND 3.77 6.05
We can see the outliers of the In-transit Lead Time,
also shipped by OCEAN took the longest days
Calculate and report correlations between all variables.
## Ship Mode Quarter LOB Origin
## 0.508419525 0.269427191 0.009724228 -0.508065586
We can see the Ship Mode has the most relative to
In-transit Lead Time
In-transit Lead Time as the response and all other
variables as the predictors. Use the summary() function to print the
results##
## Call:
## lm(formula = `In-transit Lead Time` ~ ., data = num_raw_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.921 -3.885 -1.823 3.870 50.079
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.03731 0.28818 24.42 <2e-16 ***
## LOB 7.74209 0.18987 40.78 <2e-16 ***
## Origin -8.17025 0.07637 -106.98 <2e-16 ***
## `Ship Mode` 5.43146 0.06466 84.00 <2e-16 ***
## Quarter -0.15587 0.10602 -1.47 0.142
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.859 on 8974 degrees of freedom
## Multiple R-squared: 0.6936, Adjusted R-squared: 0.6934
## F-statistic: 5078 on 4 and 8974 DF, p-value: < 2.2e-16
There is a relationship between predictors and the response variable
by a p-value < 0.05. Thus, LOB, Origin, and
Ship Mode have a statistically significant relationship
with In-transit Lead Time