Details

3PL- Data to Dollars Central NJ Data Science MEETUP.

Background of the Project

3PL Dataset Aalysis for Warehouse Management sytem.

Data setwd(“H:/Datascience”)

We received three files for Analysis

list.files(“../input”) %>% as.list %>% pander

ALL_EA_locations.xlsx CNJ.Labor_PICKING.xlsx CNJ.Labor_PUTAWAY.xlsx

Goal of the Project

3PL dataset Analysis to measure and lower cost in the supplychain/distribution sectpr while developing insights about total costs in serving clients.

** Required Package**.

#Load the required Packages.
library(needs)

#Load the required Packages.
needs(readxl,dplyr, tidyr, stringr, lubridate, readr, ggplot2,
      MASS,
      pander, formattable,knitr, viridis,gridExtra,grid)

Data Observation

Getting the data from the source

#Read the Dataset from Excel File
# install.packages("devtools")
#devtools::install_github("hadley/readxl")
#Read the Excel files provided
# Read and store Labor_Picking.xlsx as pick dataset
pick = read_excel('H:/Datascience/Meetup/3PL/data/CNJ.Labor_PICKING.xlsx', sheet=2)
head(pick)
## Source: local data frame [6 x 12]
## 
##   JOB_TP_CODE DOC_NUM DOC_LINE_NUM  OP_CODE TER_CODE        LAB_DATE
##         (chr)   (dbl)        (dbl)    (chr)    (chr)           (chr)
## 1          PI 2078687           25 NDEGAONA  2250491 160509 15:05:28
## 2          PI 2078687            8 NDEGAONA  2250491 160509 15:05:53
## 3          PI 2078687           26 NDEGAONA  2250491 160509 15:05:18
## 4          PI 2078687           27 NDEGAONA  2250491 160509 15:05:39
## 5          PI 2078687           28 NDEGAONA  2250491 160509 16:05:03
## 6          PI 2078687            9 NDEGAONA  2250491 160509 16:05:33
## Variables not shown: START_DATE (chr), END_DATE (chr), LAB_ACTUAL_TIME
##   (chr), LOC_CODE_TO (chr), WHSE_CODE_TO (chr), LOC_CODE_FROM (chr)
# Read and store Labor_Putaway.xlsx as put dataset
put = read_excel('H:/Datascience/Meetup/3PL/data/CNJ.Labor_PUTAWAY.xlsx', sheet=2)
head(put)
## Source: local data frame [6 x 11]
## 
##   JOB_TP_CODE DOC_NUM DOC_LINE_NUM OP_CODE TER_CODE        LAB_DATE
##         (chr)   (dbl)        (dbl)   (chr)    (chr)           (chr)
## 1          PU  266989            3  GLOPEZ  2250299 160509 16:05:01
## 2          PU  266989            4  GLOPEZ  2250299 160509 16:05:21
## 3          PU  266989            5  GLOPEZ  2250299 160509 16:05:30
## 4          PU  266989            6  GLOPEZ  2250299 160509 16:05:58
## 5          PU  266989            7  GLOPEZ  2250299 160509 16:05:32
## 6          PU  266989            8  GLOPEZ  2250299 160509 16:05:26
## Variables not shown: START_DATE (chr), END_DATE (chr), ACTUAL_TIME (chr),
##   WHSE_CODE (chr), LOC_CODE_TO (chr)
#Read the EA_Location workbook with assigned sheetIndex
EA_locations  <- read_excel("H:/Datascience/Meetup/3PL/data/ALL_EA_locations.xlsx", sheet = 1, na ="NA")
head(EA_locations)
## Source: local data frame [6 x 4]
## 
##   WHSE_CODE LOC_CODE LOC_DES LOC_TP_CODE
##       (chr)    (chr)   (chr)       (chr)
## 1        EA  DOOR001 DOOR001        STAG
## 2        EA  DOOR002 DOOR002        STAG
## 3        EA  50A040A 50A040A        BULK
## 4        EA  50A040B 50A040B        BULK
## 5        EA  50A040C 50A040C        RACK
## 6        EA  50A040D 50A040D        RACK

Eploratory-data-analysis

** Since most of the columns are similar we will append the Datasets for Anlaysis** ## Format the given columns in the dataset

# Format and rename 'LOC_CODE_FROM' to'LOC_CODE', ''WHSE_CODE_TO', to 'WHSE_CODE', 'LOC_CODE_TO' to 'LOC_CODE'.
# Ysing package LubriDate format the columns LAB_DATE, START_DATE AND END_DATE COLUMNS TO GET DURATION BETWEEN THE START AND END TIME.
# Split the date and time combined columns in seperate date and hours holumns

## Finally write the changes and save the dataset in CSV format
supplychaindata <- read.csv("H:/Datascience/supplychaindata.csv", header = TRUE, sep = ",")
str(supplychaindata)
## 'data.frame':    223724 obs. of  24 variables:
##  $ JOB_TP_CODE    : Factor w/ 2 levels "PI","PU": 2 2 2 2 2 2 2 2 2 2 ...
##  $ DOC_NUM        : int  265794 265794 265794 265794 265794 265794 265794 265794 265794 265794 ...
##  $ DOC_LINE_NUM   : int  4 5 6 3 7 8 9 10 11 12 ...
##  $ OP_CODE        : Factor w/ 27 levels "ABARRIOS","AJEMMY",..: 21 21 21 21 21 21 21 21 21 21 ...
##  $ TER_CODE       : int  1875722 1875722 1875722 1875722 1875722 1875722 1875722 1875722 1875722 1875722 ...
##  $ LAB_DATE       : Factor w/ 3865 levels "1/10/2015 10:01",..: 2695 2695 2695 2695 2695 2695 2695 2695 2695 2695 ...
##  $ START_DATE     : Factor w/ 59134 levels "1/10/2015 10:01",..: 35292 35292 35292 35292 35292 35292 35292 35292 35292 35292 ...
##  $ END_DATE       : Factor w/ 54411 levels "1/10/2015 10:01",..: 32427 32427 32427 32427 32427 32427 32427 32427 32427 32427 ...
##  $ WHSE_CODE      : Factor w/ 1 level "EA": 1 1 1 1 1 1 1 1 1 1 ...
##  $ LOC_CODE       : Factor w/ 13839 levels "02-20201","02-20406",..: 1006 1006 1006 1006 1006 1006 1006 1006 1006 1006 ...
##  $ IND_SRC        : Factor w/ 2 levels "PICKING","PUTAWAY": 2 2 2 2 2 2 2 2 2 2 ...
##  $ D_DURATION_SECS: int  3 3 3 15 3 3 3 4 3 4 ...
##  $ D_BUILDING     : Factor w/ 11 levels "2","3","36","4",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ D_AISLE        : Factor w/ 24 levels "-","A","B","C",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ D_SPACE        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ D_LEVEL        : Factor w/ 21 levels "0","1","2","3",..: 18 18 18 18 18 18 18 18 18 18 ...
##  $ D_LEFT_AISLE   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ D_RIGHT_AISLE  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LAB_HOURS      : Factor w/ 152 levels "10:01","10:02",..: 53 53 53 53 53 53 53 53 53 53 ...
##  $ START_HOURS    : Factor w/ 830 levels "10:00","10:01",..: 246 246 246 246 246 246 246 246 246 246 ...
##  $ END_HOURS      : Factor w/ 831 levels "10:00","10:01",..: 246 246 246 246 246 246 246 246 246 246 ...
##  $ LAB_DATES      : Factor w/ 402 levels "1/10/2015","1/11/2016",..: 288 288 288 288 288 288 288 288 288 288 ...
##  $ START_DATES    : Factor w/ 402 levels "1/10/2015","1/11/2016",..: 288 288 288 288 288 288 288 288 288 288 ...
##  $ END_DATES      : Factor w/ 402 levels "1/10/2015","1/11/2016",..: 288 288 288 288 288 288 288 288 288 288 ...

observation :

Two types of job Activities recorded in JOB_TP_CODE: PI - Picking Activity (no of observations ~ 99534) PU - Putaway Activity.(no of observations ~ 126088) Five types of Activity locations are recorded in LOC_CODE: BULK - BUST - RACK - REPK and STAGING

** Visual Analysis**

#Lets check for Major Activity type and Major Activity Location**

par(mfrow = c(1,2))
activity.type <- table(supplychaindata$JOB_TP_CODE)
barplot(activity.type, space = F, col = rainbow(4), xpd = FALSE,
        main = "PUTAWAY is the Major activity", legend = c("PICKING","PUTAWAY"),
        xlab = "Job type",
        ylab = "Activity Records")
activity.location <- table(EA_locations$LOC_TP_CODE)
barplot(activity.location,space = F, col = rainbow(4), xpd = FALSE,
        main = "We can see Major activity is done at RACKs", legend = c("BULK","BUST","RACK","REPK","STAG"),
        xlab = "Location  type",
        ylab = "Activity Records")

#OBS for PUTAWAY is greater then PICKING may be because of the previous reserves in the Warehouse.

We calculate the average mean time for “PICKUP” and “PUTAWAY” Job activity

mean(supplychaindata$JOB_TP_CODE == "PI")
## [1] 0.4448964
mean(supplychaindata$JOB_TP_CODE == "PU")
## [1] 0.5551036
#We see that Mean value observations for PICKUP takes more then the PUTAWAY activity.

Consider the operators working in the ware house Lets take look at OP_CODE and time taken by each employee to finish the task

operator <- table(supplychaindata$OP_CODE)
labels <- names(operator)
mp <- barplot(operator, axes = FALSE,axisnames = FALSE, col = "darkblue",main = "Operators by OP_CODE",xlab = "",ylab = "Frequency")
text(mp,par("usr")[3], labels = labels, srt = 45, adj = c(1.1,1.1), xpd = TRUE, cex = .9)
axis(2)

## We can see that operator NDEGAONA is performing the highes task followed by others.
## One the lowest side there are about 7-8 operaters performing very low number of tasks. We need to figure out the amount of jobs done by each operater w.r.t to time. And find out the reason of difference between Highest and lowest performer.

Lets take look at the time taken by each operator to complete the task

#Lets compare the time differnce between the highest and lowest performer
# HIGEST PERFORMER == NDEGAONA
ggplot(supplychaindata) + aes(x = D_DURATION_SECS , y = OP_CODE, main = "Maxm tasks by OP_CODE") + geom_point(size = 3, alpha = 0.5, color = "deeppink3") 

Location Analysis

#Looking at the Location 
layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))
#par(mfrow = c(3,1))
building <- table(supplychaindata$D_BUILDING)
barplot(building,col = rainbow(4), xpd = FALSE,
        xlab = "BUILDING")
aisle <- table(supplychaindata$D_AISLE)
barplot(aisle,col = rainbow(4), xpd = FALSE, xlab = "AISLE BY ALPHABET")
aisle_sides <- table(supplychaindata$D_LEFT_AISLE == "1",supplychaindata$D_RIGHT_AISLE == "1")
barplot(aisle_sides,col = rainbow(4), xpd = FALSE, names.arg = c("Right","Left"), xlab = "AISLE SIDES")

# As per data most work is done at Building 51 and 50.
# Missing a lot of data on Aisle record, most jobs are taken place at Aisle G and H.
# Slightly job taking place on Right side of the Aisle is more.

Time Analysis

#Subsetting Month and year from LAB_DATE
LAB_DATES <- format(as.POSIXct(strptime(supplychaindata$LAB_DATE,"%m/%d/%Y %H:%M",tz="EST")),format = "%Y/%m/%d")
monthyear <- strftime(LAB_DATES,"%m-%Y")
year <- strftime(LAB_DATES,"%Y")
month <- strftime(LAB_DATES,"%m")

# day of week from LAB_DATE
dow <- function(x) format(as.Date(x), "%A")
weekday <- dow(supplychaindata$LAB_DATES)
a <- ggplot(supplychaindata) + aes(x = year,fill = "red", main = "BY YEAR") + geom_bar(position = "dodge",col = "black") 
b <- ggplot(supplychaindata) + aes(x = month,fill = "red",main = "BY MONTH") + geom_bar(position = "dodge",col = "black") + scale_x_discrete(labels=month.abb) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
c <- ggplot(supplychaindata) + aes(x = monthyear,fill = "red",main = "BY MONTH AND YEAR") + geom_bar(position = "dodge",col = "black") + theme(axis.text.x = element_text(angle = 90, hjust = 1))
d <- ggplot(supplychaindata) + aes(x = weekday,fill = "red",main = "BY WEEKDAY") + geom_bar(position = "dodge",col = "black") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) 
grid.arrange(a,b,ncol = 2)

grid.arrange(d,c,ncol = 2)
## Warning: Removed 137310 rows containing non-finite values (stat_count).

Weeday job Analysis

dow <- function(x) format(as.Date(x), "%A")
weekday <- dow(supplychaindata$LAB_DATES)
ggplot(supplychaindata,aes(x = weekday)) + geom_density()
## Warning: Removed 137310 rows containing non-finite values (stat_density).

# More work seems to be done on Fridays and distribution form is equal.

Final Conclusion –The above analysis needs more concentration on timestamp and making an accurate coordinate plot of awarehous’s location layout based on existing locations and timestamp info. –Solution as based on the data provided can be acombination of following points : 1. PICKUP and PUTAWAY performance measures. 2. Flow optimization. 3. Cross docking details. 4. PIckup AND putaway Prioritization. 5. Inbound and Outbound centroids. 6. Aisle Labelling. 7. More data required w.r.t warehouse infrastructure and floor design.