For the Fresh Food Now Case Study, I obtained the files and conducted both graphical and statistical analysis which highlights key aspects in the data which vary when compared to other metrics. From these analysis we can identify outliers, visualize relationships, and better understand the data set produced form the case study.
To import the data by SKU Number from the Fresh Food Now Case study for analysis, we can use the following code in RStudio to pull the files from my EDA Github Repository:
dat <- read.csv("https://raw.githubusercontent.com/CGarcia44/EDA-/main/6%20SKU%20Master.csv")
dat <- na.omit(dat)
dat$Uom <- as.factor(dat$Uom)
From the case study, I pulled raw data from the SKUMaster.csv file to analyze the following metrics:
Unit of Measurement (UOM)
UOM Cube (Volume)
UOM Weight
Units per Case
Commodity
Flow
After importing our data, we must identify and refine potential outliers via the following code, and their influence on our future analyses only considering:
Observations which UoM Cube is greater than 0 and less than 2:
\(UoM Cube > 0\)
\(UoM Cube <2\)
dat <- dat[dat$UomCube > 0 & dat$UomCube < 2,]Observations which UoM Weight is greater than 0 and less than 50:
\(UOMWeight > 0\)
\(UoM Weight < 50\)
dat <- dat[dat$UomWeight > 0 & dat$UomWeight < 50,]Observations with UOM factors:
Case (CA)
Each (EA)
Pallet (PL)
Pound (LB)
dat <- dat[dat$Uom %in% c("CA", "EA", "PL", "LB"),]For our initial analyses, I produced a set of graphs to visualize and compare the previously mentioned metrics and further explained the results and meaning of each:
## [1] 49.00 46.00 45.00 46.20 45.00 45.00 46.00 46.90 45.00 46.00 48.00 48.00
## [13] 48.00 49.00 47.00 44.50 44.50 44.40 45.25 44.85 45.90 45.00 45.20 47.00
## [25] 46.35 47.00 45.00 44.70 45.70 48.60 46.35 45.20 47.10 46.80 46.80 44.70
## [37] 46.35 45.60 49.35 45.40 47.30 48.80 44.80 44.80 45.50 45.50 45.50 45.00
## [49] 45.30 49.90 48.00 45.65 45.50 45.00 45.00 45.35 45.00 45.80 47.40 45.10
## [61] 45.00 48.00 46.50 48.00 45.00 45.00 48.00 48.00 47.00 46.00 45.00 45.00
## [73] 45.15 48.00 48.30
From this box-plot we can identify statistical values such as mean, standard deviation, distribution quartiles, and the outliers present in our UoM Weight data set. Labeling our outliers in a new data set for future documentation allows us to prepare and investigate each root cause for the out of control processes.
Viewing the Units Per Case by UoM Weight Scatter-plot allows us to visualize the relative density of each UoM Weight and its relationship to the number of Units per Case for storage/order purposes which can indicate product size and quantity.
From our UoM Frequency Plot we can view the numeric value and visualize the density of Unit of Measurements for the various products within Fresh Food Now’s supply chain and inventory.
Visualizing the Cubic Volume values of each Unit of Measurement and plotting each by relative Flow values allows us to identify statistical values for product flow within the data set. Further, plotting each UoM Cube by its category allows us to visualize where each product travels through and identifies outliers which could be a result of excess material handling or forecast compensation.
Next I turned my attention to products that pass through the Direct to Store supply chain channel to better estimate how each of the products are flowing from supplier to customer.
Using a Box-plot to view each UoM Weight value provides us with a visual means of representing statistical control charts which record the mean, standard deviation, and quartiles of each Unit of Measurement’s Weight additionally identifying potential outliers which can be either considered or excluded from further statistical analysis. Similarly, tracing the outliers within the Direct to Store supply chain will further allow us to identify the root cause if the process were to be out of control.
Viewing a Histogram plot to view the frequency of Unit of Measurement within the Direst to Store Supply chain may indicate necessary material handling changes to better move products within the supply chain based on their weight requirement and frequency.
Using a Strip-chart to isolate UoM Weight by Case (CA) and Each (EA) will further allow us to analyze the density of UoM methods while visualize its relationship and weight frequency and requirements. Further, we can relate the material handling methods and labor requirements of each employee to evaluate both ergonomic and material handling capabilities of the facility.
#EDA HW 6 Supplemental Code: Caleb Garcia#
dat <- read.csv("https://raw.githubusercontent.com/CGarcia44/EDA-/main/6%20SKU%20Master.csv")
dat <- na.omit(dat)
dat$Uom <- as.factor(dat$Uom)
dat <- dat[dat$UomCube > 0 & dat$UomCube < 2,]
dat <- dat[dat$UomWeight > 0 & dat$UomWeight < 50,]
dat <- dat[dat$Uom %in% c("CA", "EA", "PL", "LB"),]
dat <- droplevels(dat)
boxplot(dat$UomWeight,
main = "Boxplot of UoM Weight",
xlab = "UoM Weight",
ylab ="Frequency",
font.axis = 3,
col = "cyan",
pch = 21,
bg = c("cyan"),
cex = 0.75)
Outliers <- boxplot.stats(dat$UomWeight)$out
Outliers
## [1] 49.00 46.00 45.00 46.20 45.00 45.00 46.00 46.90 45.00 46.00 48.00 48.00
## [13] 48.00 49.00 47.00 44.50 44.50 44.40 45.25 44.85 45.90 45.00 45.20 47.00
## [25] 46.35 47.00 45.00 44.70 45.70 48.60 46.35 45.20 47.10 46.80 46.80 44.70
## [37] 46.35 45.60 49.35 45.40 47.30 48.80 44.80 44.80 45.50 45.50 45.50 45.00
## [49] 45.30 49.90 48.00 45.65 45.50 45.00 45.00 45.35 45.00 45.80 47.40 45.10
## [61] 45.00 48.00 46.50 48.00 45.00 45.00 48.00 48.00 47.00 46.00 45.00 45.00
## [73] 45.15 48.00 48.30
plot(dat$UnitsPerCase,dat$UomWeight,
main = "Scatter Plot of Units Per Case by UoM Weight",
xlab = "Units Per Case",
ylab = "UoM Weight",
pch = 21,
cex = 0.5,
bg = 3,
font.axis = 2
)
plot(dat$Uom,
main = "Plot of UoM Frequency",
xlab = "Unit of Measurement [UoM]",
ylab = "Frequency",
col = c("red", "green", "cyan", "blue"),
ylim = c(0,3000)
)
boxplot(dat$UomCube~dat$Flow,
main = "Boxplot of UoM Cube on Flow",
xlab = "UoM Cube",
ylab = "Flow",
col = c("lightpink", "lightblue", "lightgreen"),
pch = 21,
bg = c("lightpink", "lightblue", "lightgreen")
)
boxplot(dat$UomWeight~dat$Uom,
main = "Boxplot of UoM Weight per UoM",
xlab = "UoM",
ylab = "UoM Weight",
col = c("light green", "yellow", "lightblue", "lightpink"),
pch = 21,
bg = c("light green", "yellow", "lightblue", "lightpink")
)
hist(dat$UomWeight,
main = "Histogram of Weight per UoM",
xlab = "UoM Weight",
ylab = "Frequency",
col = rainbow(20),
ylim = c(0,2000)
)
dat <- dat[dat$Uom %in% c("EA","CA"),]
dat <- droplevels(dat)
stripchart(dat$UomWeight~dat$Uom,
main = "UoM Weight by UoM",
xlab = "Weight",
ylab = "Unit of Measurement [UoM]",
pch = 21,
bg = c("lightpink")
)