Load in the SkuMaster.csv dataset.This dataset shows things such as shelf life, Lead time, method of transport, and method of packaging plus more
data1 <- (read.csv("https://raw.githubusercontent.com/vernonkat/Coursework/main/HW3SKU%20Master.csv",fileEncoding="UTF-8-BOM"))
We would only like to consider those observations for which the Cubic Feet per Unit of Measure (CA = Case, EA = Each, PL = Pallet, LB = Pound) is greater than zero and less than two, and for which the Weight per UOM is greater than zero and less than 50. We will also drop all rows containing NA values.
data2 <- data1[data1$UomCube<2 & data1$UomCube>0 & data1$UomWeight<50 & data1$UomWeight>0, ]
data3<-data2[data2$Uom=="CA" | data2$Uom=="EA" | data2$Uom== "PL" | data2$Uom=="LB", ]
library(tidyr)
dafr <- data3 %>% drop_na()
dafr<-droplevels(dafr)
We will create a boxplot on Weight per UOM. This gives us the spread of weight for all packages.
boxplot(dafr$UomWeight,xlab="Weight per UOM",main="UoM Weight",col="light Blue",horizontal=TRUE)
There are 75 outliers from this boxplot (record errors have been removed). This tells us that our weight is definitely right skewed, with an average weight solidly at 10.
To see if there is a relationship between the number of units in a package and the weight of the package we developed a scatterplot on the Units per Case and the Weight per UOM.
plot(dafr$UomWeight,dafr$UnitsPerCase,xlab="Weight per UOM",ylab="Units per Case",main="Units per Case by Weight per UoM",pch = 5)
As you can see, there does not appear to be a significant relationship between the two. You can see most of the points fall well below 200 Units per Case, and we can see the heavier blackness around the weight of 10, which aligns with our previous boxplot.
We can zoom in on the smaller Units to get a better sense of the realationship between weight and number of units.
zoomdafr <-dafr[dafr$UnitsPerCase<2000, ]
plot(zoomdafr$UomWeight,zoomdafr$UnitsPerCase,xlab="Weight per UOM",ylab="Units per Case",main="Units per Case by Weight per UoM\n(Units per Case Below 2000)",pch = 5)
Even disregarding the larger points, we can see that there is not a relationship between the two beyond a skew towards the smaller numbers on both axi. This zoomed in graph does additionally show that there are a consistent number of packages that are precisely 500 or 1000 in number of units.
Here we examine a plot showing the frequency/count with which the levels of the factor Commodity occur.
library(ggplot2)
ggplot(data=dafr, aes(x=Commodity)) +
geom_bar() +
ggtitle("Frequency of commmodity") +
theme(axis.text.x = element_text(angle = 90))
From this chart we can see that BVO5 is the highest, with GR00 coming in second. BD00 and then PK00 coming third and fourth, respectively.
This is perhaps better shown in a Pareto Chart:
library(qcc)
pchart <- pareto.chart(table(dafr$Commodity),cex.names=.5,main="Frequency of Commodities")
We then look at a plot showing the frequency/count with which the levels of the Units of Measure occur.
plot(dafr$Uom,
type = "p",
main="UOM Count",
font.main=2,
ylab="Count",
xlab="UOM",
pch=1,
col="light blue",
ylim=c(0,3000)
)
As we can see, most commonly appearing was Cases, followed by Each, by the Pound, and then by the Pallet.
Here we eaxmine a side-by-side boxplot of Cubic Feet by UoM by the types of Flow. This can tell us a relationship between how much space is usually taken up by each package, depending on the method of delivery to the store.
boxplot((dafr$UomCube)~(dafr$Flow),data=dafr,xlab="Flow",ylab="Cubic Feet by Uom",main="Cubic Feet by UoM by the Types of Flow",col=rainbow(4))
As shown, the Diredt to Store method, shown in green, is much much lower than then other two methods. We can state that direct to store has significantly lower physical sizes than the other methods of shipping.
We now restrict the attention to only those products that pass through the Direct to Store supply chain channel.
dafr2 <- dafr[dafr$Flow==c("DD"), ]
dafr2<-droplevels(dafr2)
Performing the same boxplot analysis on only the Direct to Store method shows a much smaller range of data, as well a very tight concentration in the lower weight
boxplot(dafr2$UomWeight,xlab="Weight per UoM",col="yellow",main="Boxplot of Weight per UoM",horizontal=TRUE)
Create a histogram on Weight per UoM
hist(dafr2$UomWeight,xlab="UoM Weight",main="Histogram of UoM Weight",col="Light Green",ylim=c(0,20))
As seen in the boxplot, the frequency graph also shows a concentration of lower weights.
We then wanted to see which SKU had the highest weight.
dotchart(dafr2$UomWeight,dafr2$SkuNbr,ylab="",xlab="Weight per Uom",pch=5,main="Weights by UoM per SKU",cex=.75)
By the graph above, it seems 09990402 is the highest weighted item.
We can also see below that the Cases in general had the highest weights, while the “Each” items have a concentration in the lower weights.
stripchart(dafr2$UomWeight~dafr2$Uom,xlab="UomWeight",ylab="Uom",pch=2,method="jitter",main="Weight per Uom by the Units of Measure")