https://raw.githubusercontent.com/Garcia-Ry/IE-4331-EDA/main/SKU%20Master.csv
Filter the data according to the following:
Consider only those observations for which the Cubic Feet per UOM is greater than zero and less than two, and for which the Weight per UOM is greater than zero and less than 50. Note also that only the factor levels of Case(CA), Each(EA), Pallet(PL), and Pound(PL) for the Units of Measure (UoM) are admissible, all observations with other designations should be omitted. In addition, all rows with NA should be dropped.
Filter the dataframe to keep only the variables UnitsPerCase, LeadTime, UoMCube, UoMWeight, and ShelfLifeDays.
After completing the given instruction, the table should look similar to:
Create a correlation plot for all the variables. Show the numerical correlation on this plot and keep only the upper potion
Test for the significance of all correlations useing the pearson method
## UnitsPerCase LeadTime UomCube UomWeight ShelfLifeDays
## UnitsPerCase 1.00 0.05 0.01 0.04 0.03
## LeadTime 0.05 1.00 -0.01 0.04 0.04
## UomCube 0.01 -0.01 1.00 0.32 0.01
## UomWeight 0.04 0.04 0.32 1.00 -0.06
## ShelfLifeDays 0.03 0.04 0.01 -0.06 1.00
##
## n= 5158
##
##
## P
## UnitsPerCase LeadTime UomCube UomWeight ShelfLifeDays
## UnitsPerCase 0.0004 0.3770 0.0020 0.0394
## LeadTime 0.0004 0.5436 0.0038 0.0074
## UomCube 0.3770 0.5436 0.0000 0.5140
## UomWeight 0.0020 0.0038 0.0000 0.0000
## ShelfLifeDays 0.0394 0.0074 0.5140 0.0000
Overall the only linear correlation of significance is UomCube and UomWeight.
cor.test(dat2$UomWeight, dat2$ShelfLifeDays)
##
## Pearson's product-moment correlation
##
## data: dat2$UomWeight and dat2$ShelfLifeDays
## t = -4.1199, df = 5156, p-value = 3.849e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08444154 -0.03003769
## sample estimates:
## cor
## -0.05728214
cor.test(dat2$UomWeight, dat2$ShelfLifeDays, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: dat2$UomWeight and dat2$ShelfLifeDays
## S = 2.3311e+10, p-value = 0.1673
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.01922875
The tested variables both have a negative correlation, where there is a stronger linear correlation as compared to the monotonic relationship
cor.test(dat2$UomWeight, dat2$UomCube)
##
## Pearson's product-moment correlation
##
## data: dat2$UomWeight and dat2$UomCube
## t = 24.516, df = 5156, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2984546 0.3473426
## sample estimates:
## cor
## 0.3231141
cor.test(dat2$UomWeight, dat2$UomCube, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: dat2$UomWeight and dat2$UomCube
## S = 1.5426e+10, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.3255434
The tested variables both have a weak positive correlation.
#IE EDA - Assignment 8
# Luis Araiza and Ryan Garcia
###### Creating Data frame from Assignment 3 ######
dat<-read.csv("https://raw.githubusercontent.com/Garcia-Ry/IE-4331-EDA/main/SKU%20Master.csv")
head(dat)
colnames(dat)
colnames(dat)[colnames(dat)=="?..SkuNbr"]<-"SkuNbr" #Rename SkuNbr column
colnames(dat)
# dat$SkuNbr=as.numeric(dat$SkuNbr) #SkuNbr is character because of some with letters
dat$Flow=as.factor(dat$Flow)
dat$SkuNbr=suppressWarnings(as.numeric(dat$SkuNbr))
dat$Whs=as.factor(dat$Whs)
dat$Uom=as.factor(dat$Uom)
dat$Commodity=as.factor(dat$Commodity)
head(dat)
#Only Consider if UomCube is greater than zero and less than two
dat<-dat[dat$UomCube>0&&dat$UomCube<2,]
(head)dat
#Only Consider if UomWeight is greater than zero and less than fifty
dat<-dat[dat$UomWeight>0&&dat$UomWeight<50,]
head(dat)
#Only consider if Uom is CA, EA, PL, or LB
dat<-dat[dat$Uom=="CA"|dat$Uom=="EA"|dat$Uom=="PL"|dat$Uom=="LB",]
head(dat)
#Omit all NA rows
dat<-na.omit(dat)
head(dat)
dat<-droplevels(dat) #Drops unsued factors from Dataframe
head(dat)
###### Begin New Material for Assignment 8 ######
## 1. Filter the dataframe to keep only the variables UnitsPerCase, LeadTime,
## UoMCube, UoMWeight, and ShelfLifeDays
dat2<-dat[,c(6,7,8,9,11)]
head(dat2)
## 2. Create a correlation plot for all the variables. Show the numerical
## correlation on this plot and keep only the upper potion
library(corrplot)
corrplot(cor(dat2),type="upper")
## 3. Test for the significance of all correlations
library(Hmisc)
rcorr(as.matrix(dat2))
## 4. Examine variables UoMWeight and ShelfLifeDays for a linear relationship
## (Pearsons) and test for the significance of this correlation.
cor.test(dat2$UomWeight, dat2$ShelfLifeDays)
## 5. Examine variables UoMWeight and ShelfLifeDays for a monotonic relationship
## (Spearmans) and test for the significance of this correlation.
cor.test(dat2$UomWeight, dat2$ShelfLifeDays, method = "spearman")
## 6. Comment/compare the findings from questions 4 and 5
# The tested variables both have a negative correlation, where there is a
# stronger linear correlation as compared to the monotonic relationship
## 7. Examine variables UoMWeight and UoMCube for a linear relationship
## (Pearsons) and test for the significance of this correlation.
cor.test(dat2$UomWeight, dat2$UomCube)
## 8. Examine variables UoMWeight and UoMCube for a monotonic relationship
## (Spearmans) and test for the significance of this correlation.
cor.test(dat2$UomWeight, dat2$UomCube, method = "spearman")
## 9. Comment/compare the findings from questions 7 and 8
# The tested variables both have a weak positive correlation.