Filter the dataframe to keep only the variables UnitsPerCase, LeadTime, UoMCube, UoMWeight, and ShelfLifeDays
#Load in the SkuMaster.csv dataset.
data1 <- (read.csv("https://raw.githubusercontent.com/vernonkat/Coursework/main/HW3SKU%20Master.csv",fileEncoding="UTF-8-BOM"))
# We would only like to consider those observations for which the Cubic Feet per UOM is greater than zero and less than two,
# and for which the Weight per UOM is greater than zero and less than 50.
data2 <- data1[data1$UomCube<2 & data1$UomCube>0 & data1$UomWeight<50 & data1$UomWeight>0, ]
#Note also that only the factor levels of Case(CA), Each(EA), Pallet(PL), and Pound(PL) for the Units of Measure (UoM) are admissible,
# all observations with other designations should be omitted.
data3 <- data2[data2$Uom%in%c("CA","EA","PL","LB"), ]
# In addition, all rows with NA should be dropped.
library(tidyr)
dafr <- data3 %>% drop_na()
dafr<-droplevels(dafr)
dafr <- dafr[ ,c(6,7,8,9,11)]
Create a correlation plot for all the variables. Show the numerical correlation on this plot and keep only the upper potion
library(corrplot)
## corrplot 0.92 loaded
corrplot(cor(dafr),type="upper",method="number")
Test for the significance of all correlations
library(Hmisc)
rcorr(as.matrix(dafr))
## UnitsPerCase LeadTime UomCube UomWeight ShelfLifeDays
## UnitsPerCase 1.00 0.06 0.07 0.03 0.02
## LeadTime 0.06 1.00 0.14 0.08 0.03
## UomCube 0.07 0.14 1.00 0.34 0.14
## UomWeight 0.03 0.08 0.34 1.00 -0.05
## ShelfLifeDays 0.02 0.03 0.14 -0.05 1.00
##
## n= 4804
##
##
## P
## UnitsPerCase LeadTime UomCube UomWeight ShelfLifeDays
## UnitsPerCase 0.0001 0.0000 0.0438 0.2979
## LeadTime 0.0001 0.0000 0.0000 0.0449
## UomCube 0.0000 0.0000 0.0000 0.0000
## UomWeight 0.0438 0.0000 0.0000 0.0002
## ShelfLifeDays 0.2979 0.0449 0.0000 0.0002
It looks like the only relationships that are not significant are ShelfLifeDays with UnitPerCase or LeadTime, or UomWeight with UnitsPerCase.
Examine variables UoMWeight and ShelfLifeDays for a linear relationship (Pearsons) and test for the significance of this correlation.
dafr$ShelfLifeDays <- as.numeric(dafr$ShelfLifeDays)
cor.test(dafr$UomWeight,dafr$ShelfLifeDays)
##
## Pearson's product-moment correlation
##
## data: dafr$UomWeight and dafr$ShelfLifeDays
## t = -3.7576, df = 4802, p-value = 0.0001736
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08229825 -0.02590562
## sample estimates:
## cor
## -0.05414511
This correlations test using Pearson’s (the default for cor.test) gives us a correlation relationship of -.054, and a p-value of below any normal alpha (.0001736). This shows the two variables are significantly correlated, though the relationship is not strong. # 5 Examine variables UoMWeight and ShelfLifeDays for a monotonic relationship (Spearmans) and test for the significance of this correlation.
cor.test(dafr$UomWeight,dafr$ShelfLifeDays,method="spearman")
## Warning in cor.test.default(dafr$UomWeight, dafr$ShelfLifeDays, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: dafr$UomWeight and dafr$ShelfLifeDays
## S = 1.8657e+10, p-value = 0.5021
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.009686691
Using the non parametric correlation test gets us a value of -.009 (very weak relationship) and a p-vlaue of .5021, which is well above any normal alpha, implying that we should not reject the null hypothesis and instead state that there is not a significant effect of one variable on another.
Comment/compare the findings from questions 4 and 5
They got very different results. The Pearson’s value was significant statistically and additionally had a much higher strength of relationship. where as Spearman’s was both not significant and a lower strength of relationship.
Examine variables UoMWeight and UoMCube for a linear relationship (Pearsons) and test for the significance of this correlation.
cor.test(dafr$UomWeight,dafr$UomCube,method="pearson")
##
## Pearson's product-moment correlation
##
## data: dafr$UomWeight and dafr$UomCube
## t = 24.762, df = 4802, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3111763 0.3613351
## sample estimates:
## cor
## 0.3364943
Here we can see that our relationship between the two variables is highly significant at a p-value of below 2.2e-16, and a strong relationship of .33.
Examine variables UoMWeight and UoMCube for a monotonic relationship (Spearmans) and test for the significance of this correlation.
cor.test(dafr$UomWeight,dafr$UomCube,method="spearman")
## Warning in cor.test.default(dafr$UomWeight, dafr$UomCube, method = "spearman"):
## Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: dafr$UomWeight and dafr$UomCube
## S = 1.244e+10, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.3267811
Here we can see that our relationship between the two variables is highly significant at a p-value of below 2.2e-16, and a strong relationship of .33 (rounded)
Comment/compare the findings from questions 7 and 8
This time we got very similar results from the two tests. It would appear that there is almost no difference between the parametric and nonparametric tests for these two variables.
Given that when comparing UomWeight and Shelf Life days, our spearmans test had very different results from our pearsons test, but when we compared UomWeight and UomCube the results were very similar. It may be worth checking if Shelf Life Days is normally distributed before continuing any analysis in this way,