I want to finish My Customer Profit statistics project with some analysis using the non-parametric methods. I am going to try to repeat many of the analysis I have already made (except for Matched Pairs).
Hypothesis are
$$
H0 : mM = mW
Ha : mM ≠ mW
$$
Visualize the data
ggplot(data = data,aes(x = data$Ship_Mode, y = data$Profit))+
geom_boxplot()
## Warning: Use of `data$Ship_Mode` is discouraged. Use `Ship_Mode` instead.
## Warning: Use of `data$Profit` is discouraged. Use `Profit` instead.
wilcox.test(data$Profit, data$Shipping_Cost, data = data, paired=TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: data$Profit and data$Shipping_Cost
## V = 34980, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
Here I am able to reject the null hypothesis.
data$Profit[is.na(data$Profit)] <- 0
data$Unit_Price[is.na(data$Unit_Price)] <- 0
data$Shipping_Cost[is.na(data$Shipping_Cost)] <- 0
data[which(data$Unit_Price <= data$Shipping_Cost),"PROFIT"] = "Less"
data[which(data$Unit_Price > data$Shipping_Cost),"PROFIT"] = "More"
Let’s summarize the data.
Median Value
by(data$Profit,data$PROFIT, median)
## data$PROFIT: Less
## [1] 1934
## ------------------------------------------------------------
## data$PROFIT: More
## [1] 1928
table(data$PROFIT)
##
## Less More
## 145 119
Let’s run the wilcox test
wilcox.test(data$Profit~data$PROFIT, data = data)
##
## Wilcoxon rank sum test with continuity correction
##
## data: data$Profit by data$PROFIT
## W = 9852.5, p-value = 0.04727
## alternative hypothesis: true location shift is not equal to 0
Here I am fail to reject the null hypothesis.
I could subtract the two columns or just add that they are paired. I’ll show that both are the same
wilcox.test(data$Profit, data$Shipping_Cost, paired = TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: data$Profit and data$Shipping_Cost
## V = 34980, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(data$Profit - data$Shipping_Cost)
##
## Wilcoxon signed rank test with continuity correction
##
## data: data$Profit - data$Shipping_Cost
## V = 34980, p-value < 2.2e-16
## alternative hypothesis: true location is not equal to 0
Just like the ANOVA, compare between two categories.
data$Profit[is.na(data$Profit)] <- 0
data$Unit_Price[is.na(data$Unit_Price)] <- 0
data[which(data$Profit < data$Unit_Price),"PROFIT"] = "Less"
data[which(data$Profit > data$Unit_Price),"PROFIT"] = "More"
data[which(data$Profit == data$Unit_Price),"PROFIT"] = "Equal"
Looking at the median and seeing if the Profit stay the same
by(data$Profit,data$PROFIT, median)
## data$PROFIT: More
## [1] 1931
kruskal.test(data$Profit ~ data$Shipping_Cost, data = data)
##
## Kruskal-Wallis rank sum test
##
## data: data$Profit by data$Shipping_Cost
## Kruskal-Wallis chi-squared = 47.591, df = 15, p-value = 2.959e-05
Here I am able to reject the null hypothesis.
Visualize the data by boxplot
boxplot(data$Profit~data$Ship_Mode)
Visualize the data by ggplot
ggplot(data = data,aes(x = data$Profit, y = data$Customer_Name))+
geom_boxplot()
## Warning: Use of `data$Profit` is discouraged. Use `Profit` instead.
## Warning: Use of `data$Customer_Name` is discouraged. Use `Customer_Name`
## instead.
Two quantitative variables, preform a Spearman rank correlation test.
cor.test(data$Profit, data$Unit_Price, method = "spearman")
## Warning in cor.test.default(data$Profit, data$Unit_Price, method = "spearman"):
## Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: data$Profit and data$Unit_Price
## S = 3054926, p-value = 0.951
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.003800163
Visualize the data
ggplot(data = data, aes(data$Profit, data$Customer_Name, color = Shipping_Cost))+
geom_point()
## Warning: Use of `data$Profit` is discouraged. Use `Profit` instead.
## Warning: Use of `data$Customer_Name` is discouraged. Use `Customer_Name`
## instead.
The hypothesis will be about ρS the Spearman Rank correlation.
$$
H0:ρS=0
Ha:ρS≠0
$$ There is a strong correlation! We’ll reject the null hypothesis that the spearman correlation is zero but we do not know if we have a linear relationship.