Customer Profit

I want to finish My Customer Profit statistics project with some analysis using the non-parametric methods. I am going to try to repeat many of the analysis I have already made (except for Matched Pairs).

Hypothesis are

$$

H0 : mM = mW

Ha : mM ≠ mW

$$

Visualize the data

ggplot(data = data,aes(x = data$Ship_Mode, y = data$Profit))+
  geom_boxplot()
## Warning: Use of `data$Ship_Mode` is discouraged. Use `Ship_Mode` instead.
## Warning: Use of `data$Profit` is discouraged. Use `Profit` instead.

wilcox.test(data$Profit, data$Shipping_Cost, data = data, paired=TRUE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  data$Profit and data$Shipping_Cost
## V = 34980, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

Here I am able to reject the null hypothesis.

Use a categorical variable and a quantitative variable to compare two medians using Wilcoxon ranked sum test.

data$Profit[is.na(data$Profit)] <- 0
data$Unit_Price[is.na(data$Unit_Price)] <- 0
data$Shipping_Cost[is.na(data$Shipping_Cost)] <- 0
data[which(data$Unit_Price <= data$Shipping_Cost),"PROFIT"] = "Less"
data[which(data$Unit_Price > data$Shipping_Cost),"PROFIT"] = "More"

Let’s summarize the data.

Median Value

by(data$Profit,data$PROFIT, median)
## data$PROFIT: Less
## [1] 1934
## ------------------------------------------------------------ 
## data$PROFIT: More
## [1] 1928
table(data$PROFIT)
## 
## Less More 
##  145  119

Let’s run the wilcox test

wilcox.test(data$Profit~data$PROFIT, data = data)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  data$Profit by data$PROFIT
## W = 9852.5, p-value = 0.04727
## alternative hypothesis: true location shift is not equal to 0

Here I am fail to reject the null hypothesis.

Wilcox Sign Test

I could subtract the two columns or just add that they are paired. I’ll show that both are the same

wilcox.test(data$Profit, data$Shipping_Cost, paired = TRUE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  data$Profit and data$Shipping_Cost
## V = 34980, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(data$Profit - data$Shipping_Cost)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  data$Profit - data$Shipping_Cost
## V = 34980, p-value < 2.2e-16
## alternative hypothesis: true location is not equal to 0

Kruskal-Wallis

Just like the ANOVA, compare between two categories.

data$Profit[is.na(data$Profit)] <- 0
data$Unit_Price[is.na(data$Unit_Price)] <- 0
data[which(data$Profit < data$Unit_Price),"PROFIT"] = "Less"
data[which(data$Profit > data$Unit_Price),"PROFIT"] = "More"
data[which(data$Profit == data$Unit_Price),"PROFIT"] = "Equal"

Looking at the median and seeing if the Profit stay the same

by(data$Profit,data$PROFIT, median)
## data$PROFIT: More
## [1] 1931
kruskal.test(data$Profit ~ data$Shipping_Cost, data = data)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  data$Profit by data$Shipping_Cost
## Kruskal-Wallis chi-squared = 47.591, df = 15, p-value = 2.959e-05

Here I am able to reject the null hypothesis.

Visualize the data by boxplot

boxplot(data$Profit~data$Ship_Mode)

Visualize the data by ggplot

ggplot(data = data,aes(x = data$Profit, y = data$Customer_Name))+
  geom_boxplot()
## Warning: Use of `data$Profit` is discouraged. Use `Profit` instead.
## Warning: Use of `data$Customer_Name` is discouraged. Use `Customer_Name`
## instead.

Spearman Rank Correlation

Two quantitative variables, preform a Spearman rank correlation test.

cor.test(data$Profit, data$Unit_Price, method = "spearman")
## Warning in cor.test.default(data$Profit, data$Unit_Price, method = "spearman"):
## Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  data$Profit and data$Unit_Price
## S = 3054926, p-value = 0.951
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##         rho 
## 0.003800163

Visualize the data

ggplot(data = data, aes(data$Profit, data$Customer_Name, color = Shipping_Cost))+
  geom_point()
## Warning: Use of `data$Profit` is discouraged. Use `Profit` instead.
## Warning: Use of `data$Customer_Name` is discouraged. Use `Customer_Name`
## instead.

The hypothesis will be about ρS the Spearman Rank correlation.

$$

H0:ρS=0

Ha:ρS≠0

$$ There is a strong correlation! We’ll reject the null hypothesis that the spearman correlation is zero but we do not know if we have a linear relationship.