I am going to try to complete the homework and project by giving you some examples in coding these non-parametric tests.
To keep this assignment simple, we are going to use the built in dataset ChickWeight that is included with ggplot2.
head(data)
## Weight Time Chick Diet
## 1 42 0 1 1
## 2 51 2 1 1
## 3 59 4 1 1
## 4 64 6 1 1
## 5 76 8 1 1
## 6 93 10 1 1
head(data)
## Weight Time Chick Diet
## 1 42 0 1 1
## 2 51 2 1 1
## 3 59 4 1 1
## 4 64 6 1 1
## 5 76 8 1 1
## 6 93 10 1 1
I am going to look at the difference of the summary and year cost and see if the species make a difference
df <- data[which(data$Chick %in% c("setosa","versicolor")),]
df["ChickWeight.Difference"] = df$Weight - df$Time
With that all cleaned up we run the test.
wilcox.test(data$Time, data$Chick, data = df3, paired=TRUE)
## Warning in wilcox.test.default(data$Time, data$Chick, data = df3, paired =
## TRUE): cannot compute exact p-value with ties
##
## Wilcoxon signed rank test with continuity correction
##
## data: data$Time and data$Chick
## V = 378.5, p-value = 5.994e-05
## alternative hypothesis: true location shift is not equal to 0
So we are able to reject the null hypothesis.
by(data$Chick, data$Time, median)
## data$Time: 0
## [1] 1
## ------------------------------------------------------------
## data$Time: 2
## [1] 1
## ------------------------------------------------------------
## data$Time: 4
## [1] 1
## ------------------------------------------------------------
## data$Time: 6
## [1] 1
## ------------------------------------------------------------
## data$Time: 7
## [1] 1
## ------------------------------------------------------------
## data$Time: 8
## [1] 1
## ------------------------------------------------------------
## data$Time: 10
## [1] 1
Visualize the data by boxplot
boxplot(data$Time ~ data$Weight)
by(data$Weight, data$Time, median)
## data$Time: 0
## [1] 51
## ------------------------------------------------------------
## data$Time: 2
## [1] 59
## ------------------------------------------------------------
## data$Time: 4
## [1] 64
## ------------------------------------------------------------
## data$Time: 6
## [1] 78.5
## ------------------------------------------------------------
## data$Time: 7
## [1] 76
## ------------------------------------------------------------
## data$Time: 8
## [1] 84.5
## ------------------------------------------------------------
## data$Time: 10
## [1] 93
kruskal.test(Weight ~ Time, data = data)
##
## Kruskal-Wallis rank sum test
##
## data: Weight by Time
## Kruskal-Wallis chi-squared = 3.1178, df = 6, p-value = 0.7939
Here I am able to reject the null hypothesis.
ggplot
ggplot(data = data, aes(data$Weight, data$Time, color = Chick))+
geom_point()
## Warning: Use of `data$Weight` is discouraged. Use `Weight` instead.
## Warning: Use of `data$Time` is discouraged. Use `Time` instead.