Consider the situation where we tabulate the numbers of errors made by a group of 10 subjects in translating two passages of English, of equal length, into French. We wish to test (5% level) whether there is any significant difference between the two sets of scores. Since we are not predicting the direction of any such difference, a non-directional test will be appropriate.
a <- c(8,7,4,2,4,10,17,3,2,11)
b <- c(10,6,4,5,7,11,15,6,3,14)
data <- data.frame(a,b)
data <- tbl_df(data)
data
## Source: local data frame [10 x 2]
##
## a b
## 1 8 10
## 2 7 6
## 3 4 4
## 4 2 5
## 5 4 7
## 6 10 11
## 7 17 15
## 8 3 6
## 9 2 3
## 10 11 14
Find the difference between each pair of scores.
data <- mutate(data, diffs = a-b) # Add a row of differences of 'a' - 'b'
data
## Source: local data frame [10 x 3]
##
## a b diffs
## 1 8 10 -2
## 2 7 6 1
## 3 4 4 0
## 4 2 5 -3
## 5 4 7 -3
## 6 10 11 -1
## 7 17 15 2
## 8 3 6 -3
## 9 2 3 -1
## 10 11 14 -3
Remove all ‘0’ differences
data <- filter(data, a-b != 0)
data
## Source: local data frame [9 x 3]
##
## a b diffs
## 1 8 10 -2
## 2 7 6 1
## 3 2 5 -3
## 4 4 7 -3
## 5 10 11 -1
## 6 17 15 2
## 7 3 6 -3
## 8 2 3 -1
## 9 11 14 -3
Rank the differences based on their absolute position.
data <- mutate(data, ranks = rank(abs(data$diffs)))
data
## Source: local data frame [9 x 4]
##
## a b diffs ranks
## 1 8 10 -2 4.5
## 2 7 6 1 2.0
## 3 2 5 -3 7.5
## 4 4 7 -3 7.5
## 5 10 11 -1 2.0
## 6 17 15 2 4.5
## 7 3 6 -3 7.5
## 8 2 3 -1 2.0
## 9 11 14 -3 7.5
Each rank is given the appropriate sign.
data$ranks <- ifelse(data$diffs < 0, data$ranks*-1, data$ranks)
data
## Source: local data frame [9 x 4]
##
## a b diffs ranks
## 1 8 10 -2 -4.5
## 2 7 6 1 2.0
## 3 2 5 -3 -7.5
## 4 4 7 -3 -7.5
## 5 10 11 -1 -2.0
## 6 17 15 2 4.5
## 7 3 6 -3 -7.5
## 8 2 3 -1 -2.0
## 9 11 14 -3 -7.5
The sums of the positive ranks and negative ranks is calculated separately.
sumPos <- filter(data, ranks > 0) %>% summarize(Sum_of_Positives = sum(ranks))
sumNeg <- filter(data, ranks < 0) %>% summarize(Sum_of_Negatives = sum(ranks))
data.frame(sumPos, sumNeg)
## Sum_of_Positives Sum_of_Negatives
## 1 6.5 -38.5
The smaller sum of the two, 6.5, is assigned as test statistic W.
Compare the previous value to the critical value. In this case, the critical value is 5. Since 6.5 > 5, do not reject the null hypothesis.
wilcox.test(a, b, paired = TRUE)
## Warning in wilcox.test.default(a, b, paired = TRUE): cannot compute exact
## p-value with ties
## Warning in wilcox.test.default(a, b, paired = TRUE): cannot compute exact
## p-value with zeroes
##
## Wilcoxon signed rank test with continuity correction
##
## data: a and b
## V = 6.5, p-value = 0.06275
## alternative hypothesis: true location shift is not equal to 0