testScores <- c(79,74,88,80,80,66,65,86,84,80,78,72,71,74,86,96,77,81,76,80,
76,75,78,87,87,74,85,84,76,77,76,74,85,74,76,77,76,74,81,76)
(a) Summarize the data. Is there any initial evidence that the hypothesis is true?
summary(testScores)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 65.00 74.75 77.00 78.53 81.75 96.00
Summary of the test scores above reports a median of 77.00
This provides some initial evidence that our hypothesis about seeing an increase in median exam scores could be true, given that the
previous median score was 7 points lower (70.0 vs 77.0)
(b) State the null and alternative hypothesis that we wish to test here. Also state the level \(\alpha\) that you will be using for this test.
\(H_{0}\): After revision of course content, the median exam score did not change (median = 70.00)
\(H_{1}\): After revision of course content, the median exam score increased (median > 70.00)
\(\alpha\) = .05
(c) Test the hypothesis by using the binomial test.
binom.test(sum((testScores>70)), length(testScores), .50, alternative="greater")
##
## Exact binomial test
##
## data: sum((testScores > 70)) and length(testScores)
## number of successes = 38, number of trials = 40, p-value =
## 7.467e-10
## alternative hypothesis: true probability of success is greater than 0.5
## 95 percent confidence interval:
## 0.850848 1.000000
## sample estimates:
## probability of success
## 0.95
(d) State your conclusion and p-value.
We reject the null hypothesis that the median exam score did not change in the semester following course revision (p<.0001).
library(gtools)
combinations(8,4)
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 1 2 3 5
## [3,] 1 2 3 6
## [4,] 1 2 3 7
## [5,] 1 2 3 8
## [6,] 1 2 4 5
## [7,] 1 2 4 6
## [8,] 1 2 4 7
## [9,] 1 2 4 8
## [10,] 1 2 5 6
## [11,] 1 2 5 7
## [12,] 1 2 5 8
## [13,] 1 2 6 7
## [14,] 1 2 6 8
## [15,] 1 2 7 8
## [16,] 1 3 4 5
## [17,] 1 3 4 6
## [18,] 1 3 4 7
## [19,] 1 3 4 8
## [20,] 1 3 5 6
## [21,] 1 3 5 7
## [22,] 1 3 5 8
## [23,] 1 3 6 7
## [24,] 1 3 6 8
## [25,] 1 3 7 8
## [26,] 1 4 5 6
## [27,] 1 4 5 7
## [28,] 1 4 5 8
## [29,] 1 4 6 7
## [30,] 1 4 6 8
## [31,] 1 4 7 8
## [32,] 1 5 6 7
## [33,] 1 5 6 8
## [34,] 1 5 7 8
## [35,] 1 6 7 8
## [36,] 2 3 4 5
## [37,] 2 3 4 6
## [38,] 2 3 4 7
## [39,] 2 3 4 8
## [40,] 2 3 5 6
## [41,] 2 3 5 7
## [42,] 2 3 5 8
## [43,] 2 3 6 7
## [44,] 2 3 6 8
## [45,] 2 3 7 8
## [46,] 2 4 5 6
## [47,] 2 4 5 7
## [48,] 2 4 5 8
## [49,] 2 4 6 7
## [50,] 2 4 6 8
## [51,] 2 4 7 8
## [52,] 2 5 6 7
## [53,] 2 5 6 8
## [54,] 2 5 7 8
## [55,] 2 6 7 8
## [56,] 3 4 5 6
## [57,] 3 4 5 7
## [58,] 3 4 5 8
## [59,] 3 4 6 7
## [60,] 3 4 6 8
## [61,] 3 4 7 8
## [62,] 3 5 6 7
## [63,] 3 5 6 8
## [64,] 3 5 7 8
## [65,] 3 6 7 8
## [66,] 4 5 6 7
## [67,] 4 5 6 8
## [68,] 4 5 7 8
## [69,] 4 6 7 8
## [70,] 5 6 7 8
p.val<-1/70; p.val
## [1] 0.01428571
One-sided p-value: probability of an event that would be the same or more extreme than we observed. Our observed data can be illustrated
by looking at combination 70 for treatment 1. In that case, treatment 1 would take values [5,6,7,8] and treatment 2 would have to take values
[1,2,3,4]. Combinations 1-69 would allow 1 or more observations from treatment 2 to take value(s) greater than treatment 1.
P(event >= combination [5,6,7,8]) = 0.014
siblings<-data.frame(hometown=c(rep("rural",24),rep("urban",17)),
siblings=c(3,2,1,1,2,1,3,2,2,2,2,5,1,4,1,1,1,1,6,2,2,2,1,1,1,0,1,1,0,0,1,1,1,8,1,1,1,0,1,1,2))
(a) Test for a significant difference between rural and urban areas using the Wilcoxon rank-sum test.(Make sure to state your hypothesis, p-value, and conclusion!)
wilcox.test(siblings ~ hometown, data=siblings)
## Warning in wilcox.test.default(x = c(3, 2, 1, 1, 2, 1, 3, 2, 2, 2, 2, 5, :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: siblings by hometown
## W = 314.5, p-value = 0.001598
## alternative hypothesis: true location shift is not equal to 0
\(H_{0}\): There is no difference in the median number of siblings reported among rural and urban students. Median number of siblings is the
same for both groups.
\(H_{1}\): There is a difference in the median number of siblings reported among rural and urban students. Median number of siblings is not
the same for both groups.
P-value: 0.001598
Using the Wilcoxon Rank-Sum Test, we reject the null hypothesis (p=0.001598 < .05) and suggest, with sufficient evidence, that there is a
statistically significant difference in the median number of siblings between the rural and urban students.
(b) Test for a significant difference using a permutation test. (Hint: Is checking all possible permutations feasible here?)
siblings$rank<-rank(siblings$siblings)
W2<-sum(siblings$rank[siblings$hometown=="urban"])
W2
## [1] 246.5
set.seed(1234)
nsims<-10000
rankSumPerms<-rep(NA,nsims)
for (i in 1:nsims){
rankSumPerms[i]<- sum(sample(1:41,17,replace=FALSE))
}
(sum(rankSumPerms<=W2)/nsims)*2
## [1] 0.0026
I don’t believe checking all possible permutations is feasible.
pets<-data.frame(animal=c(rep("cat",6),rep("dog",6)),values=c(100,830,86,89,670,69,8,2,6,9,87,82))
wilcox.test(values ~ animal, data=pets)
##
## Wilcoxon rank sum test
##
## data: values by animal
## W = 33, p-value = 0.01515
## alternative hypothesis: true location shift is not equal to 0
t.test(values ~ animal, data = pets)
##
## Welch Two Sample t-test
##
## data: values by animal
## t = 1.9296, df = 5.1365, p-value = 0.11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -88.4516 638.4516
## sample estimates:
## mean in group cat mean in group dog
## 307.33333 32.33333
Wilcoxon P-value: 0.01515
T-test P-value: 0.11
trt1 <- c(21.9,20.2,19.4,20.3,19.6,20.4,18.4,20.1,22.0,18.9)
trt2 <- c(20.2,13.8,21.8,19.2,19.6,25.5,17.0,17.6,19.5,22.2)
ansari.test(trt1,trt2)
## Warning in ansari.test.default(trt1, trt2): cannot compute exact p-value
## with ties
##
## Ansari-Bradley test
##
## data: trt1 and trt2
## AB = 64, p-value = 0.1707
## alternative hypothesis: true ratio of scales is not equal to 1
\(H_{0}\): Variance(Trt1)=Variance(Trt2)
\(H_{1}\): Variance(Trt1)=/=Variance(Trt2)
P-value = 0.1707
Since p>0.05, we fail-to-reject the null hypothesis and do not have sufficient evidence to suggest that there is a statistically significant
difference in the treatment variances.