Homework #1 RMarkdown

QUESTION 1

Suppose the exam was given in the semester after the course content was revised, and the previous median exam score was 70. We would like to know whether or not the median score has increased.

testScores <- c(79,74,88,80,80,66,65,86,84,80,78,72,71,74,86,96,77,81,76,80,
                76,75,78,87,87,74,85,84,76,77,76,74,85,74,76,77,76,74,81,76)

(a) Summarize the data. Is there any initial evidence that the hypothesis is true?

summary(testScores)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   65.00   74.75   77.00   78.53   81.75   96.00

        Summary of the test scores above reports a median of 77.00
        This provides some initial evidence that our hypothesis about seeing an increase in median exam scores could be true, given that the
        previous median score was 7 points lower (70.0 vs 77.0)

(b) State the null and alternative hypothesis that we wish to test here. Also state the level \(\alpha\) that you will be using for this test.
        \(H_{0}\): After revision of course content, the median exam score did not change (median = 70.00)
        \(H_{1}\): After revision of course content, the median exam score increased (median > 70.00)
        \(\alpha\) = .05

(c) Test the hypothesis by using the binomial test.

binom.test(sum((testScores>70)), length(testScores), .50, alternative="greater")

## 
##  Exact binomial test
## 
## data:  sum((testScores > 70)) and length(testScores)
## number of successes = 38, number of trials = 40, p-value =
## 7.467e-10
## alternative hypothesis: true probability of success is greater than 0.5
## 95 percent confidence interval:
##  0.850848 1.000000
## sample estimates:
## probability of success 
##                   0.95

(d) State your conclusion and p-value.
We reject the null hypothesis that the median exam score did not change in the semester following course revision (p<.0001).

QUESTION 2

A certain data set has eight distinct observations, four from each treatment, and all of the observations from treatment 1 are bigger than the observations from treatment 2. What is the one-sided p-value associated with the permutation test? Show how you arrived at this p-value.

library(gtools)
combinations(8,4)

##       [,1] [,2] [,3] [,4]
##  [1,]    1    2    3    4
##  [2,]    1    2    3    5
##  [3,]    1    2    3    6
##  [4,]    1    2    3    7
##  [5,]    1    2    3    8
##  [6,]    1    2    4    5
##  [7,]    1    2    4    6
##  [8,]    1    2    4    7
##  [9,]    1    2    4    8
## [10,]    1    2    5    6
## [11,]    1    2    5    7
## [12,]    1    2    5    8
## [13,]    1    2    6    7
## [14,]    1    2    6    8
## [15,]    1    2    7    8
## [16,]    1    3    4    5
## [17,]    1    3    4    6
## [18,]    1    3    4    7
## [19,]    1    3    4    8
## [20,]    1    3    5    6
## [21,]    1    3    5    7
## [22,]    1    3    5    8
## [23,]    1    3    6    7
## [24,]    1    3    6    8
## [25,]    1    3    7    8
## [26,]    1    4    5    6
## [27,]    1    4    5    7
## [28,]    1    4    5    8
## [29,]    1    4    6    7
## [30,]    1    4    6    8
## [31,]    1    4    7    8
## [32,]    1    5    6    7
## [33,]    1    5    6    8
## [34,]    1    5    7    8
## [35,]    1    6    7    8
## [36,]    2    3    4    5
## [37,]    2    3    4    6
## [38,]    2    3    4    7
## [39,]    2    3    4    8
## [40,]    2    3    5    6
## [41,]    2    3    5    7
## [42,]    2    3    5    8
## [43,]    2    3    6    7
## [44,]    2    3    6    8
## [45,]    2    3    7    8
## [46,]    2    4    5    6
## [47,]    2    4    5    7
## [48,]    2    4    5    8
## [49,]    2    4    6    7
## [50,]    2    4    6    8
## [51,]    2    4    7    8
## [52,]    2    5    6    7
## [53,]    2    5    6    8
## [54,]    2    5    7    8
## [55,]    2    6    7    8
## [56,]    3    4    5    6
## [57,]    3    4    5    7
## [58,]    3    4    5    8
## [59,]    3    4    6    7
## [60,]    3    4    6    8
## [61,]    3    4    7    8
## [62,]    3    5    6    7
## [63,]    3    5    6    8
## [64,]    3    5    7    8
## [65,]    3    6    7    8
## [66,]    4    5    6    7
## [67,]    4    5    6    8
## [68,]    4    5    7    8
## [69,]    4    6    7    8
## [70,]    5    6    7    8

p.val<-1/70; p.val

## [1] 0.01428571

        One-sided p-value: probability of an event that would be the same or more extreme than we observed. Our observed data can be illustrated
        by looking at combination 70 for treatment 1. In that case, treatment 1 would take values [5,6,7,8] and treatment 2 would have to take values
        [1,2,3,4]. Combinations 1-69 would allow 1 or more observations from treatment 2 to take value(s) greater than treatment 1.
        P(event >= combination [5,6,7,8]) = 0.014

QUESTION 3

Students in the introductory statistics class were asked how many brothers and sisters they have and whether their hometown is urban or rural.

siblings<-data.frame(hometown=c(rep("rural",24),rep("urban",17)),
                     siblings=c(3,2,1,1,2,1,3,2,2,2,2,5,1,4,1,1,1,1,6,2,2,2,1,1,1,0,1,1,0,0,1,1,1,8,1,1,1,0,1,1,2))

(a) Test for a significant difference between rural and urban areas using the Wilcoxon rank-sum test.(Make sure to state your hypothesis, p-value, and conclusion!)

wilcox.test(siblings ~ hometown, data=siblings)

## Warning in wilcox.test.default(x = c(3, 2, 1, 1, 2, 1, 3, 2, 2, 2, 2, 5, :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  siblings by hometown
## W = 314.5, p-value = 0.001598
## alternative hypothesis: true location shift is not equal to 0

        \(H_{0}\): There is no difference in the median number of siblings reported among rural and urban students. Median number of siblings is the
        same for both groups.
        \(H_{1}\): There is a difference in the median number of siblings reported among rural and urban students. Median number of siblings is not
        the same for both groups.
        P-value: 0.001598

Using the Wilcoxon Rank-Sum Test, we reject the null hypothesis (p=0.001598 < .05) and suggest, with sufficient evidence, that there is a
statistically significant difference in the median number of siblings between the rural and urban students.

(b) Test for a significant difference using a permutation test. (Hint: Is checking all possible permutations feasible here?)

siblings$rank<-rank(siblings$siblings)
W2<-sum(siblings$rank[siblings$hometown=="urban"])
W2

## [1] 246.5

set.seed(1234)
nsims<-10000
rankSumPerms<-rep(NA,nsims)
for (i in 1:nsims){
  rankSumPerms[i]<- sum(sample(1:41,17,replace=FALSE))
}

(sum(rankSumPerms<=W2)/nsims)*2

## [1] 0.0026

I don’t believe checking all possible permutations is feasible.

QUESTION 4: Create a fictitous data set where the Wilcoxon rank-sum test and the two- sample t-test lead to different conclusions at the 5% level of significance.

pets<-data.frame(animal=c(rep("cat",6),rep("dog",6)),values=c(100,830,86,89,670,69,8,2,6,9,87,82))
wilcox.test(values ~ animal, data=pets)

## 
##  Wilcoxon rank sum test
## 
## data:  values by animal
## W = 33, p-value = 0.01515
## alternative hypothesis: true location shift is not equal to 0

t.test(values ~ animal, data = pets)

## 
##  Welch Two Sample t-test
## 
## data:  values by animal
## t = 1.9296, df = 5.1365, p-value = 0.11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -88.4516 638.4516
## sample estimates:
## mean in group cat mean in group dog 
##         307.33333          32.33333

Wilcoxon P-value: 0.01515
T-test P-value: 0.11

QUESTION 5:

The simulated data below are from two normal distributions with equal means and possibly unequal variances. Test for differences in the scale parameters using the Ansari-Bradley test. Make sure to state the hypotheses, your conclusion, and your p-value.

trt1 <- c(21.9,20.2,19.4,20.3,19.6,20.4,18.4,20.1,22.0,18.9)
trt2 <- c(20.2,13.8,21.8,19.2,19.6,25.5,17.0,17.6,19.5,22.2)

ansari.test(trt1,trt2)

## Warning in ansari.test.default(trt1, trt2): cannot compute exact p-value
## with ties

## 
##  Ansari-Bradley test
## 
## data:  trt1 and trt2
## AB = 64, p-value = 0.1707
## alternative hypothesis: true ratio of scales is not equal to 1

        \(H_{0}\): Variance(Trt1)=Variance(Trt2)
        \(H_{1}\): Variance(Trt1)=/=Variance(Trt2)
        P-value = 0.1707
        Since p>0.05, we fail-to-reject the null hypothesis and do not have sufficient evidence to suggest that there is a statistically significant
        difference in the treatment variances.