Problem 1

Part A.

Based on the summary statistics below, the median is 77 and the mean is 78.53 of the 40 simulated exam scores.

From this, I would argue there is no initial evidence that the null hypothesis is true; the median score has increased from 70 after the course content was revised. From a bit more analysis, 38 of the 40 scores are greater than 70.

testScores <- c(79,74,88,80,80,66,65,86,84,80,78,72,71,74,86,96,77,81,76,80,
76,75,78,87,87,74,85,84,76,77,76,74,85,74,76,77,76,74,81,76)

summary(testScores)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   65.00   74.75   77.00   78.53   81.75   96.00

length(testScores)

## [1] 40

alt <- testScores>70
length(alt[alt==TRUE])

## [1] 38

Part B.

Hypothesis Test: $\theta$ is the change in median score

$H_0: \theta = 0$

$H_A: \theta \neq 0$

We are testing to see if the median is greater than 70. We will let p denote the probability that an observation is greater than 70.

So, we can rewrite our hypothesis.

$H_0: p < 0.50 $

$H_A: p > 0.50 $

I will use $\alpha = 0.05$ because Hadley Wickham.

Part C.

binom.test(38, 40, p=0.5, alternative = "greater")

## 
##  Exact binomial test
## 
## data:  38 and 40
## number of successes = 38, number of trials = 40, p-value =
## 7.467e-10
## alternative hypothesis: true probability of success is greater than 0.5
## 95 percent confidence interval:
##  0.850848 1.000000
## sample estimates:
## probability of success 
##                   0.95

Part D.

Based on a p-value = 7.467e-10 < $\alpha = 0.05$, we reject the $H_0: p < 0.50 $. That is, the change in the median score is not 0 and the median score has increased.

Problem 2.

Our p-value = 0.01428.

We take 8 choose 4 to find all possible combinations of the observations. There are 70 possible combinations. Similar to the example in class, of all the differences of the permuatations, there is 1 of 70 that is most extreme. So, we have probability 1/70.

library(gtools)
perms<-combinations(8,4)
dim(perms)

## [1] 70  4

1/70

## [1] 0.01428571

Problem 3

Part A.

Let $\theta$ signify the difference between rural and urban areas.

Then, our hypotheses are as follow at an $\alpha = 0.05$:

$H_0: = 0 $

$H_A: 0 $

Based on the below code, our p-value is 0.001598. With a p-value = 0.001598 < $\alpha = 0.05$, we reject the $H_0$. That is, there is enough evidence to suggest there is a significant difference between rural and urban areas using the Wilcoxon rank-sum test.

siblings<-data.frame(hometown=c(rep("rural",24),rep("urban",17)),
siblings=c(3,2,1,1,2,1,3,2,2,2,2,5,1,4,1,1,1,1,6,2,2,
2,1,1,1,0,1,1,0,0,1,1,1,8,1,1,1,0,1,1,2))

siblings$hometown2 <- as.numeric(factor(siblings$hometown))

wilcox.test(siblings$siblings~siblings$hometown2)

## Warning in wilcox.test.default(x = c(3, 2, 1, 1, 2, 1, 3, 2, 2, 2, 2, 5, :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  siblings$siblings by siblings$hometown2
## W = 314.5, p-value = 0.001598
## alternative hypothesis: true location shift is not equal to 0

Part B.

It is not possible to check for all possible permutations here. So, we draw a sample and get a p-value (0.002) very similar to the wilcox.test and arrive at the same conclusion.

siblings$rank <- rank(siblings$siblings)
W2<-sum(siblings$rank[siblings$hometown2==2])

W2

## [1] 246.5

set.seed(1234)
nsims <- 10000
ranksumperm <- rep(NA,nsims)
for (i in 1:nsims) {
  ranksumperm[i] <-sum(sample(1:41,17, replace = FALSE))
}

#p-value
(sum(ranksumperm<=W2)/nsims)*2

## [1] 0.0026

Problem 4

The two sample t-test and the wilcoxon rank test are run below with two groups.

Our hypothesis are almost the same, where theta denotes the differene in means for the two sample t test and

With the two sample t-test, we obtain a p-value of 0.3321. This is because we have thrown an outlier into group two, which skews the mean heavily to the right and the difference between the groups is negligible.

With the wilcoxon rank test, the p-value is 0.02 < $\alpha = 0.05$. So, we argue that the true difference in medians is equal to 0.

g1 <- c(38.9, 61.2, 73.3, 21.8, 63.4, 64.6, 48.4, 48.8, 48.5)
g2 <- c(100000, 60, 63.4, 76, 89.4, 73.3, 67.3, 61.3, 62.4) 
# Create a data frame
my_data <- data.frame( 
                group = rep(c("G1", "G2"), each = 9),
                tally = c(g1,  g2)
                )

ttest <- t.test(tally ~ group, data = my_data, var.equal = TRUE)
ttest

## 
##  Two Sample t-test
## 
## data:  tally by group
## t = -1.0015, df = 16, p-value = 0.3315
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -34658.69  12417.76
## sample estimates:
## mean in group G1 mean in group G2 
##            52.10         11172.57

wilcox.test(g1,g2)

## Warning in wilcox.test.default(g1, g2): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  g1 and g2
## W = 14, p-value = 0.02155
## alternative hypothesis: true location shift is not equal to 0

Problem 5

The parameter of interest is $\gamma^2 = \frac{V(X)}{V(Y)}$

$H_0 : \gamma^2 = 1$ $H_1 : \gamma^2 \neq 1$

Below we conduct the Ansari Bradley test to test for differences in the scale parameters.

We obtain a p-value of 0.1707. So, we fail to reject our $H_0$. That is, the true ratio of scales is not equal to 1.

#Treatment 1 data
trt1 <- c(21.9,20.2,19.4,20.3,19.6,20.4,18.4,20.1,22.0,18.9)
#Treatment 2 data
trt2 <- c(20.2,13.8,21.8,19.2,19.6,25.5,17.0,17.6,19.5,22.2)

ansari.test(trt1,trt2)

## Warning in ansari.test.default(trt1, trt2): cannot compute exact p-value
## with ties

## 
##  Ansari-Bradley test
## 
## data:  trt1 and trt2
## AB = 64, p-value = 0.1707
## alternative hypothesis: true ratio of scales is not equal to 1

STAT 488 Non Parametric HW1

Kajal Chokshi

1/21/2019

Problem 1

Part A.

Part B.

Part C.

Part D.

Problem 2.

Problem 3

Part A.

Part B.

Problem 4

Problem 5