(1)

(5 points) Parametric tests are usually more powerful than non-parametric tests. Why don’t we always use parametric tests?

Parametric tests are used when there is an underlying theoretical distribution, such as the normal. We don’t use parametric tests when the underlying assumption of a distribution is not there. For example, we may have a small sample of data that has some outliers that came from some population (and we do not know anything about the population). Non-Parametric tests “relax” those assumptions, i.e. do not have the same assumptions.

(2)

(5 points) Explain the differences between the Pearson and Spearman correlation.

Pearson correlation is the measure of “linear” association of two variables: the covariance between the variables divided by the product each standard deviation. Spearman correlation is the the same, but with the ranks instead of the actual values. Since Spearman correlation is calculated from ranks, it is not influenced by the size of the differences of the data (outliers), and that is why it measures the degree to which the data is monotone. Perfect Spearman correlation of +-1 would mean that the data is strictly increasing or decreasing, but does not measure the differences (i.e. data that looks like it is increasing in a linear fashion vs. an exponential fashion can easily have the same Spearman correlation). Note that we assume that observations are uncorrelated for both.

(3)

Consider the following plot containing 5 data points.

(Yes, I know these points aren’t perfect, but the ranks will be the same as the question).

df.3 = data.frame(x = c(1,2,3,4,5), y = c(0,25,-25, 50,200))
plot(df.3$x,df.3$y)

(A)

(10 points) The Pearson correlation between x and y is 0.7748. Using the provided plot, calculate Kendall’s τ and Spearman’s correlation of the variables x and y.

df.3$x.rank = rank(df.3$x)
df.3$y.rank = rank(df.3$y)

cor(df.3$x,df.3$y) #Pearson approximate
## [1] 0.7602631
cor(df.3$x,df.3$y, method = "spearman") 
## [1] 0.7
cor(df.3$x.rank,df.3$y.rank) #Spearman because it's the same as Pearson with Ranks
## [1] 0.7

Spearman By Hand, because I don’t know if you wanted us to do it by hand

x.rbar = mean(df.3$x.rank) #3
y.rbar = mean(df.3$y.rank) #3

#Spearman rho:
sum((df.3$x.rank-x.rbar)*(df.3$y.rank-y.rbar)) / sqrt(sum((df.3$x.rank-x.rbar)^2)*sum((df.3$y.rank-y.rbar)^2))
## [1] 0.7

Kendall’s \(\tau\) Formula and Data Frame to calculate by hand as well

#using my made up data (no ranks):
df.3$Pair.Number = c('A','B','C','D','E')
df.3[c(-3,-4)]
##   x   y Pair.Number
## 1 1   0           A
## 2 2  25           B
## 3 3 -25           C
## 4 4  50           D
## 5 5 200           E

Concordant Pairs: (A,B) (A,D) (A,E) (B,D) (B E) (C,D) (C,E) (D,E) #8 Discordant Pairs: (A,C) (B,C) #2

Calulating \(r_\tau\)

2*(8/10) - 1
## [1] 0.6
cor(df.3$x,df.3$y, method = "kendall")
## [1] 0.6

(B)

(5 points) Of Pearson, Spearman, and Kendall’s τ, which measure(s) of correlation are appropriate to use here?

I would use Kendall’s \(\tau\) in this situation. I would avoid Pearson because we do not know if the data is assumed to follow the linear assumption and the last point (where \(x\approx5\)) could be an outlier/leverage point, especially with the small sample size.

Spearman and Kendall’s \(\tau\) both do not have the linear assumption and would be appropriate in this situation. I would avoid Spearman because it could possibly be influenced by error and discrepencies in the data (see reference).

And finally, Kendall’s \(\tau\) is my top choice because p-values are more accurate with smaller sample sizes (like 5 in this situation). Information for my last two answers found here: https://www.statisticssolutions.com/kendalls-tau-and-spearmans-rank-correlation-coefficient/

(4)

Consider the following set of paired data:

df.4 = data.frame (before<-c(102 , 86, 127, 140,  80), after<-c(124 ,110 ,123 ,134, 154), diff = before-after)
boxplot(df.4$diff)

(A)

(10 points) Perform a paired comparison permutation test to test the null that the means of the two groups are the same versus the two-sided alternative. State the hypothesis, the p-value, the conclusion, and your assumptions.

\(H_0:\theta_{before} = \theta_{after}\) \(H_a:\theta_{before} \neq \theta_{after}\) Where \(\theta\) is the median. I know the question asks for means, but I thought the [non-parametric] permutation test is for the medians, and that’s what I read in the class notes.

Assuming the data is paired (dependent) and \(\alpha = .05\)

With two-sample paired tests, I typically write my hypotheses (and prefer this method): \(H_0: \mu_{diff} = 0\) \(H_a: \mu_{diff} \neq 0\), where \(\mu_{diff}\) is the mean of the differences of the pairs.

D.obs = mean(df.4$diff) #-22
signs = expand.grid(rep(list(c(-1,1)),5))
perms = as.matrix(signs)%*%as.matrix(df.4$diff)/5
sort(perms)
##  [1] -26.0 -24.4 -23.6 -22.0 -17.2 -16.4 -15.6 -14.8 -14.8 -14.0 -13.2
## [12] -12.4  -7.6  -6.0  -5.2  -3.6   3.6   5.2   6.0   7.6  12.4  13.2
## [23]  14.0  14.8  14.8  15.6  16.4  17.2  22.0  23.6  24.4  26.0

We count the number of permutations as extreme or more extreme than our observed -22. Since it’s two-sided there are 8.

P-value

8/(2^5)
## [1] 0.25
#Checking my work
library(exactRankTests)
##  Package 'exactRankTests' is no longer under development.
##  Please consider using package 'coin' instead.
perm.test(df.4$before....c.102..86..127..140..80., df.4$after....c.124..110..123..134..154., paired = T)
## 
##  1-sample Permutation Test
## 
## data:  df.4$before....c.102..86..127..140..80. and df.4$after....c.124..110..123..134..154.
## T = 10, p-value = 0.25
## alternative hypothesis: true mu is not equal to 0

Since our p-value is greater than \(\alpha\), we fail-to-reject the null and cannot conclude that the medians of the two groups are different.

(B)

(5 points) Assuming you calculated the p-value correctly, is this p-value here exact or an approximation? Why don’t we always calculate the exact p-value?

This is exact because we did all possible (2^5) permutations of differences. We don’t calculate the exact p-value when the data is large enough to take take up too much time to compute.

(5)

(10 points) Below you will see 12 data points drawn using a simple random sample from a population. Using a non-parametric test, test the null hypothesis that the median of that population is 50 versus the alternative that the median is greater than 50. State the null and alternative hypothesis, the p-value, the conclusion, and your assumptions.

\(H_0:\theta = 50\) \(H_a:\theta > 50\) Where \(\theta\) is the median. \(\alpha = .05\) Assuming the data is independent and identically distributed (randomly sampled from the same mystery population).

dat.5 = c(124, -1, -114, 142, -77, 124, -28, 101, -99, 18, 26, 212)
boxplot(dat.5)

#Sign Test:
dat.5.shifted = dat.5 - 50
table(sign(dat.5.shifted))
## 
## -1  1 
##  7  5
#Prob binom is 5 or greater
pv = 1-pbinom(4,12,.5); pv
## [1] 0.8061523

With a p-value of 0.8061523< \(\alpha\), we fail-to-reject the null and cannot conclude that the median is greater than 50.

(6)

The following question uses the “iris” data set in R.

(A)

(10 points) Using the “iris” data set in R, perform a Kruskal-Wallis oneway analysis of variance using species as the group and petal width as the response. State the null and alternative hypothesis, the p-value, and your conclusion.

\[H_0:\theta_1=\theta_2=\theta_3\] \(H_a:\) at least one \(\theta\) is not the same.

Where \(\theta\) are the median Petal widths, \(\alpha=.05\)

table(iris$Species)
## 
##     setosa versicolor  virginica 
##         50         50         50
k = kruskal.test(iris$Petal.Width~iris$Species); k
## 
##  Kruskal-Wallis rank sum test
## 
## data:  iris$Petal.Width by iris$Species
## Kruskal-Wallis chi-squared = 131.19, df = 2, p-value < 2.2e-16

Since our p-value 3.261795610^{-29} < \(\alpha\), we reject the null and conclude that at least 1 median petal width is different. Now we will perform paired comparisons in part B.

(B)

(15 points) If necessary, perform paired comparisons to see if there are any significant pair wise differences between the groups using 1) Bonferroni correction and 2) Tukey’s honest significant different (HSD) and 3) Least significant difference (LSD).

BonFerroni

pairwise.wilcox.test(iris$Petal.Width, iris$Species, p.adj="bonf", paired = TRUE)
## 
##  Pairwise comparisons using Wilcoxon signed rank test 
## 
## data:  iris$Petal.Width and iris$Species 
## 
##            setosa  versicolor
## versicolor 2.2e-09 -         
## virginica  2.2e-09 3.3e-09   
## 
## P value adjustment method: bonferroni

The bonferroni pair-wise difference between groups shows a significant pair-wise difference between all three groups which matches what I saw from my quick glimpse of the data in the histograms.

Tukey HSD

#I won't use the ANOVE method, since it relies on the parametric Anova test, instead I found a Non Parametric version of HSD used below.
#TukeyHSD(aov(Petal.Width ~ Species, iris)) - If parametric, this is what I would have done.

library(MultNonParam)
tukey.kruskal.test(iris$Petal.Width, iris$Species)
##  1-2  1-3  2-3 
## TRUE TRUE TRUE

The Tukey pair-wise comparison matches my bonferroni conclusion above.

LSD

iris$ranks = rank(iris$Sepal.Width)
#Mean ranks of all three species
versicolor = subset(iris, iris$Species == "versicolor")
R1bar = mean(versicolor$ranks); R1bar
## [1] 46.08
n1 = nrow(versicolor)

setosa = subset(iris, iris$Species == "setosa")
R2bar = mean(setosa$ranks); R2bar
## [1] 113.46
n2 = nrow(setosa)

virginica = subset(iris, iris$Species == "virginica")
R3bar = mean(virginica$ranks); R3bar
## [1] 66.96
n3 = nrow(virginica)

#Cutoff
N = nrow(iris)
-qnorm(.05/2)*sqrt((N*(N+1))/12*(1/n1 + 1/n2))
## [1] 17.03027

Comparing versicolor and setosa

abs(R1bar-R2bar) > -qnorm(.05/2)*sqrt((N*(N+1))/12*(1/n1 + 1/n2))
## [1] TRUE

Comparing versicolor and virginica

abs(R1bar-R3bar) > -qnorm(.05/2)*sqrt((N*(N+1))/12*(1/n1 + 1/n3))
## [1] TRUE

Comparing setosa and virginica

abs(R2bar-R3bar) > -qnorm(.05/2)*sqrt((N*(N+1))/12*(1/n2 + 1/n3))
## [1] TRUE

So all three appear to be significantly different by Fishers LSD. Which makes sense because the bonferroni correction is so conservative. I expected LSD and HSD to have the same pair-wise results.

(7)

Below you will see a data set wth 9 observations across three treatment and three blocks.

Input = ("
y trt block
1 21.308064 1 1 
2 6.800487 2 1 
3 2.742283 3 1
4 34.083525   1     2
5 29.941135   2     2
6 11.941155   3     2
7 51.397883   1     3
8 36.453405   2     3
9 20.557651   3     3
")
df.7 = read.table(textConnection(Input),header=TRUE)

(A)

(10 points) First perform a Kruskal-Wallis test to look for significant differences acress the treatment groups. Is the test significant at the 0.05 level? Is this test appropriate here?

kruskal.test(df.7$y~df.7$trt)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  df.7$y by df.7$trt
## Kruskal-Wallis chi-squared = 3.8222, df = 2, p-value = 0.1479

This test does not yield a significant result at the 0.05 level, however it is not appropriate in this setting because much of the variability may be explained by the blocking.

(B)

(10 points) Now perform a Friedman test to look for significant differences between the treatment groups. Is the test significant at the 0.05 levels? Is this test appropriate here?

fr = friedman.test(df.7$y, df.7$trt, df.7$block); fr
## 
##  Friedman rank sum test
## 
## data:  df.7$y, df.7$trt and df.7$block
## Friedman chi-squared = 6, df = 2, p-value = 0.04979

This test is significant at the 0.05 level and is a more appropriate test because of the blocking. However I am skeptical about the significance since it is just barely below the 0.05 threshold, I’d like to do further analysis if possible or collect more data if possible.

(C)

(5 points) If we have blocking of the data, when is it ok, theoretically speaking, to ignore the blocking effect in the model?

If a Kruskal-Wallas test yields a significant result, we can ignore the blocking effect because it would only make the test more significant (by separating out sum of squares attributed to the blocks).