(1) Parametric tests are usually more powerful than non-parametric tests. Why don’t we always use parametric tests? (5 pts)

Parametric tests can be more powerful than non-parametric tests because they inherently rely on assumptions about the parameters of the underlying population distribution that a particular sample comes from. In situations where those assumptions are valid or at least reasonably satisfied, parametric tests should be more powerful than non-parametric tests, however assessing the validity of those assumptions can be difficult; conducting parametric tests in situations where those assumptions are not reasonably satisfied may result in irrational conclusions because analysis have been conducted on false premises.

Non-parametric testing (sometimes referred to as distribution-free testing) relaxes assumptions about the population distribution from which a sample comes. Fewer, less-restricting assumptions make these methods more widely applicable. Non-parametric test results are typically valid in the presence of outliers, influential points, inadequate sample size, nonnormality, etc. In settings with any of those data maladies, non-parametric alternatives typically outperform those situationally shoddy parametric tests. Parametric tests cannot always be used because the underlying assumptions are not always known to be true.

(2) Explain the differences between the Pearson and Spearman correlation. (5 pts)

data("anscombe")
kable(anscombe)

x1	x2	x3	x4	y1	y2	y3	y4
10	10	10	8	8.04	9.14	7.46	6.58
8	8	8	8	6.95	8.14	6.77	5.76
13	13	13	8	7.58	8.74	12.74	7.71
9	9	9	8	8.81	8.77	7.11	8.84
11	11	11	8	8.33	9.26	7.81	8.47
14	14	14	8	9.96	8.10	8.84	7.04
6	6	6	8	7.24	6.13	6.08	5.25
4	4	4	19	4.26	3.10	5.39	12.50
12	12	12	8	10.84	9.13	8.15	5.56
7	7	7	8	4.82	7.26	6.42	7.91
5	5	5	8	5.68	4.74	5.73	6.89

rr <- cor(anscombe); r <- rr[1:4,5:8]

corrplot.mixed(rr, lower = "number", 
               upper = "square", 
               lower.col = "black",
               title = "Pearson Correlation of Anscombe's Quartet")

ggpairs(anscombe, 
        columns = c(1,5,2,6,3,7,4,8),
        title = "Pearson Correlation of Anscombe's Quartet", 
        axisLabels = "show")

Assuming continuous variables, independence between observations, normality, homoskedasticity, and absence of outliers, Pearson correlation is a measure of the linear association between two variables (X & Y) ranging from [-1, 1]. It is calculated by dividing the covariance of X, Y by the product of the respective standard deviations. If the relationship between X & Y is something other than linear, pearson correlation is a vague metric and can be misleading. Anscombe’s quartet is comprised of four different artificially constructed datasets, each having nearly identical descriptive statistics, including a pearson correlation of 0.816; this indicates a strong positive, linear relationship between X & Y, but when the data is visualized it is clear that there is a unique relationship between each X/Y pair that is nonlinear despite the Pearson correlation coefficients being identical (to 3 decimal places!) for each pair.

When nonlinear relationships are present in the data, the Pearson product-moment correlation is difficult to interpret and calls for different descriptive measures. Rather than using the true quantitaties from the sample, an alternative measure of correlation ranks & orders each variable separately before calculating the covariance/standard deviations of X/Y. This measure, known as Spearman correlation, quantifies the strength & direction of the monotonic relationship between pairs of X/Y; that is, spearman correlation measures the tendency of X to increase as Y increases or decrease as Y decreases rather than purely measuring the linear relationship between X & Y.

⊕

(3) Consider the plot containing 5 data points.

# Approximate y values (eye-balled)
# Accuracy of these quantities is arbitrary for Spearman/Kendall correlation
# Assumes ranks are correct despite uncertainty in true y values
df <- data.frame(x = c(1,2,3,4,5),
                 y = c(0,25,-20,50,200))
df$rx <- rank(df$x); df$ry <- rank(df$y)
kable(df)

x	y	rx	ry
1	0	1	2
2	25	2	3
3	-20	3	1
4	50	4	4
5	200	5	5

a. The Pearson correlation between x and y is 0.7748. Using the provided plot, calculate Kendall’s \(\tau\) and Spearman’s correlation of the variables x and y. (10 pts)

# Spearman Correlation Coefficient
cor(df$x, df$y, method = "spearman")

## [1] 0.7

# Kendall Correlation Coefficient
cor(df$x, df$y, method = "kendall")

## [1] 0.6

b. Of Pearson, Spearman, and Kendall’s \(\tau\), which measure(s) of correlation are appropriate to use here? (5 pts)

# Spearman Correlation Coefficient
cor.test(df$x, df$y, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  df$x and df$y
## S = 6, p-value = 0.2333
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho 
## 0.7

# Kendall Correlation Coefficient
cor.test(df$x, df$y, method = "kendall")

## 
##  Kendall's rank correlation tau
## 
## data:  df$x and df$y
## T = 8, p-value = 0.2333
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau 
## 0.6

Spearman’s \(\rho\) & Kendall’s \(\tau\) would be more appropriate to use here since the plot seems to be showing a non-linear relationship with a small sample size; using Pearson product-moment correlation may be misleading when applying a descriptive statistic for linearity in a seemingly non-linear situation.

(4) Given the paired data:

df <- data.frame(before = c(102 , 86, 127, 140, 80),
                 after = c(124 ,110 ,123 ,134, 154))
df$diff <- df$before - df$after
kable(df)

before	after	diff
102	124	-22
86	110	-24
127	123	4
140	134	6
80	154	-74

a. Perform a paired comparison permutation test to test the null that the means of the two groups are the same versus the two-sided alternative. State the hypothesis, the p-value, the conclusion, and your assumptions. (10 pts)

\(H_{0}: \mu_{b} = \mu_{a} ; \mu_{d} = 0\)
\(H_{A}: \mu_{b} \neq \mu_{a}; \mu_{d} \neq 0\)

Assumptions: The only assumption we have is the distribution of “before” under the null hypothesis is the same as the distribution of “after”

n <- 5; xbar <- mean(df$diff)
signs <- expand.grid(rep(list(c(-1,1)), 5))
perms <- as.matrix(signs)%*%as.matrix(df$diff)/n
perms <- sort(perms)
# p-value
1 - sum(perms > xbar)/length(perms)

## [1] 0.125

# p-value, different but the same!
2*(2/(2^5))

## [1] 0.125

P-value: 0.125

Conclusion: With a relatively high p-value being greater than the generally accepted but arbitrary cutoff of \(\alpha = 0.05\), we fail to reject the null hypothesis lacking statistically significant evidence that the means before and after some event are different.

b. Assuming you calculated the p-value correctly, is this p-value here exact or an approximation? Why don’t we always calculate the exact p-value? (5 pts)

The p-value found in the paired comparison permutation test (0.125) was calculated exactly using all conceivable permutations; this cannot always be done because finding every permutation becomes too computationally intense as the sample size increases.

(5) Below you will see 12 data points drawn using a simple random sample from a population. Using a non-parametric test, test the null hypothesis that the median of that population is 50 versus the alternative that the median is greater than 50. State the null and alternative hypothesis, the p-value, the conclusion, and your assumptions. (10 pts)

x <- c(124,-1,-114,142,-77,124,-28,101,-99,18,26,212)

Binomial Exact Test

\(H_{0}: \theta = 5; p = 0.50\)
\(H_{A}: \theta > 50; p > 0.5\)

Assumptions: Under the null we assume observations are mutually independent & the probability of an observation being greater than 50 is the same for all observations.

xu <- length(x[x > 50])
xl <- length(x[x <= 50])
bin_test <- binom.test(x = xu, 
                       n = length(x), 
                       p = .5, 
                       alternative = "greater");

Conclusion: With a p-value of 0.8061523, we fail to reject the null lacking statistically significant evidence that the median is greater than 50.

(6) Given R’s iris dataset:

a. Perform a Kruskal-Wallis oneway analysis of variance using species as the group and petal width as the response. State the null and alternative hypothesis, the p-value, and your conclusion. (10 pts)

\(H_{0}:\) The population distribution functions for each of the three iris species are identical.
\(H_{A}:\) At least one of the populations tends to yield larger observations than at least one of the other populations.

t <- kruskal.test(Petal.Width ~ Species, data = iris); t

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Petal.Width by Species
## Kruskal-Wallis chi-squared = 131.19, df = 2, p-value < 2.2e-16

P-Value: 3.261795610^{-29}

Conclusion: With a p-value of 3.261795610^{-29}, we reject the null with statistically significant evidence in the direction of the alternative. That is, the results of the Kruskal-Wallis test indicate at least one species of iris species tends to yield wider petals than at least one of the other species.

b. If necessary, perform paired comparisons to see if there are any significant pairwise differences between the groups using 1) Bonferroni correction and 2) Tukey’s honest significant difference (HSD) and 3) Least significant difference (LSD). (15 pts)

Having rejected the null hypothesis in part a, it seems necessary to investigate further into the pairwise differences between iris species.

i. Bonferroni Correction

pairwise.wilcox.test(x = iris$Petal.Width, 
                     g = iris$Species, 
                     p.adjust.method = "bonferroni", 
                     paired = TRUE)

## 
##  Pairwise comparisons using Wilcoxon signed rank test 
## 
## data:  iris$Petal.Width and iris$Species 
## 
##            setosa  versicolor
## versicolor 2.2e-09 -         
## virginica  2.2e-09 3.3e-09   
## 
## P value adjustment method: bonferroni

ii. Tukey’s Honest Significant Difference

fit <- lm(Petal.Width ~ Species, data = iris)
av <- aov(fit)
tukey <- TukeyHSD(x = av, conf.level = 0.95)

plot(tukey)

tukey

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = fit)
## 
## $Species
##                      diff       lwr       upr p adj
## versicolor-setosa    1.08 0.9830903 1.1769097     0
## virginica-setosa     1.78 1.6830903 1.8769097     0
## virginica-versicolor 0.70 0.6030903 0.7969097     0

iii. Fisher’s LSD

LSD.test(y = av, 
         trt = "Species", 
         console = TRUE, 
         group = TRUE, 
         main = "Fisher's LSD")

## 
## Study: Fisher's LSD
## 
## LSD t Test for Petal.Width 
## 
## Mean Square Error:  0.04188163 
## 
## Species,  means and individual ( 95 %) CI
## 
##            Petal.Width       std  r       LCL       UCL Min Max
## setosa           0.246 0.1053856 50 0.1888041 0.3031959 0.1 0.6
## versicolor       1.326 0.1977527 50 1.2688041 1.3831959 1.0 1.8
## virginica        2.026 0.2746501 50 1.9688041 2.0831959 1.4 2.5
## 
## Alpha: 0.05 ; DF Error: 147
## Critical Value of t: 1.976233 
## 
## least Significant Difference: 0.08088724 
## 
## Treatments with the same letter are not significantly different.
## 
##            Petal.Width groups
## virginica        2.026      a
## versicolor       1.326      b
## setosa           0.246      c

Analysis of the pairwise differences in petal width by iris species using bonferroni corrected p-values, Tukey’s HSD, & Fisher’s LSD yielded similar results in each instance. That is each species seems to be characterized by significantly different petal widths.

(7) Given a dataset with 9 observations across 3 treatments and 3 blocks:

df <- data.frame(Y = c(21.308064, 6.800487, 2.742283, 34.083525, 29.941135, 11.941155, 51.397883, 36.453405, 20.557651),
                 trt = rep(c(1,2,3), times = 3),
                 blk = c(1,1,1,2,2,2,3,3,3))
kable(df)

Y	trt	blk
21.308064	1	1
6.800487	2	1
2.742283	3	1
34.083525	1	2
29.941135	2	2
11.941155	3	2
51.397883	1	3
36.453405	2	3
20.557651	3	3

a. First perform a Kruskal-Wallis test to look for significant differences acress the treatment groups. Is the test significant at the 0.05 level? Is this test appropriate here? (10 pts)

\(H_{0}:\) The population distribution functions for each of the three groups are identical.
\(H_{A}:\) At least one of the populations tends to yield larger observations than at least one of the other populations.

boxplot(Y ~ trt, data = df, horizontal = TRUE, border = "red")

t <- kruskal.test(Y ~ trt, data = df); t

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Y by trt
## Kruskal-Wallis chi-squared = 3.8222, df = 2, p-value = 0.1479

P-Value: 0.1479159

Conclusion: With a p-value of 0.1479159, we fail to reject the null lacking statistically significant evidence in the direction of the alternative. That is, the results of the Kruskal-Wallis hold that the population distributions for each group are identical; this test, however, is not appropriate for settings in which more than one factor is involved so the results may be invalid.

b. Now perform a Friedman test to look for significant differences between the treatment groups. Is the test significant at the 0.05 levels? Is this test appropriate here? (10 pts)

f <- friedman.test(y = df$Y, groups = df$trt, blocks = df$blk); f

## 
##  Friedman rank sum test
## 
## data:  df$Y, df$trt and df$blk
## Friedman chi-squared = 6, df = 2, p-value = 0.04979

Conclusion: With a p-value of 0.0497871 just slightly less than \(\alpha = 0.05\), we narrowly reject the null hypothesis with statistically significant evidence in the direction of the alternate hypothesis; that is, when accounting for treatments and blocking with the friedman test, we are moved to conclude that the population distributions are different for at least one of the treatment groups.

c. If we have blocking of the data, when is it ok, theoretically speaking, to ignore the blocking effect in the model? (5 pts)

Theoretically, since blocking is done when treatment effects are thought to depend in some way on the experimental unit to which they are applied (aka when nuisance factors appear to be present), it would be acceptable to ignore the blocking effect in the model when you know nuisance factors are not at play or their effect is deemed negligible.

Non-Parametric Statistics Midterm

Patrick Oster

March 11, 2019