The data in the table are dry matter content (in kilograms) of hay obtained from experimental plots. The experiment was designed as a randomized complete block, and the treatments are three cutting dates.
dat<-data.frame(kg=c(1.5,2.1,1.9,2.8,1.4,1.8,
1.8,2.0,2.0,2.7,1.6,2.3,
1.9,2.5,2.5,2.6,2.1,2.4),
date=factor(rep(c("Sept 1","Sept 15","Sept30"),each=6)),
block=factor(rep(1:6,3)))
dat = dat[order(dat$block),]
Test for differences among cutting dates using a permutation test for a randomized complete block design.
#observed F
lm1 = lm(kg ~ date + block, data = dat)
Fobs = anova(lm1)$"F value"[1]
#permutation test
set.seed(1234)
nsim = 1000
F.Vec = rep(NA, nsim)
for (i in 1:nsim){
dat.perm = dat
dat.perm$kg = unlist (tapply(dat.perm$kg, dat.perm$block, function(x){sample(x, length(x), replace = FALSE)} ))
lm.out = lm(kg ~ date + block, data = dat.perm)
F.Vec[i] = anova(lm.out)$"F value"[1]
}
#Count the number of F from the Perm set > F obs
p = sum(F.Vec > Fobs)/nsim
With a p-value of 0.03 < \(\alpha = .05\) we reject the null, and conclude there is a difference among cutting dates.
Apply Friedman’s test to the data from the previous problem.
fr = friedman.test(dat$kg, dat$date, dat$block); fr
##
## Friedman rank sum test
##
## data: dat$kg, dat$date and dat$block
## Friedman chi-squared = 4.3333, df = 2, p-value = 0.1146
With Friedman’s test, a p-value of 0.1145588 > 0.05 we fail to reject the null, and cannot conclude there is a difference among cutting dates.
Analyze using ANOVA for a randomized complete block and compare with the previous answers.
a = anova(lm1);a
## Analysis of Variance Table
##
## Response: kg
## Df Sum Sq Mean Sq F value Pr(>F)
## date 2 0.53444 0.26722 6.9308 0.012927 *
## block 5 2.00944 0.40189 10.4236 0.001022 **
## Residuals 10 0.38556 0.03856
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The parametric Anova F test yields a p-value of 0.0129269 < \(\alpha = 0.05\), so we would reject the null hypothesis and conclude there is a difference in among cutting dates. We would need to do further analyses to see which dates are different.
Perform multiple comparisons to test for significant differences between paris of groups using Tukey’s HSD procedure.
aov = aov(lm1)
TukeyHSD(aov)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = lm1)
##
## $date
## diff lwr upr p adj
## Sept 15-Sept 1 0.1500000 -0.16076969 0.4607697 0.4149464
## Sept30-Sept 1 0.4166667 0.10589698 0.7274364 0.0108584
## Sept30-Sept 15 0.2666667 -0.04410302 0.5774364 0.0938611
##
## $block
## diff lwr upr p adj
## 2-1 0.46666667 -0.090188952 1.02352229 0.1167609
## 3-1 0.40000000 -0.156855619 0.95685562 0.2126457
## 4-1 0.96666667 0.409811048 1.52352229 0.0012854
## 5-1 -0.03333333 -0.590188952 0.52352229 0.9999268
## 6-1 0.43333333 -0.123522286 0.99018895 0.1583077
## 3-2 -0.06666667 -0.623522286 0.49018895 0.9978919
## 4-2 0.50000000 -0.056855619 1.05685562 0.0855800
## 5-2 -0.50000000 -1.056855619 0.05685562 0.0855800
## 6-2 -0.03333333 -0.590188952 0.52352229 0.9999268
## 4-3 0.56666667 0.009811048 1.12352229 0.0455544
## 5-3 -0.43333333 -0.990188952 0.12352229 0.1583077
## 6-3 0.03333333 -0.523522286 0.59018895 0.9999268
## 5-4 -1.00000000 -1.556855619 -0.44314438 0.0009844
## 6-4 -0.53333333 -1.090188952 0.02352229 0.0624913
## 6-5 0.46666667 -0.090188952 1.02352229 0.1167609
From the Tukey HSD we see that the only significantly different dates are Sept 30 and Sept 1.
From book: Page 301, Problem 1.
dat2 = data.frame(subject = rep(c(1:4),3), resistance = c(-0.08, .21,.5,.14,.01,.17,-.11,.07, 0.06,.19,.34,.14), ozone = c(rep("After .1ppm",4), rep("After .6ppm",4), rep("After 1.0ppm",4)))
f2 = friedman.test(dat2$resistance, dat2$ozone, dat2$subject)
With a p-value of 0.1652989 > 0.05, we fail to reject the null and cannot conclude that there is a difference in airway resistance (resoitory function) for the different ozone exposures.
From book: Page 301, Problem 3.
Technically you can apply Friedman’s test by choosing any blocking scheme, the blocks may or may not be significantly different. However, I do not think you should because with blocking (Friedman’s test) we are separating out the variablity of the data with the blocks, but if the blocks aren’t meaningful then there is no reason to do that.
(GRAD ONLY) From book: Page 302, Problem 7.
p=3; n=3
Rsum = rep(NA,6^3)
library(gtools)
perm = permutations(3,3)
counter = 1
for (i in 1:6)
{
for (j in 1:6)
{
for (k in 1:6)
{
counter = counter + 1
matrix = matrix(data = c(perm[i,1],perm[i,2],perm[i,3],perm[j,1],perm[j,2],perm[j,3],perm[k,1],perm[k,2],perm[k,3]), nrow = 3, ncol = 3)
Rsum[counter] = sum(apply(matrix, 1, sum)^2)
}
}
}
S = 12/(n*p*(p+1)) *(Rsum) - 3*n*(p+1)
hist(S)
This was hard!
(GRAD ONLY) Generate the permutation distribution for Spearman’s rs and Kendall’s τ based on the following data:
height<-c(68,70,74)
weight<-c(145,155,160)
cor.test(height, weight, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: height and weight
## S = 0, p-value = 0.3333
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 1
#Use same perms from #4
perms = permutations(3,3)
d = dim(perms)[1]
cor.null = rep(NA, d)
ken.null = rep(NA, d)
for (i in 1:d) {
cor.null[i] = cor(height, weight[perms[i,]], method = "spearman")
ken.null[i] = cor(height, weight[perms[i,]], method = "kendall")
}
hist(cor.null)
hist(ken.null)
The data below are ages (in days) of concrete cylinders and the compressive strengths of the cylinders.
age<-c(3,7,15,24,85,180,360)
strength<-c(2500,3200,4300,5300,5900,6700,6900)
Plot the data to show a non-linear relationship. Compute Pearson’s correlation, Spearman’s correlation, and Kendall’s τ.
plot(strength ~ age)
cor(strength, age)
## [1] 0.7858418
cor(strength, age, method = "spearman")
## [1] 1
cor(strength, age, method = "kendall")
## [1] 1
(UNDERGRAD ONLY) Test for significant association using the Spearman correlation from the previous part.
(GRAD ONLY) Test for significant association using each of the measures from the previous part.
cor.test(strength, age)
##
## Pearson's product-moment correlation
##
## data: strength and age
## t = 2.8414, df = 5, p-value = 0.03619
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08030986 0.96677652
## sample estimates:
## cor
## 0.7858418
\[H_0:\rho = 0\] \[H_0:\rho \neq{0}\]
If we were testing significance of the Pearson correlation coefficient we would reject the null and conclude that there is a linear association between these two variables, however this is NOT an appropriate test because the data is not linear.
spear = cor.test(strength, age, method = "spearman"); spear
##
## Spearman's rank correlation rho
##
## data: strength and age
## S = 1.2434e-14, p-value = 0.0003968
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 1
\[H_0:\rho = 0\] \[H_0:\rho \neq{0}\]
Since the data is monotonic, Spearman correlation is an appropriate test. With a p-value of 3.96825410^{-4} < \(\alpha = .01\), we reject the null and conclude there is a mononic association between weight and age of the compressive strengths of concrete cylinders (assuming the observations are from different cylinders; uncorrelated).
kend = cor.test(strength, age, method = "kendall"); kend
##
## Kendall's rank correlation tau
##
## data: strength and age
## T = 21, p-value = 0.0003968
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 1
With a p-value of 3.96825410^{-4} < \(\alpha = .01\), we reject the null and conclude there is some association between weight and age of the compressive strengths of concrete cylinders.