HW3.R

## Question 1
# The data in the table are dry matter content (in kilograms) of hay obtained from experimental plots. The experiment was designed as a randomized complete block, and the treatments are three cutting dates.
# (a) Test for differences among cutting dates using a permutation test for a randomized complete block design.
# (b) Apply Friedman’s test to the data from the previous problem.
# (c) Analyze using ANOVA for a randomized complete block and compare with the previous answers.
# (d) Perform multiple comparisons to test for significant differences between paris of groups using Tukey’s HSD procedure

# Part A
dat1 <- data.frame(kg=c(1.5,2.1,1.9,2.8,1.4,1.8,
                        1.8,2.0,2.0,2.7,1.6,2.3,
                        1.9,2.5,2.5,2.6,2.1,2.4),
                   date=factor(rep(c("Sept 1","Sept 15","Sept30"),each=6)),
                   block=factor(rep(1:6,3)))
dat1 <- dat1[order(dat1$block),]

Fstat.1 <- anova(lm(kg~date+block,data=dat1))$'F value'[1]

perm1 <- dat1
tapply(perm1$kg, perm1$block, function(x) {sample(x, length(x), replace=FALSE)} )

## $`1`
## [1] 1.8 1.9 1.5
## 
## $`2`
## [1] 2.0 2.5 2.1
## 
## $`3`
## [1] 1.9 2.0 2.5
## 
## $`4`
## [1] 2.8 2.7 2.6
## 
## $`5`
## [1] 2.1 1.6 1.4
## 
## $`6`
## [1] 1.8 2.3 2.4

unlist(tapply(perm1$kg, perm1$block, function(x) {sample(x, length(x), replace=FALSE)} ))

##  11  12  13  21  22  23  31  32  33  41  42  43  51  52  53  61  62  63 
## 1.8 1.9 1.5 2.0 2.5 2.1 2.0 2.5 1.9 2.7 2.8 2.6 2.1 1.4 1.6 2.3 2.4 1.8

set.seed(1234)
nsim<-1000
fVec<-rep(NA,nsim)
for (i in 1:nsim){
  perm1$kg <- unlist(tapply(perm1$kg, perm1$block, function(x) {sample(x, length(x), replace=FALSE)} ))
  fVec[i] <- anova(lm(kg~date+block, data=perm1))$'F value'[1]
}

sum(fVec >= Fstat.1)/nsim

## [1] 0.029

# p-value: 0.029


# Part B
friedman.test(dat1$kg, dat1$date, dat1$block)

## 
##  Friedman rank sum test
## 
## data:  dat1$kg, dat1$date and dat1$block
## Friedman chi-squared = 4.3333, df = 2, p-value = 0.1146

# Friedman chi-squared = 4.3333, df = 2, p-value = 0.1146

# Part C
Fstat.1

## [1] 6.930836

Pval.1 <- anova(lm(kg~date+block,data=dat1))$'Pr(>F)'[1]
# ANOVA p-value: 0.0129
# The ANOVA p-value from part C and the permutation test p-value from part A suggest that we should reject the null hypothesis (p=.03 and p=.01, respectively). In contrast, we fail to reject the null hypothesis after subsequently performing the Friedman rank sum test (p=.115)

# Part D

TukeyHSD(aov(lm(kg~date+block,data=dat1)))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = lm(kg ~ date + block, data = dat1))
## 
## $date
##                     diff         lwr       upr     p adj
## Sept 15-Sept 1 0.1500000 -0.16076969 0.4607697 0.4149464
## Sept30-Sept 1  0.4166667  0.10589698 0.7274364 0.0108584
## Sept30-Sept 15 0.2666667 -0.04410302 0.5774364 0.0938611
## 
## $block
##            diff          lwr         upr     p adj
## 2-1  0.46666667 -0.090188952  1.02352229 0.1167609
## 3-1  0.40000000 -0.156855619  0.95685562 0.2126457
## 4-1  0.96666667  0.409811048  1.52352229 0.0012854
## 5-1 -0.03333333 -0.590188952  0.52352229 0.9999268
## 6-1  0.43333333 -0.123522286  0.99018895 0.1583077
## 3-2 -0.06666667 -0.623522286  0.49018895 0.9978919
## 4-2  0.50000000 -0.056855619  1.05685562 0.0855800
## 5-2 -0.50000000 -1.056855619  0.05685562 0.0855800
## 6-2 -0.03333333 -0.590188952  0.52352229 0.9999268
## 4-3  0.56666667  0.009811048  1.12352229 0.0455544
## 5-3 -0.43333333 -0.990188952  0.12352229 0.1583077
## 6-3  0.03333333 -0.523522286  0.59018895 0.9999268
## 5-4 -1.00000000 -1.556855619 -0.44314438 0.0009844
## 6-4 -0.53333333 -1.090188952  0.02352229 0.0624913
## 6-5  0.46666667 -0.090188952  1.02352229 0.1167609

# Using Tukey's HSD, we find that there is statistically significant differences between the September 30 and September 1 cutting dates (p=0.0108584)



## Question 2
# Pg. 301, Ex. 1: Use procedure 7.6 to test the null hypothesis.
dat2 <- data.frame(ppm=c(-.08, .01, .06, .21, .17, .19, .50, -.11, .34, .14, .07, .14), 
                   subject=factor(rep(c(1:4), each=3)),
                   expose=factor(rep(c(".1 ppm", ".6 ppm", "1.0 ppm"), 4)))

dat2 <- dat2[order(dat2$subject),]
dat2$rank <- c(1, 2, 3, 3, 1, 2, 3, 1, 2, 2.5, 1, 2.5)
dat2 <- dat2[order(dat2$expose),]


n=4; k=3
val <- n*(k+1)/2; Tij <- ((2^3)+1)-k 
R1 <- ((sum(dat2[1:4,4])) - val)^2; R2 <- ((sum(dat2[5:8,4])) - val)^2; R3 <- ((sum(dat2[9:12,4])) - val)^2

numerator <- 12*(R1+R2+R3); denominator <- n*k*(k+1) - (1/(k-1))*Tij

s.prime <- numerator/denominator
# 3.6
pval.2 <- 1-pchisq(s.prime, 2)
# 0.1653

library(NSM3)

## Loading required package: combinat

## 
## Attaching package: 'combinat'

## The following object is masked from 'package:utils':
## 
##     combn

## Loading required package: MASS

## Loading required package: partitions

## Loading required package: survival

## fANCOVA 0.5-1 loaded

#cFrd(a,k,n)=Sa
cFrd(pval.2, k, n)$'cutoff.U'[1]

## [1] 4.5

# We fail to reject the null hypothesis given that S'(3.6) < Sa(4.5)

## Question 3
# Pg. 301, Ex. 3: Could Friedman’s test be applied to data from a one-way layout in which there are the same number, n, of observations from each of the k treatments? Explain. Should Friedman’s test be applied to such data? Explain.

# I suppose technically, yes, you could apply the Friedman test to data from a one-way layout. You'd need to add an additional metric to the data for blocking, possibly by assigning each k(i) treatment to a separate block before running the Friedman test. Even though you could computationally arrive at an answer using the Friedman test, it should not be applied to data from a one-way layout because we'd have to manipulate the data in a way that deviates from how the experiment was designed/data was collected. If the goal of the study was not to collect data on repeated measures, then it would be best to apply a different statistical test.


## Question 4
# Pg 302, Ex. 7: Suppose k=3 and n=3. Obtain the form of the exact null distribution of S for the case of no-tied observations.
k <- 3; n <- 3;

library(gtools)
dat.permute = permutations(n,k)
total.obs <- (factorial(k))^n #216 observations
R.j = rep(NA, total.obs)

start <- 1 
  for (A in 1:6)
    {for (B in 1:6)
      {for (C in 1:6)
        {start <- start + 1
        data.shell <- matrix(data <- c(dat.permute[A,1], dat.permute[A,2], dat.permute[A,3],
                                       dat.permute[B,1], dat.permute[B,2], dat.permute[B,3],
                                       dat.permute[C,1], dat.permute[C,2], dat.permute[C,3]),
                             nrow<-n, ncol<-k)
        R.j[start] <- sum(apply(data.shell, 1, sum)^2)
      }
    }
  }
# R.j has 216 observations, excluding NA

S.null.dist = ((1/3)*R.j)-36
hist(S.null.dist)

## Question 5 (GRAD ONLY)
# Generate the permutation distribution for Spearman’s Rs and Kendall’s τ based on the following data:
height<-c(68,70,74)
weight<-c(145,155,160)

perm.n <- permutations(3,3)
dim.n <- dim(perm.n)[1]

library(SuppDists)
library(pspearman)
Spm.corr <- rep(NA, dim.n)
for (i in 1:dim.n) {
  Spm.corr[i] = cor(height, weight[perm.n[i,]], method="spearman")
}

Kndl.tau <- rep(NA, dim.n)
for (i in 1:dim.n) {
  Kndl.tau[i] = cor(height, weight[perm.n[i,]], method="kendall")
  }

# Spearman's
hist(Spm.corr)

# Kendall's
hist(Kndl.tau)

## Question 6
# The data below are ages (in days) of concrete cylinders and the compressive strengths of the cylinders.
# (a) Plot the data to show a non-linear relationship. Compute Pearson’s correlation, Spearman’s correlation, and Kendall’s τ .
# (b) (UNDERGRAD ONLY) Test for significant association using the Spearman correlation from the previous part.
# (c) (GRAD ONLY) Test for significant association using each of the measures from the previous part.

age <- c(3, 7, 15, 24, 85, 180, 360)
strength <- c(2500, 3200, 4300, 5300, 5900, 6700, 6900)

# Part A
plot(strength~age)

pearson.corr <- cor.test(strength, age); pearson.corr

## 
##  Pearson's product-moment correlation
## 
## data:  strength and age
## t = 2.8414, df = 5, p-value = 0.03619
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.08030986 0.96677652
## sample estimates:
##       cor 
## 0.7858418

spearman.corr <- cor.test(strength, age, method="spearman"); spearman.corr

## 
##  Spearman's rank correlation rho
## 
## data:  strength and age
## S = 1.2434e-14, p-value = 0.0003968
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho 
##   1

kendall.tau <- cor.test(strength, age, method="kendall"); kendall.tau

## 
##  Kendall's rank correlation tau
## 
## data:  strength and age
## T = 21, p-value = 0.0003968
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau 
##   1

# Part C
pearson.corr$"p.value"

## [1] 0.03618723

# P-value=0.036 is significant but it's clear from the graph that the relationship between the data is monotonic. Pearson's is not robust in the presence of data that violates normality, so we should look to alternative methods (distribution-free) if we want to appropriately test for significant associations.

spearman.corr$"p.value"

## [1] 0.0003968254

# We reject the null hypothesis and conclude that there is a monotonic association between age and strength of concrete cylinders (p=0.0003968)

kendall.tau$"p.value"

## [1] 0.0003968254

# We reject the null hypothesis in favor of the alternative and conclude that there is *some* statistically significant association between age and strength of concrete cylinders (p=0.0003968)

HW3.R

sofia

Fri Feb 22 23:36:27 2019