Determining the sample size using power calculation analysis

Total number of population is three (K=3)

Since our K is odd and with maximum variability, the formula of effect f is

\(f=(d*\sqrt{k^{2}}-1)/2K\)

where f = effect

d = effect size and the value of d is 0.9

alpha=0.05

power=0.55

d<-0.9
f<-d*sqrt(3^2-1)/(2*3)
library(pwr)
pwr.anova.test(k=3,n=NULL,f=f,sig.level=0.05,power=0.55)

## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 3
##               n = 11.35348
##               f = 0.4242641
##       sig.level = 0.05
##           power = 0.55
## 
## NOTE: n is number in each group

We would need to collect 12 samples per group

Also, since we have three different populations, a total number of 36 observations is needed to design this experiment.

1.2 Using a complete randomized run order for sample data collection we have

library(agricolae)
treatment1<-c("golf","tennis","rock")
design<-design.crd(trt=treatment1,r=12,seed=12341234)
design$book

##    plots  r treatment1
## 1    101  1       rock
## 2    102  1     tennis
## 3    103  1       golf
## 4    104  2     tennis
## 5    105  2       golf
## 6    106  2       rock
## 7    107  3     tennis
## 8    108  3       rock
## 9    109  4       rock
## 10   110  4     tennis
## 11   111  5     tennis
## 12   112  6     tennis
## 13   113  5       rock
## 14   114  6       rock
## 15   115  3       golf
## 16   116  4       golf
## 17   117  7     tennis
## 18   118  5       golf
## 19   119  7       rock
## 20   120  8     tennis
## 21   121  8       rock
## 22   122  6       golf
## 23   123  7       golf
## 24   124  9       rock
## 25   125  8       golf
## 26   126  9       golf
## 27   127 10       rock
## 28   128  9     tennis
## 29   129 10       golf
## 30   130 10     tennis
## 31   131 11     tennis
## 32   132 11       golf
## 33   133 12       golf
## 34   134 11       rock
## 35   135 12       rock
## 36   136 12     tennis

Section 1-C

Collection of data

dat<-read.csv("part1.csv",TRUE,",")
dat<-as.data.frame(dat)
str(dat)

## 'data.frame':    36 obs. of  5 variables:
##  $ X        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ plots    : int  101 102 103 104 105 106 107 108 109 110 ...
##  $ r        : int  1 1 1 2 2 2 3 3 4 4 ...
##  $ treatment: chr  "rock " "tennis " "golf " "tennis " ...
##  $ obs      : int  46 60 70 71 90 40 84 36 34 75 ...

dat

##     X plots  r treatment obs
## 1   1   101  1     rock   46
## 2   2   102  1   tennis   60
## 3   3   103  1     golf   70
## 4   4   104  2   tennis   71
## 5   5   105  2     golf   90
## 6   6   106  2     rock   40
## 7   7   107  3   tennis   84
## 8   8   108  3     rock   36
## 9   9   109  4     rock   34
## 10 10   110  4   tennis   75
## 11 11   111  5   tennis   60
## 12 12   112  6   tennis   78
## 13 13   113  5     rock   68
## 14 14   114  6     rock   34
## 15 15   115  3     golf   52
## 16 16   116  4     golf   57
## 17 17   117  7   tennis   78
## 18 18   118  5     golf   75
## 19 19   119  7     rock   20
## 20 20   120  8   tennis   75
## 21 21   121  8     rock   45
## 22 22   122  6     golf   51
## 23 23   123  7     golf   81
## 24 24   124  9     rock   55
## 25 25   125  8     golf   62
## 26 26   126  9     golf   49
## 27 27   127 10     rock   56
## 28 28   128  9   tennis   69
## 29 29   129 10     golf   51
## 30 30   130 10   tennis   65
## 31 31   131 11   tennis   68
## 32 32   132 11     golf   53
## 33 33   133 12     golf   80
## 34 34   134 11     rock   48
## 35 35   135 12     rock   49
## 36 36   136 12   tennis   66

SECTION 1-4

HYPOTHESIS TESTING

NULL HYPOTHESIS;

\(H_{0}:\mu_{1}=\mu_{2}=\mu_{3}\)

ALTERNATIVE HYPOTHESIS

\(H_{a}\) - Atleast one of the above means differs

Where

\(\mu_{1}\)- mean of rock

\(\mu_{2}\)- mean of Tennis ball

\(\mu_{3}\)- mean of Golf ball

dat$treatment<-as.factor(dat$treatment)
model<-aov(dat$obs~dat$treatment,data=dat)
summary(model)

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## dat$treatment  2   4578  2289.0   16.37 1.15e-05 ***
## Residuals     33   4615   139.8                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value of our model (1.15e-05) is less than our reference level of significance.

We are rejecting the null hypothesis and stating that at-least one of the means differs

Checking plot for model adequacy below we have

plot(model)

Since the residual plots have the same spread and width, this Implies that we can assume constant variance between the three balls.

Also, from the normal probability plot, the data points fairly follows a straight line, indicating the data is normally distributed.

1.5 PAIRWISE COMPARISONS

TukeyHSD(model)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dat$obs ~ dat$treatment, data = dat)
## 
## $`dat$treatment`
##                diff        lwr       upr     p adj
## rock -golf    -20.0 -31.846217 -8.153783 0.0006403
## tennis -golf    6.5  -5.346217 18.346217 0.3802940
## tennis -rock   26.5  14.653783 38.346217 0.0000128

plot(TukeyHSD(model))

From the TukeyHSD plot, we can accurately say that the means of the pair tennis and golf ball are the same because zero lies in the 95 percent confidence interval range.

Also, The means of the rock and golf and also the means of tennis and rock differs significantly because 0 is not in their respective 95percent confidence interval range.

DESIGN OF EXPERIMENT PROJECT, 2022

Ayodeji Ayoola, Jahir Ahmed, Yashwanth Dommaraju

Part 1