INTRODUCTION
This was a design of experiment to investigate using a statapult significant factors that influence the distance in which a ball type was thrown.
The statapult has three parameters to consider for our experiment which was
• Pin Elevation
• Bungee Position
• Release Angle
Note
The release angle was varied from 90 to 180 degrees.
The Pin elevation had 4 discrete settings, numbered from top to bottom
The bungee position had 4 discrete settings, numbered from top to bottom
Additionally,
Also as seen below, three ball types were used, which were
Golf ball
Tennis ball
Rock
Total number of population is three (K=3)
Since our K is odd and with maximum variability, the formula of effect f is
\(f=(d*\sqrt{k^{2}}-1)/2K\)
where f = effect
d = effect size and the value of d is 0.9
alpha=0.05
power=0.55
d<-0.9
f<-d*sqrt(3^2-1)/(2*3)
library(pwr)
pwr.anova.test(k=3,n=NULL,f=f,sig.level=0.05,power=0.55)
##
## Balanced one-way analysis of variance power calculation
##
## k = 3
## n = 11.35348
## f = 0.4242641
## sig.level = 0.05
## power = 0.55
##
## NOTE: n is number in each group
We would need to collect 12 samples per group
Also, since we have three different populations, a total number of 36 observations is needed to design this experiment.
library(agricolae)
treatment1<-c("golf","tennis","rock")
design<-design.crd(trt=treatment1,r=12,seed=12341234)
design$book
## plots r treatment1
## 1 101 1 rock
## 2 102 1 tennis
## 3 103 1 golf
## 4 104 2 tennis
## 5 105 2 golf
## 6 106 2 rock
## 7 107 3 tennis
## 8 108 3 rock
## 9 109 4 rock
## 10 110 4 tennis
## 11 111 5 tennis
## 12 112 6 tennis
## 13 113 5 rock
## 14 114 6 rock
## 15 115 3 golf
## 16 116 4 golf
## 17 117 7 tennis
## 18 118 5 golf
## 19 119 7 rock
## 20 120 8 tennis
## 21 121 8 rock
## 22 122 6 golf
## 23 123 7 golf
## 24 124 9 rock
## 25 125 8 golf
## 26 126 9 golf
## 27 127 10 rock
## 28 128 9 tennis
## 29 129 10 golf
## 30 130 10 tennis
## 31 131 11 tennis
## 32 132 11 golf
## 33 133 12 golf
## 34 134 11 rock
## 35 135 12 rock
## 36 136 12 tennis
dat<-read.csv("part1.csv",TRUE,",")
dat<-as.data.frame(dat)
str(dat)
## 'data.frame': 36 obs. of 5 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ plots : int 101 102 103 104 105 106 107 108 109 110 ...
## $ r : int 1 1 1 2 2 2 3 3 4 4 ...
## $ treatment: chr "rock " "tennis " "golf " "tennis " ...
## $ obs : int 46 60 70 71 90 40 84 36 34 75 ...
dat
## X plots r treatment obs
## 1 1 101 1 rock 46
## 2 2 102 1 tennis 60
## 3 3 103 1 golf 70
## 4 4 104 2 tennis 71
## 5 5 105 2 golf 90
## 6 6 106 2 rock 40
## 7 7 107 3 tennis 84
## 8 8 108 3 rock 36
## 9 9 109 4 rock 34
## 10 10 110 4 tennis 75
## 11 11 111 5 tennis 60
## 12 12 112 6 tennis 78
## 13 13 113 5 rock 68
## 14 14 114 6 rock 34
## 15 15 115 3 golf 52
## 16 16 116 4 golf 57
## 17 17 117 7 tennis 78
## 18 18 118 5 golf 75
## 19 19 119 7 rock 20
## 20 20 120 8 tennis 75
## 21 21 121 8 rock 45
## 22 22 122 6 golf 51
## 23 23 123 7 golf 81
## 24 24 124 9 rock 55
## 25 25 125 8 golf 62
## 26 26 126 9 golf 49
## 27 27 127 10 rock 56
## 28 28 128 9 tennis 69
## 29 29 129 10 golf 51
## 30 30 130 10 tennis 65
## 31 31 131 11 tennis 68
## 32 32 132 11 golf 53
## 33 33 133 12 golf 80
## 34 34 134 11 rock 48
## 35 35 135 12 rock 49
## 36 36 136 12 tennis 66
NULL HYPOTHESIS;
\(H_{0}:\mu_{1}=\mu_{2}=\mu_{3}\)
ALTERNATIVE HYPOTHESIS
\(H_{a}\) - Atleast one of the above means differs
Where
\(\mu_{1}\)- mean of rock
\(\mu_{2}\)- mean of Tennis ball
\(\mu_{3}\)- mean of Golf ball
dat$treatment<-as.factor(dat$treatment)
model<-aov(dat$obs~dat$treatment,data=dat)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## dat$treatment 2 4578 2289.0 16.37 1.15e-05 ***
## Residuals 33 4615 139.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p-value of our model (1.15e-05) is less than our reference level of significance(0.05).
We are rejecting the null hypothesis and stating that at-least one of the means differs
plot(model)
Since the residual plots have the same spread and width, this Implies that we can assume constant variance between the three balls.
Also, from the normal probability plot, the data points fairly follows a straight line, indicating the data is normally distributed.
TukeyHSD(model)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = dat$obs ~ dat$treatment, data = dat)
##
## $`dat$treatment`
## diff lwr upr p adj
## rock -golf -20.0 -31.846217 -8.153783 0.0006403
## tennis -golf 6.5 -5.346217 18.346217 0.3802940
## tennis -rock 26.5 14.653783 38.346217 0.0000128
plot(TukeyHSD(model))
From the TukeyHSD plot, we can accurately say that the means of the pair tennis and golf ball are the same because zero lies in the 95 percent confidence interval range.
Also, The means of the rock and golf and also the means of tennis and rock differs significantly because 0 is not in their respective 95percent confidence interval range.
PART 2-A
Since this is a mixed effect model where the
Pin elevation factor is the fixed effect factor
release angle is the random effect factor
The model equation can be written as.
\(Y_{ijk}=\mu+\alpha_{i}+\beta_{j}+\alpha\beta_{ij}+\varepsilon_{ijk}\)
where
\(\alpha_{i}\) = Fixed effect of pin elevation
\(\beta_{j}\)= Random effect of release angle
\(\alpha\beta_{ij}\)= Interaction between pin elevation and release angle
\(\epsilon_{ijk}\) ~ \(N(0,\sigma^{2})\)= Random error
MAIN EFFECT HYPOTHESIS ( PIN ELEVATION)
Null hypothesis
\(\alpha_{i}=0\) for all i
Alternative Hypothesis
\(\alpha_{i}\neq 0\) for some i
MAIN EFFECT HYPOTHESIS ( RELEASE ANGLE)
Null Hypothesis
\(\sigma_{\beta}^{2}=0\)
Alternative hypothesis
\(\sigma_{\beta}^{2}\neq 0\) for some j
INTERACTION EFFECT (PIN ELEVATION AND RELEASE ANGLE)
Null hypothesis
\(\sigma^{2}_{\alpha\beta}=0\)
Alternative hypothesis
\(\sigma^{2}_{\alpha\beta}\neq 0\)
Also
I=levels of factor A
J=levels of factor B
K=3
For our design
“1”- pin elevation for Factor A ( bottom most location)
“2”- pin elevation for Factor A ( third position from the bottom most location)
“1”- release angle for factor B - 90 degrees
“2”- release angle for factor B- 110 degrees
“3”- release angle for factor B- 120 degrees
Randomizing the design layout
library(agricolae)
trt<-c(2,3)
seednumber<-123456
design<-design.ab(trt=trt,r=3,design="crd",seed=seednumber)
design$book
## plots r A B
## 1 101 1 2 1
## 2 102 1 1 2
## 3 103 1 1 3
## 4 104 2 1 3
## 5 105 1 2 2
## 6 106 3 1 3
## 7 107 2 2 2
## 8 108 1 1 1
## 9 109 2 1 2
## 10 110 3 1 2
## 11 111 1 2 3
## 12 112 3 2 2
## 13 113 2 1 1
## 14 114 2 2 1
## 15 115 2 2 3
## 16 116 3 1 1
## 17 117 3 2 1
## 18 118 3 2 3
Sample data would be collected above in the randomized order shown
Sample Data collection
dat<-read.csv("part2aa.csv",TRUE,",")
dat
## No plots r A B obs
## 1 1 101 1 2 1 26
## 2 2 102 1 1 2 33
## 3 3 103 1 1 3 22
## 4 4 104 2 1 3 42
## 5 5 105 1 2 2 35
## 6 6 106 3 1 3 39
## 7 7 107 2 2 2 35
## 8 8 108 1 1 1 17
## 9 9 109 2 1 2 30
## 10 10 110 3 1 2 45
## 11 11 111 1 2 3 55
## 12 12 112 3 2 2 45
## 13 13 113 2 1 1 20
## 14 14 114 2 2 1 25
## 15 15 115 2 2 3 57
## 16 16 116 3 1 1 19
## 17 17 117 3 2 1 25
## 18 18 118 3 2 3 57
library(GAD)
dat$A<-as.fixed(dat$A)
dat$B<-as.random(dat$B)
NOTE
A- pin elevation
B- release angle
From the syntax below
model2<-aov(obs~A+B+A*B,data=dat)
GAD::gad(model2)
## Analysis of Variance Table
##
## Response: obs
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 480.50 480.50 3.0000 0.22540
## B 2 1682.33 841.17 23.2938 7.383e-05 ***
## A:B 2 320.33 160.17 4.4354 0.03613 *
## Residual 12 433.33 36.11
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
FROM THE MODEL TEST
We can see that the p-value of the interaction effect is 0.03613 which is significant at alpha is 0.05. Because of this we are rejecting the null hypothesis and saying that there is significant interaction between A (pin elevation) and B (release angle).
Because we are rejecting the null hypothesis of the higher order effect in this model (interaction effect), we would stop here. We don’t need to check for the significance of the MAIN EFFECTS in this model.
THE INTERACTION PLOT CAN BE VIEWED BELOW
interaction.plot(dat$A,dat$B,dat$obs)
plot(model2)
From the residuals vs Fitted values plot, we see that the spread of the residuals points varies and don’t follow the same spread. This indicates that the constant variance variance assumption is questionable and a further analysis should be done in this case. Because violation of constant variance assumption is a strong one.
We can see a fairly linear trend of the normal qq plot of the residuals indicate that the data have a fair amount of normal distribution.
This is an un-replicated \(2^{4}\) design with four factors and each factor has a low level (-1) and a high level(+1). The four factors are:
Factor A is Pin Elevation and it has two levels where the low level is (-1) and it is at position 1 and the high level (+1) and it is at position 3.
Factor B is Bungee Position and it has two levels where position 2 is at low level (-1) and position 3 is at high level (+1).
Factor C is Release Angle and it has two levels where 90 degrees is at low level (-1) and 110 degrees is at high level (+1).
Factor D is Ball Type and it has two levels where the yellow ball is at low level (-1) and the red ball is at high level (+1).

Proposing a data collection layout with a randomized order we have
library(agricolae)
trts <- c(2,2,2,2)
design <- design.ab(trt=trts,r=1,design = "crd",seed = 11345)
design$book
## plots r A B C D
## 1 101 1 1 1 1 1
## 2 102 1 2 2 2 2
## 3 103 1 2 1 1 2
## 4 104 1 2 2 2 1
## 5 105 1 2 2 1 1
## 6 106 1 1 2 2 1
## 7 107 1 1 1 1 2
## 8 108 1 1 1 2 2
## 9 109 1 1 1 2 1
## 10 110 1 2 1 2 1
## 11 111 1 1 2 2 2
## 12 112 1 1 2 1 1
## 13 113 1 1 2 1 2
## 14 114 1 2 2 1 2
## 15 115 1 2 1 1 1
## 16 116 1 2 1 2 2
dat<-read.csv("part.csv",TRUE,",")
dat
## X plots r A B C D values
## 1 1 101 1 -1 -1 -1 -1 35
## 2 2 102 1 1 1 1 1 62
## 3 3 103 1 1 -1 -1 1 31
## 4 4 104 1 1 1 1 -1 60
## 5 5 105 1 1 1 -1 -1 34
## 6 6 106 1 -1 1 1 -1 46
## 7 7 107 1 -1 -1 -1 1 29
## 8 8 108 1 -1 -1 1 1 42
## 9 9 109 1 -1 -1 1 -1 44
## 10 10 110 1 1 -1 1 -1 64
## 11 11 111 1 -1 1 1 1 29
## 12 12 112 1 -1 1 -1 -1 35
## 13 13 113 1 -1 1 -1 1 34
## 14 14 114 1 1 1 -1 1 37
## 15 15 115 1 1 -1 -1 -1 48
## 16 16 116 1 1 -1 1 1 57
Model Equation
\(Y_{ikjl}=\mu+\alpha_{i}+\beta_{j}+\alpha\beta_{ij}+\gamma_{k}+\alpha\gamma_{ik}+\beta\gamma_{jk}+\alpha\beta\gamma_{ijk}+\delta_{l}+\alpha\delta_{il}+\beta\delta_{jl}+\gamma\delta_{kl}+\alpha\beta\delta_{ijl}+\alpha\gamma\delta_{ikl}+\beta\gamma\delta_{jkl}+\alpha\beta\gamma\delta_{ijkl}+\epsilon_{ijkl}\)
where
\(\alpha_{i}\)= Factor A (Pin elevation)
\(\beta_{j}\)= Factor B ( Bungee Position)
\(\gamma_{k}\)=Factor C ( Release Angle)
\(\delta\)= Factor D ( Ball type)
\(\epsilon_{ijkl}\) = Random error term
library(DoE.base)
model<-lm(values~A*B*C*D,data=dat)
halfnormal(model)
From the half normal plots, we can see that the significant factors are A,C,A:C. This essentially means that only Pin elevation, Release angle and the interaction between Pin elevation and release angle are significant.
Performing analysis of variance to determine the final model we have
Main effect ( Pin elevation)
Null hypothesis : \(\alpha_{i}=0\) For all i
Alternative hypothesis: \(\alpha\neq 0\) for some i
Main effect (Release angle)
Null hypothesis- \(\gamma_{k}=0\) for all k
Alternative hypothesis- \(\gamma_{k}=0\) for some k
Interaction effect (Pin elevation & Release angle)
Null hypothesis- \(\alpha\gamma_{ik}=0\)
Alternative hypothesis- \(\alpha\gamma_{ik}\neq 0\)
dat2<-dat[,c("A","C","values")]
dat2
## A C values
## 1 -1 -1 35
## 2 1 1 62
## 3 1 -1 31
## 4 1 1 60
## 5 1 -1 34
## 6 -1 1 46
## 7 -1 -1 29
## 8 -1 1 42
## 9 -1 1 44
## 10 1 1 64
## 11 -1 1 29
## 12 -1 -1 35
## 13 -1 -1 34
## 14 1 -1 37
## 15 1 -1 48
## 16 1 1 57
library(GAD)
dat2$A<-as.fixed(dat2$A)
dat2$C<-as.fixed(dat2$C)
model1<-aov(values~A*C,data = dat2)
GAD::gad(model1)
## Analysis of Variance Table
##
## Response: values
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 612.56 612.56 18.6923 0.0009901 ***
## C 1 915.06 915.06 27.9231 0.0001933 ***
## A:C 1 264.06 264.06 8.0579 0.0149345 *
## Residual 12 393.25 32.77
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on the ANOVA values
We can see that the p-value of factor A, factor C and factor A:C (0.0011619, 0.0002014,0.0166751) are all significant because they are all less than our reference significant level of alpha(0.05)
Therefore we are rejecting the null hypothesis claiming that they have significant effect on the model.
The final model can be written has
\(Y_{ikl}=\mu+\alpha_{i}+\gamma_{k}+\alpha\gamma_{ik}+\epsilon_{ikl}\)
\(\alpha_{i}\)= Factor A (Pin elevation)
\(\gamma_{k}\)= Factor C ( Release Angle)
\(\alpha\gamma_{ik}\) = interaction term between factor A and factor C
\(\epsilon_{ikl}\)= Standard error term
Group 4 members would like to show a big appreciation to Dr. Timothy Matis and the Teaching assistant Andrea Arriet for their guidance throughout, up to the completion of this project.
Montgomery Douglas C. Design of experiment
IE 5342 Course materials and videos by Dr Timothy matis
R studio cheat sheets