INTRODUCTION

This was a design of experiment to investigate using a statapult significant factors that influence the distance in which a ball type was thrown.The statapult has three parameters to consider for our experiment which was:

Pin Elevation

• Bungee Position

• Release Angle

Note

The release angle was varied from 90 to 180 degrees. The Pin elevation had 4 discrete settings, numbered from top to bottom. The bungee position had 4 discrete settings, numbered from top to bottom.

Additionally, Also as seen below, three ball types were used, which were

Golf ball

Tennis ball

Rock

Part 1

a) Determining the sample size using power calculation analysis:

Total number of population is three (K=3)

Since our K is odd and with maximum variability, the formula of effect f is

\(f=(d*\sqrt{k^{2}}-1)/2K\)

where f = effect, d = effect size and the value of d is 0.9, alpha=0.05, power=0.55

d<-0.9
f<-d*sqrt(3^2-1)/(2*3)
library(pwr)
pwr.anova.test(k=3,n=NULL,f=f,sig.level=0.05,power=0.55)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 3
##               n = 11.35348
##               f = 0.4242641
##       sig.level = 0.05
##           power = 0.55
## 
## NOTE: n is number in each group

Outcome: We would need to collect 12 samples per group. Also, since we have three different populations, a total number of 36 observations is needed to design this experiment.

b) Proposal a layout using the number of samples from part(a) with randomized run-order:

library(agricolae)
treatment1<-c("golf","tennis","rock")
design<-design.crd(trt=treatment1,r=12,seed=12341234)
design$book
##    plots  r treatment1
## 1    101  1       rock
## 2    102  1     tennis
## 3    103  1       golf
## 4    104  2     tennis
## 5    105  2       golf
## 6    106  2       rock
## 7    107  3     tennis
## 8    108  3       rock
## 9    109  4       rock
## 10   110  4     tennis
## 11   111  5     tennis
## 12   112  6     tennis
## 13   113  5       rock
## 14   114  6       rock
## 15   115  3       golf
## 16   116  4       golf
## 17   117  7     tennis
## 18   118  5       golf
## 19   119  7       rock
## 20   120  8     tennis
## 21   121  8       rock
## 22   122  6       golf
## 23   123  7       golf
## 24   124  9       rock
## 25   125  8       golf
## 26   126  9       golf
## 27   127 10       rock
## 28   128  9     tennis
## 29   129 10       golf
## 30   130 10     tennis
## 31   131 11     tennis
## 32   132 11       golf
## 33   133 12       golf
## 34   134 11       rock
## 35   135 12       rock
## 36   136 12     tennis

c) Collect data and record observations on layout proposed in previous part:

dat<-read.csv("part1.csv",TRUE,",")
dat<-as.data.frame(dat)
str(dat)
## 'data.frame':    36 obs. of  5 variables:
##  $ X        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ plots    : int  101 102 103 104 105 106 107 108 109 110 ...
##  $ r        : int  1 1 1 2 2 2 3 3 4 4 ...
##  $ treatment: chr  "rock " "tennis " "golf " "tennis " ...
##  $ obs      : int  46 60 70 71 90 40 84 36 34 75 ...
dat
##     X plots  r treatment obs
## 1   1   101  1     rock   46
## 2   2   102  1   tennis   60
## 3   3   103  1     golf   70
## 4   4   104  2   tennis   71
## 5   5   105  2     golf   90
## 6   6   106  2     rock   40
## 7   7   107  3   tennis   84
## 8   8   108  3     rock   36
## 9   9   109  4     rock   34
## 10 10   110  4   tennis   75
## 11 11   111  5   tennis   60
## 12 12   112  6   tennis   78
## 13 13   113  5     rock   68
## 14 14   114  6     rock   34
## 15 15   115  3     golf   52
## 16 16   116  4     golf   57
## 17 17   117  7   tennis   78
## 18 18   118  5     golf   75
## 19 19   119  7     rock   20
## 20 20   120  8   tennis   75
## 21 21   121  8     rock   45
## 22 22   122  6     golf   51
## 23 23   123  7     golf   81
## 24 24   124  9     rock   55
## 25 25   125  8     golf   62
## 26 26   126  9     golf   49
## 27 27   127 10     rock   56
## 28 28   128  9   tennis   69
## 29 29   129 10     golf   51
## 30 30   130 10   tennis   65
## 31 31   131 11   tennis   68
## 32 32   132 11     golf   53
## 33 33   133 12     golf   80
## 34 34   134 11     rock   48
## 35 35   135 12     rock   49
## 36 36   136 12   tennis   66

d) Hypothesis test and check residuals:

Null Hypothesis:

\(H_{0}:\mu_{1}=\mu_{2}=\mu_{3}\)

Alternative Hypothesis:

\(H_{a}\) - Atleast one of the above means differs

Where

\(\mu_{1}\)- mean of rock

\(\mu_{2}\)- mean of Tennis ball

\(\mu_{3}\)- mean of Golf ball

dat$treatment<-as.factor(dat$treatment)
model<-aov(dat$obs~dat$treatment,data=dat)
summary(model)
##               Df Sum Sq Mean Sq F value   Pr(>F)    
## dat$treatment  2   4578  2289.0   16.37 1.15e-05 ***
## Residuals     33   4615   139.8                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comment: Since the p-value of our model (1.15e-05) is less than our reference level of significance(0.05). We are rejecting the null hypothesis and stating that at-least one of the means differs.

Checking plot for model adequacy below we have

plot(model)

Comment on residuals: Since the residual plots have the same spread and width, this Implies that we can assume constant variance between the three balls. Also, from the normal probability plot, the data points fairly follows a straight line, indicating the data is normally distributed.

e) Pairwise Comparison:

TukeyHSD(model)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dat$obs ~ dat$treatment, data = dat)
## 
## $`dat$treatment`
##                diff        lwr       upr     p adj
## rock -golf    -20.0 -31.846217 -8.153783 0.0006403
## tennis -golf    6.5  -5.346217 18.346217 0.3802940
## tennis -rock   26.5  14.653783 38.346217 0.0000128
plot(TukeyHSD(model))

Comment: From the Tukey HSD plot, we can accurately say that the means of the pair tennis and golf ball are the same because zero lies in the 95 percent confidence interval range. Also, The means of the rock and golf and also the means of tennis and rock differs significantly because 0 is not in their respective 95percent confidence interval range.

f) Findings & Recommendation:

Major findings:

  • our anova model had an equal variance based on the residual plots and also our normal probability plots show normal distribution. Hence our model for this experiment was adequate.

  • Based on the Tukey HSD plots the means of the pair tennis and golf ball are the same but the means of the rock and golf ball and the mean of tennis and rock balls differs significantly.

  • So we reject the null hypothesis for means of the rock and golf ball and also the mean of tennis ball and mean of rock based on TukeyHSD plots.

  • Also we are failing to reject means of the pair tennis and golf ball based on Tukey HSD plots.

Recommendations:

Band elasticity changed with each toss. Our result may have been affected. Also, the band we utilized in the experiment was 3 bands connected together. In our testing, we noticed the knots loosening. So, the band tension likely changed with each toss. Changing band tension may be a nuisance variable in the experiment. To avoid this, use a single band instead of many ties.

We could also do a block-experiment. We can split our data into 6 blocks. Thus, we can ensure that the change in tension doesn’t affect our final model.

PART 2

a) Stating the model equation and level of significance to be used

Since this is a mixed effect model where the Pin elevation factor is the fixed effect factor release angle is the random effect factor The model equation can be written as.

\(Y_{ijk}=\mu+\alpha_{i}+\beta_{j}+\alpha\beta_{ij}+\varepsilon_{ijk}\)

where

\(\alpha_{i}\) = Fixed effect of pin elevation

\(\beta_{j}\)= Random effect of release angle

\(\alpha\beta_{ij}\)= Interaction between pin elevation and release angle

\(\epsilon_{ijk}\) ~ \(N(0,\sigma^{2})\)= Random error

Hypothesis:

Main Effect Hypothesis (Pin Elevation)

Null hypothesis

\(\alpha_{i}=0\) for all i

Alternative Hypothesis

\(\alpha_{i}\neq 0\) for some i

Main Effect Hypothesis (Release Angle):

Null Hypothesis

\(\sigma_{\beta}^{2}=0\)

Alternative hypothesis

\(\sigma_{\beta}^{2}\neq 0\) for some j

Interaction Effect (Pin Elevation and Release Angle)

Null hypothesis

\(\sigma^{2}_{\alpha\beta}=0\)

Alternative hypothesis

\(\sigma^{2}_{\alpha\beta}\neq 0\)

Also

I=levels of factor A, J=levels of factor B, K=3

For our design

“1”- pin elevation for Factor A ( bottom most location)

“2”- pin elevation for Factor A ( third position from the bottom most location)

“1”- release angle for factor B - 90 degrees

“2”- release angle for factor B- 110 degrees

“3”- release angle for factor B- 120 degrees

b) Proposing a layout with randomized run order

Randomizing the design layout

library(agricolae)
trt<-c(2,3)
seednumber<-123456
design<-design.ab(trt=trt,r=3,design="crd",seed=seednumber)
design$book
##    plots r A B
## 1    101 1 2 1
## 2    102 1 1 2
## 3    103 1 1 3
## 4    104 2 1 3
## 5    105 1 2 2
## 6    106 3 1 3
## 7    107 2 2 2
## 8    108 1 1 1
## 9    109 2 1 2
## 10   110 3 1 2
## 11   111 1 2 3
## 12   112 3 2 2
## 13   113 2 1 1
## 14   114 2 2 1
## 15   115 2 2 3
## 16   116 3 1 1
## 17   117 3 2 1
## 18   118 3 2 3

c) Collecting the data and recording observations on the layout in Part-a

Sample Data collection

dat<-read.csv("part2aa.csv",TRUE,",")
dat
##    No plots r A B obs
## 1   1   101 1 2 1  26
## 2   2   102 1 1 2  33
## 3   3   103 1 1 3  22
## 4   4   104 2 1 3  42
## 5   5   105 1 2 2  35
## 6   6   106 3 1 3  39
## 7   7   107 2 2 2  35
## 8   8   108 1 1 1  17
## 9   9   109 2 1 2  30
## 10 10   110 3 1 2  45
## 11 11   111 1 2 3  55
## 12 12   112 3 2 2  45
## 13 13   113 2 1 1  20
## 14 14   114 2 2 1  25
## 15 15   115 2 2 3  57
## 16 16   116 3 1 1  19
## 17 17   117 3 2 1  25
## 18 18   118 3 2 3  57
library(GAD)
dat$A<-as.fixed(dat$A)
dat$B<-as.random(dat$B)

d) Testing the Hypothesis and Stating Conclusions

Running ANOVA

Note: A- pin elevation, B- release angle

From the syntax below

model2<-aov(obs~A+B+A*B,data=dat)
GAD::gad(model2)
## Analysis of Variance Table
## 
## Response: obs
##          Df  Sum Sq Mean Sq F value    Pr(>F)    
## A         1  480.50  480.50  3.0000   0.22540    
## B         2 1682.33  841.17 23.2938 7.383e-05 ***
## A:B       2  320.33  160.17  4.4354   0.03613 *  
## Residual 12  433.33   36.11                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the model Test:

We can see that the p-value of the interaction effect is 0.03613 which is significant at alpha is 0.05. Because of this we are rejecting the null hypothesis and saying that there is significant interaction between A (pin elevation) and B (release angle).

Because we are rejecting the null hypothesis of the higher order effect in this model (interaction effect), we would stop here. We don’t need to check for the significance of the MAIN EFFECTS in this model.

The Interaction plot can be viewed below:

interaction.plot(dat$A,dat$B,dat$obs)

Comment: Since the line between the interaction of factor A and B are not parallel. It shows an interaction effect is positive.

Viewing the residual plots:

plot(model2)

From the residuals vs Fitted values plot, we see that the spread of the residuals points varies and don’t follow the same spread. This indicates that the constant variance variance assumption is questionable and a further analysis should be done in this case. Because violation of constant variance assumption is a strong one.

We can see a fairly linear trend of the normal qq plot of the residuals indicate that the data have a fair amount of normal distribution.

PART 3

This is an un-replicated \(2^{4}\) design with four factors and each factor has a low level (-1) and a high level(+1). The four factors are:

a) Proposing a data collection layout with randomized run order:

library(agricolae)
trts <- c(2,2,2,2)
design <- design.ab(trt=trts,r=1,design = "crd",seed = 11345)
design$book
##    plots r A B C D
## 1    101 1 1 1 1 1
## 2    102 1 2 2 2 2
## 3    103 1 2 1 1 2
## 4    104 1 2 2 2 1
## 5    105 1 2 2 1 1
## 6    106 1 1 2 2 1
## 7    107 1 1 1 1 2
## 8    108 1 1 1 2 2
## 9    109 1 1 1 2 1
## 10   110 1 2 1 2 1
## 11   111 1 1 2 2 2
## 12   112 1 1 2 1 1
## 13   113 1 1 2 1 2
## 14   114 1 2 2 1 2
## 15   115 1 2 1 1 1
## 16   116 1 2 1 2 2

b) Data collection and recording observations:

dat<-read.csv("part.csv",TRUE,",")
dat
##     X plots r  A  B  C  D values
## 1   1   101 1 -1 -1 -1 -1     35
## 2   2   102 1  1  1  1  1     62
## 3   3   103 1  1 -1 -1  1     31
## 4   4   104 1  1  1  1 -1     60
## 5   5   105 1  1  1 -1 -1     34
## 6   6   106 1 -1  1  1 -1     46
## 7   7   107 1 -1 -1 -1  1     29
## 8   8   108 1 -1 -1  1  1     42
## 9   9   109 1 -1 -1  1 -1     44
## 10 10   110 1  1 -1  1 -1     64
## 11 11   111 1 -1  1  1  1     29
## 12 12   112 1 -1  1 -1 -1     35
## 13 13   113 1 -1  1 -1  1     34
## 14 14   114 1  1  1 -1  1     37
## 15 15   115 1  1 -1 -1 -1     48
## 16 16   116 1  1 -1  1  1     57

c) Stating the model equation and significant factors:

Model Equation

\(Y_{ikjl}=\mu+\alpha_{i}+\beta_{j}+\alpha\beta_{ij}+\gamma_{k}+\alpha\gamma_{ik}+\beta\gamma_{jk}+\alpha\beta\gamma_{ijk}+\delta_{l}+\alpha\delta_{il}+\beta\delta_{jl}+\gamma\delta_{kl}+\alpha\beta\delta_{ijl}+\alpha\gamma\delta_{ikl}+\beta\gamma\delta_{jkl}+\alpha\beta\gamma\delta_{ijkl}+\epsilon_{ijkl}\)

where

\(\alpha_{i}\)= Factor A (Pin elevation)

\(\beta_{j}\)= Factor B ( Bungee Position)

\(\gamma_{k}\)=Factor C ( Release Angle)

\(\delta\)= Factor D ( Ball type)

\(\epsilon_{ijkl}\) = Random error term

library(DoE.base)
model<-lm(values~A*B*C*D,data=dat)
halfnormal(model)

Comments: From the half normal plots, we can see that the significant factors are A,C,A:C. This essentially means that only Pin elevation, Release angle and the interaction between Pin elevation and release angle are significant.

d) Performing an ANOVA analysis to determine final model equation using alpha=0.05

Performing analysis of variance to determine the final model we have

Main effect ( Pin elevation)

Null hypothesis : \(\alpha_{i}=0\) For all i

Alternative hypothesis: \(\alpha\neq 0\) for some i

Main effect (Release angle)

Null hypothesis- \(\gamma_{k}=0\) for all k

Alternative hypothesis- \(\gamma_{k}=0\) for some k

Interaction effect (Pin elevation & Release angle)

Null hypothesis- \(\alpha\gamma_{ik}=0\)

Alternative hypothesis- \(\alpha\gamma_{ik}\neq 0\)

Running the new model:

dat2<-dat[,c("A","C","values")]
dat2
##     A  C values
## 1  -1 -1     35
## 2   1  1     62
## 3   1 -1     31
## 4   1  1     60
## 5   1 -1     34
## 6  -1  1     46
## 7  -1 -1     29
## 8  -1  1     42
## 9  -1  1     44
## 10  1  1     64
## 11 -1  1     29
## 12 -1 -1     35
## 13 -1 -1     34
## 14  1 -1     37
## 15  1 -1     48
## 16  1  1     57
library(GAD)
dat2$A<-as.fixed(dat2$A)
dat2$C<-as.fixed(dat2$C)
model1<-aov(values~A*C,data = dat2)
GAD::gad(model1)
## Analysis of Variance Table
## 
## Response: values
##          Df Sum Sq Mean Sq F value    Pr(>F)    
## A         1 612.56  612.56 18.6923 0.0009901 ***
## C         1 915.06  915.06 27.9231 0.0001933 ***
## A:C       1 264.06  264.06  8.0579 0.0149345 *  
## Residual 12 393.25   32.77                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comments on Final Model: Based on the ANOVA values We can see that the p-value of factor A, factor C and factor A:C (0.0011619, 0.0002014,0.0166751) are all significant because they are all less than our reference significant level of alpha(0.05), Therefore we are rejecting the null hypothesis claiming that they have significant effect on the model.

The final model can be written has

\(Y_{ikl}=\mu+\alpha_{i}+\gamma_{k}+\alpha\gamma_{ik}+\epsilon_{ikl}\)

\(\alpha_{i}\)= Factor A (Pin elevation)

\(\gamma_{k}\)= Factor C ( Release Angle)

\(\alpha\gamma_{ik}\) = interaction term between factor A and factor C

\(\epsilon_{ikl}\)= Standard error term

Acknowledgement:

Group 4 members would like to show a big appreciation to Dr. Timothy Matis and the Teaching assistant Andrea Arriet for their guidance throughout the completion of this project.

References:

  1. Montgomery Douglas C. Design of experiment

  2. IE 5342 Course materials and videos by Dr Timothy Matis

  3. R studio cheat sheets