This section of the project is intended to employ design of experiment tools and approaches to make the best selections possible based on statistical data. Section 1 determines the number of samples required to detect a mean difference with a medium effect and a probability of 0.55. In section 2, we collected data based on the randomized order provided by Rstudio, a computer language used to run a completely randomized design (CRD function) with an alpha of 0.05. We then presented a layout for our data, which is tabulated, neat, ranked, and well arranged. In Section 3, we ran a hypothesis test and examined our residuals vs fitted values and other residuals plots to ensure that nothing out of the ordinary was there.
Both the Pin Elevation and the Bungee Position should be set to the fourth, or highest, setting. The Release Angle should be 90 degrees, and the arm should be fully extended before releasing. We intend to test this hypothesis using a completely randomized design with an alpha of roughly 0.05.
ANALYSIS AND DISCUSSION
Section A: Hypothesis and sample size
Determine how many samples should be collected to detect a mean difference with a medium effect (i.e.90% of the standard deviation) and a pattern of maximum variability with a probability of 55%.
Hypothesis:
Null Hypothesis:\(H_0=\mu_y=\mu_r=\mu_b\)
Alternative Hypothesis:\(H_a=\)atleast one \(\mu_i\)differs
Note:
\(\mu_y=\) mean distance recorded using yellow ball
\(\mu_r=\)mean distance recorded using red ball
\(\mu_b=\)mean distance recorded using black ball
{library(pwr)}
pwr.anova.test(k=3,n=NULL,f=0.424,sig.level = 0.05,power=0.55)
##
## Balanced one-way analysis of variance power calculation
##
## k = 3
## n = 11.36625
## f = 0.424
## sig.level = 0.05
## power = 0.55
##
## NOTE: n is number in each group
By the above analysis we can interpret that the n=11.36625.
Therefore we can say that we need 12 observations for each ball sample.
Section B: Randomized Run and Layout
library(agricolae)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
trt<-c("yellow","red","black")
crd<-design.crd(trt,r=12,seed=1234)
crd$book
## plots r trt
## 1 101 1 red
## 2 102 1 yellow
## 3 103 2 yellow
## 4 104 1 black
## 5 105 2 red
## 6 106 3 yellow
## 7 107 3 red
## 8 108 2 black
## 9 109 4 red
## 10 110 4 yellow
## 11 111 5 red
## 12 112 6 red
## 13 113 3 black
## 14 114 5 yellow
## 15 115 7 red
## 16 116 6 yellow
## 17 117 4 black
## 18 118 5 black
## 19 119 8 red
## 20 120 7 yellow
## 21 121 8 yellow
## 22 122 6 black
## 23 123 9 yellow
## 24 124 7 black
## 25 125 10 yellow
## 26 126 9 red
## 27 127 10 red
## 28 128 8 black
## 29 129 11 yellow
## 30 130 12 yellow
## 31 131 11 red
## 32 132 9 black
## 33 133 10 black
## 34 134 11 black
## 35 135 12 red
## 36 136 12 black
Section C: Collecting Data
BalCol<-c(rep(c("red"),12),rep(c("yellow"),12),rep(c("black"),12))
red<-c(63,67,70,63,77,75,63,71,65,72,78,71)
yellow<-c(52,93,78,87,70,82,76,84,81,83,80,87)
black<-c(121,61,55,62,53,101,79,74,78,65,61,70)
Obs1<-c(red,yellow,black)
boxplot(Obs1~BalCol)
mean1<-c(rep(mean(red),12),rep(mean(yellow),12),rep(mean(black),12))
R1<-Obs1-mean1
qqnorm(R1)
qqline(R1)
plot(mean1,R1)
The data is largely normal, but the normalcy plot reveals several outliers. The data’s variance also doesn’t appear to be distributed equally.
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
boxcox(Obs1~BalCol)
Obs1<-Obs1^(-1)
boxcox(Obs1~BalCol)
boxplot(Obs1~BalCol)
red<-red^(-1)
yellow<-yellow^(-1)
black<-black^(-1)
mean1<-c(rep(mean(red),12),rep(mean(yellow),12),rep(mean(black),12))
R1<-Obs1-mean1
qqnorm(R1)
qqline(R1)
plot(mean1,R1)
library(GAD)
## Warning: package 'GAD' was built under R version 4.2.2
## Loading required package: matrixStats
##
## Attaching package: 'matrixStats'
## The following object is masked from 'package:dplyr':
##
## count
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.2 (2022-06-13 22:00:14 UTC) successfully loaded. See ?R.methodsS3 for help.
BalCol<-as.fixed(BalCol)
model1<-lm(Obs1~BalCol)
gad(model1)
## Analysis of Variance Table
##
## Response: Obs1
## Df Sum Sq Mean Sq F value Pr(>F)
## BalCol 2 1.9806e-05 9.9030e-06 1.8411 0.1746
## Residual 33 1.7750e-04 5.3787e-06
The P-value we got is equal to 0.1746. This value is greater than the alpha value which is 0.05.
Due to this we fail to reject the null hypothesis
We do not need to investigate pairwise comparison since we failed to reject the null hypothesis.
In addition to the statapult experimental research, we are conducting a designed experiment to assess the influence of Pin Elevation and Release Angle on the distance thrown when the Bungee Position is fixed at the second position.
As a fixed effect, Pin Elevation settings one and three will be studied, as will Release Angle values corresponding to 90, 120, and 170 degrees as a random effect. The design is three times reproduced.
We identify Two factors: Factor A: Release Angle with 3 levels ( 90, 120, 170) therefore the degrees of freedom for factor A , will be i-1=3-1=2.
Factor B: Pin elevation with 2 levels ( 1 and 3) therefore the degrees of freedom for factor B, will be j-1=2-1=1.
Section A: Model equation and Hypothesis
State model equation with the null and alternative hypotheses to be tested. In addition, state the level of significance that will be used in your analysis.
Model Equation:
\[ Y_{ijkl}=\mu+\alpha_i+\beta_j+\alpha\beta_{ij}+\epsilon_{ijkl} \]
Hypothesis:
Null Hypothesis: \(\alpha\beta_{ij}=0\) For all {i,j}
Alternative Hypothesis: \(\alpha\beta_{ij}\not=0\) for some{i,j}
Null Hypothesis: \(\alpha_i=0\)
Alternative Hypothesis: \(\alpha_i\not=0\)
Null Hypothesis: \(\beta_j=0\)
Alternative Hypothesis: \(\beta_j\not=0\)
Section B: Randomized Design Layout
library(agricolae)
trt1<-c(2,3)
Design<-design.ab(trt = trt1,r=3,design = "crd",seed = 10000)
Design$book
## plots r A B
## 1 101 1 2 2
## 2 102 1 1 2
## 3 103 1 1 3
## 4 104 2 1 2
## 5 105 1 2 3
## 6 106 3 1 2
## 7 107 1 2 1
## 8 108 2 2 3
## 9 109 2 2 1
## 10 110 1 1 1
## 11 111 2 1 3
## 12 112 2 2 2
## 13 113 2 1 1
## 14 114 3 2 2
## 15 115 3 1 1
## 16 116 3 2 3
## 17 117 3 2 1
## 18 118 3 1 3
A=1 pin elevation 1
A=2 pin elevation 2
B=1 90 degree
B=2 120 degree
B=3 140 degree
Section C: Data Collection
P<-rep(c(1,3),9)
Ang<-c(rep(90,6),rep(120,6),rep(140,6))
Obs2<-c(72,46,45,43,63,51,64,71,60,49,45,77,43,61,36,71,53,47)
data2<-data.frame(P,Ang,Obs2)
Section D: Hypothesis test
P<-as.fixed(P)
Ang<-as.random(Ang)
model2<-aov(Obs2~P+Ang+P*Ang)
gad(model2)
## Analysis of Variance Table
##
## Response: Obs2
## Df Sum Sq Mean Sq F value Pr(>F)
## P 1 68.06 68.06 0.1952 0.70183
## Ang 2 290.11 145.06 1.1740 0.34224
## P:Ang 2 697.44 348.72 2.8224 0.09894 .
## Residual 12 1482.67 123.56
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model2)
We got the P value for the interaction as 0.09894. This value is greater than the alpha value which is 0.05
The pin elevation and the release angle is not significant as the p value we got is greater than the alpha value=0.05.
2k Factorial Design
We are Performing a designed experiment to determine the effect of the available factors of Pin Elevation (A), Bungee Position (B), Release Angle(C), and Ball Type (D) on distance in which a ball is thrown. we will Design this experiment as a single replicate of a 24 factorial design with the low and high level of the factors being as follows:
| Factor | Low Level (-1) | High Level (+1) |
|---|---|---|
| A :Pin Elevation | Position 1 | Position 3 |
| B :Bungee Position | Position 2 | Position 3 |
| C :Release Angle | 100 degrees | 120 degrees |
| D :Ball Type | Yellow | Red |
Section A: Randomized Design and layout
trts <- c(2,2,2,2)
design <- design.ab(trt=trts,r=1,design="crd",seed=1000)
design$book
## plots r A B C D
## 1 101 1 2 1 2 1
## 2 102 1 1 1 1 2
## 3 103 1 2 1 1 1
## 4 104 1 2 2 2 1
## 5 105 1 1 1 2 1
## 6 106 1 1 2 2 1
## 7 107 1 2 2 1 2
## 8 108 1 1 2 1 2
## 9 109 1 2 2 1 1
## 10 110 1 2 2 2 2
## 11 111 1 2 1 1 2
## 12 112 1 2 1 2 2
## 13 113 1 1 2 1 1
## 14 114 1 1 1 1 1
## 15 115 1 1 1 2 2
## 16 116 1 1 2 2 2
In this 1 means (-1) and 2 means (+1)
A =(+1) = Pin Elevation 3
A = (-1) = Pin Elevation 1
B = (+1)= Bungee Position 3
B = (-1) = Bungee Position 2
C = (+1)= 110 degree
C = (-1) = 90 degree
D = (+1) = Red
D = (-1) = Yellow
Section B: Data Collection
obs3<-c(44, 18, 59, 46, 17, 24, 61, 24, 57, 63, 61, 53, 22, 17, 19, 26)
Section C: Model Equation
Model Equation:
\[Y_{ijkl}=\mu + \alpha_i + \beta_j + \gamma_k + \lambda_l + \alpha\beta_{ij} + \alpha\gamma_{ik} + \alpha\lambda_{il} + \beta\gamma_{jk} + \beta\lambda_{jl} + \gamma\lambda_{kl} + \alpha\beta\gamma_{ijk} + \alpha\beta\lambda_{ijl} + \beta\gamma\lambda_{jkl} + \alpha\beta\gamma\lambda_{ijkl} + \epsilon_{ijklm}\]
library(DoE.base)
## Warning: package 'DoE.base' was built under R version 4.2.2
## Loading required package: grid
## Loading required package: conf.design
## Registered S3 method overwritten by 'DoE.base':
## method from
## factorize.factor conf.design
##
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
##
## aov, lm
## The following object is masked from 'package:graphics':
##
## plot.design
## The following object is masked from 'package:base':
##
## lengths
A<-c(1,-1,1,1,-1,-1,1,-1,1,1,1,1,-1,-1,-1,-1)
B<-c(-1,-1,-1,1,-1,1,1,1,1,1,-1,-1,1,-1,-1,1)
C<-c(1,-1,-1,1,1,1,-1,-1,-1,1,-1,1,-1,-1,1,1)
D<-c(-1,1,-1,-1,-1,-1,1,1,-1,1,1,1,-1,-1,1,1)
obs3<-c(34,18,39,26,11,24,21,24,37,33,21,33,24,17,19,26)
dat<-data.frame(A,B,C,D,obs3)
dat
## A B C D obs3
## 1 1 -1 1 -1 34
## 2 -1 -1 -1 1 18
## 3 1 -1 -1 -1 39
## 4 1 1 1 -1 26
## 5 -1 -1 1 -1 11
## 6 -1 1 1 -1 24
## 7 1 1 -1 1 21
## 8 -1 1 -1 1 24
## 9 1 1 -1 -1 37
## 10 1 1 1 1 33
## 11 1 -1 -1 1 21
## 12 1 -1 1 1 33
## 13 -1 1 -1 -1 24
## 14 -1 -1 -1 -1 17
## 15 -1 -1 1 1 19
## 16 -1 1 1 1 26
mod3<-lm(obs3~A*B*C*D,data = dat)
coef(mod3)
## (Intercept) A B C D A:B
## 25.4375 5.0625 1.4375 0.3125 -1.0625 -2.6875
## A:C B:C A:D B:D C:D A:B:C
## 0.6875 0.0625 -2.4375 0.1875 3.0625 -0.8125
## A:B:D A:C:D B:C:D A:B:C:D
## 1.0625 1.9375 0.0625 0.6875
halfnormal(mod3)
##
## Significant effects (alpha=0.05, Lenth method):
## [1] A C:D
From the halfnormal plots, we could see that the significant factors are A,C:D. This means that now the model is just run with only these effects and the other effects are clubbed into the error term.
dat$A<-as.fixed(dat$A)
dat$C<-as.fixed(dat$C)
dat$D<-as.fixed(dat$D)
mod3b<-aov(obs3~C*D+A,data=dat)
GAD::gad(mod3b)
## Analysis of Variance Table
##
## Response: obs3
## Df Sum Sq Mean Sq F value Pr(>F)
## C 1 1.56 1.56 0.0494 0.828247
## D 1 18.06 18.06 0.5706 0.465875
## A 1 410.06 410.06 12.9548 0.004175 **
## C:D 1 150.06 150.06 4.7408 0.052106 .
## Residual 11 348.19 31.65
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the ANOVA table we can see that the p value for Factor A = 0.004175 which is less than the alpha value=0.05. This shows that the factor is significant.