We have designed an experiment using a Statapult to find the significant factors that affect the distance in which the ball is thrown. The Statapult has three parameters i.e.
• Pin Elevation
• Bungee Position
• Release Angle
Parameters
There are four discrete settings for both of the Pin Elevation and Bungee Position, numbered from the bottom up. The Release Angle is a continuous variable from 90 to 180 degrees. There are additionally three types of balls that are used.
Tyoes of Balls for the Experiment
Pin Elevation: Kept at Fourth Setting (Highest Setting)
Bungee Position: Kept at Fourth Setting (Highest Setting)
Release Angle: 90 Degrees
To test this hypothesis, we used a completely randomized design with an alpha around 0.05
How many samples should be collected to detect a mean difference with a large effect (i.e. 90% of the standard deviation) and a pattern of maximum variability with a probability of 55%.
We used pwr.anova.test, where k, is the number of populations, and f is the effect to find the necessary number of samples to conduct the experiment.
Since the value of K is 3 (Population Size) which is an odd number and using maximum variability, the formula of effect f would be:
\[ \frac{d*\sqrt{k^2-1}}{2k} \]
Therefore, using pwr.t.test to determine no of samples required:
alpha=0.05
power=0.55
d<-0.9
f1 = d*sqrt(3^2-1)/(2*3)
library(pwr)
pwr.anova.test(k = 3,n=NULL,f = f1, sig.level = alpha,power = power)
##
## Balanced one-way analysis of variance power calculation
##
## k = 3
## n = 11.35348
## f = 0.4242641
## sig.level = 0.05
## power = 0.55
##
## NOTE: n is number in each group
--> The number of samples required per group is 12 , hence we need to collect a total of 36 observations since we have 3 different populations.
Propose a layout using the number of samples from part (a) with randomized run order:
--> A completely randomized design was constructed using design.crd and the resulting order of balls was obtained:library(agricolae)
design <- design.crd(trt = c("Golf", "Tennis", "Stone") ,r = 12,seed = 84544)
design$book
## plots r c("Golf", "Tennis", "Stone")
## 1 101 1 Tennis
## 2 102 1 Golf
## 3 103 2 Tennis
## 4 104 2 Golf
## 5 105 1 Stone
## 6 106 3 Tennis
## 7 107 3 Golf
## 8 108 2 Stone
## 9 109 4 Golf
## 10 110 3 Stone
## 11 111 4 Tennis
## 12 112 4 Stone
## 13 113 5 Stone
## 14 114 6 Stone
## 15 115 5 Golf
## 16 116 6 Golf
## 17 117 7 Golf
## 18 118 7 Stone
## 19 119 8 Stone
## 20 120 5 Tennis
## 21 121 8 Golf
## 22 122 9 Stone
## 23 123 9 Golf
## 24 124 10 Stone
## 25 125 10 Golf
## 26 126 6 Tennis
## 27 127 11 Golf
## 28 128 7 Tennis
## 29 129 8 Tennis
## 30 130 9 Tennis
## 31 131 10 Tennis
## 32 132 11 Stone
## 33 133 11 Tennis
## 34 134 12 Golf
## 35 135 12 Tennis
## 36 136 12 Stone
Collect data and record observations on layout proposed in part b:
--> After we retrieved the order we collected the datalibrary("readxl")
data <- read_excel("D:\\00. Classes\\1. Fall 2022\\2. 5342 - Statistics & QA - [Design of Experiments]\\PROJ\\Part 1.xlsx")
data <- as.data.frame(data)
str(data)
## 'data.frame': 36 obs. of 4 variables:
## $ Plots : num 101 102 103 104 105 106 107 108 109 110 ...
## $ r : num 1 1 2 2 1 3 3 2 4 3 ...
## $ trt : chr "Tennis" "Golf" "Tennis" "Golf" ...
## $ Distance: num 65 69 70 83 45 69 51 49 56 35 ...
data
## Plots r trt Distance
## 1 101 1 Tennis 65
## 2 102 1 Golf 69
## 3 103 2 Tennis 70
## 4 104 2 Golf 83
## 5 105 1 Stone 45
## 6 106 3 Tennis 69
## 7 107 3 Golf 51
## 8 108 2 Stone 49
## 9 109 4 Golf 56
## 10 110 3 Stone 35
## 11 111 4 Tennis 48
## 12 112 4 Stone 47
## 13 113 5 Stone 67
## 14 114 6 Stone 48
## 15 115 5 Golf 85
## 16 116 6 Golf 50
## 17 117 7 Golf 80
## 18 118 7 Stone 61
## 19 119 8 Stone 48
## 20 120 5 Tennis 45
## 21 121 8 Golf 61
## 22 122 9 Stone 40
## 23 123 9 Golf 48
## 24 124 10 Stone 55
## 25 125 10 Golf 51
## 26 126 6 Tennis 42
## 27 127 11 Golf 52
## 28 128 7 Tennis 50
## 29 129 8 Tennis 45
## 30 130 9 Tennis 42
## 31 131 10 Tennis 49
## 32 132 11 Stone 41
## 33 133 11 Tennis 53
## 34 134 12 Golf 78
## 35 135 12 Tennis 48
## 36 136 12 Stone 51
Perform hypothesis test and check residuals. Be sure to comment and take corrective action if necessary:
HypothesisNull: \[H_O: \mu_{1}=\mu_{2}=\mu_{3}=\mu\]
Alternate: \[H_a:Atleast \space one\space \mu_{i}\space differs\]
where,
\(\mu_{1}\)= Mean of Tennis Ball
\(\mu_{2}\)= Mean of Golf Ball
\(\mu_{3}\)= Mean of Stone
data$trt<-as.factor(data$trt)
model1 <- aov(data$Distance~data$trt,data = data)
summary(model1)
## Df Sum Sq Mean Sq F value Pr(>F)
## data$trt 2 1441 720.7 5.556 0.00833 **
## Residuals 33 4281 129.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
--> The P value (0.00833) is smaller than the 0.05.Hence we reject Null Hypothesis, claiming that at least one of the mean differs
Anova Model Adequacy
library(ggplot2)
library(ggfortify)
autoplot(model1,col="blue")
Conclusion:
--> The residual plots are of roughly the same width, implying that the variance is nearly constant between the three ball types. Also,from Normal Probability Plot the samples follow a straight line indicating normal distribution. Hence no need of corrective actionIf the null hypothesis is rejected, investigate pairwise comparisons:
TukeyHSD(model1)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = data$Distance ~ data$trt, data = data)
##
## $`data$trt`
## diff lwr upr p adj
## Stone-Golf -14.75 -26.160138 -3.33986215 0.0089206
## Tennis-Golf -11.50 -22.910138 -0.08986215 0.0479000
## Tennis-Stone 3.25 -8.160138 14.66013785 0.7657771
plot(TukeyHSD(model1))
Conclusion:
--> From TukeysHSD plot, we can claim that means for the pair of-Tennis and Stone are similar because zero lies in the 95% confidence interval range. The mean value of Tennis differs from Golf and similarly mean for Stone and Golf pair also differ significantly because 0 is not in the 95% confidence interval range.&
--> The Pair Wise comparison Test also indicates the difference in mean levels of treatment i.e. Pairwise means for Tennis and Golf as well as pair wise mean of Stone and Golf differs significantlya) State model equation with the null and alternative hypotheses to be tested. In addition, state the level of significance that will be used in your analysis.
Since Settings one and three of Pin Elevation should be investigated as a fixed effect, and Pin setting as a random effect, it corresponds to Mixed Effects Model.
Type of Ball (Tennis) and Bungee Position (Second Position) are kept constant for this experiment.
Model Equation\[ Y_ijk = \mu+\alpha_{i}+\beta_{j}+(\alpha\beta)_{ij}+\epsilon_{ijk} \]
where,
\(\alpha_{i}\) = The effect of pin elevation for i=1(Setting 1) and i=2(Setting 2)
\(\beta_{j}\) = The effect of release angle for j=1(90 deg), j=2(110 deg) and j=3(120 deg)
Hypothesis:Interaction Effect:
Null:\[H_o:(\alpha\beta)_{ij}=0\space\space\forall\space\space"i,j"\]
Alternate:
\[H_a:(\alpha\beta)_{i,j}\neq0\space\space\exists\space"i,j"\]
Main Effects:
Null:\[H_o:\alpha_{i}=0\space\space\forall\space"i"\]
\[ H_O:\beta_j=0\space\space\forall\space"j" \]
Alternate:
\[ H_a:\alpha_i\neq0\space\space\exists\space"i" \]
\[ H_a:\beta_j\neq0 \space\space\exists\space"j" \]
--> We will use level of significance as α = 0.05b) Propose a layout with a randomized run order
--> A randomized design was constructed using design.ab and the resulting order of balls was obtained: The number of replicates are K=3trts <- c(2,3)
seedNum <- 1234567
experiment <- design.ab(trt = trts, r=3,design="crd",seed = seedNum)
experiment$book
## plots r A B
## 1 101 1 1 2
## 2 102 1 1 3
## 3 103 1 2 3
## 4 104 1 2 2
## 5 105 2 1 3
## 6 106 2 2 3
## 7 107 1 2 1
## 8 108 2 2 1
## 9 109 1 1 1
## 10 110 2 1 2
## 11 111 3 2 1
## 12 112 2 1 1
## 13 113 3 2 3
## 14 114 2 2 2
## 15 115 3 1 2
## 16 116 3 2 2
## 17 117 3 1 1
## 18 118 3 1 3
Collect data and record observations on layout proposed in part b:
--> After we retrieved the order we collected the datalibrary("readxl")
data <- read_excel("D:\\00. Classes\\1. Fall 2022\\2. 5342 - Statistics & QA - [Design of Experiments]\\PROJ\\Part2.xlsx")
data <- as.data.frame(data)
str(data)
## 'data.frame': 18 obs. of 3 variables:
## $ A : chr "Setting 1" "Setting 1" "Setting 3" "Setting 3" ...
## $ B : chr "110 Degrees" "120 Degrees" "120 Degrees" "110 Degrees" ...
## $ Distance: num 37 31 49 50 28 48 44 48 28 35 ...
data
## A B Distance
## 1 Setting 1 110 Degrees 37
## 2 Setting 1 120 Degrees 31
## 3 Setting 3 120 Degrees 49
## 4 Setting 3 110 Degrees 50
## 5 Setting 1 120 Degrees 28
## 6 Setting 3 120 Degrees 48
## 7 Setting 3 90 Degrees 44
## 8 Setting 3 90 Degrees 48
## 9 Setting 1 90 Degrees 28
## 10 Setting 1 110 Degrees 35
## 11 Setting 3 90 Degrees 48
## 12 Setting 1 90 Degrees 31
## 13 Setting 3 120 Degrees 51
## 14 Setting 3 110 Degrees 54
## 15 Setting 1 110 Degrees 31
## 16 Setting 3 110 Degrees 48
## 17 Setting 1 90 Degrees 28
## 18 Setting 1 120 Degrees 24
a) Test the hypotheses and state conclusions, determining those effects that are significant. Show any plots that might be useful/necessary to show your findings. You may also show residual plots and make appropriate comments, but do not transform the data (i.e. use the raw data regardless of normality and variance constancy)
Converting Factor A into Fixed Effect and Factor B into Random Effect before running the Model:
library(GAD)
data$A<-as.fixed(data$A)
data$B<-as.random(data$B)
str(data)
## 'data.frame': 18 obs. of 3 variables:
## $ A : Factor w/ 2 levels "Setting 1","Setting 3": 1 1 2 2 1 2 2 2 1 1 ...
## $ B : Factor w/ 3 levels "110 Degrees",..: 1 2 2 1 2 2 3 3 3 1 ...
## $ Distance: num 37 31 49 50 28 48 44 48 28 35 ...
Running the Model:
model <- aov(data$Distance~data$A+data$B+data$A*data$B,data = data)
GAD::gad(model)
## Analysis of Variance Table
##
## Response: data$Distance
## Df Sum Sq Mean Sq F value Pr(>F)
## data$A 1 1549.39 1549.39 134.082 0.007376 **
## data$B 2 76.44 38.22 5.504 0.020129 *
## data$A:data$B 2 23.11 11.56 1.664 0.230237
## Residual 12 83.33 6.94
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion:
--> From above test, the P value for interaction effect is 0.230 and we fail to reject the null hypothesis for interaction effects, stating that there is no interaction between the two factors (pin elevation and release angle).Whereas,
--> From above test, the P value for main effect (Pin Elevation) is 0.0073 and (Pin Setting) is 0.0201 which is less than 0.05 significant value and therefore we reject the null hypothesis for main effect.Checking Residual Plots:
library(ggplot2)
library(ggfortify)
autoplot(model,col="blue")
Comment on Model Adequacy:
--> We can see that the data looks fairly normally distributed. Also, the residual vs fitted plot shows same width or square shape which implies that the variances are constant and so the model is adequate.The low and high level of the factors being as follows:
trts <- c(2,2,2,2)
design <- design.ab(trt = trts ,r =1 , design = "crd" , seed = 1234)
design$book
## plots r A B C D
## 1 101 1 1 2 2 1
## 2 102 1 2 2 2 1
## 3 103 1 2 2 2 2
## 4 104 1 1 1 2 1
## 5 105 1 2 1 1 2
## 6 106 1 2 1 2 1
## 7 107 1 1 2 2 2
## 8 108 1 1 1 2 2
## 9 109 1 2 2 1 2
## 10 110 1 1 2 1 2
## 11 111 1 1 2 1 1
## 12 112 1 2 2 1 1
## 13 113 1 2 1 2 2
## 14 114 1 1 1 1 1
## 15 115 1 2 1 1 1
## 16 116 1 1 1 1 2
b) Collect data and record observations
--> After we retrieved the order we collected the data as follows:library("readxl")
data <- read_excel("D:\\00. Classes\\1. Fall 2022\\2. 5342 - Statistics & QA - [Design of Experiments]\\PROJ\\Part3.xlsx")
data <- as.data.frame(data)
data
## plots r A B C D Distance
## 1 101 1 -1 1 1 1 17
## 2 102 1 1 1 1 1 31
## 3 103 1 1 1 1 -1 33
## 4 104 1 -1 -1 1 1 10
## 5 105 1 1 -1 -1 -1 35
## 6 106 1 1 -1 1 1 34
## 7 107 1 -1 1 1 -1 23
## 8 108 1 -1 -1 1 -1 15
## 9 109 1 1 1 -1 -1 42
## 10 110 1 -1 1 -1 -1 28
## 11 111 1 -1 1 -1 1 24
## 12 112 1 1 1 -1 1 37
## 13 113 1 1 -1 1 -1 39
## 14 114 1 -1 -1 -1 1 16
## 15 115 1 1 -1 -1 1 33
## 16 116 1 -1 -1 -1 -1 18
\[ Y_{ijkl}=\mu+\alpha_i+\beta_j+(\alpha\beta)_ij+\gamma_k+(\alpha\gamma)_ik+(\beta\gamma)_jk+(\alpha\beta\gamma)_ijk+\delta_l+(\alpha\delta)_il+(\beta\delta)_jl+(\gamma\delta)_kl+(\alpha\beta\delta)_ijl+(\alpha\gamma\delta)_ikl+(\beta\gamma\delta)_jkl+(\alpha\beta\gamma\delta)_ijkl+\epsilon_ijkl \]
where,
\(\alpha_i\)= Factor A (Pin Elevation)
\(\beta_j\)= Factor B (Bungee Position)
\(\gamma_k\)= Factor C (Release Angle)
\(\delta_l\)= Factor D (Ball Type)
Significant Factors/Interactionslibrary(DoE.base)
model<-lm(Distance~A*B*C*D,data = data)
halfnormal(model)
coef(model)
## (Intercept) A B C D A:B
## 27.1875 8.3125 2.1875 -1.9375 -1.9375 -1.9375
## A:C B:C A:D B:D C:D A:B:C
## 0.6875 -1.4375 0.1875 -0.1875 -0.3125 -1.0625
## A:B:D A:C:D B:C:D A:B:C:D
## 0.1875 0.3125 0.4375 0.3125
--> From the half normal plots, we can see that the significant factors are A,B,A:B,D & C. This means that all main effects i.e. Pin elevation, Release angle, Bungee Pos and Ball Type are significant and One interaction term between Pin elevation and Bungee Position are significant.
ANOVA is run by keeping the significant terms only and clubbing other insignificant factors/interactions into error term as analyzed from Part 3 above.
Hypothesis:Main Effects:
Null: \(H_O\)
\[ \alpha_i=0\space\forall\space"i" \]
\[ \beta_j=0\space\forall"j" \]
\[ \gamma_k=0\space\forall"k" \]
\[ \delta_l=0\space\forall\space"l" \]
Alternate: \(H_a\)
\[ \alpha_i\neq0\space\exists\space"i" \]
\[ \beta_j\neq0\space\exists"j" \]
\[ \gamma_k\neq0\space\exists"k" \]
\[ \delta_l\neq0\space\exists\space"l" \]
Interaction Effect:
Null: \(H_o\)
\[ (\alpha\beta)_{ij}=0\space\space\forall\space"i,j" \]
Alternate: \(H_a\)
\[ (\alpha\beta)_{ij}\neq0\space\space\exists\space"i,j" \]
Running the Model:
A<-c(-1,1,1,-1,1,1,-1,-1,1,-1,-1,1,1,-1,1,-1)
B<-c(1,1,1,-1,-1,-1,1,-1,1,1,1,1,-1,-1,-1,-1)
C<-c(1,1,1,1,-1,1,1,1,-1,-1,-1,-1,1,-1,-1,-1)
D<-c(1,1,-1,1,-1,1,-1,-1,-1,-1,1,1,-1,1,1,-1)
Distance<-c(17,31,33,10,35,34,23,15,42,28,24,37,39,16,33,18)
A<-as.factor(A)
B<-as.factor(B)
C<-as.factor(C)
D<-as.factor(D)
model1<-aov(Distance~A+B+C+D+A*B,data = data)
summary(model1)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 1105.6 1105.6 162.284 1.66e-07 ***
## B 1 76.6 76.6 11.239 0.00734 **
## C 1 60.1 60.1 8.817 0.01406 *
## D 1 60.1 60.1 8.817 0.01406 *
## A:B 1 60.1 60.1 8.817 0.01406 *
## Residuals 10 68.1 6.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion:
--> After performing ANOVA analysis, we observe that the p-value for factor A is 1.66e-7, p-value for Factor B is 0.00734, p-value for interaction AB, Factor C & D is 0.01406. These results shows that these factors & interaction are significant since these values are less than the α value of 0.05 t/f we reject the null hypothesis and claim that they have a significant effect on the model.Also,
Final Model Equation:\[ Y_ijkl= \mu+\alpha_i+\beta_j+(\alpha\beta)_{ij}+\gamma_k+\delta_l+\epsilon_ijkl \]
\(\alpha_i\)= Factor A (Pin Elevation)
\(\beta_j\)= Factor B (Bungee Position)
\(\gamma_k\)= Factor C (Release Angle)
\(\delta_l\)= Factor D (Ball Type)
\((\alpha\beta)_ij\)= Interaction AB (Pin Elevation & Bungee Pos)
\(\epsilon_ijkl\)= Standard Error Term
getwd()
#PARTA:
alpha=0.05
power1=0.55
d<-0.9
f1 = d*sqrt(3^2-1)/(2*3)
library(pwr)
?pwr.anova.test
pwr.anova.test(k = 3,n=NULL,f = f1, sig.level = alpha,power = power1)
#here n=12
#PARTB:
#Now determining the order in which data will be collected for CRD:
library(agricolae)
?design.crd
design <- design.crd(trt = c("Golf", "Tennis", "Stone") ,r = 12,seed = 84544)
design$book
#Part1C:
dat<-design$book
str(dat)
library("readxl")
data <- read_excel("D:\\00. Classes\\1. Fall 2022\\2. 5342 - Statistics & QA - [Design of Experiments]\\PROJ\\Part 1.xlsx")
data <- as.data.frame(data)
str(data)
data
#Running the Model:
data$trt<-as.factor(data$trt)
model1 <- aov(data$Distance~data$trt,data = data)
summary(model1)
library(ggplot2)
library(ggfortify)
autoplot(model1,col="blue")
#Tukey Test:
TukeyHSD(model1)
plot(TukeyHSD(model1))
#PART 2B:
library(agricolae)
trts <- c(2,3)
seedNum <- 1234567
experiment <- design.ab(trt = trts, r=3,design="crd",seed = seedNum)
experiment$book
library(tinytex)
#Collection of Data/Reading Data:
library("readxl")
data <- read_excel("D:\\00. Classes\\1. Fall 2022\\2. 5342 - Statistics & QA - [Design of Experiments]\\PROJ\\Part2.xlsx")
data <- as.data.frame(data)
str(data)
data
#Running the Model:
library(GAD)
data$A<-as.fixed(data$A)
data$B<-as.random(data$B)
str(data)
model <- aov(data$Distance~data$A+data$B+data$A*data$B,data = data)
GAD::gad(model)
#Plotting Residuals
library(ggplot2)
library(ggfortify)
autoplot(model,col="blue")
#PART 3:
trts <- c(2,2,2,2)
design <- design.ab(trt = trts ,r =1 , design = "crd" , seed = 1234)
design$book
#Reading the Data/Data Collection:
library("readxl")
data <- read_excel("D:\\00. Classes\\1. Fall 2022\\2. 5342 - Statistics & QA - [Design of Experiments]\\PROJ\\Part3.xlsx")
data <- as.data.frame(data)
data
#Finding Significant Interactions/Factors:
library(DoE.base)
model<-lm(Distance~A*B*C*D,data = data)
halfnormal(model)
coef(model)
#Performing Model:
A<-c(-1,1,1,-1,1,1,-1,-1,1,-1,-1,1,1,-1,1,-1)
B<-c(1,1,1,-1,-1,-1,1,-1,1,1,1,1,-1,-1,-1,-1)
C<-c(1,1,1,1,-1,1,1,1,-1,-1,-1,-1,1,-1,-1,-1)
D<-c(1,1,-1,1,-1,1,-1,-1,-1,-1,1,1,-1,1,1,-1)
Distance<-c(17,31,33,10,35,34,23,15,42,28,24,37,39,16,33,18)
A<-as.factor(A)
B<-as.factor(B)
C<-as.factor(C)
D<-as.factor(D)
model1<-aov(Distance~A+B+C+D+A*B,data = data)
summary(model1)
pwr
agricolae
ggfortify
ggplot
GAD
DoE.base