We have designed an experiment using a Statapult to find the significant factors that affect the distance in which the ball is thrown. The Statapult has three parameters i.e.
• Pin Elevation
• Bungee Position
• Release Angle
Parameters
There are four discrete settings for both of the Pin Elevation and Bungee Position, numbered from the bottom up. The Release Angle is a continuous variable from 90 to 180 degrees. There are additionally three types of balls that are used.
Tyoes of Balls for the Experiment
Pin Elevation: Kept at Fourth Setting (Highest Setting)
Bungee Position: Kept at Fourth Setting (Highest Setting)
Release Angle: 90 Degrees
To test this hypothesis, we used a completely randomized design with an alpha around 0.05
How many samples should be collected to detect a mean difference with a large effect (i.e. 90% of the standard deviation) and a pattern of maximum variability with a probability of 55%.
Since the value of K is 3 (Population Size) which is an odd number and using maximum variability, the formula of effect f would be:
\[ \frac{d*\sqrt{k^2-1}}{2k} \]
Therefore, using pwr.t.test to determine no of samples required:
alpha=0.05
power=0.55
d<-0.9
f1 = d*sqrt(3^2-1)/(2*3)
library(pwr)
pwr.anova.test(k = 3,n=NULL,f = f1, sig.level = alpha,power = power)
##
## Balanced one-way analysis of variance power calculation
##
## k = 3
## n = 11.35348
## f = 0.4242641
## sig.level = 0.05
## power = 0.55
##
## NOTE: n is number in each group
--> The number of samples required per group is 12 , hence we need to collect a total of 36 observations since we have 3 different populations.
Propose a layout using the number of samples from part (a) with randomized run order:
library(agricolae)
design <- design.crd(trt = c("Golf", "Tennis", "Stone") ,r = 12,seed = 84544)
design$book
## plots r c("Golf", "Tennis", "Stone")
## 1 101 1 Tennis
## 2 102 1 Golf
## 3 103 2 Tennis
## 4 104 2 Golf
## 5 105 1 Stone
## 6 106 3 Tennis
## 7 107 3 Golf
## 8 108 2 Stone
## 9 109 4 Golf
## 10 110 3 Stone
## 11 111 4 Tennis
## 12 112 4 Stone
## 13 113 5 Stone
## 14 114 6 Stone
## 15 115 5 Golf
## 16 116 6 Golf
## 17 117 7 Golf
## 18 118 7 Stone
## 19 119 8 Stone
## 20 120 5 Tennis
## 21 121 8 Golf
## 22 122 9 Stone
## 23 123 9 Golf
## 24 124 10 Stone
## 25 125 10 Golf
## 26 126 6 Tennis
## 27 127 11 Golf
## 28 128 7 Tennis
## 29 129 8 Tennis
## 30 130 9 Tennis
## 31 131 10 Tennis
## 32 132 11 Stone
## 33 133 11 Tennis
## 34 134 12 Golf
## 35 135 12 Tennis
## 36 136 12 Stone
Collect data and record observations on layout proposed in part b:
library("readxl")
data <- read_excel("D:\\00. Classes\\1. Fall 2022\\2. 5342 - Statistics & QA - [Design of Experiments]\\PROJ\\Part 1.xlsx")
data <- as.data.frame(data)
str(data)
## 'data.frame': 36 obs. of 4 variables:
## $ Plots : num 101 102 103 104 105 106 107 108 109 110 ...
## $ r : num 1 1 2 2 1 3 3 2 4 3 ...
## $ trt : chr "Tennis" "Golf" "Tennis" "Golf" ...
## $ Distance: num 65 69 70 83 45 69 51 49 56 35 ...
data
## Plots r trt Distance
## 1 101 1 Tennis 65
## 2 102 1 Golf 69
## 3 103 2 Tennis 70
## 4 104 2 Golf 83
## 5 105 1 Stone 45
## 6 106 3 Tennis 69
## 7 107 3 Golf 51
## 8 108 2 Stone 49
## 9 109 4 Golf 56
## 10 110 3 Stone 35
## 11 111 4 Tennis 48
## 12 112 4 Stone 47
## 13 113 5 Stone 67
## 14 114 6 Stone 48
## 15 115 5 Golf 85
## 16 116 6 Golf 50
## 17 117 7 Golf 80
## 18 118 7 Stone 61
## 19 119 8 Stone 48
## 20 120 5 Tennis 45
## 21 121 8 Golf 61
## 22 122 9 Stone 40
## 23 123 9 Golf 48
## 24 124 10 Stone 55
## 25 125 10 Golf 51
## 26 126 6 Tennis 42
## 27 127 11 Golf 52
## 28 128 7 Tennis 50
## 29 129 8 Tennis 45
## 30 130 9 Tennis 42
## 31 131 10 Tennis 49
## 32 132 11 Stone 41
## 33 133 11 Tennis 53
## 34 134 12 Golf 78
## 35 135 12 Tennis 48
## 36 136 12 Stone 51
Perform hypothesis test and check residuals. Be sure to comment and take corrective action if necessary:
HypothesisNull: \[H_O: \mu_{1}=\mu_{2}=\mu_{3}=\mu\]
Alternate: \[H_a:Atleast \space one\space \mu_{i}\space differs\]
where,
\(\mu_{1}\)= Mean of Tennis Ball
\(\mu_{2}\)= Mean of Golf Ball
\(\mu_{3}\)= Mean of Stone
data$trt<-as.factor(data$trt)
model1 <- aov(data$Distance~data$trt,data = data)
summary(model1)
## Df Sum Sq Mean Sq F value Pr(>F)
## data$trt 2 1441 720.7 5.556 0.00833 **
## Residuals 33 4281 129.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
--> The P value (0.00833) is smaller than the 0.05.Hence we reject Null Hypothesis, claiming that at least one of the mean differs
Anova Model Adequacy
plot(model1,col="blue")
Conclusion:
--> The residual plots are of roughly the same width, implying that the variance is nearly constant between the three ball types. Also,from Normal Probability Plot the samples follow a straight line indicating normal distribution. Hence no need of corrective actionIf the null hypothesis is rejected, investigate pairwise comparisons:
TukeyHSD(model1)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = data$Distance ~ data$trt, data = data)
##
## $`data$trt`
## diff lwr upr p adj
## Stone-Golf -14.75 -26.160138 -3.33986215 0.0089206
## Tennis-Golf -11.50 -22.910138 -0.08986215 0.0479000
## Tennis-Stone 3.25 -8.160138 14.66013785 0.7657771
plot(TukeyHSD(model1))
Conclusion:
--> From TukeysHSD plot, we can claim that means for the pair of-Tennis and Stone are similar because zero lies in the 95% confidence interval range. The mean value of Tennis differs from Golf and similarly mean for Stone and Golf pair also differ significantly because 0 is not in the 95% confidence interval range.