INTRODUCTION:

We have designed an experiment using a Statapult to find the significant factors that affect the distance in which the ball is thrown. The Statapult has three parameters i.e.

• Pin Elevation

• Bungee Position

• Release Angle

Parameters

There are four discrete settings for both of the Pin Elevation and Bungee Position, numbered from the bottom up. The Release Angle is a continuous variable from 90 to 180 degrees. There are additionally three types of balls that are used.

1 PART 1:

Perform a designed experiment to determine the effect of the type of ball on the distance in which the ball is thrown

Tyoes of Balls for the Experiment

Pin Elevation: Kept at Fourth Setting (Highest Setting)

Bungee Position: Kept at Fourth Setting (Highest Setting)

Release Angle: 90 Degrees

To test this hypothesis, we used a completely randomized design with an alpha around 0.05

1.1 Determining the Sample Size:

How many samples should be collected to detect a mean difference with a large effect (i.e. 90% of the standard deviation) and a pattern of maximum variability with a probability of 55%.

Since the value of K is 3 (Population Size) which is an odd number and using maximum variability, the formula of effect f would be:

\[ \frac{d*\sqrt{k^2-1}}{2k} \]

Therefore, using pwr.t.test to determine no of samples required:

alpha=0.05
power=0.55
d<-0.9
f1 = d*sqrt(3^2-1)/(2*3)
library(pwr)

pwr.anova.test(k = 3,n=NULL,f = f1, sig.level = alpha,power = power)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 3
##               n = 11.35348
##               f = 0.4242641
##       sig.level = 0.05
##           power = 0.55
## 
## NOTE: n is number in each group
--> The number of samples required per group is 12 , hence we need to collect a total of 36 observations since we have 3 different populations.

1.2 Randomized Run Order:

Propose a layout using the number of samples from part (a) with randomized run order:

library(agricolae)
design <- design.crd(trt = c("Golf", "Tennis", "Stone") ,r = 12,seed = 84544)
design$book
##    plots  r c("Golf", "Tennis", "Stone")
## 1    101  1                       Tennis
## 2    102  1                         Golf
## 3    103  2                       Tennis
## 4    104  2                         Golf
## 5    105  1                        Stone
## 6    106  3                       Tennis
## 7    107  3                         Golf
## 8    108  2                        Stone
## 9    109  4                         Golf
## 10   110  3                        Stone
## 11   111  4                       Tennis
## 12   112  4                        Stone
## 13   113  5                        Stone
## 14   114  6                        Stone
## 15   115  5                         Golf
## 16   116  6                         Golf
## 17   117  7                         Golf
## 18   118  7                        Stone
## 19   119  8                        Stone
## 20   120  5                       Tennis
## 21   121  8                         Golf
## 22   122  9                        Stone
## 23   123  9                         Golf
## 24   124 10                        Stone
## 25   125 10                         Golf
## 26   126  6                       Tennis
## 27   127 11                         Golf
## 28   128  7                       Tennis
## 29   129  8                       Tennis
## 30   130  9                       Tennis
## 31   131 10                       Tennis
## 32   132 11                        Stone
## 33   133 11                       Tennis
## 34   134 12                         Golf
## 35   135 12                       Tennis
## 36   136 12                        Stone

1.3 Data Collection:

Collect data and record observations on layout proposed in part b:

library("readxl")
data <- read_excel("D:\\00. Classes\\1. Fall 2022\\2. 5342 - Statistics & QA - [Design of Experiments]\\PROJ\\Part 1.xlsx")
data <- as.data.frame(data)
str(data)
## 'data.frame':    36 obs. of  4 variables:
##  $ Plots   : num  101 102 103 104 105 106 107 108 109 110 ...
##  $ r       : num  1 1 2 2 1 3 3 2 4 3 ...
##  $ trt     : chr  "Tennis" "Golf" "Tennis" "Golf" ...
##  $ Distance: num  65 69 70 83 45 69 51 49 56 35 ...
data
##    Plots  r    trt Distance
## 1    101  1 Tennis       65
## 2    102  1   Golf       69
## 3    103  2 Tennis       70
## 4    104  2   Golf       83
## 5    105  1  Stone       45
## 6    106  3 Tennis       69
## 7    107  3   Golf       51
## 8    108  2  Stone       49
## 9    109  4   Golf       56
## 10   110  3  Stone       35
## 11   111  4 Tennis       48
## 12   112  4  Stone       47
## 13   113  5  Stone       67
## 14   114  6  Stone       48
## 15   115  5   Golf       85
## 16   116  6   Golf       50
## 17   117  7   Golf       80
## 18   118  7  Stone       61
## 19   119  8  Stone       48
## 20   120  5 Tennis       45
## 21   121  8   Golf       61
## 22   122  9  Stone       40
## 23   123  9   Golf       48
## 24   124 10  Stone       55
## 25   125 10   Golf       51
## 26   126  6 Tennis       42
## 27   127 11   Golf       52
## 28   128  7 Tennis       50
## 29   129  8 Tennis       45
## 30   130  9 Tennis       42
## 31   131 10 Tennis       49
## 32   132 11  Stone       41
## 33   133 11 Tennis       53
## 34   134 12   Golf       78
## 35   135 12 Tennis       48
## 36   136 12  Stone       51

1.4 Hypothesis Testing:

Perform hypothesis test and check residuals.  Be sure to comment and take corrective action if necessary:

Hypothesis

Null: \[H_O: \mu_{1}=\mu_{2}=\mu_{3}=\mu\]

Alternate: \[H_a:Atleast \space one\space \mu_{i}\space differs\]

where,

\(\mu_{1}\)= Mean of Tennis Ball

\(\mu_{2}\)= Mean of Golf Ball

\(\mu_{3}\)= Mean of Stone

data$trt<-as.factor(data$trt)
model1 <- aov(data$Distance~data$trt,data = data)
summary(model1)
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## data$trt     2   1441   720.7   5.556 0.00833 **
## Residuals   33   4281   129.7                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
--> The P value (0.00833) is smaller than the 0.05.Hence we reject Null Hypothesis, claiming that at least one of the mean differs Anova Model Adequacy
plot(model1,col="blue")

Conclusion:

--> The residual plots are of roughly the same width, implying that the variance is nearly constant between the three ball types. Also,from Normal Probability Plot the samples follow a straight line indicating normal distribution. Hence no need of corrective action

1.5 Pairwise Comparisons:

If the null hypothesis is rejected, investigate pairwise comparisons:

TukeyHSD(model1)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = data$Distance ~ data$trt, data = data)
## 
## $`data$trt`
##                diff        lwr         upr     p adj
## Stone-Golf   -14.75 -26.160138 -3.33986215 0.0089206
## Tennis-Golf  -11.50 -22.910138 -0.08986215 0.0479000
## Tennis-Stone   3.25  -8.160138 14.66013785 0.7657771
plot(TukeyHSD(model1))

Conclusion:

--> From TukeysHSD plot, we can claim that means for the pair of-Tennis and Stone are similar because zero lies in the 95% confidence interval range. The mean value of Tennis differs from Golf and similarly mean for Stone and Golf pair also differ significantly because 0 is not in the 95% confidence interval range.

1.6 Findings and Recommendations:

--> As per the p-values obtained while runnung the ANOVA Hypothesis,it implied that ball types does have an effect on distance travelled. --> The Pair Wise comparison Test also indicates the difference in mean levels of treatment i.e. Pairwise means for Tennis and Golf as well as pair wise mean of Stone and Golf differs significantly

2 PART 2: