The experiment was performed using a Statapult, with different type of balls. The Statapult has three factors that could be consider as variables they are nothing but the Pin Elevation, the Bungee Position, and the Release Angle with the Type of Balls being the other factor could be consider as variable.

The Project was performed in three parts.

Part 01: By varying the Type of Balls, and keeping the Pin Elevation and Bungee Position fixed, and Release Angle stretched to \(90^o\).

Part 02: By considering the Pin Elevation and Release Angle as variables, with a red ball when the Bungee Position is fixed.

Part 03: By considering all factors from the Statapult as variables, with different types of balls.

After determining the necessary sample size, a random data collecting plan is constructed. The gathered data is processed to separate the significant aspects from the insignificant ones, providing us with an assessment of the factors influencing the distance to which the ball is thrown. The ultimate conclusion about the key elements is reached and presented.

Part - 01

Perform a designed experiment to determine the effect of the type of ball on the distance in which the ball is thrown.

The Pin Elevation and Bungee Position should both be at their fourth setting, i.e., highest setting. The Release Angle should be at \(90^o\), with the arm pulled fully back before releasing. To test this hypothesis, we wish to use a completely randomized design with an \(\alpha\) around 0.05.

a) Determine how many samples should be collected to detect a mean difference with a large effect (i.e. 90% of the standard deviation) and a pattern of maximum variability with a probability of 55%.

The sample size determination involves various Arguments:

k: Number of replications of each type of ball.

n: Number of Observations of each type of ball.

f: Effect size (which depends on the variability of the design).

sig.level: The level of significant error we accept.

power: The probability of rejecting the Null hypothesis, if the means differences exceeds the effect.

For the given parameters.

\(d = \frac{differenc \space in \space mean}{sigma}\)

\(Effect Size (f) = \frac{d\sqrt{k^2 - 1}}{2k}\)

Sample size determination

library(pwr)
pwr.anova.test(k = 3, n = NULL, f = ((0.9*sqrt((3^2)-1))/(2*3)), sig.level = 0.05, power = 0.55)

## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 3
##               n = 11.35348
##               f = 0.4242641
##       sig.level = 0.05
##           power = 0.55
## 
## NOTE: n is number in each group

plot(pwr.anova.test(k = 3,n=NULL,f=((0.9*sqrt((3^2)-1))/(2*3)),sig.level=0.05,power=0.55))

The power test to detect a mean difference with a medium effect, maximum variability and the significance level of 0.05, Gives us the number of samples to collect n = 12 of the each type of ball.

b) Proposing a layout using the number of samples with randomized run order.

The layout generation depends on the type of the design we require, the two major design generators for the Complete Random Design (CRD), Complete Random Block Design (CRBD). Choosing the particular design depends the source of nuisance, if we have any external different sources of nuisance that we need to be considered we choose crbd, but in our experiment we don’t have any source of nuisance that effects our results, hence we choose CRD to generate the layout.

Arguments of the layout generation.

Treatment (trt): Number of factors

Replications (r): The number of replications of each type of ball

Seed: Random run code number

Proposed Layout

balls<-c("RED","YELLOW","STONE")
library(agricolae)
design<-design.crd(trt = balls, r = 12, seed = 2534356)
design$book

##    plots  r  balls
## 1    101  1  STONE
## 2    102  1 YELLOW
## 3    103  1    RED
## 4    104  2 YELLOW
## 5    105  3 YELLOW
## 6    106  2    RED
## 7    107  2  STONE
## 8    108  3    RED
## 9    109  3  STONE
## 10   110  4 YELLOW
## 11   111  5 YELLOW
## 12   112  4    RED
## 13   113  5    RED
## 14   114  6    RED
## 15   115  4  STONE
## 16   116  7    RED
## 17   117  5  STONE
## 18   118  8    RED
## 19   119  6  STONE
## 20   120  7  STONE
## 21   121  6 YELLOW
## 22   122  7 YELLOW
## 23   123  8  STONE
## 24   124  9    RED
## 25   125  9  STONE
## 26   126  8 YELLOW
## 27   127  9 YELLOW
## 28   128 10  STONE
## 29   129 11  STONE
## 30   130 10    RED
## 31   131 11    RED
## 32   132 12    RED
## 33   133 10 YELLOW
## 34   134 11 YELLOW
## 35   135 12 YELLOW
## 36   136 12  STONE

Collecting data and record observations on proposed layout in b).

The data is collected according to the pattern of random generated layout in b), following the required conditions of the Pin Elevation and Bungee Position fixed and the Release Angle at \(90^o\).

Drawing the collected data and Formation of data frames.

library(readxl)
dat<-read_excel("C:/Users/Saipa/OneDrive/Desktop/DOE/ProjectData.xlsx")
print(dat)

## # A tibble: 12 x 3
##    STONE   RED YELLOW
##    <dbl> <dbl>  <dbl>
##  1    44    49     94
##  2    58    53     86
##  3    92    35     49
##  4    60    54     45
##  5    70    64     52
##  6    49    52     68
##  7    37    47     43
##  8    37    42     54
##  9    46    40     44
## 10    97    48     51
## 11    48    45     41
## 12    94    71     41

library(tidyr)
dat2<-pivot_longer(data = dat, c(RED,YELLOW,STONE))
colnames(dat2)<- c("Type_of_Balls", "Observations")
dat2$Type_of_Balls<-as.factor(dat2$Type_of_Balls)
print(dat2)

## # A tibble: 36 x 2
##    Type_of_Balls Observations
##    <fct>                <dbl>
##  1 RED                     49
##  2 YELLOW                  94
##  3 STONE                   44
##  4 RED                     53
##  5 YELLOW                  86
##  6 STONE                   58
##  7 RED                     35
##  8 YELLOW                  49
##  9 STONE                   92
## 10 RED                     54
## # ... with 26 more rows

d) Performing hypothesis test and check residuals. Be sure to comment and take corrective action if necessary.

Hypothesis to be Tested

Null Hypothesis (\(H_o\)) : \(\mu_1\)=\(\mu_2\) = \(\mu_3\)
Alternative Hypothesis (\(H_a\)) : Atleast one of \(\mu_i\) differs.

where;

\(\mu_1\) = Mean of Red Ball.

\(\mu_2\) = Mean of Yellow Ball.

\(\mu_3\) = Mean of Stone Ball.

The hypothesis could be tested using Analysis of Varience (anova) or Non Parmetric test (Kruskal-Wallis rank sum test), but the Non Parametric testing reduces the power of the Hypothesis. Hence ideal option for hypothesis testing is anova.

The anova has few of the strong assumptions about the data which has to be fulfilled before the anova is used to test the hypothesis, if this assumptions are not meet the results of the anova will be effects and may lead to wrong conclusion.

Assumptions of ANOVA.

The assumption on normal distribution of observations of each type of ball (This assumption is not strong, could be exempted to certain extent).

The assumption on constant variance (This is strong assumption, can’t be voliated).

Intially choosing ANOVA to test the Hypothesis.

First Checking for Normality Assumption

Normal Probability plots

library(ggplot2)

ggplot(mapping = aes(sample = dat$RED)) +
  stat_qq(size = 2, col = "red") +
  stat_qq_line(size = 1, col = "blue") +
  xlab("Theoretical Quantiles") + ylab("Sample Quantiles") +
  ggtitle("Normal Q - Q plot of Red Ball", ) + 
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(axis.text=element_text(size=10),
        axis.title=element_text(size=10),
        plot.title = element_text(size=14, colour = "red" ))

ggplot(mapping = aes(sample = dat$YELLOW)) +
  stat_qq(size = 2, col = "yellow2") +
  stat_qq_line(size = 1, col = "blue") +
  xlab("Theoretical Quantiles") + ylab("Sample Quantiles") +
  ggtitle("Normal Q - Q plot of Yellow Ball", ) + 
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(axis.text=element_text(size=10),
        axis.title=element_text(size=10),
        plot.title = element_text(size=14, colour = "yellow2"))

ggplot(mapping = aes(sample = dat$STONE)) +
  stat_qq(size = 2, col = "black") +
  stat_qq_line(size = 1, col = "blue") +
  xlab("Theoretical Quantiles") + ylab("Sample Quantiles") +
  ggtitle("Normal Q - Q plot of Stone Ball", ) + 
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(axis.text=element_text(size=10),
        axis.title=element_text(size=10),
        plot.title = element_text(size=14, colour = "black"))

\(\underline{Comments}\): The Normal Probability plots of the all type of balls looks normally distributed; but Yellow and red balls has tails slightly out drifted. Could be considered the observations are normally distributed.

Checking for constant variance

The boxplot is best and easy way to check for the constant varience, by comparing the median and quartile ranges.

Box plot

library(ggplot2)
ggplot(dat2, aes(x = Type_of_Balls, y = Observations, fill = Type_of_Balls)) +  
  geom_boxplot() + scale_fill_manual(breaks = dat2$Type_of_Balls,
                    values = c("RED","YELLOW","BLACK"))

\(\underline{Comments}\): Clearly there is no constant variance seen. Every ball has different quantile ranges , so obviously we will have different variances across each type of ball.Therefore we can’t go with parametric method(Anova) as it has strong assumption on constant variance. But we will try to transform the data and look for constant variance so that there is a chance that we can still go with anova if we got our variance as constant.

The above box plot could not give us the favorable results to use anova for Hypothesis testing, hence we will try transforming the data to stabilize the variance using the Boxcox transformation.

Box cox Transformation

library(MASS)
boxcox(dat2$Observations~dat2$Type_of_Balls)

The Box cox transformation gives us the \(\lambda\) within the 95% of confidence interval, choosing the \(\lambda\) as -1 at the peak of curve.

Red<-c(dat$RED)^-1
Yellow<-c(dat$YELLOW)^-1
Stone<-c(dat$STONE)^-1

dat3<-cbind.data.frame(Red, Yellow, Stone)

library(tidyr)
dat4<-pivot_longer(data = dat3, c(Red,Yellow,Stone))
colnames(dat4)<- c("Type_of_Balls", "Observations")
dat4$Type_of_Balls<-as.factor(dat4$Type_of_Balls)
print(dat4)

## # A tibble: 36 x 2
##    Type_of_Balls Observations
##    <fct>                <dbl>
##  1 Red                 0.0204
##  2 Yellow              0.0106
##  3 Stone               0.0227
##  4 Red                 0.0189
##  5 Yellow              0.0116
##  6 Stone               0.0172
##  7 Red                 0.0286
##  8 Yellow              0.0204
##  9 Stone               0.0109
## 10 Red                 0.0185
## # ... with 26 more rows

Checking for constant variance

Box plo

Boxplot of newly Transformed Data

library(ggplot2)
ggplot(dat4, aes(x = Type_of_Balls, y = Observations, fill = Type_of_Balls)) +  
  geom_boxplot() + scale_fill_manual(breaks = dat4$Type_of_Balls,
                                     values = c("red","yellow","black"))

\(\underline{Comments}\): Box-cox also didn’t helped us(There is no constant variance by transforming the data also) so there fore we can’t perform anova we need to go for non- parametric method to check(Conclude) for our hypothesis.

Non-parametric test

kruskal.test(dat2$Observations~dat2$Type_of_Balls)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  dat2$Observations by dat2$Type_of_Balls
## Kruskal-Wallis chi-squared = 0.89139, df = 2, p-value = 0.6404

\(\underline{Comments}\): The Kruskal-Wallis rank sum test gives us \(p-value = 0.6404\) which is very greater in comparison with significant level \(\alpha\) = 0.05, Hence we could say, we don’t have sufficient strong evidence to reject the Null Hypothesis.Hence, we need to accept(Null Hypothesis) that means of different types of balls are equal.

\(\underline{Conculsion}\): Concluding from the analysis, we could say that the Type of The balls has no significant effect on the distance in which the ball is thrown} for the fixed Pin Elevation and Bungee Position, with Release angle stretched to of \(90^o\).

Investigating further about the model adequacy, to check for the outliners using the residuals.

Residual plots

aov_model<-aov(Observations~Type_of_Balls, data = dat2)
residules<-resid(aov_model)
plot(aov_model, col = "Blue")

\(\underline{Comments}\): The residuals plots gives us that the model is inadequate with the few of the outlines, but the normal probability plots of the residuals looks almost normally distributed with some points out drifted.

As we failed to reject Null-Hypothesis but we will provide Additional evidence from Tukey’s Honest Significant Differences Test. If we can see ZERO in our confidence interval it means that pairs are not differing in mean that they have almost same means.We know that it is Parametric method (we can’t perform it on our data) but we are just using it as further evidence that’s it.

Pairwise comparisons.

Tukey’s Honest Significant Differences

TukeyHSD(aov_model, conf.level = 0.95)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Observations ~ Type_of_Balls, data = dat2)
## 
## $Type_of_Balls
##                   diff        lwr      upr     p adj
## STONE-RED    11.000000  -6.416256 28.41626 0.2814358
## YELLOW-RED    5.666667 -11.749589 23.08292 0.7066231
## YELLOW-STONE -5.333333 -22.749589 12.08292 0.7349004

plot(TukeyHSD(aov_model, conf.level = 0.95), col = "green")

\(\underline{Comments}\): As we assumed we got the result. None of the pairs are differing in the means.As we can conclude it from the above plot zero is in every Confidence interval of the pairs it means that they are not differing in means.

\(\underline{Conculsion}\): Concluding from the analysis, we could say that the Type of The balls has no significant effect on the distance in which the ball is thrown for the fixed Pin Elevation and Bungee Position, with Release angle stretched to of \(90^o\).

Source Code

# Power Test
library(pwr)
pwr.anova.test(k = 3, n = NULL, f = ((0.9*sqrt((3^2)-1))/(2*3)), sig.level = 0.05, power = 0.55)
plot(pwr.anova.test(k = 3,n=NULL,f=((0.9*sqrt((3^2)-1))/(2*3)),sig.level=0.05,power=0.55))
## The number of samples to collect n = 36(12 from each group).

# Laying out the number of samples from part (a) with randomized run order.
balls<-c("RED","YELLOW","STONE")
library(agricolae)
design<-design.crd(trt = balls, r = 12, seed = 2534356)
design$book

#Formation of data frame for the collected data for analysis
library(readxl)
dat<-read_excel("C:/Users/Saipa/OneDrive/Desktop/DOE/ProjectData.xlsx")
print(dat)
library(tidyr)
dat2<-pivot_longer(data = dat, c(RED,YELLOW,STONE))
colnames(dat2)<- c("Type_of_Balls", "Observations")
dat2$Type_of_Balls<-as.factor(dat2$Type_of_Balls)
print(dat2) 


# Normal Probability Plots.
library(ggplot2)

ggplot(mapping = aes(sample = dat$RED)) +
  stat_qq(size = 2, col = "red") +
  stat_qq_line(size = 1, col = "blue") +
  xlab("Theoretical Quantiles") + ylab("Sample Quantiles") +
  ggtitle("Normal Q - Q plot of Red Ball", ) + 
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(axis.text=element_text(size=10),
        axis.title=element_text(size=10),
        plot.title = element_text(size=14, colour = "red" ))

ggplot(mapping = aes(sample = dat$YELLOW)) +
  stat_qq(size = 2, col = "yellow2") +
  stat_qq_line(size = 1, col = "blue") +
  xlab("Theoretical Quantiles") + ylab("Sample Quantiles") +
  ggtitle("Normal Q - Q plot of Yellow Ball", ) + 
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(axis.text=element_text(size=10),
        axis.title=element_text(size=10),
        plot.title = element_text(size=14, colour = "yellow2"))

ggplot(mapping = aes(sample = dat$STONE)) +
  stat_qq(size = 2, col = "black") +
  stat_qq_line(size = 1, col = "blue") +
  xlab("Theoretical Quantiles") + ylab("Sample Quantiles") +
  ggtitle("Normal Q - Q plot of Stone Ball", ) + 
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(axis.text=element_text(size=10),
        axis.title=element_text(size=10),
        plot.title = element_text(size=14, colour = "black"))


# Box plot
library(ggplot2)
ggplot(dat2, aes(x = Type_of_Balls, y = Observations, fill = Type_of_Balls)) +  
  geom_boxplot() + scale_fill_manual(breaks = dat2$Type_of_Balls,
                    values = c("RED","YELLOW","BLACK"))

# Transforming the data
# Box-cox Transformation
library(MASS)
boxcox(dat2$Observations~dat2$Type_of_Balls)
# $\lambda = -1$
Red<-c(dat$RED)^-1
Yellow<-c(dat$YELLOW)^-1
Stone<-c(dat$STONE)^-1

dat3<-cbind.data.frame(Red, Yellow, Stone)

library(tidyr)
dat4<-pivot_longer(data = dat3, c(Red,Yellow,Stone))
colnames(dat4)<- c("Type_of_Balls", "Observations")
dat4$Type_of_Balls<-as.factor(dat4$Type_of_Balls)
print(dat4)


library(ggplot2)
ggplot(dat4, aes(x = Type_of_Balls, y = Observations, fill = Type_of_Balls)) +  
  geom_boxplot() + scale_fill_manual(breaks = dat4$Type_of_Balls,
                                     values = c("red","yellow","black"))

## Non parametric test
kruskal.test(dat2$Observations~dat2$Type_of_Balls)

# Residuals plots
aov_model<-aov(Observations~Type_of_Balls, data = dat2)
residules<-resid(aov_model)
plot(aov_model, col = "Blue")

# Tukey's HSD Test
TukeyHSD(aov_model, conf.level = 0.95)
plot(TukeyHSD(aov_model, conf.level = 0.95), col = "green")

Part - 02.

Perform a designed experiment to determine the effect of Pin Elevation and Release Angle on distance in which a red ball is thrown when the Bungee Position is fixed at the second position.

Settings one and three of Pin Elevation should be investigated as a fixed effect, as well as settings of the Release Angle corresponding to 90, 110, and 120 degrees as a random effect. The design should be replicated three times.

The given setting of Pin Elevation and Release Angle gives us ideal choose of considered them as factors and the factorial design could be used for analysis the whole experiment, the factors with different setting in them is treated as levels in the factors. The Pin Elevation(A) has setting one and three which could be considered as two levels, similarly the Release Angle(B) which has three levels representing three different angles.

a) Proposing a layout with a randomized run order

The factorial desing layout with no source of nuisance.

The Arguments of the layout generation for the factorial design are:

Treatments (trt): The number of factor levels in each factor

Replications (r): The number of times each factor level combinations need to be replicated

Layout generation

trts<-c(2,3)
library(agricolae)
design.ab(trt=trts,r=3,design="crd",seed=63737)

## $parameters
## $parameters$design
## [1] "factorial"
## 
## $parameters$trt
## [1] "1 1" "1 2" "1 3" "2 1" "2 2" "2 3"
## 
## $parameters$r
## [1] 3 3 3 3 3 3
## 
## $parameters$serie
## [1] 2
## 
## $parameters$seed
## [1] 63737
## 
## $parameters$kinds
## [1] "Super-Duper"
## 
## $parameters[[7]]
## [1] TRUE
## 
## $parameters$applied
## [1] "crd"
## 
## 
## $book
##    plots r A B
## 1    101 1 2 2
## 2    102 1 1 2
## 3    103 1 2 3
## 4    104 1 1 3
## 5    105 1 1 1
## 6    106 2 1 3
## 7    107 2 2 3
## 8    108 2 2 2
## 9    109 2 1 1
## 10   110 3 1 1
## 11   111 2 1 2
## 12   112 3 2 2
## 13   113 3 1 3
## 14   114 3 2 3
## 15   115 1 2 1
## 16   116 2 2 1
## 17   117 3 2 1
## 18   118 3 1 2

# A correspond to Pin Elevation(2levels) ie (1st, 3rd positions)
# B corresponds to Release Angle(3levels) ie (90,110,120)

Where;

A: Pin Elevation; \(1: 1^{st}position\), \(2: 3^{rd}position\).

B: Release angle; \(1: 90^{0}\), \(2: 110^{0}\),\(3: 120^{0}\).

c) Collect data and record observations on the layout proposed in part (a)

The observations are collected according to the layout generated with different factor level combinations and required number of replications.

Entering the collected data for analysis

library(readxl)
Pdat<-read_excel("C:/Users/Saipa/OneDrive/Desktop/DOE/Projectdata2.xlsx")
print(Pdat)

## # A tibble: 18 x 3
##    Position      Angle Observation
##    <chr>         <dbl>       <dbl>
##  1 3rd elevation   110          39
##  2 1st elevation   110          57
##  3 3rd elevation   120          18
##  4 1st elevation   120          37
##  5 1st elevation    90          45
##  6 1st elevation   120          39
##  7 3rd elevation   120          12
##  8 3rd elevation   110          25
##  9 1st elevation    90          45
## 10 1st elevation    90          43
## 11 1st elevation   110          45
## 12 3rd elevation   110          28
## 13 1st elevation   120          28
## 14 3rd elevation   120           7
## 15 3rd elevation    90          18
## 16 3rd elevation    90          23
## 17 3rd elevation    90          25
## 18 1st elevation   110          48

\(\underline{Comments}\): The normal probability plot and the box plot of the observation can’t be plot because of each factor level combination has only three values, this sample size is small to plot and draw conclusion on the normality and the constant variance respectively.

b) State model equation with the null and alternative hypotheses to be tested. In addition, state the level of significance that will be used in your analysis.

Model Equation.

\(y_{ijk} = \mu + \alpha_i + \beta_j + \alpha\beta_{ij} + \epsilon_{ijk}\)

Where;

\(y_{ijk}\) = Observed response of each factor level combination

\(\mu\) = Grand Mean (Mean of entire populations)

Controllable errors

\(\alpha_i\) = Main effect of the factor Pin Elevations

\(\beta_j\) = Main effect of factor Release Angle

\(\alpha\beta_{ij}\) = Interactions effect between the factors

Uncontrollable errors

\(\epsilon_{ijk}\) = Random Error which is distributed Normally (0,1)

The hypothesis is tested for the Controllable errors, so that we could work on it to reduce the error based on the obtained results of analysis.

Hypothesis to be tested:

Factor - 1 : Pin Elevation

Null Hypothesis (\(H_o\)):

\(\alpha_i = 0\space\forall\space i\)

Alternate Hypothesis (\(H_a\)):

\(\alpha_i \neq 0\space some \space i\)

Factor - 2 : Release Angle.

Null Hypothesis (\(H_o\)):

\(\beta_j = 0\space\forall\space j\)

Alternate Hypothesis (\(H_a\)):

\(\beta \neq 0\space some \space j\)

Interaction.

Null Hypothesis (\(H_o\)):

\(\alpha\beta_{ij} = 0\space\forall\space ij\)

Alternate Hypothesis (\(H_a\)):

\(\alpha\beta_{ij} \neq 0 \space some \space ij\)

Level of significance = 0.05.

The Hypothesis is tested considering the higher order interaction first, then the lower order interactions and the main effects; this gives us the idea that which interactions are affecting the results of ANOVA.

d) Testing the hypotheses and state conclusions, determining those effects that are significant. Showing any plots that might be useful/necessary to show the findings. You may also show residual plots and make appropriate comments, but do not transform the data (i.e. use the raw data regardless of normality and variance constancy).

Testing the hypothesis.

library("GAD")
Pdat$Angle<-as.random(Pdat$Angle)
Pdat$Position<-as.fixed(Pdat$Position)
model<-aov(Observation~Position*Angle,data=Pdat)
gad(model)

## Analysis of Variance Table
## 
## Response: Observation
##                Df  Sum Sq Mean Sq  F value    Pr(>F)    
## Position        1 2048.00 2048.00 455.1111 0.0021901 ** 
## Angle           2  856.33  428.17  14.9072 0.0005586 ***
## Position:Angle  2    9.00    4.50   0.1567 0.8567043    
## Residual       12  344.67   28.72                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\(\underline{Comments}\):Fail to reject Null-hypothesis that interaction term is not significant.By interaction plot also you can actually see from Angle 110 to 120 as that observation are in decreasing pattern for both of them ie interaction of both of them is not explaining any thing about the model.

Interaction Plots

library(ggplot2)
ggplot() +
  aes(x =Pdat$Angle, color = Pdat$Position, group = Pdat$Position, y = Pdat$Observation) +
  stat_summary(fun= mean, geom = "point") +
  stat_summary(fun= mean, geom = "line")

\(\underline{Comments}\):Fail to reject Null-hypothesis that interaction term is not significant. The results give us that the interaction between the Pin Elevation and Release Angle are not significant,By interaction plot also you can actually see from Angle 110 to 120 as that observation are in decreasing pattern for both of them ie interaction of both of them is not explaining any thing about the model. So we failed to reject Null hypothesis for interaction term.

Now we need to Test for main effects if they make sence or not

model1<-aov(Observation~Position+Angle,data=Pdat)
gad(model1)

## Analysis of Variance Table
## 
## Response: Observation
##          Df  Sum Sq Mean Sq F value    Pr(>F)    
## Position  1 2048.00 2048.00  81.071 3.371e-07 ***
## Angle     2  856.33  428.17  16.949 0.0001822 ***
## Residual 14  353.67   25.26                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\(\underline{Comments} , \underline{Conculsion}\):Main effects make sense they are affecting the Model as p value is very small for both of them.We can reject the null Hypothesis.The analysis gives us that The two Main Effects Pin Elevation and the Release angle has significant effect on the distance to which the ball is thrown.

Residuals plots

plot(model1)

\(\underline{Comments} , \underline{Conculsion}\):First of all they are no enough points in-order to comment on Normality and Model Adequacy. In residual vs Fitted graph we can see that there is no strong pattern observed ie residuals are randomly distributed across zero residual Line. ie We can assume the model is Adequate(Constant Variance)(Judgemental Call only Assumption). Normal probability plot of residuals is almost normally distributed with one tail out drifted(13th observation.)

Source Code

# Design Layout and Loading Data
trts<-c(2,3)
library(agricolae)
design.ab(trt=trts,r=3,design="crd",seed=63737)
# A correspond to Pin Elevation(2levels) ie (1st, 3rd positions)
# B corresponds to Release Angle(3levels) ie (90,110,120)

library(readxl)
Pdat<-read_excel("C:/Users/Saipa/OneDrive/Desktop/DOE/Projectdata2.xlsx")
print(Pdat)

# Model
library("GAD")
Pdat$Angle<-as.random(Pdat$Angle)
Pdat$Position<-as.fixed(Pdat$Position)
model<-aov(Observation~Position*Angle,data=Pdat)
gad(model)

# Interaction Plots
library(ggplot2)
ggplot() +
  aes(x =Pdat$Angle, color = Pdat$Position, group = Pdat$Position, y = Pdat$Observation) +
  stat_summary(fun= mean, geom = "point") +
  stat_summary(fun= mean, geom = "line")

# Model Summary
model1<-aov(Observation~Position+Angle,data=Pdat)
gad(model1)

# Residual Plots
plot(model1)

Part - 03

Perform a designed experiment to determine the effect of the available factors of Pin Elevation, Bungee Position, Release Angle, and Ball Type on distance in which a ball is thrown. Design this experiment as a single replicate of a \(2^{4}\) factorial design with the low and high level of the factors being as follows.

Pin Elevation - Position 1 (-1), Position 3 (+1);

Bungee Position - Position 2 (-1), Position 3 (+1);

Release Angle - \(90^{0} (-1),\space 110^{0} (+1)\);

Ball Type - Yellow (-1),Red (+1).

a) Propose a data collection layout with a randomized run order.

The factorial design each factor with two levels, complete random design and single replicate.

Layout generation

library(agricolae)
trts<-c(2,2,2,2)
design.ab(trt=trts,r=1,design="crd",seed=25586)

## $parameters
## $parameters$design
## [1] "factorial"
## 
## $parameters$trt
##  [1] "1 1 1 1" "1 1 1 2" "1 1 2 1" "1 1 2 2" "1 2 1 1" "1 2 1 2" "1 2 2 1"
##  [8] "1 2 2 2" "2 1 1 1" "2 1 1 2" "2 1 2 1" "2 1 2 2" "2 2 1 1" "2 2 1 2"
## [15] "2 2 2 1" "2 2 2 2"
## 
## $parameters$r
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 
## $parameters$serie
## [1] 2
## 
## $parameters$seed
## [1] 25586
## 
## $parameters$kinds
## [1] "Super-Duper"
## 
## $parameters[[7]]
## [1] TRUE
## 
## $parameters$applied
## [1] "crd"
## 
## 
## $book
##    plots r A B C D
## 1    101 1 1 2 1 2
## 2    102 1 2 1 2 2
## 3    103 1 2 2 2 2
## 4    104 1 2 2 1 1
## 5    105 1 2 1 1 2
## 6    106 1 2 1 2 1
## 7    107 1 1 1 1 1
## 8    108 1 1 1 2 2
## 9    109 1 1 1 1 2
## 10   110 1 1 2 2 1
## 11   111 1 1 2 1 1
## 12   112 1 2 2 2 1
## 13   113 1 2 1 1 1
## 14   114 1 2 2 1 2
## 15   115 1 1 2 2 2
## 16   116 1 1 1 2 1

A: Ball Type - Yellow (-1),Red (1)

B: Release Angle - \(90^{0} (-1), 110^{0} (+1)\)

C: Bungee Position - Position 2 (-1), Position 3 (+1)

D: Pin Elevation - Position 1 (-1), Position 3 (+1)

b) Collect data and record observations

Data Entry

library("readxl")
Fdata3<-read_excel("C:/Users/Saipa/OneDrive/Desktop/DOE/Projectdata3.xlsx")
print(Fdata3)

## # A tibble: 16 x 5
##    `Ball Type` `Release Angle` `Bungee Position` `Pin Elevation`   Obs
##          <dbl>           <dbl>             <dbl>           <dbl> <dbl>
##  1          -1               1                -1               1    28
##  2           1              -1                 1               1    23
##  3           1               1                 1               1    19
##  4           1               1                -1              -1    54
##  5           1              -1                -1               1    27
##  6           1              -1                 1              -1    54
##  7          -1              -1                -1              -1    37
##  8          -1              -1                 1               1    28
##  9          -1              -1                -1               1    33
## 10          -1               1                 1              -1    52
## 11          -1               1                -1              -1    56
## 12           1               1                 1              -1    42
## 13           1              -1                -1              -1    40
## 14           1               1                -1               1    20
## 15          -1               1                 1               1    15
## 16          -1              -1                 1              -1    43

#colnames(Fdata3)<-c("A","B","C","D","Obs")
#print(Fdata3)
A<-Fdata3$`Ball Type`
B<-Fdata3$`Release Angle`
C<-Fdata3$`Bungee Position`
D<-Fdata3$`Pin Elevation`
Obs<-Fdata3$Obs

c) State model equation and determine what factors/interactions appear to be significant (show any plots that were used in making this determination).

Model equation.

\(y_{ijklm}= \mu + \alpha_i + \beta_j + \gamma_k + \delta_l + \alpha\beta_{ij} + \alpha\gamma_{ik} + \alpha\delta_{il} + \beta\gamma_{jk} + \beta\delta_{jl} + \gamma\delta_{kl} + \alpha\beta\gamma_{ijk} + \alpha\beta\delta_{ijl} + \beta\gamma\delta_{jkl} + \alpha\beta\gamma\delta_{ijkl} + \epsilon_{ijklm}\)

where; \(\mu\): Grand mean

Controllable Error

\(\alpha_i, \beta_j, \gamma_k, \delta_l\): Main effects of factors A, B, C and D respectively

\(\alpha\beta_{ij}, \alpha\gamma_{ik}, \alpha\delta_{il}, \beta\gamma_{jk}, \beta\delta_{jl}, \gamma\delta_{kl}, \alpha\beta\gamma_{ijk}, \alpha\beta\delta_{ijl}, \beta\gamma\delta_{jkl}, \alpha\beta\gamma\delta_{ijkl}\): This are the two factor, three factor and four factor interactions.

Uncontrollable Error

\(\epsilon_{ijklm}\): Random error which is distributed normal (0,1).

Finding the significant factors

For the single replicate of this design we can find the factors and terms which are significantly differing from the Half-Normal Plot.

library(DoE.base)
fmodel<-aov(Obs~A*B*C*D)
coef(fmodel)

## (Intercept)           A           B           C           D         A:B 
##     35.6875     -0.8125      0.0625     -1.1875    -11.5625     -1.1875 
##         A:C         B:C         A:D         B:D         C:D       A:B:C 
##      0.8125     -2.5625     -1.0625     -3.6875     -1.6875     -0.3125 
##       A:B:D       A:C:D       B:C:D     A:B:C:D 
##      2.0625      0.8125      1.9375      1.6875

summary(fmodel)

##             Df Sum Sq Mean Sq
## A            1   10.6    10.6
## B            1    0.1     0.1
## C            1   22.6    22.6
## D            1 2139.1  2139.1
## A:B          1   22.6    22.6
## A:C          1   10.6    10.6
## B:C          1  105.1   105.1
## A:D          1   18.1    18.1
## B:D          1  217.6   217.6
## C:D          1   45.6    45.6
## A:B:C        1    1.6     1.6
## A:B:D        1   68.1    68.1
## A:C:D        1   10.6    10.6
## B:C:D        1   60.1    60.1
## A:B:C:D      1   45.6    45.6

halfnormal(fmodel)

\(\underline{Comments}\):From the half normal plot ,we got only Pin Elevation(D) as the significant factor. This factor are used for the further analysis and rest of the factors are considered as error factors.

d) After using insignificant factors/interactions to create an error term, perform ANOVA to determine a final model equation using an alpha = 0.05.

FINAL ANOVA MODEL.

fmodel2<-aov(Obs~D)
summary(fmodel2)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## D            1 2139.1  2139.1   46.91 7.94e-06 ***
## Residuals   14  638.4    45.6                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\(\underline{Comments}\): After removing all insignificant terms from the model and we can say that main effect D is only significantly differing at \(\alpha = 0.05\) as we got \(P-value = 7.94e^{-06}\) .

\(\underline{Conclusion}\): The analysis gave us that, The distance to which the particular type of ball is thrown is depending only on Pin Elevation only. Remaining all the factors are just considered as error factors i.e they are not effecting the distance thrown(insignificant terms or factors).

Soure Code

# Layout generation
library(agricolae)
trts<-c(2,2,2,2)
design.ab(trt=trts,r=1,design="crd",seed=25586)

# Data Entry
library("readxl")
Fdata3<-read_excel("C:/Users/Saipa/OneDrive/Desktop/DOE/Projectdata3.xlsx")
print(Fdata3)
#colnames(Fdata3)<-c("A","B","C","D","Obs")
#print(Fdata3)
A<-Fdata3$`Ball Type`
B<-Fdata3$`Release Angle`
C<-Fdata3$`Bungee Position`
D<-Fdata3$`Pin Elevation`
Obs<-Fdata3$Obs

# Checking For Significant Factors
library(DoE.base)
fmodel<-aov(Obs~A*B*C*D)
coef(fmodel)
summary(fmodel)
halfnormal(fmodel)

# Final Model
fmodel2<-aov(Obs~D)
summary(fmodel2)

DOE Project

Sai Pavan #R11778698,Rishidhar Rao #R11842808, Hemanth Reddy #R11842807

2022-12-04

The Project was performed in three parts.

Part 01: By varying the Type of Balls, and keeping the Pin Elevation and Bungee Position fixed, and Release Angle stretched to \(90^o\).

Part 02: By considering the Pin Elevation and Release Angle as variables, with a red ball when the Bungee Position is fixed.

Part 03: By considering all factors from the Statapult as variables, with different types of balls.

Part - 01

Perform a designed experiment to determine the effect of the type of ball on the distance in which the ball is thrown.

The Pin Elevation and Bungee Position should both be at their fourth setting, i.e., highest setting. The Release Angle should be at \(90^o\), with the arm pulled fully back before releasing. To test this hypothesis, we wish to use a completely randomized design with an \(\alpha\) around 0.05.

a) Determine how many samples should be collected to detect a mean difference with a large effect (i.e. 90% of the standard deviation) and a pattern of maximum variability with a probability of 55%.

The sample size determination involves various Arguments:

For the given parameters.

Sample size determination

b) Proposing a layout using the number of samples with randomized run order.

Arguments of the layout generation.

Proposed Layout

Collecting data and record observations on proposed layout in b).

The data is collected according to the pattern of random generated layout in b), following the required conditions of the Pin Elevation and Bungee Position fixed and the Release Angle at \(90^o\).

Drawing the collected data and Formation of data frames.

d) Performing hypothesis test and check residuals. Be sure to comment and take corrective action if necessary.

Hypothesis to be Tested

The hypothesis could be tested using Analysis of Varience (anova) or Non Parmetric test (Kruskal-Wallis rank sum test), but the Non Parametric testing reduces the power of the Hypothesis. Hence ideal option for hypothesis testing is anova.

The anova has few of the strong assumptions about the data which has to be fulfilled before the anova is used to test the hypothesis, if this assumptions are not meet the results of the anova will be effects and may lead to wrong conclusion.

Assumptions of ANOVA.

Intially choosing ANOVA to test the Hypothesis.

First Checking for Normality Assumption

Normal Probability plots

Checking for constant variance

The boxplot is best and easy way to check for the constant varience, by comparing the median and quartile ranges.

Box plot

The above box plot could not give us the favorable results to use anova for Hypothesis testing, hence we will try transforming the data to stabilize the variance using the Boxcox transformation.

Box cox Transformation

The Box cox transformation gives us the \(\lambda\) within the 95% of confidence interval, choosing the \(\lambda\) as -1 at the peak of curve.

Checking for constant variance

Box plo

Boxplot of newly Transformed Data

Non-parametric test

Investigating further about the model adequacy, to check for the outliners using the residuals.

Residual plots

Pairwise comparisons.

Tukey’s Honest Significant Differences

Source Code

Part - 02.

Perform a designed experiment to determine the effect of Pin Elevation and Release Angle on distance in which a red ball is thrown when the Bungee Position is fixed at the second position.

Settings one and three of Pin Elevation should be investigated as a fixed effect, as well as settings of the Release Angle corresponding to 90, 110, and 120 degrees as a random effect. The design should be replicated three times.

a) Proposing a layout with a randomized run order

The factorial desing layout with no source of nuisance.

The Arguments of the layout generation for the factorial design are:

Layout generation

Where;

c) Collect data and record observations on the layout proposed in part (a)

The observations are collected according to the layout generated with different factor level combinations and required number of replications.

Entering the collected data for analysis

b) State model equation with the null and alternative hypotheses to be tested. In addition, state the level of significance that will be used in your analysis.

Model Equation.

\(y_{ijk} = \mu + \alpha_i + \beta_j + \alpha\beta_{ij} + \epsilon_{ijk}\)

Where;

Controllable errors

Uncontrollable errors

The hypothesis is tested for the Controllable errors, so that we could work on it to reduce the error based on the obtained results of analysis.

Hypothesis to be tested:

Factor - 1 : Pin Elevation

Null Hypothesis (\(H_o\)):

Alternate Hypothesis (\(H_a\)):

Factor - 2 : Release Angle.

Null Hypothesis (\(H_o\)):

Alternate Hypothesis (\(H_a\)):

Interaction.

Null Hypothesis (\(H_o\)):

Alternate Hypothesis (\(H_a\)):

Level of significance = 0.05.

The Hypothesis is tested considering the higher order interaction first, then the lower order interactions and the main effects; this gives us the idea that which interactions are affecting the results of ANOVA.

Testing the hypothesis.

Interaction Plots

Now we need to Test for main effects if they make sence or not

Residuals plots

Source Code

Part - 03

a) Propose a data collection layout with a randomized run order.