Libraries

library(agricolae)
## Warning: package 'agricolae' was built under R version 4.2.2
library(GAD)
## Loading required package: matrixStats
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.2 (2022-06-13 22:00:14 UTC) successfully loaded. See ?R.methodsS3 for help.
library(pwr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:matrixStats':
## 
##     count
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(DoE.base)
## Warning: package 'DoE.base' was built under R version 4.2.2
## Loading required package: grid
## Loading required package: conf.design
## Registered S3 method overwritten by 'DoE.base':
##   method           from       
##   factorize.factor conf.design
## 
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## The following object is masked from 'package:graphics':
## 
##     plot.design
## The following object is masked from 'package:base':
## 
##     lengths
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select

Part I

In the first part of the project we are testing the significance of the effect of the projectile on the travel distance. For this, we performed a Power Analysis to calculate the sample size and an ANOVA analysis in order to test he Statistical Hypothesis.

Part a - Power Analysis

pwr.anova.test(k=3,n=NULL,f=0.45*(sqrt(8)/3), sig.level=0.05, power=0.55)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 3
##               n = 11.35348
##               f = 0.4242641
##       sig.level = 0.05
##           power = 0.55
## 
## NOTE: n is number in each group

The power analysis is useful to the determine the sample size of an experiment. For given significance level and power, which correspond to the probability of rejecting Ho this being false, we are able to ensure that the sample size is large enough for this experiment.

In this study, we considered a power of 55% and a significance level of 95% in an One Factor Experiment with three levels. Thus, the result of the power analysis has shown that we need at least 12 samples per level, to ensure our parameters conditions.

Part b - Proposed Layout

In order to ensure a non-biased experiment, we used a Complete Randomized Design and result is shown in the table below.

trt <- c("Rock","Tennis Ball","Golf Ball")
design.part1 <- design.crd(trt=trt,r=12,seed=213124)
design.part1$book
##    plots  r         trt
## 1    101  1        Rock
## 2    102  2        Rock
## 3    103  1 Tennis Ball
## 4    104  3        Rock
## 5    105  2 Tennis Ball
## 6    106  1   Golf Ball
## 7    107  3 Tennis Ball
## 8    108  2   Golf Ball
## 9    109  4        Rock
## 10   110  5        Rock
## 11   111  3   Golf Ball
## 12   112  4 Tennis Ball
## 13   113  4   Golf Ball
## 14   114  5   Golf Ball
## 15   115  5 Tennis Ball
## 16   116  6        Rock
## 17   117  6   Golf Ball
## 18   118  7   Golf Ball
## 19   119  8   Golf Ball
## 20   120  9   Golf Ball
## 21   121  6 Tennis Ball
## 22   122  7        Rock
## 23   123  7 Tennis Ball
## 24   124  8        Rock
## 25   125  8 Tennis Ball
## 26   126 10   Golf Ball
## 27   127 11   Golf Ball
## 28   128  9        Rock
## 29   129 10        Rock
## 30   130 11        Rock
## 31   131  9 Tennis Ball
## 32   132 10 Tennis Ball
## 33   133 12        Rock
## 34   134 11 Tennis Ball
## 35   135 12 Tennis Ball
## 36   136 12   Golf Ball

Part c - Data Collection

The Statapult was used to throw the three different projectiles according to following setup:

  1. Pin Elevation: 4th setting (highest setting)
  2. Bungee Position: 4th setting (highest setting)
  3. Release Angle: 90 degrees

To collect the data we pulled the Statapult fully back before releasing. Also, we have used a metric tape to measure the distance traveled by the projectiles, which was estimated visually by one of the team members. Lastly, the data was collected in the order of the proposed design in Part b.

The measured distances are shown in the table bellow.

projectiles <- design.part1$book$trt
obs.part1 <- c(79,48,50,60,54,72,49,55,45,52,61,48,49,55,55,44,60,65,60,63,45,57,63,51,59,50,52,58,58,45,65,55,55,52,53,63)

data.part1 <- data.frame(projectiles, obs.part1)

data.part1$projectiles <- as.factor(data.part1$projectiles)

data.part1
##    projectiles obs.part1
## 1         Rock        79
## 2         Rock        48
## 3  Tennis Ball        50
## 4         Rock        60
## 5  Tennis Ball        54
## 6    Golf Ball        72
## 7  Tennis Ball        49
## 8    Golf Ball        55
## 9         Rock        45
## 10        Rock        52
## 11   Golf Ball        61
## 12 Tennis Ball        48
## 13   Golf Ball        49
## 14   Golf Ball        55
## 15 Tennis Ball        55
## 16        Rock        44
## 17   Golf Ball        60
## 18   Golf Ball        65
## 19   Golf Ball        60
## 20   Golf Ball        63
## 21 Tennis Ball        45
## 22        Rock        57
## 23 Tennis Ball        63
## 24        Rock        51
## 25 Tennis Ball        59
## 26   Golf Ball        50
## 27   Golf Ball        52
## 28        Rock        58
## 29        Rock        58
## 30        Rock        45
## 31 Tennis Ball        65
## 32 Tennis Ball        55
## 33        Rock        55
## 34 Tennis Ball        52
## 35 Tennis Ball        53
## 36   Golf Ball        63

For the Projectiles, we have used a Scale to weigh of each Projectile. The results are shown below:

  1. Tennis Ball: 16 g
  2. Rock: 14 g
  3. Golf Ball: 10 g

Part d - Hyphotesis Test

Finally, we were trying to determine the effect of the ball in the travel distance. In conclusion, the formulated hypothesis should compare the mean of the distance traveled, as it follows:

\[ H_o: \mu_1=\mu_2=\mu_3=\mu \\ H_a: At\; least\; one\; \mu_i \neq \mu \]

Since the experiment requires a One Factor with Three Levels, an Analysis of Variance (ANOVA) was appropriated to test the data. The assumptions needed to perform an accurate ANOVA analysis are:

  1. Constant Variance
  2. Data must be normal

Therefore, we need to check the residuals of the analysis to ensure that model is adequate:

model.aov.part1 <- aov(obs.part1~projectiles, data=data.part1)
plot(model.aov.part1,1)

plot(model.aov.part1,2)

From the Scatter Residuals Plot, we can check that the variance is fairly constant across the levels of the experiment. It possible to state this by analyzing the height of each scatter column, as shown in the plot above.

Regarding the Normality Assumption, the second plot suggests that the data is fairly normal, since the data observations seem to follow the line.

We have used the boxplot to have a deeper analysis of the data. It possible, now, to undertand that the Rock actually has a higher variance, which might explained by the irregularity of its shape. Also, we can observe that the mean distance of the golf ball is higher than the others, which is expected as it is the lighter projectile.

tennis <- data.part1 %>% filter(projectiles=="Tennis Ball")
rock <- data.part1 %>% filter(projectiles=="Rock")
golf <- data.part1 %>% filter(projectiles=="Golf Ball")

boxplot(tennis$obs.part1, rock$obs.part1, golf$obs.part1,
        main="Ball Comparisions", 
        names=c("Tennis Ball","Rock", "Golf Ball"),
        ylab=c("Distance [cm]"))

summary(model.aov.part1)
##             Df Sum Sq Mean Sq F value Pr(>F)
## projectiles  2  168.7   84.36   1.458  0.247
## Residuals   33 1908.9   57.85

From the calculated results, the p-value for projectiles shows that we can not reject Ho, because its value is higher than \(\alpha=0.05\).

Part e - Pairwise Comparisions

Since we could not reject Ho, there is no pairwise comparision to be made.

Part f - Conclusions

The results pointed that there is no significant difference between the three levels of the projectile factor. This means that:

\[ P_{value} = 0.247 > \alpha = 0.05 \] And, therefore, Ho is not reject.

Part II

Part a - Model Equation and Hyphotesis

This experiment has a 2 Factor Design. The first factor, the pin elevation, has two levels, First Position and Third Position, while the second factor (Release Angle) has three levels, 90, 110, and 120 degrees. Thus, the Linear Model equation for this experiment is the following:

\[ y_{ijk} = \mu+ \alpha_i+\beta_j+\alpha\beta_{ij}+\epsilon_{ijk} \] Where,

\[ \mu: Overall \; mean \\ \alpha: Pin \; Elevation \; Effect \\ \beta: Release \; Angle \; Effect \\ \alpha\beta: Interaction \; between \; first \; and \; second \; effects \\ \epsilon: Random \; Error \\ i= 1, 2 \; and \; j=1, 2, 3 \]

The hypothesis that we wanted to test with this experiment are 1) Test the interaction between the factors to understand if there is a significant interaction between them; 2) Test each Main Effect of the factors to understand their effect on their traveled distance. Therefore, the hypothesis can be formulated as follows:

  1. Interaction Hypothesis \[ H_o: \alpha\beta_{ij} = 0 \\ H_{ao}: \alpha\beta_{ij} \neq 0 \]
  2. Main Effect Hypothesis \[ H_1: \alpha_{i}=0\\H_{a1}:\alpha_i\neq0 \\ \\ H_2: \beta_j=0\\H_{a2}:\beta_j\neq0 \]

Part b - Proposed Layout

This experiment is classified as Three Times Replicated Two Factor Design, with 2 levels for the first factor and three levels for the second factor. So, to achieve a non-biased design, we used a Completed Randomized Design, as follows.

trt <- c(2,3)
design.part2 <- design.ab(trt,r=3,design="crd",seed=123456789)

design.part2$book
##    plots r A B
## 1    101 1 2 1
## 2    102 1 1 2
## 3    103 1 2 3
## 4    104 1 1 1
## 5    105 1 2 2
## 6    106 2 1 1
## 7    107 2 1 2
## 8    108 2 2 2
## 9    109 2 2 3
## 10   110 1 1 3
## 11   111 3 2 3
## 12   112 2 1 3
## 13   113 2 2 1
## 14   114 3 1 1
## 15   115 3 1 3
## 16   116 3 2 2
## 17   117 3 1 2
## 18   118 3 2 1

Part c - Data Collection

The data was collected obeying the stipulated order from the Completed Randomized Design of Part b. The Pin Elevation factor was considered as Fixed Effect and the Release Angle was considered as Random Effect, which gave us a Mixed Effect Model.

The collected data is presented in the table below.

pinElevation <- design.part2$book$A
relAngle <- design.part2$book$B
obs.part2 <- c(51,29,62,25,56,27,34,57,55,29,65,26,49,29,24,54,29,42)

data.part2 <- data.frame(pinElevation, relAngle, obs.part2)

data.part2$pinElevation <- as.fixed(data.part2$pinElevation)
data.part2$relAngle <- as.random(data.part2$relAngle)

data.part2
##    pinElevation relAngle obs.part2
## 1             2        1        51
## 2             1        2        29
## 3             2        3        62
## 4             1        1        25
## 5             2        2        56
## 6             1        1        27
## 7             1        2        34
## 8             2        2        57
## 9             2        3        55
## 10            1        3        29
## 11            2        3        65
## 12            1        3        26
## 13            2        1        49
## 14            1        1        29
## 15            1        3        24
## 16            2        2        54
## 17            1        2        29
## 18            2        1        42

Part d - Hyphotesis Test and Conclusion

Since the Experiment is a Two Factor Design, an ANOVA is the most appropriate model to test the Hypothesis. Analogally to Part 1, we need to test the same Assumptions to ensure that the ANOVA is suitable for this data. So, let’s check the Residual, the Normality and boxplot graphs for the constant variance and normality of the data.

model.aov.part2 <- aov(obs.part2~pinElevation+relAngle+pinElevation*relAngle,data=data.part2)

From a visual inspection of the plot below, it seems that the constant variance assumption is violated, because the height of the scatters is not constant. Conversely, the data seems to be fairly normal, according to the second plot below. The data follows approximately a normal line.

plot(model.aov.part2,1)

plot(model.aov.part2,2)

From the boxplots below, we can see the effect of the pinelevation factor on the projectile travel distance. the pin elevation seems to have a significant effect on the travel distance. In contrast to the Release Angle, which has not shown such significance from the plots.

Also, regarding the variance of the data, the height of the boxes differ, which reinforces the idea that the Constant Variance Assumption might be violated.

factor11 <- data.part2%>%filter(pinElevation==1)
factor12 <- data.part2%>%filter(pinElevation==2)

boxplot(factor11$obs.part2,factor12$obs.part2,
        main="Pin Elevation", names=c("Pin Elevation 1","Pin Elevation 3"),
        ylab="Distance [cm]")

factor21 <- data.part2%>%filter(relAngle==1)
factor22 <- data.part2%>%filter(relAngle==2)
factor23 <- data.part2%>%filter(relAngle==3)

boxplot(factor21$obs.part2,factor22$obs.part2,factor23$obs.part2,
        main="Release Angle", names=c("90 Degrees","110 Degrees","120 Degrees"),
        ylab="Distance [cm]")

In our hypothesis test, we were to check that the interaction between the two factors is significant, since the calculated p-value is lesser than \(\alpha\). This means that we can reject Ho based on the results of our model. However, the model used does not seem to be adequate for this data, because the strongest assumption is violated.

gad(model.aov.part2)
## Analysis of Variance Table
## 
## Response: obs.part2
##                       Df Sum Sq Mean Sq F value  Pr(>F)  
## pinElevation           1 3173.4  3173.4 41.6334 0.02319 *
## relAngle               2  152.4    76.2  6.5646 0.01186 *
## pinElevation:relAngle  2  152.4    76.2  6.5646 0.01186 *
## Residual              12  139.3    11.6                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Part III

Part a - Proposed Layout

For the \(2^4\) factorial we considered a Completed Randomized Design, in order to ensure that the data collection has no bias. The code below shows the proposed design for one replica of this experiment. The factors are following:

  1. A: Angle Release, where -1 is 90 degrees and 1 is 110 degrees;
  2. B: Pin Elevation, where -1 is Position 1 and 1 is Position 3;
  3. C: Bungees Position, where -1 is Bungee Posiion 1 and 1 in Bungee Position 2;
  4. D: Ball Type, where -1 is the yellow ball and 1 is the red ball.
trt <- c(2,2,2,2)

design <- design.ab(trt, r=1,design="crd",seed=45285)
design$book
##    plots r A B C D
## 1    101 1 2 1 2 2
## 2    102 1 1 2 2 2
## 3    103 1 1 1 1 2
## 4    104 1 1 2 1 2
## 5    105 1 2 2 2 2
## 6    106 1 2 2 1 1
## 7    107 1 2 1 2 1
## 8    108 1 2 2 1 2
## 9    109 1 2 1 1 1
## 10   110 1 2 1 1 2
## 11   111 1 1 2 1 1
## 12   112 1 1 2 2 1
## 13   113 1 1 1 2 1
## 14   114 1 1 1 1 1
## 15   115 1 2 2 2 1
## 16   116 1 1 1 2 2

Part b - Data Collection

After collecting the data according to the design that was proposed, the data collection showed the following results:

A <- c(1,1,-1,-1,1,-1,1,-1,-1,-1,-1,1,1,-1,1,1)
B <- c(1,-1,-1,-1,1,1,1,1,1,1,-1,-1,-1,-1,1,-1)
C <- c(-1,1,-1,1,1,1,-1,1,-1,-1,1,1,-1,-1,1,-1)
D <- c(1,1,1,1,1,-1,-1,1,-1,1,-1,-1,-1,-1,-1,1)
obs.part3 <- c(46,19,15,21,39,37,36,42,44,39,24,23,16,15,29,8)

data.part3 <- data.frame(A,B,C,D,obs.part3)

Part c - Model Equation and Hyphotesis Statement

The full model equation for this experiment is the following:

\[ y_{ijklm}=\mu+\alpha_i+\beta_j+\gamma_k+\eta_l+\alpha\beta_{ij}+\alpha\gamma_{ik}+\alpha\eta_{il}+\beta\gamma_{jk}+\beta\eta_{jl}+\gamma\eta_{kl}+\alpha\beta\gamma_{ijk}+\alpha\beta\eta_{ijl}+\alpha\gamma\eta_{ikl}+\beta\gamma\eta_{jkl}+\alpha\beta\gamma\eta_{ijkl}+\epsilon_{ijklm} \]

The hypothesis considering a four-term interaction for a \(2^4\) is the following:

\[ H_o: \alpha\beta\gamma\eta_{ijkl}=0 \\ H_a: \alpha\beta\gamma\eta_{ijkl}\neq0 \]

Where \(\alpha\) is the main effect of A factor ,\(\beta\) is the main effect of B factor, \(\gamma\) is the mais effect of C factor, \(\eta\) is the main effect of D factor ,and their multiplication represents the interaction between factors.

model.lm.part3 <- lm(obs.part3~A*B*C*D)
halfnormal(model.lm.part3)
## 
## Significant effects (alpha=0.05, Lenth method):
## [1] B   B:C

According to the halfnormal plot, only the factor B, C, and BC are significant for the data. Therefore, the linear model equation and the hypothesis to be tested need to change.

Part d - Hyphotesis Test and Conclusion

The following model equation and hypothesis represents only the significant factors for the data, where the rest of the factors were merged with the random error term, since they are not significant.

\[ y_{ijk}=\mu+\beta_j+\gamma_k+\beta\gamma_{jk}+\epsilon_{jkm} \]

\[ H_o: \beta\gamma_{jk}=0\\H_a: \beta\gamma_{jk}\neq0 \\ H_o: \beta_{j}=0\\H_a: \beta_{j}\neq0 \\ H_o: \gamma_{k}=0\\H_a: \gamma_{k}\neq0 \]

Finally, to test the hypothesis considering only the significant factors, we need to perform an ANOVA test since it was reduced to a \(2^2\) design.

model.aov.part3 <- aov(obs.part3~B+C+B*C,data=data.part3)
summary(model.aov.part3)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## B            1 1827.6  1827.6 103.814 2.92e-07 ***
## C            1   14.1    14.1   0.799   0.3890    
## B:C          1  162.6   162.6   9.234   0.0103 *  
## Residuals   12  211.3    17.6                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Even though the result shows that we can reject \(H_o\), we still need to check the same ANOVA assumptions that we checked in the prvious parts. Thus, to check the Constant Variance, we will the Scatter of the Residuals, and to check the normality, which is worth to remember that it is not a strong assumption, we will check QQ Normal Plot.

plot(model.aov.part3,1)

plot(model.aov.part3,2)

It is possible to confirm that the strongest assumption, the Constant Variance, might have been violated. Therefore, a data transformation is needed before testing the Hypothesis. First, we tried the log transformation to check if the assumption was not vaiolated anymore.

logobs.part3 <- log(obs.part3)

data.part3.transf <- data.frame(A,B,C,D,logobs.part3)

model.aov.part3.transf <- aov(logobs.part3~B+C+B*C,data=data.part3.transf)
plot(model.aov.part3.transf,1)

It is possible to see, from the plot bellow that the constant variance is met now, and, since the normality is not a strong assumption, we can still use the following model to check the hypothesis.

boxcox(obs.part3~B+C+B*C,data=data.part3)

obs.part3.box <- obs.part3^0.8
data.part3.box <- data.frame(A,B,C,D,obs.part3.box)

model.aov.part3.box <- aov(obs.part3.box~B+C+B*C,data=data.part3.box)
plot(model.aov.part3.box,1)

plot(model.aov.part3.box,2)

summary(model.aov.part3.box)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## B            1 313.85  313.85 104.347 2.84e-07 ***
## C            1   4.08    4.08   1.355  0.26704    
## B:C          1  30.27   30.27  10.064  0.00803 ** 
## Residuals   12  36.09    3.01                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The final conclusion for the third part of the project is that the interaction between the Pin Elevation and the Bungee Position are extremely significant for the data, and, therefore, we can reject Ho