library(agricolae)
## Warning: package 'agricolae' was built under R version 4.2.2
library(GAD)
## Loading required package: matrixStats
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.2 (2022-06-13 22:00:14 UTC) successfully loaded. See ?R.methodsS3 for help.
library(pwr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:matrixStats':
##
## count
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(DoE.base)
## Warning: package 'DoE.base' was built under R version 4.2.2
## Loading required package: grid
## Loading required package: conf.design
## Registered S3 method overwritten by 'DoE.base':
## method from
## factorize.factor conf.design
##
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
##
## aov, lm
## The following object is masked from 'package:graphics':
##
## plot.design
## The following object is masked from 'package:base':
##
## lengths
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
In the first part of the project we are testing the significance of the effect of the projectile on the travel distance. For this, we performed a Power Analysis to calculate the sample size and an ANOVA analysis in order to test he Statistical Hypothesis.
pwr.anova.test(k=3,n=NULL,f=0.45*(sqrt(8)/3), sig.level=0.05, power=0.55)
##
## Balanced one-way analysis of variance power calculation
##
## k = 3
## n = 11.35348
## f = 0.4242641
## sig.level = 0.05
## power = 0.55
##
## NOTE: n is number in each group
The power analysis is useful to the determine the sample size of an experiment. For given significance level and power, which correspond to the probability of rejecting Ho this being false, we are able to ensure that the sample size is large enough for this experiment.
In this study, we considered a power of 55% and a significance level of 95% in an One Factor Experiment with three levels. Thus, the result of the power analysis has shown that we need at least 12 samples per level, to ensure our parameters conditions.
In order to ensure a non-biased experiment, we used a Complete Randomized Design and result is shown in the table below.
trt <- c("Rock","Tennis Ball","Golf Ball")
design.part1 <- design.crd(trt=trt,r=12,seed=213124)
design.part1$book
## plots r trt
## 1 101 1 Rock
## 2 102 2 Rock
## 3 103 1 Tennis Ball
## 4 104 3 Rock
## 5 105 2 Tennis Ball
## 6 106 1 Golf Ball
## 7 107 3 Tennis Ball
## 8 108 2 Golf Ball
## 9 109 4 Rock
## 10 110 5 Rock
## 11 111 3 Golf Ball
## 12 112 4 Tennis Ball
## 13 113 4 Golf Ball
## 14 114 5 Golf Ball
## 15 115 5 Tennis Ball
## 16 116 6 Rock
## 17 117 6 Golf Ball
## 18 118 7 Golf Ball
## 19 119 8 Golf Ball
## 20 120 9 Golf Ball
## 21 121 6 Tennis Ball
## 22 122 7 Rock
## 23 123 7 Tennis Ball
## 24 124 8 Rock
## 25 125 8 Tennis Ball
## 26 126 10 Golf Ball
## 27 127 11 Golf Ball
## 28 128 9 Rock
## 29 129 10 Rock
## 30 130 11 Rock
## 31 131 9 Tennis Ball
## 32 132 10 Tennis Ball
## 33 133 12 Rock
## 34 134 11 Tennis Ball
## 35 135 12 Tennis Ball
## 36 136 12 Golf Ball
The Statapult was used to throw the three different projectiles according to following setup:
To collect the data we pulled the Statapult fully back before releasing. Also, we have used a metric tape to measure the distance traveled by the projectiles, which was estimated visually by one of the team members. Lastly, the data was collected in the order of the proposed design in Part b.
The measured distances are shown in the table bellow.
projectiles <- design.part1$book$trt
obs.part1 <- c(79,48,50,60,54,72,49,55,45,52,61,48,49,55,55,44,60,65,60,63,45,57,63,51,59,50,52,58,58,45,65,55,55,52,53,63)
data.part1 <- data.frame(projectiles, obs.part1)
data.part1$projectiles <- as.factor(data.part1$projectiles)
data.part1
## projectiles obs.part1
## 1 Rock 79
## 2 Rock 48
## 3 Tennis Ball 50
## 4 Rock 60
## 5 Tennis Ball 54
## 6 Golf Ball 72
## 7 Tennis Ball 49
## 8 Golf Ball 55
## 9 Rock 45
## 10 Rock 52
## 11 Golf Ball 61
## 12 Tennis Ball 48
## 13 Golf Ball 49
## 14 Golf Ball 55
## 15 Tennis Ball 55
## 16 Rock 44
## 17 Golf Ball 60
## 18 Golf Ball 65
## 19 Golf Ball 60
## 20 Golf Ball 63
## 21 Tennis Ball 45
## 22 Rock 57
## 23 Tennis Ball 63
## 24 Rock 51
## 25 Tennis Ball 59
## 26 Golf Ball 50
## 27 Golf Ball 52
## 28 Rock 58
## 29 Rock 58
## 30 Rock 45
## 31 Tennis Ball 65
## 32 Tennis Ball 55
## 33 Rock 55
## 34 Tennis Ball 52
## 35 Tennis Ball 53
## 36 Golf Ball 63
For the Projectiles, we have used a Scale to weigh of each Projectile. The results are shown below:
Finally, we were trying to determine the effect of the ball in the travel distance. In conclusion, the formulated hypothesis should compare the mean of the distance traveled, as it follows:
\[ H_o: \mu_1=\mu_2=\mu_3=\mu \\ H_a: At\; least\; one\; \mu_i \neq \mu \]
Since the experiment requires a One Factor with Three Levels, an Analysis of Variance (ANOVA) was appropriated to test the data. The assumptions needed to perform an accurate ANOVA analysis are:
Therefore, we need to check the residuals of the analysis to ensure that model is adequate:
model.aov.part1 <- aov(obs.part1~projectiles, data=data.part1)
plot(model.aov.part1,1)
plot(model.aov.part1,2)
From the Scatter Residuals Plot, we can check that the variance is fairly constant across the levels of the experiment. It possible to state this by analyzing the height of each scatter column, as shown in the plot above.
Regarding the Normality Assumption, the second plot suggests that the data is fairly normal, since the data observations seem to follow the line.
We have used the boxplot to have a deeper analysis of the data. It possible, now, to undertand that the Rock actually has a higher variance, which might explained by the irregularity of its shape. Also, we can observe that the mean distance of the golf ball is higher than the others, which is expected as it is the lighter projectile.
tennis <- data.part1 %>% filter(projectiles=="Tennis Ball")
rock <- data.part1 %>% filter(projectiles=="Rock")
golf <- data.part1 %>% filter(projectiles=="Golf Ball")
boxplot(tennis$obs.part1, rock$obs.part1, golf$obs.part1,
main="Ball Comparisions",
names=c("Tennis Ball","Rock", "Golf Ball"),
ylab=c("Distance [cm]"))
summary(model.aov.part1)
## Df Sum Sq Mean Sq F value Pr(>F)
## projectiles 2 168.7 84.36 1.458 0.247
## Residuals 33 1908.9 57.85
From the calculated results, the p-value for projectiles shows that we can not reject Ho, because its value is higher than \(\alpha=0.05\).
Since we could not reject Ho, there is no pairwise comparision to be made.
The results pointed that there is no significant difference between the three levels of the projectile factor. This means that:
\[ P_{value} = 0.247 > \alpha = 0.05 \] And, therefore, Ho is not reject.
This experiment has a 2 Factor Design. The first factor, the pin elevation, has two levels, First Position and Third Position, while the second factor (Release Angle) has three levels, 90, 110, and 120 degrees. Thus, the Linear Model equation for this experiment is the following:
\[ y_{ijk} = \mu+ \alpha_i+\beta_j+\alpha\beta_{ij}+\epsilon_{ijk} \] Where,
\[ \mu: Overall \; mean \\ \alpha: Pin \; Elevation \; Effect \\ \beta: Release \; Angle \; Effect \\ \alpha\beta: Interaction \; between \; first \; and \; second \; effects \\ \epsilon: Random \; Error \\ i= 1, 2 \; and \; j=1, 2, 3 \]
The hypothesis that we wanted to test with this experiment are 1) Test the interaction between the factors to understand if there is a significant interaction between them; 2) Test each Main Effect of the factors to understand their effect on their traveled distance. Therefore, the hypothesis can be formulated as follows:
This experiment is classified as Three Times Replicated Two Factor Design, with 2 levels for the first factor and three levels for the second factor. So, to achieve a non-biased design, we used a Completed Randomized Design, as follows.
trt <- c(2,3)
design.part2 <- design.ab(trt,r=3,design="crd",seed=123456789)
design.part2$book
## plots r A B
## 1 101 1 2 1
## 2 102 1 1 2
## 3 103 1 2 3
## 4 104 1 1 1
## 5 105 1 2 2
## 6 106 2 1 1
## 7 107 2 1 2
## 8 108 2 2 2
## 9 109 2 2 3
## 10 110 1 1 3
## 11 111 3 2 3
## 12 112 2 1 3
## 13 113 2 2 1
## 14 114 3 1 1
## 15 115 3 1 3
## 16 116 3 2 2
## 17 117 3 1 2
## 18 118 3 2 1
The data was collected obeying the stipulated order from the Completed Randomized Design of Part b. The Pin Elevation factor was considered as Fixed Effect and the Release Angle was considered as Random Effect, which gave us a Mixed Effect Model.
The collected data is presented in the table below.
pinElevation <- design.part2$book$A
relAngle <- design.part2$book$B
obs.part2 <- c(51,29,62,25,56,27,34,57,55,29,65,26,49,29,24,54,29,42)
data.part2 <- data.frame(pinElevation, relAngle, obs.part2)
data.part2$pinElevation <- as.fixed(data.part2$pinElevation)
data.part2$relAngle <- as.random(data.part2$relAngle)
data.part2
## pinElevation relAngle obs.part2
## 1 2 1 51
## 2 1 2 29
## 3 2 3 62
## 4 1 1 25
## 5 2 2 56
## 6 1 1 27
## 7 1 2 34
## 8 2 2 57
## 9 2 3 55
## 10 1 3 29
## 11 2 3 65
## 12 1 3 26
## 13 2 1 49
## 14 1 1 29
## 15 1 3 24
## 16 2 2 54
## 17 1 2 29
## 18 2 1 42
Since the Experiment is a Two Factor Design, an ANOVA is the most appropriate model to test the Hypothesis. Analogally to Part 1, we need to test the same Assumptions to ensure that the ANOVA is suitable for this data. So, let’s check the Residual, the Normality and boxplot graphs for the constant variance and normality of the data.
model.aov.part2 <- aov(obs.part2~pinElevation+relAngle+pinElevation*relAngle,data=data.part2)
From a visual inspection of the plot below, it seems that the constant variance assumption is violated, because the height of the scatters is not constant. Conversely, the data seems to be fairly normal, according to the second plot below. The data follows approximately a normal line.
plot(model.aov.part2,1)
plot(model.aov.part2,2)
From the boxplots below, we can see the effect of the pinelevation factor on the projectile travel distance. the pin elevation seems to have a significant effect on the travel distance. In contrast to the Release Angle, which has not shown such significance from the plots.
Also, regarding the variance of the data, the height of the boxes differ, which reinforces the idea that the Constant Variance Assumption might be violated.
factor11 <- data.part2%>%filter(pinElevation==1)
factor12 <- data.part2%>%filter(pinElevation==2)
boxplot(factor11$obs.part2,factor12$obs.part2,
main="Pin Elevation", names=c("Pin Elevation 1","Pin Elevation 3"),
ylab="Distance [cm]")
factor21 <- data.part2%>%filter(relAngle==1)
factor22 <- data.part2%>%filter(relAngle==2)
factor23 <- data.part2%>%filter(relAngle==3)
boxplot(factor21$obs.part2,factor22$obs.part2,factor23$obs.part2,
main="Release Angle", names=c("90 Degrees","110 Degrees","120 Degrees"),
ylab="Distance [cm]")
In our hypothesis test, we were to check that the interaction between
the two factors is significant, since the calculated p-value is lesser
than \(\alpha\). This means that we can
reject Ho based on the results of our model. However, the model used
does not seem to be adequate for this data, because the strongest
assumption is violated.
gad(model.aov.part2)
## Analysis of Variance Table
##
## Response: obs.part2
## Df Sum Sq Mean Sq F value Pr(>F)
## pinElevation 1 3173.4 3173.4 41.6334 0.02319 *
## relAngle 2 152.4 76.2 6.5646 0.01186 *
## pinElevation:relAngle 2 152.4 76.2 6.5646 0.01186 *
## Residual 12 139.3 11.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For the \(2^4\) factorial we considered a Completed Randomized Design, in order to ensure that the data collection has no bias. The code below shows the proposed design for one replica of this experiment. The factors are following:
trt <- c(2,2,2,2)
design <- design.ab(trt, r=1,design="crd",seed=45285)
design$book
## plots r A B C D
## 1 101 1 2 1 2 2
## 2 102 1 1 2 2 2
## 3 103 1 1 1 1 2
## 4 104 1 1 2 1 2
## 5 105 1 2 2 2 2
## 6 106 1 2 2 1 1
## 7 107 1 2 1 2 1
## 8 108 1 2 2 1 2
## 9 109 1 2 1 1 1
## 10 110 1 2 1 1 2
## 11 111 1 1 2 1 1
## 12 112 1 1 2 2 1
## 13 113 1 1 1 2 1
## 14 114 1 1 1 1 1
## 15 115 1 2 2 2 1
## 16 116 1 1 1 2 2
After collecting the data according to the design that was proposed, the data collection showed the following results:
A <- c(1,1,-1,-1,1,-1,1,-1,-1,-1,-1,1,1,-1,1,1)
B <- c(1,-1,-1,-1,1,1,1,1,1,1,-1,-1,-1,-1,1,-1)
C <- c(-1,1,-1,1,1,1,-1,1,-1,-1,1,1,-1,-1,1,-1)
D <- c(1,1,1,1,1,-1,-1,1,-1,1,-1,-1,-1,-1,-1,1)
obs.part3 <- c(46,19,15,21,39,37,36,42,44,39,24,23,16,15,29,8)
data.part3 <- data.frame(A,B,C,D,obs.part3)
The full model equation for this experiment is the following:
\[ y_{ijklm}=\mu+\alpha_i+\beta_j+\gamma_k+\eta_l+\alpha\beta_{ij}+\alpha\gamma_{ik}+\alpha\eta_{il}+\beta\gamma_{jk}+\beta\eta_{jl}+\gamma\eta_{kl}+\alpha\beta\gamma_{ijk}+\alpha\beta\eta_{ijl}+\alpha\gamma\eta_{ikl}+\beta\gamma\eta_{jkl}+\alpha\beta\gamma\eta_{ijkl}+\epsilon_{ijklm} \]
The hypothesis considering a four-term interaction for a \(2^4\) is the following:
\[ H_o: \alpha\beta\gamma\eta_{ijkl}=0 \\ H_a: \alpha\beta\gamma\eta_{ijkl}\neq0 \]
Where \(\alpha\) is the main effect of A factor ,\(\beta\) is the main effect of B factor, \(\gamma\) is the mais effect of C factor, \(\eta\) is the main effect of D factor ,and their multiplication represents the interaction between factors.
model.lm.part3 <- lm(obs.part3~A*B*C*D)
halfnormal(model.lm.part3)
##
## Significant effects (alpha=0.05, Lenth method):
## [1] B B:C
According to the halfnormal plot, only the factor B, C, and BC are significant for the data. Therefore, the linear model equation and the hypothesis to be tested need to change.
The following model equation and hypothesis represents only the significant factors for the data, where the rest of the factors were merged with the random error term, since they are not significant.
\[ y_{ijk}=\mu+\beta_j+\gamma_k+\beta\gamma_{jk}+\epsilon_{jkm} \]
\[ H_o: \beta\gamma_{jk}=0\\H_a: \beta\gamma_{jk}\neq0 \\ H_o: \beta_{j}=0\\H_a: \beta_{j}\neq0 \\ H_o: \gamma_{k}=0\\H_a: \gamma_{k}\neq0 \]
Finally, to test the hypothesis considering only the significant factors, we need to perform an ANOVA test since it was reduced to a \(2^2\) design.
model.aov.part3 <- aov(obs.part3~B+C+B*C,data=data.part3)
summary(model.aov.part3)
## Df Sum Sq Mean Sq F value Pr(>F)
## B 1 1827.6 1827.6 103.814 2.92e-07 ***
## C 1 14.1 14.1 0.799 0.3890
## B:C 1 162.6 162.6 9.234 0.0103 *
## Residuals 12 211.3 17.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Even though the result shows that we can reject \(H_o\), we still need to check the same ANOVA assumptions that we checked in the prvious parts. Thus, to check the Constant Variance, we will the Scatter of the Residuals, and to check the normality, which is worth to remember that it is not a strong assumption, we will check QQ Normal Plot.
plot(model.aov.part3,1)
plot(model.aov.part3,2)
It is possible to confirm that the strongest assumption, the Constant Variance, might have been violated. Therefore, a data transformation is needed before testing the Hypothesis. First, we tried the log transformation to check if the assumption was not vaiolated anymore.
logobs.part3 <- log(obs.part3)
data.part3.transf <- data.frame(A,B,C,D,logobs.part3)
model.aov.part3.transf <- aov(logobs.part3~B+C+B*C,data=data.part3.transf)
plot(model.aov.part3.transf,1)
It is possible to see, from the plot bellow that the constant variance is met now, and, since the normality is not a strong assumption, we can still use the following model to check the hypothesis.
boxcox(obs.part3~B+C+B*C,data=data.part3)
obs.part3.box <- obs.part3^0.8
data.part3.box <- data.frame(A,B,C,D,obs.part3.box)
model.aov.part3.box <- aov(obs.part3.box~B+C+B*C,data=data.part3.box)
plot(model.aov.part3.box,1)
plot(model.aov.part3.box,2)
summary(model.aov.part3.box)
## Df Sum Sq Mean Sq F value Pr(>F)
## B 1 313.85 313.85 104.347 2.84e-07 ***
## C 1 4.08 4.08 1.355 0.26704
## B:C 1 30.27 30.27 10.064 0.00803 **
## Residuals 12 36.09 3.01
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The final conclusion for the third part of the project is that the interaction between the Pin Elevation and the Bungee Position are extremely significant for the data, and, therefore, we can reject Ho