Project Part 1

Q1: Determine how many samples should be collected to detect a mean difference with a medium effect (i.e. 50% of the standard deviation) with a probability of 75%.

library(pwr)

pwr.anova.test(k=3,n=NULL,f=0.5,sig.level=0.05, power=0.75)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 3
##               n = 12.50714
##               f = 0.5
##       sig.level = 0.05
##           power = 0.75
## 
## NOTE: n is number in each group

Ans: We have three levels and alpha= 0.05. Results show n = 12.50714, so we need at least 13 samples for each group.

————————————————————————————————

Q2: Propose a layout using the number of samples from part (a) with randomized run order.

library(agricolae)

#CRD Experimental Design-completely random design, replication is random as well.
#Specify the I levels of a factor/treatment in a vector
trt1<-c("1","2","3") 
#Can set different random number seeds 
#Specify the J number of replications (r=13)
design<-design.crd(trt=trt1,r=13,seed=8373)
design$book
##    plots  r trt1
## 1    101  1    1
## 2    102  2    1
## 3    103  1    3
## 4    104  2    3
## 5    105  3    1
## 6    106  3    3
## 7    107  1    2
## 8    108  2    2
## 9    109  4    3
## 10   110  5    3
## 11   111  4    1
## 12   112  3    2
## 13   113  4    2
## 14   114  5    1
## 15   115  5    2
## 16   116  6    3
## 17   117  6    1
## 18   118  6    2
## 19   119  7    1
## 20   120  7    2
## 21   121  7    3
## 22   122  8    2
## 23   123  8    3
## 24   124  8    1
## 25   125  9    3
## 26   126  9    2
## 27   127 10    2
## 28   128  9    1
## 29   129 10    1
## 30   130 11    1
## 31   131 12    1
## 32   132 11    2
## 33   133 12    2
## 34   134 10    3
## 35   135 11    3
## 36   136 13    2
## 37   137 12    3
## 38   138 13    3
## 39   139 13    1

Ans: Layout is present herein. “1” denotes the largest ball, “2” denotes the medium-sized ball and “3” denotes the smallest ball.

————————————————————————————————

Q3: Collect data and record observations on layout proposed in part (b).

getwd() # get the working directory.
## [1] "C:/Users/jiachliu/Desktop/IE5344_Martis/Project"
setwd("C:/Users/jiachliu/Desktop/IE5344_Martis/Project")
clm <- read.csv("dat1.csv", header=FALSE)
str(clm)
## 'data.frame':    39 obs. of  2 variables:
##  $ V1: chr  "81" "72" "80" "75" ...
##  $ V2: int  1 1 3 3 1 3 2 2 3 3 ...
clm[1,1] <- 81
clm$V1 <- as.numeric(clm$V1)
clm$V2 <- as.factor(clm$V2)
str(clm)
## 'data.frame':    39 obs. of  2 variables:
##  $ V1: num  81 72 80 75 84 72 75 78 74 73 ...
##  $ V2: Factor w/ 3 levels "1","2","3": 1 1 3 3 1 3 2 2 3 3 ...
colnames(clm)<-c("results","size")
head(clm)
##   results size
## 1      81    1
## 2      72    1
## 3      80    3
## 4      75    3
## 5      84    1
## 6      72    3

Ans: We collected the experimental data in excel, imported the csv file into the R and wrangled the data for further usage.

————————————————————————————————

Q4:Perform hypothesis test and check residuals. Be sure to comment and take corrective action if necessary.

## clm <- read.csv("C:/Users/ASUS/Desktop/TTU2021_2022_Fall/Project/dat1.xlsx")
## type <- c(rep(1,5),rep(2,5),rep(3,5),rep(4,5),rep(5,5))
## dat1 <- cbind(clm,type)

aov.model<-aov(results~size,data=clm)
summary(aov.model)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## size         2   78.5   39.26   3.235 0.0511 .
## Residuals   36  436.9   12.14                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)

null hypothesis: \(H_0: \mu_{1}=\mu_{2}=\cdots =\mu_{i}=\mu\)

alternative hypothesis: \(H_1:\) at least one \(\mu_{i}\) differs

Ans: A normal probability plot of the residuals was constructed. From this plot, it is determined that the samples passed fat pencil test and roughly follow a normal distribution. Additionally, the residual plots are of roughly the same width, implying that the variance is nearly constant among the three ball types. Therefore, no variance stabilizing transformations (VST) were performed on the data.

From the ANOVA, since p=0.0511>0.05, we do not reject null hypothesis. That is, means do not have significant difference. However, this conclusion is not safe and will be further explained and investigated.

————————————————————————————————

Q5: If the null hypothesis is rejected, investigate pairwise comparisons.

Ans: Depite we drew some aforementioned preliminary conclusions, however, since the P-value is close to the significance level of 0.05, we are not so condfident in this non-rejection. Thus, we performed the Pairwise Comparisons (PC) to render our conclusions safer and more robust. Namely, in the case where null hypothesis can be rejected (at least one mean must differ from others).

null hypothesis: \(H_0: \mu_{i}-\mu_{j}=0\)

alternative hypothesis: \(H_1:\mu_{i} -\mu_{j} \ne0\)

 TukeyHSD(aov.model,conf.level = 0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = results ~ size, data = clm)
## 
## $size
##          diff       lwr        upr     p adj
## 2-1 -1.461538 -4.801555  1.8784776 0.5387792
## 3-1 -3.461538 -6.801555 -0.1215224 0.0408732
## 3-2 -2.000000 -5.340016  1.3400161 0.3201006
 plot(TukeyHSD(aov.model,conf.level = 0.95))

Comments: In the 95% Confidence level plot, pairs 1-2 & 2-3 have ranges that include 0, implying that we fail to reject null hypothesis (i.e., the means of those pairs does not differ significantly from one another). In comparison, The third possible pair of means, 1-3 has a value range solely in the negative, which means we reject null hypothesis (i.e., the pair of means does differ significantly).

————————————————————————————————

Q6: State all findings and make recommendation.

Ans: After performing ANOVA and Tukey HSD on the collected data, it was determined that ball type does have an effect on the distance it travels after being launched from a Statapult, and that ball types 1 and 3 have different mean distances with 95% confidence while other population pairs do not differ in their respective mean distances.

However, this determination that ball types 1 and 3 have different means is still not very strong. This is because in the Tukey HSD method, the range of values for pair 1-3 nearly approaches 0. If the more multiple experiments were to be conducted with more samples collected, this range might then include 0. This idea is also supported by the fact that the P-value calculated from ANOVA was 0.0511, which is colse to the siginificant level of 0.05.