To find out the number of samples required from each type of ball, we will have to perform the pwr.anova.test.

From the given experiment we have, Number of population(k)=3, mean difference with a medium effect (f)= 50% of the standard deviation and probability= 75%.

library(pwr)
pwr.anova.test(k=3,n=NULL,f=0.5,sig.level=0.05,power=0.75)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 3
##               n = 12.50714
##               f = 0.5
##       sig.level = 0.05
##           power = 0.75
## 
## NOTE: n is number in each group

From the results of pwr.anova test we conclude that, 13 samples should be collected to detect a mean difference with a medium effect (i.e. 50% of the standard deviation) with a probability of 75%.

Now, we need to generate Completely Randomized Design with equal or different repetition to perform a designed experiment to determine the effect of the type of ball on the distance in which the ball is thrown.

library(agricolae)
designlevel<-c("yellow ball","red ball","black ball")
experimentdesign<-design.crd(trt=designlevel,r= 13,seed=0)

From the results of Completely Randomized Design test we generated the randomized experiment to note the observations to determine the effect of the type of ball on the distance in which the ball is thrown

Now, to test the hypothesis, we will first define the hypothesis.

Null Hypothesis : \(H_o : \mu_1 = \mu_2 = \mu_3 = .... \mu_i = \mu\)

Alternative Hypothesis : \(H_a\) : At least one of the \(\mu_i\) differs

Writing observations to perform anova test for completely Randomized Design

Here distance is in inches

yellowball<-c(77,81,79,88,89,85,87,75,79,71,68,78,75)
redball<-c(66,67,63,67,68,68,78,70,69,69,77,73,72)
blackball<-c(107,72,69,94,70,73,65,73,85,89,72,79,86)
typeofballs<-c(yellowball,redball,blackball)
a<-c(rep(1,13),rep(2,13),rep(3,13))
a<-as.factor(a)
d<-data.frame(typeofballs,a)
str(d)
## 'data.frame':    39 obs. of  2 variables:
##  $ typeofballs: num  77 81 79 88 89 85 87 75 79 71 ...
##  $ a          : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...

We will first plot the normal probability graphs to understand the structure of the data

Normal probability plot for Yellowball,redball and blackball

qqnorm(yellowball, main="Normal probability plot of yellowball", col = "darkorange",xlab = 'yellowball', ylab = 'readings')
qqline(yellowball, datax = FALSE, distribution = qnorm, probs = c(0.25, 0.75), qtype = 7)
abline(v=c(-1.8,1.7), col="darkorange")

qqnorm(redball, main="Normal probability plot of redball", col = "Red",xlab = 'redball', ylab = 'readings')
qqline(redball, datax = FALSE, distribution = qnorm, probs = c(0.25, 0.75), qtype = 7)
abline(v=c(-1.8,1.7), col="red")

qqnorm(blackball, main="Normal probability plot of blackball", col = "Gray60",xlab = 'redball', ylab = 'readings')
qqline(blackball, datax = FALSE, distribution = qnorm, probs = c(0.25, 0.75), qtype = 7)
abline(v=c(-1.8,1.7), col="Gray60")

From the normal probability plot for yellowball we can conclude that data is fairly distributed normally but it has some points that do not lie along the line in the upper region. This might indicate the potential outliers in the data.

From the normal probability plot for redball we can conclude that data is fairly distributed normally but it has some points that do not lie along the line in the upper region. This might indicate the potential outliers in the data.

From the normal probability plot for blackball we can conclude that data is fairly distributed normally but it has some points that do not lie along the line. This might indicate the potential outliers in the data.

Now we will plot the boxplot to simultaneously compare the data

boxplot(yellowball, redball, blackball, names = c("yellowball", "redball","blackball"), main="Comparing Boxplot for yellowball,redball and blackball", col=c("darkorange","red","gray60"))

From the boxplot, it appears that Blackball readings are distributed or more spread out when compared with Yellowball and Redball.

Now to estimate how a quantitative dependent variable changes according to the levels of one or more categorical independent variables we will perform anova test

model<-aov(typeofballs~a,data = d)
summary(model)
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## a            2  814.3   407.2   5.957 0.00582 **
## Residuals   36 2460.6    68.4                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model)

From the results of anova test we get p-value as 0.00582. As the p-value is lesser than 0.05, we conclude that we reject the null hypothesis.

Also from the graph we conclude that variances are stabilize and almost constant

Now, to check assumptions of the model, we will plot residuals

b1<-rnorm(13,mean=79.38462,sd=6.487661)
b2<-rnorm(13,mean=69.76923,sd=4.265244)
b3<-rnorm(13,mean=79.53846,sd=12.03201)
ball<-c(b1,b2,b3)
x<-c(rep(1,13),rep(2,13),rep(3,13))
boxplot(ball~x,xlab="Type of Ball",ylab="Observations",main="Boxplot of Observations")

meanx<-c(rep(mean(b1),13),rep(mean(b2),13),rep(mean(b3),13))
residuals<-ball-meanx
qqnorm(residuals)
qqline(residuals)

plot(meanx,residuals,xlab="Predicted Tensile Strength",ylab="residuals",
     main="constant variance check")

Also from the graph we conclude that variances are stabilize and almost constant

To compute the honest significant differences we will now perform the TukeyHSD test

library(agricolae)
TukeyHSD<-TukeyHSD(model)
TukeyHSD
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = typeofballs ~ a, data = d)
## 
## $a
##           diff        lwr       upr     p adj
## 2-1 -9.6153846 -17.541638 -1.689132 0.0144221
## 3-1  0.1538462  -7.772407  8.080099 0.9987599
## 3-2  9.7692308   1.842978 17.695484 0.0127889
plot(TukeyHSD)

According to the graph of TukeyHSD test, we conclude that mean of group of “Blackball and Redball” and “Redball and Yellowball” does not fall in zero confidence interval, hence there is significant difference in the mean of readings in that group.