To find out the number of samples required from each type of ball, we will have to perform the pwr.anova.test.
From the given experiment we have, Number of population(k)=3, mean difference with a medium effect (f)= 50% of the standard deviation and probability= 75%.
library(pwr)
pwr.anova.test(k=3,n=NULL,f=0.5,sig.level=0.05,power=0.75)
##
## Balanced one-way analysis of variance power calculation
##
## k = 3
## n = 12.50714
## f = 0.5
## sig.level = 0.05
## power = 0.75
##
## NOTE: n is number in each group
From the results of pwr.anova test we conclude that, 13 samples should be collected to detect a mean difference with a medium effect (i.e. 50% of the standard deviation) with a probability of 75%.
Now, we need to generate Completely Randomized Design with equal or different repetition to perform a designed experiment to determine the effect of the type of ball on the distance in which the ball is thrown.
library(agricolae)
designlevel<-c("yellow ball","red ball","black ball")
experimentdesign<-design.crd(trt=designlevel,r= 13,seed=0)
From the results of Completely Randomized Design test we generated the randomized experiment to note the observations to determine the effect of the type of ball on the distance in which the ball is thrown
Now, to test the hypothesis, we will first define the hypothesis.
Null Hypothesis : \(H_o : \mu_1 = \mu_2 = \mu_3 = .... \mu_i = \mu\)
Alternative Hypothesis : \(H_a\) : At least one of the \(\mu_i\) differs
Writing observations to perform anova test for completely Randomized Design
Here distance is in inches
yellowball<-c(77,81,79,88,89,85,87,75,79,71,68,78,75)
redball<-c(66,67,63,67,68,68,78,70,69,69,77,73,72)
blackball<-c(107,72,69,94,70,73,65,73,85,89,72,79,86)
typeofballs<-c(yellowball,redball,blackball)
a<-c(rep(1,13),rep(2,13),rep(3,13))
a<-as.factor(a)
d<-data.frame(typeofballs,a)
str(d)
## 'data.frame': 39 obs. of 2 variables:
## $ typeofballs: num 77 81 79 88 89 85 87 75 79 71 ...
## $ a : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
We will first plot the normal probability graphs to understand the structure of the data
Normal probability plot for Yellowball,redball and blackball
qqnorm(yellowball, main="Normal probability plot of yellowball", col = "darkorange",xlab = 'yellowball', ylab = 'readings')
qqline(yellowball, datax = FALSE, distribution = qnorm, probs = c(0.25, 0.75), qtype = 7)
abline(v=c(-1.8,1.7), col="darkorange")
qqnorm(redball, main="Normal probability plot of redball", col = "Red",xlab = 'redball', ylab = 'readings')
qqline(redball, datax = FALSE, distribution = qnorm, probs = c(0.25, 0.75), qtype = 7)
abline(v=c(-1.8,1.7), col="red")
qqnorm(blackball, main="Normal probability plot of blackball", col = "Gray60",xlab = 'redball', ylab = 'readings')
qqline(blackball, datax = FALSE, distribution = qnorm, probs = c(0.25, 0.75), qtype = 7)
abline(v=c(-1.8,1.7), col="Gray60")
From the normal probability plot for yellowball we can conclude that data is fairly distributed normally but it has some points that do not lie along the line in the upper region. This might indicate the potential outliers in the data.
From the normal probability plot for redball we can conclude that data is fairly distributed normally but it has some points that do not lie along the line in the upper region. This might indicate the potential outliers in the data.
From the normal probability plot for blackball we can conclude that data is fairly distributed normally but it has some points that do not lie along the line. This might indicate the potential outliers in the data.
Now we will plot the boxplot to simultaneously compare the data
boxplot(yellowball, redball, blackball, names = c("yellowball", "redball","blackball"), main="Comparing Boxplot for yellowball,redball and blackball", col=c("darkorange","red","gray60"))
From the boxplot, it appears that Blackball readings are distributed or more spread out when compared with Yellowball and Redball.
Now to estimate how a quantitative dependent variable changes according to the levels of one or more categorical independent variables we will perform anova test
model<-aov(typeofballs~a,data = d)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## a 2 814.3 407.2 5.957 0.00582 **
## Residuals 36 2460.6 68.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model)
From the results of anova test we get p-value as 0.00582. As the p-value is lesser than 0.05, we conclude that we reject the null hypothesis.
Also from the graph we conclude that variances are stabilize and almost constant
Now, to check assumptions of the model, we will plot residuals
b1<-rnorm(13,mean=79.38462,sd=6.487661)
b2<-rnorm(13,mean=69.76923,sd=4.265244)
b3<-rnorm(13,mean=79.53846,sd=12.03201)
ball<-c(b1,b2,b3)
x<-c(rep(1,13),rep(2,13),rep(3,13))
boxplot(ball~x,xlab="Type of Ball",ylab="Observations",main="Boxplot of Observations")
meanx<-c(rep(mean(b1),13),rep(mean(b2),13),rep(mean(b3),13))
residuals<-ball-meanx
qqnorm(residuals)
qqline(residuals)
plot(meanx,residuals,xlab="Predicted Tensile Strength",ylab="residuals",
main="constant variance check")
Also from the graph we conclude that variances are stabilize and almost constant
To compute the honest significant differences we will now perform the TukeyHSD test
library(agricolae)
TukeyHSD<-TukeyHSD(model)
TukeyHSD
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = typeofballs ~ a, data = d)
##
## $a
## diff lwr upr p adj
## 2-1 -9.6153846 -17.541638 -1.689132 0.0144221
## 3-1 0.1538462 -7.772407 8.080099 0.9987599
## 3-2 9.7692308 1.842978 17.695484 0.0127889
plot(TukeyHSD)
According to the graph of TukeyHSD test, we conclude that mean of group of “Blackball and Redball” and “Redball and Yellowball” does not fall in zero confidence interval, hence there is significant difference in the mean of readings in that group.