Question 3.7

Formulation:

\[ H_o: \mu_1=\mu_2=\mu_3=\mu_4=\mu \\ H_1: at\; least\; one\; \mu_k \neq \mu \]

Libraries and initial data processing

library(tidyr)
library(agricolae)

mix1 <- c(3129,3000,2865,2890)
mix2 <- c(3200,3300,2975,3150)
mix3 <- c(2800,2900,2985,3050)
mix4 <- c(2600,2700,2600,2765)

dat <- data.frame(mix1,mix2,mix3,mix4)
dat <- pivot_longer(dat, c(mix1,mix2,mix3,mix4))

model <- aov(value ~ name, data=dat)

Item c

print(LSD.test(model, "name",alpha=0.05))

## $statistics
##    MSerror Df     Mean       CV  t.value      LSD
##   12825.69 12 2931.812 3.862817 2.178813 174.4798
## 
## $parameters
##         test p.ajusted name.t ntr alpha
##   Fisher-LSD      none   name   4  0.05
## 
## $means
##        value       std r      LCL      UCL  Min  Max     Q25    Q50     Q75
## mix1 2971.00 120.55704 4 2847.624 3094.376 2865 3129 2883.75 2945.0 3032.25
## mix2 3156.25 135.97641 4 3032.874 3279.626 2975 3300 3106.25 3175.0 3225.00
## mix3 2933.75 108.27242 4 2810.374 3057.126 2800 3050 2875.00 2942.5 3001.25
## mix4 2666.25  80.97067 4 2542.874 2789.626 2600 2765 2600.00 2650.0 2716.25
## 
## $comparison
## NULL
## 
## $groups
##        value groups
## mix2 3156.25      a
## mix1 2971.00      b
## mix3 2933.75      b
## mix4 2666.25      c
## 
## attr(,"class")
## [1] "group"

As we can see in the print of the LSD.test function, we can state that:

the groups mix1 and mix3 are not significantly different between themselves
the group mix2 is significantly different from mix1 and mix3 and also from the group mix4
the group mix4 is also significantly different from mix1 and mix3 and also mix2

Item d

plot(model,2)

As it is possible to see in the second plot of the function, also called Normal Q-Q, it is possible to assume that the residuals are fairly normal, since they follow an straight line.

item e

plot(model,1)

As it is possible to observe in the plot Residuals vs Fitted (first plot), we can observe that the group mix2 and mix3 really have mean values that are way more closer when compared to groups mix4 and mix1. Also, it is possible to observe that the standard deviation, when comparing mix2 and mix3, are way more closer to each other (size of the scatter plot), when compared to mix4 and mix1.

Item f

plot(model,3)

With the plot above, it is possible to conclude how the means are different between mix1, mix4 and how the means of the group mix2 and 3 are similar.

It is important to emphasize that the variance is not that similiar between mix1, mix4 and the group mix2 and mix3. Which might indicate an error by choosing the ANOVA model.

Question 3.10

Formulation:

\[ H_o: \mu_1=\mu_2=\mu_3=\mu_4=\mu_5=\mu \\ H_1: at\; least\; one\; \mu_k \neq \mu \]

Data processing

library(tidyr)
library(agricolae)

cotton1 <- c(7,7,15,11,9)
cotton2 <- c(12,17,12,18,18)
cotton3 <- c(14,19,19,18,18)
cotton4 <- c(19,25,22,19,23)
cotton5 <- c(7,10,11,15,11)

cotton <- data.frame(cotton1,cotton2,cotton3,cotton4,cotton5)
cotton <- pivot_longer(cotton, c(cotton1,cotton2,cotton3,cotton4,cotton5))

model.cotton <- aov(value ~ name, data=cotton)

Item b

print(LSD.test(model.cotton, "name", alpha=0.05))

## $statistics
##   MSerror Df  Mean       CV  t.value      LSD
##      8.06 20 15.04 18.87642 2.085963 3.745452
## 
## $parameters
##         test p.ajusted name.t ntr alpha
##   Fisher-LSD      none   name   5  0.05
## 
## $means
##         value      std r       LCL      UCL Min Max Q25 Q50 Q75
## cotton1   9.8 3.346640 5  7.151566 12.44843   7  15   7   9  11
## cotton2  15.4 3.130495 5 12.751566 18.04843  12  18  12  17  18
## cotton3  17.6 2.073644 5 14.951566 20.24843  14  19  18  18  19
## cotton4  21.6 2.607681 5 18.951566 24.24843  19  25  19  22  23
## cotton5  10.8 2.863564 5  8.151566 13.44843   7  15  10  11  11
## 
## $comparison
## NULL
## 
## $groups
##         value groups
## cotton4  21.6      a
## cotton3  17.6      b
## cotton2  15.4      b
## cotton5  10.8      c
## cotton1   9.8      c
## 
## attr(,"class")
## [1] "group"

From the result of Fisher LSD method, it is possible to conclude that:

Groups cotton3 and cotton2 are not significantly different;
Groups cotton5 and cotton1 are not significantly different;
The group cotton4 is significantly different from the others.

Item c

plot(model.cotton,1)

From the plot above, it is possible to observe, first the distribution of the groups. In other words, it is possible to observe visually that there are two groups of two that do not present such meaningful difference. Also, it is possible to visualize that there is a group that has a greater difference of the others, which is group cotton4.

From the residuals plot, we can observe that the variance of the two first groups are bigger than the other ones, by compare its size. This indicates that the ANOVA model might not be the most adequate model for these samples, due to the strong assumption of constant Variance.

From the point of view of the weak assumption of normality, after analyzing the plot below, the data is might not be normal. The data gives the impression of some skewness, since we have four points that are considerably away from the line.

plot(model.cotton,2)

Question 3.44

Data treatment

library(pwr)

means <- c(50,50,60,60)

From the given data, it is possible to conclude that the distribution of the means has a maximum variability effect.

Power analysis

To perform the Power Analysis, let’s perform some calculations first:

\[ d = \frac{\mu_{max}-\mu_{min}}{\sigma}=\frac{60-50}{5}=2 \\ \]

Therefore, the effect for for maximum variability and even observations is given by:

\[ f=\frac{d}{2}=\frac{2}{2}=1 \]

pwr.anova.test(k = 4, n = NULL, f=1, sig.level = 0.05, power = 0.9)

## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 4.658119
##               f = 1
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

It is needed, approximately, 5 samples per group.

Question 3.45

Item a

\[ \sigma^2 = 36 \]

\[ d= \frac{10}{6}=1.667 \]

Therefore, our effect would be

\[ f=\frac{1.667}{2}=0.833 \]

pwr.anova.test(k = 4, n = NULL, f=0.833, sig.level = 0.05, power = 0.9)

## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 6.184871
##               f = 0.833
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

Then, our power analysis would tell us that we need approximately 7 samples per group

Item b

Following the same steps of Item a,

\[ f=\frac{1.429}{2}=0.7143 \]

pwr.anova.test(k = 4, n = NULL, f=0.7143, sig.level = 0.05, power = 0.9)

## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 7.998476
##               f = 0.7143
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

We would need 8 samples per group.

Item c

By fixing the power, when the variance increase, the number of needed samples will increase too. This happens, because, in order to achieve a determined chance in reject Ho, this being false, a higher number of samples is needed to guarantee a good representation of the population.

Item d

Before design the experiment it is important to determine and fix \(\alpha\) and \(\beta\). With that, the power will be fixed as well. After that, it might be possible to calculate the ideal n for the experiment.

Homework - Week5

2022-10-02

Question 3.7

Libraries and initial data processing

Item c

Item d

item e

Item f

Question 3.10

Data processing

Item b

Item c

Question 3.44

Data treatment

Power analysis

Question 3.45

Item a

Item b

Item c

Item d