1 Example for CRD

This one is sourced from the Design and Analysis of Experiments textbookby D.C. Montgomery (8th ed.) Exercise 3.21 (pg.133).

EXPERIMENT PERFORMED and DATA:

An Article in Environmental International (Vol. 18, No.4, 1992) describes an experiment in which the amount of radon released in showers was investigated. Radon-enriched water was used in the experiment, and six different orifice diameters were tested in shower heads. The data from the experiment are shown in the following table:

Data from the experiment (This data is stored as a CSV file in GitHub: https://raw.githubusercontent.com/Jaisai0611/CRD-example/main/EX%203.21.csv)
Orifice Diameter Radon Released (%)
Sample 1
Radon Released (%)
Sample 2
Radon Released (%)
Sample 3
Radon Released (%)
Sample 4
0.37 80 83 83 85
0.51 75 75 79 79
0.71 74 73 76 77
1.02 67 72 74 74
1.40 62 62 67 69
1.99 60 61 64 66

1.1 Experimental Data fits the CRD model or not? if CRD, Factor levels (Orifice Diameter) Fixed or Random Effects?

Observing the given data table, we can clearly understand that the experiment conducted or observations chosen would be in a random manner so that the environment in which the treatments are applied is as uniform as possible. Thus, It can be considered as a CRD Model.

The treatment effects, according to the study conducted author considered particularly 6 variants in orifice diameter (0.37, 0.51, 0.71, 1.02, 1.40, 1.99). Thus we can consider it as fixed effects (Single factor ANOVA). The same experiment would have also been best suitable for Random Effects if orifice diameters were randomly chosen out of a big range of diameters, especially to perform Statistical analysis.

1.2 Read CSV file into R and perform some data wrangling to best fit for ANOVA

The CSV file was read into R from the GitHub link. Using the pivot_longer command, we made sure to fit all the observations in a single column. Then, converted the data of Orifice Diameter as a factor. This was shown in the following Code chunk.

#Read the CSV file into R
ex3.21<- read.csv("https://raw.githubusercontent.com/Jaisai0611/CRD-example/main/EX%203.21.csv")

#Data wrangling to make it fit for ANOVA
library(tidyr)
ex3.21<- pivot_longer(ex3.21,2:5,names_to = "samples", values_to = "Radon Released%")
ex3.21$Orifice.Diameter<- as.factor(ex3.21$Orifice.Diameter)

1.3 Check for Assumptions

As discussed earlier, we must ensure that all the given data is normally distributed and that constant variance among all the factor levels of treatment.

  • Check for Normality of data

    #Assumption1: Check for Normality of data
    qqnorm(ex3.21$`Radon Released%`, ylab = "Radon Released %", col ='BLUE')
    qqline(ex3.21$`Radon Released%`, col = 'RED')

    From the above NORMAL Q-Q Plot, most of the data points lie upon the normality line or closer to it, and no major deviations from the normal line were observed. Thus it can be interpreted that the data fits into a Normal Distribution.

  • Check for Constant Variance

    #Assumption2: Check for Variance equality among different levels of Treatments
    boxplot(ex3.21$`Radon Released%`~ ex3.21$Orifice.Diameter, main = "Boxplot for Orifice Diameters", xlab = "Orifice Diameters", ylab = "% of Radon Released", col = 'lightblue')

    From the plotted Boxplot, based on the size of each box which represents the variance of Radon released percentage for respective orifice diameter, it can be only interpreted as approximately constant variance.
    However, by performing Boxcox, we can check the Model adequacy to perform ANOVA.

1.4 Model Adequacy check using Boxcox

library(MASS)
boxcox(ex3.21$`Radon Released%`~ ex3.21$Orifice.Diameter,lambda = seq(-5, 8))

we can observe that 1 lies within the confidence interval, thus confirming that our data is valid to proceed with ANOVA.

1.5 ANOVA (Analysis of Variance)

  • HYPOTHESIS: (Recap Discussions)

    • Null Hypothesis: All the treatment level means are equal, which means No significant difference between the means of Radon released (%) with respect to Orifice Diameter.

      \[ H_{0}: \mu_{1} =\mu_{2} =\mu_{3} =\mu_{4} =\mu_{5} =\mu_{6} \\ (Or)\\ H_{0}: \tau_{i} = 0\ \ \ \forall i \]

    • Alternate Hypothesis: At least one mean is significantly different from others, which means at least one of the means out of 6 types of orifice diameters significantly differs from the rest.

      \[ H_a: atleast\ one\ mean\ significantly\ differs\\ (Or)\\ H_a: \tau_i\ \neq\ 0\ \ \ \exists\ i \]

  • Applying Hypothesis to our example:

    #Performing single factor ANOVA
    Anova_3.21<- aov(`Radon Released%`~ Orifice.Diameter, data = ex3.21)
    summary(Anova_3.21)
    ##                  Df Sum Sq Mean Sq F value   Pr(>F)    
    ## Orifice.Diameter  5 1133.4  226.68   30.85 3.16e-08 ***
    ## Residuals        18  132.2    7.35                     
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    We can observe that p-value (3.16e-08) is far less than our significant value (say 0.001 or 99% Conf. interval), which leads to reject the null hypothesis. Hence, there is a significant difference between the various types of Orifice diameters w.r.t Radon Released (%).

Although we rejected null hypothesis, we are still unaware about which pair of means exactly differs. lets check the residual plots and later perform TukeysHSD test.

1.6 Residual plots from ANOVA

plot(Anova_3.21,1, col = 'BLUE', col.lab = 'red', col.axis = 'blue')

  • Comment: Analyzing the Residuals vs Fitted values plot, widespread residual points can be observed which tells us that it has higher variability. however across all the treatments variance spread seems to be the same which was indicated by the red line over the center of the plot, even spread can be observed on either side of the line.
plot(Anova_3.21,2, col = 'BLUE', col.lab = 'red', col.axis = 'blue')

  • Comment: Observing the normal probability plot of the residuals, hard to interpret it as a Normality because we can see a few outliers from the line which is not even closely bound to the normality line.

    However, we can’t say that this data doesn’t fit exactly for this type of statistical model, but let’s proceed with the HSD test to identify what pairs of mean are significantly different. This step accomplishes the ultimate goal of performing this statistical test (ANOVA with CRD)

1.7 Tukeys-HSD Test

#Multiple pairwise means comparison using tukeys-HSD test
library(car)
results<-TukeyHSD(Anova_3.21)
results_matrix <- as.matrix (results) 
df_res <- as.data.frame(results_matrix[1]) #this conversion of results to matrix and data fram is for sake of ease in interpretting the color coded plot to identify the pairs of mean which significantly differ.
plot(results, col= ifelse(df_res[,4]<0.05, 'red', 'darkgreen'), las = 1, xlim =c(-26,4), cex.axis = 0.75) # Aguments like las (axis lable directin), xlim (xasis scale limits), cex.axis (font size of axis labels) were used here in order to ehance the looks of graphics and also due to space contraints.

Observing the Tukeys HSD plot, a few pairs of orifice diameters intersect with “0”, meaning those pairs are not significantly different. But, all those pairs on the plot, highlighted with red color are significantly different.

1.8 Conclusion:

On an overall scale, since more no. of pairs of mean (Orifice diameters) are significantly different. Hence, This statistical analysis strongly believes that Orifice diameter is affecting the percentage of Radon Released.

1.9 Complete R Code

#Read the CSV file into R
ex3.21<- read.csv("https://raw.githubusercontent.com/Jaisai0611/CRD-example/main/EX%203.21.csv")

#Data wrangling to make it fit for ANOVA
library(tidyr)
ex3.21<- pivot_longer(ex3.21,2:5,names_to = "samples", values_to = "Radon Released%")
ex3.21$Orifice.Diameter<- as.factor(ex3.21$Orifice.Diameter)

#Assumption1: Check for Normality of data
qqnorm(ex3.21$`Radon Released%`, ylab = "Radon Released %", col ='BLUE')
qqline(ex3.21$`Radon Released%`, col = 'RED')

#Assumption2: Check for Variance equality among different levels of Treatments
boxplot(ex3.21$`Radon Released%`~ ex3.21$Orifice.Diameter, main = "Boxplot for Orifice Diameters", xlab = "Orifice Diameters", ylab = "% of Radon Released", col = 'lightblue')

#Check if transformation of data required or not (MODEL ADEQUECY)
library(MASS)
boxcox(ex3.21$`Radon Released%`~ ex3.21$Orifice.Diameter,lambda = seq(-5, 8, 1/2))

#Performing single factor ANOVA
Anova_3.21<- aov(`Radon Released%`~ Orifice.Diameter, data = ex3.21)
summary(Anova_3.21)
plot(Anova_3.21)

#Multiple pairwise means comparison using tukeys-HSD test
library(car)
comparision<-TukeyHSD(Anova_3.21)
results_matrix <- as.matrix (results) 
df_res <- as.data.frame(results_matrix[1]) #this conversion of results to matrix and data fram is for sake of ease in interpretting the color coded plot to identify the pairs of mean which significantly differ.
plot(results_matrix, col= ifelse(df_res[,4]<0.05, 'red', 'black'), las = 1, xlim =c(-26,4), cex.axis = 0.75) # Aguments like las (axis lable directin), xlim (xasis scale limits), cex.axis (font size of axis labels) were used here in order to ehance the looks of graphics and also due to space contraints.