Here is an elaboration on each part of the process with generic code chunks for a randomized complete block design cookbook:

1 Randomized Complete Block Design

(Armina Mim, Sadman Labib, Abdul Aziz Mohammed)

Randomization is a method employed in the design process to safeguard against the interference of unexpected factors or nuisance factors. In certain scenarios, these interfering factors are identifiable but beyond our control. When the source of interference is both known and controlled, a design technique known as “blocking” can be applied to systematically mitigate its impact on statistical comparisons among various treatments.

The Randomized Complete Block Design (RCBD) stands out as one of the most frequently employed experimental designs. The RCBD is suitable for a wide range of situations. It is often used when test equipment or machinery units have varying operational characteristics, making them a typical candidate for a blocking factor. Additionally, in experiments, batches of raw materials, individuals, and time can also serve as common sources of unwanted variability that can be methodically controlled through the application of blocking techniques.

The traditional statistical model of RCBD is:

\(Y_{ij}\) = \(\mu\) + \(\tau_{i}\) + \(\beta_{j}\) + \(\epsilon_{ij}\) {i = 1,2,3,… ; j= 1,2,3,… }

where, \(\mu\) = Overall mean

\(\tau_{i}\) = Effect of the i-th treatment

\(\beta_{j}\) = Effect of the j-th block

\(\epsilon_{ij}\) = Random error term, N(0, \(\sigma^2\))

Consequently, the treatment and block effects are considered as deviations from the overall mean which implies that

\(\sum_{i=1}^{a}\) \(\tau_i\) = 0 and \(\sum_{j=1}^{b}\) \(\beta_{j}\) = 0

Therefore, we can rewrite the RCBD means model as

\(Y_{ij}\) = \(\mu_{ij}\) + \(\epsilon_{ij}\) {i = 1,2,3,… ; j = 1,2,3,… }

where, \(Y_{ij}\) = \(\mu\) + \(\tau_{i}\) + \(\beta_{j}\).

Hypothesis: The hypothesis of interest are,

\(H_0\): \(\mu_1\) = \(\mu_2\) = … = \(\mu_a\)

\(H_a\): at least one \(\mu_i\) \(\neq\) \(\mu_j\)

Because the \(i\)-th treatment mean \(\mu_i\) = \((1/b)\) \(\sum_{j=1}^{b}\) (\(\mu\) + \(\tau_{i}\) + \(\beta_{j}\)) = \(\mu\) + \(\tau_{i}\) , an equivalent way to write the above hypothesis is in terms of the treatment effects,

\(H_0\): \(\tau_1\) = \(\tau_2\) = … = \(\tau_a\) = 0

\(H_a\): \(\tau_i\) \(\neq\) 0 at least one \(i\)

1.1 Assumptions

  • The experimental units can be grouped into homogeneous blocks or groups.
  • Differences between blocks do not interact with the treatment effects. Each block represents a similar environment or condition.
  • The observations within each block are independent of each other.
  • The data collected from one experimental unit does not influence the data collected from another experimental unit within the same block.
  • The residuals (differences between observed and expected values) are normally distributed within each treatment level.
  • Normality assumption is crucial, especially for conducting parametric statistical tests like analysis of variance (ANOVA).
  • The variance of the errors (residuals) is constant across all treatment groups.
  • This assumption ensures that the spread of the data points around the mean is consistent across all treatment levels.
  • Randomization helps to eliminate or control the effects of uncontrolled variables or nuisance factors.

Adhering to these assumptions is essential to ensure the validity of the statistical analysis performed on the data collected from a Randomized Complete Block Design experiment. Violations of these assumptions can lead to biased or incorrect conclusions.

1.2 Process

1.2.1 Sample Size Determination

  • Specify number of treatments (k) you want to compare
  • Specify number of blocks (b) to use based on blocking factor
  • Total sample size is k * b
# Example

# 4 treatments, 6 blocks
k <- 4 # number of treatments  
b <- 6 # number of blocks
n <- k*b 

1.2.2 Design Generation

# Example

# Specify number of treatments (k) and blocks (b)
k <- 4
b <- 4

## Design Generation
treatments <- rep(1:k, each = b)  # Treatments
blocks <- rep(1:b, times = k)  # Assign blocks to treatments
design <- data.frame(Treatment = factor(treatments), Block = factor(blocks))

# Create data frame with block and randomized treatment  
#design <- data.frame(Block = blocks,Treatment = randomized)

# Randomize treatments within each block
#set.seed(123)
#for (i in 1:b) {
#block_indices <- which(design$Block == i)
#design$Treatment[block_indices] <- sample(design$Treatment[block_indices])}
                     
# Export design
print("Generated Design:")
## [1] "Generated Design:"
print(design)
##    Treatment Block
## 1          1     1
## 2          1     2
## 3          1     3
## 4          1     4
## 5          2     1
## 6          2     2
## 7          2     3
## 8          2     4
## 9          3     1
## 10         3     2
## 11         3     3
## 12         3     4
## 13         4     1
## 14         4     2
## 15         4     3
## 16         4     4
write.csv(design, "design.csv") 
  • The key steps are:
    1. Define number of treatments (k) and blocks (b)

    2. Create block factor with k units per block

    3. Generate full treatment factor

    4. Construct data frame with block and randomized treatment

    5. Export design to CSV

1.2.3 Data Collection

  • Collect response data for each experimental unit
  • Import design from CSV file created previously
  • Construct a data frame with columns for block, treatment, and response
# Example

# Import design
design <- read.csv("design.csv") 

# Collect response data
response <- c(143,141,150,146,
                152,149,137,143,
                134,136,132,127,
                129,127,132,129)  # Example response data

# Create data frame
data = data.frame(block = design$Block,
                  treatment = design$Treatment,
                  response = response)

1.2.4 Exploratory Data Analysis

# Example

# Boxplots
par(mfrow=c(1,2))
boxplot(response ~ block, data, xlab = "Block", ylab = "Response", main="Response by Block")
boxplot(response ~ treatment, data, xlab = "Treatment", ylab = "Response", main="Response by Treatment")

# Check for outliers
  • Generate boxplots to visualize distributions of response by block and treatment
  • Check for outliers that may need to be addressed

1.2.5 Statistical Analysis

# Example

# ANOVA model 
model <- aov(response ~ block + treatment, data)

# Summary
summary(model) 
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## block        1   21.0    21.0   0.818 0.38220    
## treatment    1  726.0   726.0  28.265 0.00014 ***
## Residuals   13  333.9    25.7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual plots
plot(model)

# Shapiro-Wilk test of normality
shapiro.test(resid(model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(model)
## W = 0.95168, p-value = 0.5167
  • Fit an ANOVA model with block as a fixed factor
  • Use model summary to view treatment effects
  • Check residuals for normality and constant variance

1.2.6 Inference

  • Interpret ANOVA results to draw inferences about treatment effects
  • Use the F-test and p-values to determine statistical significance

1.2.7 Multiple Comparisons

# Example 
# Tukey's HSD
#TukeyHSD(model)
#plot(TukeyHSD(model))
  • If there are more than two treatments, use Tukey’s HSD to make all pairwise comparisons
  • Only interpret comparisons if overall F-test is significant

1.3 Example

A medical device manufacturer produces vascular grafts (artificial veins). These grafts are produced by extruding billets of polytetrafluoroethylene (PTFE) resin combined with a lubricant into tubes. Frequently, some of the tubes in a production run contain small, hard protrusions on the external surface. These defects are known as “flicks.” The defect is the cause for the rejection of the unit. The product developer responsible for the vascular grafts suspects that the extrusion pressure affects the occurrence of flicks and therefore intends to conduct an experiment to investigate this hypothesis. However, the resin is manufactured by an external supplier and is delivered to the medical device manufacturer in batches. The engineer also suspects that there may be a significant batch-to-batch variation because while the material should be consistent with respect to parameters such as molecular weight, mean particle size, retention, and peak height ratio, it probably isn’t due to manufacturing variation at the resin supplier and natural variation in the material. Therefore, the product developer decided to investigate the effect of four different levels of extrusion pressure on flicks using a randomized complete block design considering batches of resin as blocks. The RCBD is shown in Table. Note that there are four levels of extrusion pressure (treatments) and six batches of resin (blocks). Remember that the order in which the extrusion pressures are tested within each block is random. The response variable is yield, or the percentage of tubes in the production run that did not contain any flicks. The GitHub link for the Randomized Complete Block Design of the Vascular Graft Experiment is given below:

GitHub Link: https://raw.githubusercontent.com/Aziz6094/DOE/f45545a09da54c16d8ccf3331a087f0a0d3122a3/DOE_Mid.csv

Reference: Design and Analysis of Experiments by DOUGLAS C. MONTGOMERY- Eight Edition ,pg 144 -Example 4.1

1.3.1 Complete R Code of the Exapmle

# Load required libraries
library(rcompanion)

## Sample Size Determination
k <- 4  # Number of treatments
b <- 6  # Number of blocks
n <- k * b  # Total sample size

# Design Generation
blocks <- rep(1:b, times = k)  # Assign blocks to treatments
treatments <- rep(1:k, each = b)  # Treatments
design <- data.frame(Block = factor(blocks), Treatment = factor(treatments))
design
##    Block Treatment
## 1      1         1
## 2      2         1
## 3      3         1
## 4      4         1
## 5      5         1
## 6      6         1
## 7      1         2
## 8      2         2
## 9      3         2
## 10     4         2
## 11     5         2
## 12     6         2
## 13     1         3
## 14     2         3
## 15     3         3
## 16     4         3
## 17     5         3
## 18     6         3
## 19     1         4
## 20     2         4
## 21     3         4
## 22     4         4
## 23     5         4
## 24     6         4
# Randomize treatments within each block
#set.seed(123)
#for (i in 1:b) {
 # block_indices <- which(design$Block == i)
  #design$Treatment[block_indices] <- sample(design$Treatment[block_indices])
#}

# Print and export the design
print("Generated Design:")
## [1] "Generated Design:"
print(design)
##    Block Treatment
## 1      1         1
## 2      2         1
## 3      3         1
## 4      4         1
## 5      5         1
## 6      6         1
## 7      1         2
## 8      2         2
## 9      3         2
## 10     4         2
## 11     5         2
## 12     6         2
## 13     1         3
## 14     2         3
## 15     3         3
## 16     4         3
## 17     5         3
## 18     6         3
## 19     1         4
## 20     2         4
## 21     3         4
## 22     4         4
## 23     5         4
## 24     6         4
write.csv(design, "design.csv") 


# Data Collection (simulated response data)
#set.seed(123456)  # for reproducibility
response <-read.csv("https://raw.githubusercontent.com/Aziz6094/DOE/f45545a09da54c16d8ccf3331a087f0a0d3122a3/DOE_Mid.csv") #Example response data
response
##   Extrusion.Pressure..PSI.   X1   X2   X3   X4   X5   X6
## 1                     8500 90.3 89.2 98.2 93.9 87.4 97.9
## 2                     8700 92.5 89.5 90.6 94.7 87.0 95.8
## 3                     8900 85.5 90.8 89.6 86.2 88.0 93.4
## 4                     9100 82.5 89.5 85.6 87.4 78.9 90.7
dat<-data.frame(response)
response<-c(dat[1,2],dat[1,3],dat[1,4],dat[1,5],dat[1,6],dat[1,7],dat[2,2],dat[2,3],dat[2,4],dat[2,5],dat[2,6],dat[2,7],dat[3,2],dat[3,3],dat[3,4],dat[3,5],dat[3,6],dat[3,7],dat[4,2],dat[4,3],dat[4,4],dat[4,5],dat[4,6],dat[4,7])
# Create data frame
data <- data.frame(block = design$Block,treatment = design$Treatment,response = response)
data
##    block treatment response
## 1      1         1     90.3
## 2      2         1     89.2
## 3      3         1     98.2
## 4      4         1     93.9
## 5      5         1     87.4
## 6      6         1     97.9
## 7      1         2     92.5
## 8      2         2     89.5
## 9      3         2     90.6
## 10     4         2     94.7
## 11     5         2     87.0
## 12     6         2     95.8
## 13     1         3     85.5
## 14     2         3     90.8
## 15     3         3     89.6
## 16     4         3     86.2
## 17     5         3     88.0
## 18     6         3     93.4
## 19     1         4     82.5
## 20     2         4     89.5
## 21     3         4     85.6
## 22     4         4     87.4
## 23     5         4     78.9
## 24     6         4     90.7
# Exploratory Data Analysis
# Boxplots
par(mfrow=c(1,2))
boxplot(response ~ block, data, main="Response by Block", xlab = "Block", ylab = "Response")
boxplot(response ~ treatment, data, main="Response by Treatment", xlab = "Treatment", ylab = "Response")

# Check for outliers
outliers <- boxplot.stats(data$response)$out
print("Outliers:")
## [1] "Outliers:"
print(outliers)
## numeric(0)
# Statistical Analysis
# ANOVA model 
?aov
model <- aov(response ~block+treatment, data)

# Summary
print("ANOVA Summary:")
## [1] "ANOVA Summary:"
print(summary(model)) 
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## block        5  192.2   38.45   5.249 0.00553 **
## treatment    3  178.2   59.39   8.107 0.00192 **
## Residuals   15  109.9    7.33                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Since the significance level alpha=0.05, and the P-value is 0.00192 for the treatment. Therefore, we can reject the null hypothesis. 

# Residual plots
par(mfrow=c(2,2))
plot(model)

# Shapiro-Wilk test of normality
print("Shapiro-Wilk Test:")
## [1] "Shapiro-Wilk Test:"
print(shapiro.test(resid(model)))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(model)
## W = 0.95631, p-value = 0.3689
# Inference
# Interpret ANOVA results
# Use the F-test and p-values to determine statistical significance

# Multiple Comparisons
if (anova(model)$`Pr(>F)`[2] < 0.05) {
  tukey_results <- TukeyHSD(model)
  print("Tukey's HSD for Multiple Comparisons:")
  print(tukey_results)
} else {
  print("No significant differences found.")
}
## [1] "Tukey's HSD for Multiple Comparisons:"
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = response ~ block + treatment, data = data)
## 
## $block
##       diff         lwr        upr     p adj
## 2-1  2.050  -4.1680828  8.2680828 0.8853016
## 3-1  3.300  -2.9180828  9.5180828 0.5376297
## 4-1  2.850  -3.3680828  9.0680828 0.6757699
## 5-1 -2.375  -8.5930828  3.8430828 0.8105903
## 6-1  6.750   0.5319172 12.9680828 0.0297368
## 3-2  1.250  -4.9680828  7.4680828 0.9845521
## 4-2  0.800  -5.4180828  7.0180828 0.9980198
## 5-2 -4.425 -10.6430828  1.7930828 0.2483499
## 6-2  4.700  -1.5180828 10.9180828 0.1986961
## 4-3 -0.450  -6.6680828  5.7680828 0.9998784
## 5-3 -5.675 -11.8930828  0.5430828 0.0837504
## 6-3  3.450  -2.7680828  9.6680828 0.4925715
## 5-4 -5.225 -11.4430828  0.9930828 0.1263042
## 6-4  3.900  -2.3180828 10.1180828 0.3674672
## 6-5  9.125   2.9069172 15.3430828 0.0027838
## 
## $treatment
##          diff        lwr       upr     p adj
## 2-1 -1.133333  -5.637161  3.370495 0.8854831
## 3-1 -3.900000  -8.403828  0.603828 0.1013084
## 4-1 -7.050000 -11.553828 -2.546172 0.0020883
## 3-2 -2.766667  -7.270495  1.737161 0.3245644
## 4-2 -5.916667 -10.420495 -1.412839 0.0086667
## 4-3 -3.150000  -7.653828  1.353828 0.2257674
plot(TukeyHSD(model))