Here is an elaboration on each part of the process with generic code chunks for a randomized complete block design cookbook:
(Armina Mim, Sadman Labib, Abdul Aziz Mohammed)
Randomization is a method employed in the design process to safeguard against the interference of unexpected factors or nuisance factors. In certain scenarios, these interfering factors are identifiable but beyond our control. When the source of interference is both known and controlled, a design technique known as “blocking” can be applied to systematically mitigate its impact on statistical comparisons among various treatments.
The Randomized Complete Block Design (RCBD) stands out as one of the most frequently employed experimental designs. The RCBD is suitable for a wide range of situations. It is often used when test equipment or machinery units have varying operational characteristics, making them a typical candidate for a blocking factor. Additionally, in experiments, batches of raw materials, individuals, and time can also serve as common sources of unwanted variability that can be methodically controlled through the application of blocking techniques.
The traditional statistical model of RCBD is:
\(Y_{ij}\) = \(\mu\) + \(\tau_{i}\) + \(\beta_{j}\) + \(\epsilon_{ij}\) {i = 1,2,3,… ; j= 1,2,3,… }
where, \(\mu\) = Overall mean
\(\tau_{i}\) = Effect of the i-th treatment
\(\beta_{j}\) = Effect of the j-th block
\(\epsilon_{ij}\) = Random error term, N(0, \(\sigma^2\))
Consequently, the treatment and block effects are considered as deviations from the overall mean which implies that
\(\sum_{i=1}^{a}\) \(\tau_i\) = 0 and \(\sum_{j=1}^{b}\) \(\beta_{j}\) = 0
Therefore, we can rewrite the RCBD means model as
\(Y_{ij}\) = \(\mu_{ij}\) + \(\epsilon_{ij}\) {i = 1,2,3,… ; j = 1,2,3,… }
where, \(Y_{ij}\) = \(\mu\) + \(\tau_{i}\) + \(\beta_{j}\).
Hypothesis: The hypothesis of interest are,
\(H_0\): \(\mu_1\) = \(\mu_2\) = … = \(\mu_a\)
\(H_a\): at least one \(\mu_i\) \(\neq\) \(\mu_j\)
Because the \(i\)-th treatment mean \(\mu_i\) = \((1/b)\) \(\sum_{j=1}^{b}\) (\(\mu\) + \(\tau_{i}\) + \(\beta_{j}\)) = \(\mu\) + \(\tau_{i}\) , an equivalent way to write the above hypothesis is in terms of the treatment effects,
\(H_0\): \(\tau_1\) = \(\tau_2\) = … = \(\tau_a\) = 0
\(H_a\): \(\tau_i\) \(\neq\) 0 at least one \(i\)
Adhering to these assumptions is essential to ensure the validity of the statistical analysis performed on the data collected from a Randomized Complete Block Design experiment. Violations of these assumptions can lead to biased or incorrect conclusions.
# Example
# 4 treatments, 6 blocks
k <- 4 # number of treatments
b <- 6 # number of blocks
n <- k*b
# Example
# Specify number of treatments (k) and blocks (b)
k <- 4
b <- 4
## Design Generation
treatments <- rep(1:k, each = b) # Treatments
blocks <- rep(1:b, times = k) # Assign blocks to treatments
design <- data.frame(Treatment = factor(treatments), Block = factor(blocks))
# Create data frame with block and randomized treatment
#design <- data.frame(Block = blocks,Treatment = randomized)
# Randomize treatments within each block
#set.seed(123)
#for (i in 1:b) {
#block_indices <- which(design$Block == i)
#design$Treatment[block_indices] <- sample(design$Treatment[block_indices])}
# Export design
print("Generated Design:")
## [1] "Generated Design:"
print(design)
## Treatment Block
## 1 1 1
## 2 1 2
## 3 1 3
## 4 1 4
## 5 2 1
## 6 2 2
## 7 2 3
## 8 2 4
## 9 3 1
## 10 3 2
## 11 3 3
## 12 3 4
## 13 4 1
## 14 4 2
## 15 4 3
## 16 4 4
write.csv(design, "design.csv")
Define number of treatments (k) and blocks (b)
Create block factor with k units per block
Generate full treatment factor
Construct data frame with block and randomized treatment
Export design to CSV
# Example
# Import design
design <- read.csv("design.csv")
# Collect response data
response <- c(143,141,150,146,
152,149,137,143,
134,136,132,127,
129,127,132,129) # Example response data
# Create data frame
data = data.frame(block = design$Block,
treatment = design$Treatment,
response = response)
# Example
# Boxplots
par(mfrow=c(1,2))
boxplot(response ~ block, data, xlab = "Block", ylab = "Response", main="Response by Block")
boxplot(response ~ treatment, data, xlab = "Treatment", ylab = "Response", main="Response by Treatment")
# Check for outliers
# Example
# ANOVA model
model <- aov(response ~ block + treatment, data)
# Summary
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## block 1 21.0 21.0 0.818 0.38220
## treatment 1 726.0 726.0 28.265 0.00014 ***
## Residuals 13 333.9 25.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual plots
plot(model)
# Shapiro-Wilk test of normality
shapiro.test(resid(model))
##
## Shapiro-Wilk normality test
##
## data: resid(model)
## W = 0.95168, p-value = 0.5167
# Example
# Tukey's HSD
#TukeyHSD(model)
#plot(TukeyHSD(model))
A medical device manufacturer produces vascular grafts (artificial veins). These grafts are produced by extruding billets of polytetrafluoroethylene (PTFE) resin combined with a lubricant into tubes. Frequently, some of the tubes in a production run contain small, hard protrusions on the external surface. These defects are known as “flicks.” The defect is the cause for the rejection of the unit. The product developer responsible for the vascular grafts suspects that the extrusion pressure affects the occurrence of flicks and therefore intends to conduct an experiment to investigate this hypothesis. However, the resin is manufactured by an external supplier and is delivered to the medical device manufacturer in batches. The engineer also suspects that there may be a significant batch-to-batch variation because while the material should be consistent with respect to parameters such as molecular weight, mean particle size, retention, and peak height ratio, it probably isn’t due to manufacturing variation at the resin supplier and natural variation in the material. Therefore, the product developer decided to investigate the effect of four different levels of extrusion pressure on flicks using a randomized complete block design considering batches of resin as blocks. The RCBD is shown in Table. Note that there are four levels of extrusion pressure (treatments) and six batches of resin (blocks). Remember that the order in which the extrusion pressures are tested within each block is random. The response variable is yield, or the percentage of tubes in the production run that did not contain any flicks. The GitHub link for the Randomized Complete Block Design of the Vascular Graft Experiment is given below:
GitHub Link: https://raw.githubusercontent.com/Aziz6094/DOE/f45545a09da54c16d8ccf3331a087f0a0d3122a3/DOE_Mid.csv
Reference: Design and Analysis of Experiments by DOUGLAS C. MONTGOMERY- Eight Edition ,pg 144 -Example 4.1
# Load required libraries
library(rcompanion)
## Sample Size Determination
k <- 4 # Number of treatments
b <- 6 # Number of blocks
n <- k * b # Total sample size
# Design Generation
blocks <- rep(1:b, times = k) # Assign blocks to treatments
treatments <- rep(1:k, each = b) # Treatments
design <- data.frame(Block = factor(blocks), Treatment = factor(treatments))
design
## Block Treatment
## 1 1 1
## 2 2 1
## 3 3 1
## 4 4 1
## 5 5 1
## 6 6 1
## 7 1 2
## 8 2 2
## 9 3 2
## 10 4 2
## 11 5 2
## 12 6 2
## 13 1 3
## 14 2 3
## 15 3 3
## 16 4 3
## 17 5 3
## 18 6 3
## 19 1 4
## 20 2 4
## 21 3 4
## 22 4 4
## 23 5 4
## 24 6 4
# Randomize treatments within each block
#set.seed(123)
#for (i in 1:b) {
# block_indices <- which(design$Block == i)
#design$Treatment[block_indices] <- sample(design$Treatment[block_indices])
#}
# Print and export the design
print("Generated Design:")
## [1] "Generated Design:"
print(design)
## Block Treatment
## 1 1 1
## 2 2 1
## 3 3 1
## 4 4 1
## 5 5 1
## 6 6 1
## 7 1 2
## 8 2 2
## 9 3 2
## 10 4 2
## 11 5 2
## 12 6 2
## 13 1 3
## 14 2 3
## 15 3 3
## 16 4 3
## 17 5 3
## 18 6 3
## 19 1 4
## 20 2 4
## 21 3 4
## 22 4 4
## 23 5 4
## 24 6 4
write.csv(design, "design.csv")
# Data Collection (simulated response data)
#set.seed(123456) # for reproducibility
response <-read.csv("https://raw.githubusercontent.com/Aziz6094/DOE/f45545a09da54c16d8ccf3331a087f0a0d3122a3/DOE_Mid.csv") #Example response data
response
## Extrusion.Pressure..PSI. X1 X2 X3 X4 X5 X6
## 1 8500 90.3 89.2 98.2 93.9 87.4 97.9
## 2 8700 92.5 89.5 90.6 94.7 87.0 95.8
## 3 8900 85.5 90.8 89.6 86.2 88.0 93.4
## 4 9100 82.5 89.5 85.6 87.4 78.9 90.7
dat<-data.frame(response)
response<-c(dat[1,2],dat[1,3],dat[1,4],dat[1,5],dat[1,6],dat[1,7],dat[2,2],dat[2,3],dat[2,4],dat[2,5],dat[2,6],dat[2,7],dat[3,2],dat[3,3],dat[3,4],dat[3,5],dat[3,6],dat[3,7],dat[4,2],dat[4,3],dat[4,4],dat[4,5],dat[4,6],dat[4,7])
# Create data frame
data <- data.frame(block = design$Block,treatment = design$Treatment,response = response)
data
## block treatment response
## 1 1 1 90.3
## 2 2 1 89.2
## 3 3 1 98.2
## 4 4 1 93.9
## 5 5 1 87.4
## 6 6 1 97.9
## 7 1 2 92.5
## 8 2 2 89.5
## 9 3 2 90.6
## 10 4 2 94.7
## 11 5 2 87.0
## 12 6 2 95.8
## 13 1 3 85.5
## 14 2 3 90.8
## 15 3 3 89.6
## 16 4 3 86.2
## 17 5 3 88.0
## 18 6 3 93.4
## 19 1 4 82.5
## 20 2 4 89.5
## 21 3 4 85.6
## 22 4 4 87.4
## 23 5 4 78.9
## 24 6 4 90.7
# Exploratory Data Analysis
# Boxplots
par(mfrow=c(1,2))
boxplot(response ~ block, data, main="Response by Block", xlab = "Block", ylab = "Response")
boxplot(response ~ treatment, data, main="Response by Treatment", xlab = "Treatment", ylab = "Response")
# Check for outliers
outliers <- boxplot.stats(data$response)$out
print("Outliers:")
## [1] "Outliers:"
print(outliers)
## numeric(0)
# Statistical Analysis
# ANOVA model
?aov
model <- aov(response ~block+treatment, data)
# Summary
print("ANOVA Summary:")
## [1] "ANOVA Summary:"
print(summary(model))
## Df Sum Sq Mean Sq F value Pr(>F)
## block 5 192.2 38.45 5.249 0.00553 **
## treatment 3 178.2 59.39 8.107 0.00192 **
## Residuals 15 109.9 7.33
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Since the significance level alpha=0.05, and the P-value is 0.00192 for the treatment. Therefore, we can reject the null hypothesis.
# Residual plots
par(mfrow=c(2,2))
plot(model)
# Shapiro-Wilk test of normality
print("Shapiro-Wilk Test:")
## [1] "Shapiro-Wilk Test:"
print(shapiro.test(resid(model)))
##
## Shapiro-Wilk normality test
##
## data: resid(model)
## W = 0.95631, p-value = 0.3689
# Inference
# Interpret ANOVA results
# Use the F-test and p-values to determine statistical significance
# Multiple Comparisons
if (anova(model)$`Pr(>F)`[2] < 0.05) {
tukey_results <- TukeyHSD(model)
print("Tukey's HSD for Multiple Comparisons:")
print(tukey_results)
} else {
print("No significant differences found.")
}
## [1] "Tukey's HSD for Multiple Comparisons:"
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = response ~ block + treatment, data = data)
##
## $block
## diff lwr upr p adj
## 2-1 2.050 -4.1680828 8.2680828 0.8853016
## 3-1 3.300 -2.9180828 9.5180828 0.5376297
## 4-1 2.850 -3.3680828 9.0680828 0.6757699
## 5-1 -2.375 -8.5930828 3.8430828 0.8105903
## 6-1 6.750 0.5319172 12.9680828 0.0297368
## 3-2 1.250 -4.9680828 7.4680828 0.9845521
## 4-2 0.800 -5.4180828 7.0180828 0.9980198
## 5-2 -4.425 -10.6430828 1.7930828 0.2483499
## 6-2 4.700 -1.5180828 10.9180828 0.1986961
## 4-3 -0.450 -6.6680828 5.7680828 0.9998784
## 5-3 -5.675 -11.8930828 0.5430828 0.0837504
## 6-3 3.450 -2.7680828 9.6680828 0.4925715
## 5-4 -5.225 -11.4430828 0.9930828 0.1263042
## 6-4 3.900 -2.3180828 10.1180828 0.3674672
## 6-5 9.125 2.9069172 15.3430828 0.0027838
##
## $treatment
## diff lwr upr p adj
## 2-1 -1.133333 -5.637161 3.370495 0.8854831
## 3-1 -3.900000 -8.403828 0.603828 0.1013084
## 4-1 -7.050000 -11.553828 -2.546172 0.0020883
## 3-2 -2.766667 -7.270495 1.737161 0.3245644
## 4-2 -5.916667 -10.420495 -1.412839 0.0086667
## 4-3 -3.150000 -7.653828 1.353828 0.2257674
plot(TukeyHSD(model))