FACTORIAL EXPERIMENTAL DESIGN

Definition

Factorial Completely Randomized Design is an experimental design in which the treatment is formed by a combination of the levels of several factors. Factorial treatment structures are useful for examining the effects of two or more factors on a response y, whether or not interaction exists. The treatment design of the multi-factor experiment was differentiated based on the level of importance and restrictions on randomization of the levels of each factor making up the treatment.

Load the data set

factorial_design <- read.csv("factorial.csv")
attach(factorial_design)
head(factorial_design,10)
   Pestcide_A Variety_B Response
1         a_1       b_1       49
2         a_2       b_1       50
3         a_3       b_1       43
4         a_4       b_1       53
5         a_1       b_1       39
6         a_2       b_1       55
7         a_3       b_1       38
8         a_4       b_1       48
9         a_1       b_2       55
10        a_2       b_2       67

Estimate the Model

model_aov <- aov(Response ~ Pestcide_A*Variety_B, data = factorial_design)
summary(model_aov)
                     Df Sum Sq Mean Sq F value   Pr(>F)    
Pestcide_A            3   2227   742.5  17.556  0.00011 ***
Variety_B             2   3996  1998.0  47.244 2.05e-06 ***
Pestcide_A:Variety_B  6    457    76.2   1.801  0.18168    
Residuals            12    508    42.3                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Decision Criterion

knitr::include_graphics("factorial.png")

Conclusion

The first test of significance must be to test for an interaction between factors A and B, because if the interaction is significant then the main effects may have no interpretation

Interaction AB:The computed value of F=1.801 doesn’t exceed the tabulated value of 3.00 for alpha=0.05, df1=6, and df2=12 in the F tables. Hence, we have insufficient evidence to indicate an interaction between pesticide levels and variety of trees levels. Because the interaction is not significant, we can next test the main effects of the two factors.

Factor A:The computed value of F=17.556 does exceed the tabulated value of 3.49 for alpha=0.05, df1=3, and df2=12 in the F tables. Hence, we have sufficient evidence to indicate a significant difference in the mean yields among the four pesticide levels (Factor A).

Factor B:The computed value of F=47.244 does exceed the tabulated value of 3.89 for alpha=0.05, df1=2, and df2=12 in the F tables. Hence, we have sufficient evidence to indicate a significant difference in the mean yields among the three varieties of citrus trees. (Factor B).

SPLIT PLOT DESIGN

A Factorial Split Plot Design using Oats

knitr::include_graphics("split.png")

SUMMARY

The main purpose of this tutorial is to demonstrate my experimentation and analytically skills using R programming language. This project is about analyzing an oat dataset collected from a Split-plot research system. It is a common but more advanced research method for a factorial agricultural experiment that works on multiple independent variables. The experiment was carried out in a field. The experimentation field was spitted into block, plot, and subplot.

There are 3 independent variables in the dataset with various levels. They are 6 blocks, 3 oat varieties, and 4 nutrient levels. Responding variable was plant yield in kilogram per hectare. Exploratory data analysis (EDA) was carried out to explore general trends of the dataset, then followed by statistical analyses. This project uses a mixed-effect model to analyse significant differences among oat varieties, nutrient levels (manurial nitrogen) and their interactions. Statistical assumptions were tests using Q-Q plot, Shapiro-Wilk test, and Levene’s test. Results from the test shown normality among residuals and variances among treatment groups were the same. ANOVA and Tukey were selected as the omnibus and post-hoc test.

Results reveal that there is no significant yield difference among oat varieties. However, there is significant yield difference between nutrient levels. There is no significant interaction effect between oat varieties and nutrient levels. If a decision has to be made, my top 3 best combinations selected from the dataset will be the three varieties, Marvellous, Golden rain, and Victory, with 50 kilogram per hectare of manurial nitrogen application. It is the second highest nutrient level from the experiment. I am not picking the highest nutrient level that had the best yield because the yield though was higher but is not significantly higher than my top 3 recommendations. These recommendations may also more beneficial for farm economics and surrounding environment.

knitr::include_graphics("highlight.png")

R PACKAGES

R packages loaded in this project include tidyverse packages (ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, and forcats), skimr, lubridate, kableExtra, agridat, MASS, DescTools, DescTools, lme4, qqplotr, lsmeans, and multcomp.

library(systemfonts)
library(tidyverse)
library(skimr)
library(lubridate)
library(kableExtra)
library(agridat)
library(MASS)
library(DescTools)
library(lme4)
library(lmerTest)
library(qqplotr)
library(lsmeans)
library(multcomp)

INTRODUCTION

This personal project uses a public dataset from an oat field experiment, the data was available for download from a R package called MASS. The objective of this project is to demonstrate how I successfully analyzed the data using statistical methods to find out the optimum nutrient levels on different variety of oats, with incorporation of the effects of blocks and plots.

The experiment was carried out in a field with a famous agricultural experimentation system called split plot. This method splits the entire allocated field into different blocks and further subdividing into smaller plots and even smaller subplots. In the dataset, there are 3 oat varieties and 4 levels of manurial treatment (nutrient level).

The experiment was,

A sketch to visualize the arrangement of blocks. The arrangement of each block is often subjected to the direction of gradient of identified heterogeneous environment.

knitr::include_graphics("split2.png")

I am expecting all treatments were randomised within plot. It is a usual way to minimize the impact of heterogeneous environmental conditions for more accurate results. For example, if the left corner has more soil moisture than the right corner, then having a randomised plots would avoid particular treatment be assigned to the heterogeneous corner and generating unfairn data.

knitr::include_graphics("split3.png")

Fundamentals of split-plot and RCBD are similar. RCBD (Randomized Complete Block Design) stops when the split achieves plots, whereas split plot further splits the plots into subplots. They are widely-used experimental designs in agriculture research to minimize heterogeneous field conditions. These conditions can be sporadic soil water availability, inherent nutrient contents, soil chemistry, or other environmental conditions in the research field. These effects on the crops can be ranged from small to large. Usually, when a heterogeneous condition is identified, blocks will be arranged in a gradient against the identified area. It is to ensure the heterogeneous area has all treatments in it. Therefore, it can reduce biases against particular treatments and reduce overall experimental errors because all treatments have now equal chance receiving the heterogeneous soil conditions.

In general, one can assume which variable in a split-plot experiment is the most important variable for the researchers by looking at where the variable is being plotted. Usually, the variable that assigned to the subplot is the variable that interest the researchers the most, which is “nutrient level on oat” in this case. The second interest of this split plot experiment is the variable assigned to the plot, which is the “varieties” of oat in this case.

EXPERIMENTAL DESIGN SUMMARY

  • Crop: Oat
  • Experimental Design: Split-Plot
  • Experimental unit: Subplot
  • Independent variables: Varieties, Nutrient levels
  • Levels of independent variable: 3 Varieties, 4 nutrient levels
  • Number of treatment group: 12 (3 varieties x 4 nutrient levels)
  • Number of replication: 6
  • Dependent variable: Plant yield

To fully examine the yield of oats due to varieties and nutrient levels in a split plots. I will need to statistically analyse and compare the effects of varieties, nutrient levels, their interaction, and the effects of plots and subplots.

DATA PREPARATION

Load the data

Following code import the data and the table indicates successful import of the dataset.

data("oats")
head(oats,10)
   B           V      N   Y
1  I     Victory 0.0cwt 111
2  I     Victory 0.2cwt 130
3  I     Victory 0.4cwt 157
4  I     Victory 0.6cwt 174
5  I Golden.rain 0.0cwt 117
6  I Golden.rain 0.2cwt 114
7  I Golden.rain 0.4cwt 161
8  I Golden.rain 0.6cwt 141
9  I  Marvellous 0.0cwt 105
10 I  Marvellous 0.2cwt 140

Data description

Following table describes what is each column of the dataset comprised of.

Variables <- c("B", "V", "N", "Y")
Description <- c("Blocks, levels I, II, III, IV, V, VI",
                 "Varieties, 3 levels",
                 "Nitrogen (manurial) treatment, levels 0.0cwt, 0.2cwt, 0.4cwt, 0.6cwt, showing the application in unit of cwt/acre",
                 "Yields in 1/4lbs per sub-plot, each of area 1/80 acre.")

data.frame(Variables, Description) %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = c("bordered", "stripped", "hover"))
Variables Description
B Blocks, levels I, II, III, IV, V, VI
V Varieties, 3 levels
N Nitrogen (manurial) treatment, levels 0.0cwt, 0.2cwt, 0.4cwt, 0.6cwt, showing the application in unit of cwt/acre
Y Yields in 1/4lbs per sub-plot, each of area 1/80 acre.

Data exploration

The dataset has 74 rows of observations and 4 columns of variables. There are 3 variables, B, V, and N, categorised as factor, and the Y is categorised as numerical data.

skim_without_charts(oats)
Data summary
Name oats
Number of rows 72
Number of columns 4
_______________________
Column type frequency:
factor 3
numeric 1
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
B 0 1 FALSE 6 I: 12, II: 12, III: 12, IV: 12
V 0 1 FALSE 3 Gol: 24, Mar: 24, Vic: 24
N 0 1 FALSE 4 0.0: 18, 0.2: 18, 0.4: 18, 0.6: 18

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
Y 0 1 103.97 27.06 53 86 102.5 121.25 174

The data set is complete and having no missing data. It can be examined through the results of the columns n_missing and complete_rate. The two columns associate with missing value in each column of the dataset. All variables (B, V, N and Y) have 0 in the n_missing, and 1 in the complete_rate throughout their columns.

Following is the summary of the dataset.

summary(oats)
   B                V           N            Y        
 I  :12   Golden.rain:24   0.0cwt:18   Min.   : 53.0  
 II :12   Marvellous :24   0.2cwt:18   1st Qu.: 86.0  
 III:12   Victory    :24   0.4cwt:18   Median :102.5  
 IV :12                    0.6cwt:18   Mean   :104.0  
 V  :12                                3rd Qu.:121.2  
 VI :12                                Max.   :174.0  
  • There are 6 blocks.
  • There are 3 oat varieties.
  • There are 4 nutrient levels.
  • The range of the overall yield can be ranged from minimum of 53 lbs/acre to maximum of 174 lbs/acre, with mean of 104 lbs/acre and median of 102.5 lbs/acre.

DATA CLEANING AND MANIPULATION

The dataset is obviously has been cleaned by the up-loader of the dataset. The dataset has been transformed to a long format already, all variables have been assigned to their desired types, having no missing values and unnecessary spaces to trim.

Rename variables

However, variable names of B, V, N, and Y can be changed to a more complete, simple, intuitive terms such as block, variety, nutrient, and yield, to make them more sensible and understandable for readers who are not familiar with the dataset. Following codes complete the changes (click the right button), and the table reveals the outcome of the codes.

oat <- oats %>%             # note: I change the name of oats to oat, to preserve original dataset.
  rename("block" = "B",
         "variety" = "V",
         "nutrient" = "N",
         "yield" = "Y")

head(oat,10)
   block     variety nutrient yield
1      I     Victory   0.0cwt   111
2      I     Victory   0.2cwt   130
3      I     Victory   0.4cwt   157
4      I     Victory   0.6cwt   174
5      I Golden.rain   0.0cwt   117
6      I Golden.rain   0.2cwt   114
7      I Golden.rain   0.4cwt   161
8      I Golden.rain   0.6cwt   141
9      I  Marvellous   0.0cwt   105
10     I  Marvellous   0.2cwt   140

The changes have been successfully completed. Now convert the units of nutrient and yield The unit of nutrient is cwt/arce, and the unit of yield is lbs/acre. Personally, I am more comfortable with the unit kg/ha or ton/ha. The conversion is optional but I will carry out the conversion in this section, with:

  • 1 cwt = 50.8023 kg
  • 1 acre = 0.404686 ha
  • 1 lbs = 0.453592 kg Following codes complete the conversion (click right button) and the table reveals the outcome of the codes.
oat2 <- oat %>% 
  mutate(nutrient = str_remove(nutrient, "cwt"),
         nutrient = as.numeric(nutrient),
         nutrient = round(nutrient*50.8023/0.404686),
         nutrient = as.factor(nutrient),
         yield = round(yield * 0.453592/0.404686))

head(oat2,10)
   block     variety nutrient yield
1      I     Victory        0   124
2      I     Victory       25   146
3      I     Victory       50   176
4      I     Victory       75   195
5      I Golden.rain        0   131
6      I Golden.rain       25   128
7      I Golden.rain       50   180
8      I Golden.rain       75   158
9      I  Marvellous        0   118
10     I  Marvellous       25   157
attach(oat2)

Now, the units of nutrient and yield have been converted into kg/ha.

EXPLORATORY DATA ANALYSIS (EDA)

After cleaning, the very first step is always visualizing the data because it is the time one can have a chance to have a general understanding of the data set, what are the obvious trends, how data points are spread out, and how each levels of variables are visually different from each other.

My EDA will explore:

  • Yield vs Variety
  • Yield vs Nutrient Level
  • Yield vs Blocks
  • Yield vs Variety + Nutrient Level
  • Yield vs Variety + Nutrient level + Blocks
  • Yield vs Variety

There is no obvious yield difference among oat varieties. If a comparison is needed to be made, the Marvellous has the highest average yield, followed by Golden rain, then Victory. However, the differences are not much.

df6.1 <- oat2 %>% 
  dplyr::select(variety, yield) %>%                         # MASS package making select of dplyr error, having dplyr:: solve the issue.  
  group_by(variety) %>% 
  mutate(count = n(),
         xlab = as.factor(paste0(variety, "\n", "(n = ", count, ")"))) 
  
  
ggplot(df6.1, aes(x = reorder(xlab, -yield), y = yield, fill = variety, colour = variety)) +
  geom_boxplot(alpha = 0.1, outlier.shape = NA) +
  stat_boxplot(geom = "errorbar") +
  geom_jitter(width = 0.2) +
  stat_summary(fun.y = mean, geom = "point", size = 7, shape = 4, stroke = 2) +
  labs(x = "Oat Variety",
       y = "Yield, kg/ha",
       title = "Figure 1. There is no obvious yield different between Oat Varietes",
       subtitle = " 'x' in the boxplot represents mean.") +
  theme_bw() +
  theme(legend.position = "none",
        axis.title.x = element_text(margin = margin(10, 0, 0, 0)),
        plot.title = element_text(face = "bold", vjust = 1))

Yield vs Nutrient Levels

The yield of oat increased with increasing nutrient content. A positive relationship is observed. However, a subtle insight should be noted which is the rate of yield increment from 50 kg/ha of nitrogen to 75 kg/ha is not as much as from 0 kg/ha to 25 kg/ha and from 25 kg/ha to 50 kg/ha.

df6.2 <- oat2 %>% 
  dplyr::select(nutrient, yield) %>%                         # MASS package making select of dplyr error, having dplyr:: solve the issue.  
  group_by(nutrient) %>% 
  mutate(count = n(),
         xlab = as.factor(paste0(nutrient, "\n", "(n = ", count, ")"))) 
  
  
ggplot(df6.2, aes(x = reorder(xlab, yield), y = yield, fill = nutrient, colour = nutrient)) +
  geom_boxplot(alpha = 0.1, outlier.shape = NA) +
  stat_boxplot(geom = "errorbar") +
  geom_jitter(width = 0.2) +
  stat_summary(fun.y = mean, geom = "point", size = 7, shape = 4, stroke = 2) +
  labs(x = "Nutrient levels, kg/ha",
       y = "Yield, kg/ha",
       title = "Figure 2. Oat yields increased with increasing Nutrient Content",
       subtitle = " 'x' in the boxplot represents mean.") +
  theme_bw() +
  theme(legend.position = "none",
        axis.title.x = element_text(margin = margin(10, 0, 0, 0)),
        plot.title = element_text(face = "bold", vjust = 1))

An important question for farmer is that whether the smaller increment of yield at the 2 highest nutrient levels outweigh the cost of the additional nutrient added.

Yield vs Blocks

This section is important to observe yields among blocks. In a most basic experimental design, blocks are often treated as replicates and each replicates should be free from environmental noises, meaning the results between each block should be similar.

However, perhaps Block 1 has been identified to have a condition that favors plant growth. This would be a reason why split-plot is selected compared to a CRD (completely randomized design) to account for the heterogeneous condition.

df6.3 <- oat2 %>% 
  dplyr::select(block, yield) %>%                         # MASS package making select of dplyr error, having dplyr:: solve the issue.  
  group_by(block) %>% 
  mutate(count = n(),
         xlab = as.factor(paste0(block, "\n", "(n = ", count, ")"))) 
  
  
ggplot(df6.3, aes(x = reorder(xlab, -yield), y = yield, fill = block, colour = block)) +
  geom_boxplot(alpha = 0.1, outlier.shape = NA) +
  stat_boxplot(geom = "errorbar") +
  geom_jitter(width = 0.2) +
  stat_summary(fun.y = mean, geom = "point", size = 7, shape = 4, stroke = 2) +
  labs(x = "Block",
       y = "Yield, kg/ha",
       title = "Figure 3. Yield in Block I has the Highest Yield than Other Blocks",
       subtitle = " 'x' in the boxplot represents mean.") +
  theme_bw() +
  theme(legend.position = "none",
        axis.title.x = element_text(margin = margin(10, 0, 0, 0)),
        plot.title = element_text(face = "bold", vjust = 1))

Again, the type of statistical analysis from the split-block design allows one to account for the errors (noises) between blocks. Additionally, difference among plots will also be accounted in the statistical analysis section, which further increases the accuracy of the result.

Yield vs Variety + Nutrient

Following graph summarizes the effects of oat varieties and nutrient contents in the soil on yield.

df6.4 <- oat2 %>% 
  dplyr::select(variety, nutrient, yield)                 # MASS package making select of dplyr error, having dplyr:: solve the issue.  
  

ggplot(df6.4, aes(x = variety, y = yield, fill = nutrient)) +
  geom_boxplot(alpha = 0.8, outlier.shape = NA) +
  stat_boxplot(geom = "errorbar") +
  geom_point(position = position_jitterdodge(), width = 0.1, size = 2, shape = 21, aes(fill = nutrient)) +
  stat_summary(fun.y = mean, geom = "point", size = 5, shape = 4, position = position_jitterdodge(0), colour = "blue", stroke = 1.5) +
  labs(x = "Block",
       y = "Yield, kg/ha",
       title = "Figure 4. The Effect of Variety + Nutrient on Oat Yield.",
       subtitle = " 'x' in the boxplot represents mean.",
       fill = "Nutrient level, kg/ha: ") +
  theme_bw() +
  theme(legend.position = "bottom",
        axis.title.x = element_text(margin = margin(10, 0, 0, 0)),
        plot.title = element_text(face = "bold", vjust = 1)) + 
  scale_fill_manual(values = c("yellow", "green1", "green2", "green4")) 

  • Victory variety always has the lowest yield across all nutrient levels compared to other varieties.
  • There are consistent outliers observed throughout individual nutrient treatment of Victory.
  • The outliers can be due to the effect of block. 6 data points of each boxplot are planted in 6 different blocks. It is highly likely that there is an effect of block. If it is true, a statistical analysis that taking block into account is important for more accurate results.

Yield vs Variety + Nutrient + Block

Following figure 5 has too many variables and making it less effective in story telling. I hope my addition of light to dark green to highlight different nutrient levels will aid you a little.

Insights:

  • The outliers observed in the victory of afore figure 4 could be planted in block I because yields in block 1 is visually higher than other blocks.
  • 75 kg/ha of nutrient level (N) has always the highest yield but not most of the times, sometime 50 kg/ha of N nutrient is higher.
  • In all blocks, there is not much visually difference in yield between oat varieties.
df6.5 <- oat2 %>% 
  group_by(block, variety) %>% 
  mutate(count = n(),
         xlab = as.factor(paste0(variety, "\n", "(n = ", count, ")"))) 
  
ggplot(df6.5, aes(x = variety, y = yield, fill = nutrient)) + 
  geom_bar(stat = "identity", position = "dodge", colour = "black") +
  facet_wrap(~block) +
  labs(x = "Oat Variety",
       y = "Yield, kg/ha",
       fill = "Nutrient level, kg/ha: ",
       title = "Figure 5. Oat Yield of Different Treatments in Different Blocks") +
  theme_bw() +
  theme(legend.position = "bottom",
        axis.title.x = element_text(margin = margin(10, 0, 0, 0)),
        plot.title = element_text(face = "bold", vjust = 1)) +
   theme(axis.text.x = element_text(angle = 90, vjust = 0.5))+
  scale_fill_manual(values = c("yellow", "green1", "green2", "green4"))

STATISTICAL ANALYSIS

The statistical analysis for split plot design requires a model that takes both fixed and random effects into account. Fixed effect include the variety, nutrient, and the interaction of these two variables. Random effects include the blocks and the plots. Assuming some residuals may be the effect of blocks and plots due to their inherent differences despite the effects might be insignificant and the researchers have done their best in choosing a research field. The model will still incorporate and consider errors due to these random effects.

Prior to all processes, my usual first step is assumptions checking, it includes the testing of the normality of residuals, and data variances among all possible combination of treatments. This steps are important to decide what omnibus and post-hoc tests.

Building the model

model <- lmer(yield ~ variety + nutrient + variety:nutrient + (1|block) + (1|block:variety),
              data = oat2)
summary(model)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: yield ~ variety + nutrient + variety:nutrient + (1 | block) +  
    (1 | block:variety)
   Data: oat2

REML criterion at convergence: 542.8

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.81365 -0.55118  0.02205  0.64379  1.57713 

Random effects:
 Groups        Name        Variance Std.Dev.
 block:variety (Intercept) 133.3    11.54   
 block         (Intercept) 268.9    16.40   
 Residual                  223.1    14.94   
Number of obs: 72, groups:  block:variety, 18; block, 6

Fixed effects:
                             Estimate Std. Error      df t value Pr(>|t|)    
(Intercept)                   89.6667    10.2086 16.1263   8.783 1.52e-07 ***
varietyMarvellous              7.6667    10.8993 30.2658   0.703   0.4872    
varietyVictory                -9.8333    10.8993 30.2658  -0.902   0.3741    
nutrient25                    20.6667     8.6237 45.0000   2.397   0.0208 *  
nutrient50                    38.6667     8.6237 45.0000   4.484 5.02e-05 ***
nutrient75                    50.3333     8.6237 45.0000   5.837 5.45e-07 ***
varietyMarvellous:nutrient25   3.6667    12.1957 45.0000   0.301   0.7651    
varietyVictory:nutrient25      0.1667    12.1957 45.0000   0.014   0.9892    
varietyMarvellous:nutrient50  -4.6667    12.1957 45.0000  -0.383   0.7038    
varietyVictory:nutrient50      5.8333    12.1957 45.0000   0.478   0.6347    
varietyMarvellous:nutrient75  -5.5000    12.1957 45.0000  -0.451   0.6542    
varietyVictory:nutrient75      2.6667    12.1957 45.0000   0.219   0.8279    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) vrtyMr vrtyVc ntrn25 ntrn50 ntrn75 vrM:25 vrV:25 vrM:50
vartyMrvlls -0.534                                                        
varityVctry -0.534  0.500                                                 
nutrient25  -0.422  0.396  0.396                                          
nutrient50  -0.422  0.396  0.396  0.500                                   
nutrient75  -0.422  0.396  0.396  0.500  0.500                            
vrtyMrvl:25  0.299 -0.559 -0.280 -0.707 -0.354 -0.354                     
vrtyVctr:25  0.299 -0.280 -0.559 -0.707 -0.354 -0.354  0.500              
vrtyMrvl:50  0.299 -0.559 -0.280 -0.354 -0.707 -0.354  0.500  0.250       
vrtyVctr:50  0.299 -0.280 -0.559 -0.354 -0.707 -0.354  0.250  0.500  0.500
vrtyMrvl:75  0.299 -0.559 -0.280 -0.354 -0.354 -0.707  0.500  0.250  0.500
vrtyVctr:75  0.299 -0.280 -0.559 -0.354 -0.354 -0.707  0.250  0.500  0.250
            vrV:50 vrM:75
vartyMrvlls              
varityVctry              
nutrient25               
nutrient50               
nutrient75               
vrtyMrvl:25              
vrtyVctr:25              
vrtyMrvl:50              
vrtyVctr:50              
vrtyMrvl:75  0.250       
vrtyVctr:75  0.500  0.500

Assumption tests

Normality of residuals

Residues generated from the mixed effect model of the dataset are normally distributed, supported by the following Q-Q plot that most of the points are lying close to the line and are mostly within the 95% confidence shaded area around the line.

oat2$resid <- resid(model)
ggplot(oat2, aes(sample = resid)) +
  stat_qq_band() +
  stat_qq_line() +
  stat_qq_point() +
  labs(title = "Q-Q plot",
       subtitle = "model = lmer(yield ~ variety + nutrient + variety:nutrient + (1|block) + (1|block:variety),
              data = oat2)") +
  theme(plot.title = element_text(face = "bold"))

The normality is also supported by Shapiro-Wilk test that test on the residuals of the model and having a p-value of 0.2811, which is higher than 0.05, and fail to reject the null hypothesis that the residuals are normally distributed.

shapiro.test(resid(model))

    Shapiro-Wilk normality test

data:  resid(model)
W = 0.97928, p-value = 0.2811

Spread of variances

Applying Levene’s test to test the spread of variances among levels within two of the fixed effect variables - “variety” and “nutrient”. Levene’s tests show that variances among the treatment groups of variety are equal to each other, with P-value higher than 0.05. The variances among the treatment groups of nutrient level also equal to each other (P-value > 0.05).

Variance equality among variety

LeveneTest(oat2$yield ~ oat2$variety)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  2  0.4424 0.6443
      69               

Test of variance equality among blocks

LeveneTest(oat2$yield ~ oat2$block)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  5  0.3629 0.8721
      66               

Variance equality among nutrients levels

LeveneTest(oat2$yield ~ oat2$nutrient)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  3  0.0501 0.9851
      68               

Omnibus test

Based on the results of assumption tests in the previous section, anova is selected as the omnibus test to find out if there are differences between groups.

print(anova(model))
Type III Analysis of Variance Table with Satterthwaite's method
                  Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)    
variety            668.4   334.2     2    10  1.4979    0.2698    
nutrient         25195.2  8398.4     3    45 37.6436 2.502e-12 ***
variety:nutrient   418.0    69.7     6    45  0.3122    0.9273    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Computed ANOVA table shows that the F-value of nutrient treatment is very high which indicating that the variances between groups is significantly higher than variances within group and resulting a highly significant P-value of way lesser than 0.05. There are no significant difference between oat varieties, as well as interaction between oat varieties and nutrient levels.

Post-Hoc analysis

Since there is significant result from the omnibus test, post-hoc test is allowed to be proceeded. The column “.group” shows a summary of the statistical results with letters. Letters that overlapped each other are not significant different or otherwise (P = 0.05).

Post-hoc for Nutrient only

ls_n <- lsmeans(model, ~nutrient)
lscld_n <- cld(ls_n, Letters = LETTERS, reversed = T)
lscld_n
 nutrient lsmean   SE  df lower.CL upper.CL .group
 75        138.3 8.04 6.8    119.2      157  A    
 50        128.0 8.04 6.8    108.9      147  A    
 25        110.9 8.04 6.8     91.8      130   B   
 0          88.9 8.04 6.8     69.8      108    C  

Results are averaged over the levels of: variety 
Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
P value adjustment: tukey method for comparing a family of 4 estimates 
significance level used: alpha = 0.05 
NOTE: If two or more means share the same grouping symbol,
      then we cannot show them to be different.
      But we also did not show them to be the same. 

A 95% confidence interval plot is also a very helpful visualization.

ggplot(lscld_n, aes(x = nutrient, y = lsmean, colour = nutrient)) +
  geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), size = 2) +
  geom_point(size = 6) +
  geom_text(aes(label = .group), vjust = -6, stroke = 1.5) +
  scale_y_continuous(lim = c(60, 180), breaks = seq(60, 180, 20)) +
  theme_classic() +
  theme(legend.position = "none",
        plot.title = element_text(face = "bold")) +
  labs(title = "CI Plot: Nutrient levels",
       x = "Nutrient level, kg/ha",
       y = "Plant yield, kg/ha")

Insights: * Nutrients levels at 50 kg/ha and 75 kg/ha have significant higher plant yield than 0 kg/ha and 25 kg/ha. * However, two of the 50 kg/ha and 75 kg/ha nutrients level are not significantly different from each other. * This plot does not separate out the data from 3 varieties of oats.

A final overall post-hoc test will be computed to obverse overall differences.

Post-hoc for interaction between Nutrient and variety

Following shows the statistical results between the treatment groups of variety and nutrient. Letters that overlap each other indicating not significant different (P = 0.05).

ls_vn <- lsmeans(model, ~ variety * nutrient)
lscld_vn <- cld(ls_vn, Letters = LETTERS, reversed = T)
lscld_vn
 variety     nutrient lsmean   SE   df lower.CL upper.CL .group 
 Marvellous  75        142.2 10.2 16.1    120.5      164  A     
 Golden.rain 75        140.0 10.2 16.1    118.4      162  A     
 Victory     75        132.8 10.2 16.1    111.2      154  A C   
 Marvellous  50        131.3 10.2 16.1    109.7      153  AB    
 Golden.rain 50        128.3 10.2 16.1    106.7      150  ABCD  
 Victory     50        124.3 10.2 16.1    102.7      146  ABCDE 
 Marvellous  25        121.7 10.2 16.1    100.0      143  ABCDE 
 Golden.rain 25        110.3 10.2 16.1     88.7      132  ABCDEF
 Victory     25        100.7 10.2 16.1     79.0      122   B DEF
 Marvellous  0          97.3 10.2 16.1     75.7      119    CDEF
 Golden.rain 0          89.7 10.2 16.1     68.0      111      EF
 Victory     0          79.8 10.2 16.1     58.2      101       F

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
P value adjustment: tukey method for comparing a family of 12 estimates 
significance level used: alpha = 0.05 
NOTE: If two or more means share the same grouping symbol,
      then we cannot show them to be different.
      But we also did not show them to be the same. 

Insights:

  • Resulted nutrient levels are ordered in a top-down absolute way that the higher the nutrient level in the soil, the better the plant growth.
  • All 3 oat varieties with 75 kg/ha of nutrient (N) had the highest growth.
  • Marvellous is the oat variety that always had the highest yield in each nutrient level.
  • Victory oat variety always had the lowest yield in each nutrient level.

Visualisualing the above result using the lsmean will be:

ggplot(lscld_vn, aes(x = nutrient, y = lsmean, colour = variety)) +
  geom_path(aes(group = variety), size = 1.5) +
  geom_point(size = 4) +  
  theme_classic() +
  theme(legend.position = "none") +
  labs(title = "Interaction between Nutrient Levels and Variety",
       x = "Nutrient level, kg/ha",
       y = "Plant yield (average), kg/ha") +
  geom_text(data = lscld_vn %>% filter(nutrient == "75"), 
            aes(label = variety),
            hjust = -0.2)

CONCLUSION

Base on the data from this dataset, result shows that there is no significant yield difference among Marvellous, Golden rain and Victory, with P-value of higher than 0.05. However, there is significant yield difference between nutrient levels applied in the experiment (P-value < 0.05). It implies that as long as any higher rate of given nutrient level in the experiment is added, any oat variety that used in the experiment can outcompete another variety used in the experiment, though the result might be insignificant.

It should be noted that although the highest nutrient level applied in the experiment (75 kg/ha) has a higher yield than the second highest nutrient level (50 kg/ha), the difference was not significant. The suitability of adopting the highest nutrient level in this experiment should rely on the farmer’s context, economical-feasibility, and local environmental concerns. On the other aspect, there is no significant interaction effect between oat varieties and nutrient levels applied in the experiment. Base on the variety and nutrient levels applied in the experiment, there was no evidence from the statistical result showing that one oat variety may response to the nutrient significantly than another (P-value > 0.05).

If a decision has to be made to choose the best combination of oat variety and nutrient level base on the result of this project, my own top 3 best combinations are the three varieties, Marvellous, Golden rain, and Victory, with 50 kg per ha of manurial nitrogen application. Marvellous will be my most preferred oat variety, then followed by Golden rain, and lastly the victory oat variety. I am avoiding the maximum yield with the highest rate of nitrogen application as the result of yield gained was proven not significantly (P-value > 0.05) higher from my top 3 recommendations, I am also aiming for a better farm economics and environmental outcome.

LEGALITY

This is a personal project created and designed for skill demonstration and non-commercial use only. All photos in this project, such as those that applied as the thumbnail are just for demonstration only, they are not related to the dataset, the location where the data were collected, and the results of this analysis.

REFERENCE

McDonald G 2021, Raised beds – design, layout, construction and renovation, view 10 July 2021, https://www.agric.wa.gov.au/waterlogging/raised-beds-%E2%80%93-design-layout-construction-and-renovation

Sendelbach 2005, Avena sativa L (oat), viewed 05 July 2021, https://en.wikipedia.org/wiki/Oat#/media/File:Avena_sativa_L.jpg

smartcopying n.d., Quick Guide to Creative Commons, viewed 04 July 2021, https://smartcopying.edu.au/quick-guide-to-creative-commons/

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Wright 2021, Package ‘agridat’, CRAN, viewed 04 July 2021, http://kwstat.github.io/agridat/

Yates, F. (1935) Complex experiments, Journal of the Royal Statistical Society Suppl. 2, 181–247.

Yates, F. (1970) Experimental design: Selected papers of Frank Yates, C.B.E, F.R.S. London: Griffin.

SPLIT PLOT DESIGN (Class Example)

Import the data

data <- read.csv("split plot design.csv")
head(data,10)
   Replication            Nitrogen Varieties Yield..kgs.
1            1  Nitrogen (0kgs/ha)        s1        4430
2            1  Nitrogen (0kgs/ha)        s2        3944
3            1  Nitrogen (0kgs/ha)        s3        3464
4            1  Nitrogen (0kgs/ha)        s4        4426
5            1 Nitrogen (30kgs/ha)        s1        5418
6            1 Nitrogen (30kgs/ha)        s2        6562
7            1 Nitrogen (30kgs/ha)        s3        4768
8            1 Nitrogen (30kgs/ha)        s4        5192
9            1 Nitrogen (60kgs/ha)        s1        6076
10           1 Nitrogen (60kgs/ha)        s2        6008
data$Nitrogen <- as.factor(data$Nitrogen)
data$Varieties <- as.factor(data$Varieties)
data$Replication <- as.factor(data$Replication)
data$Yield..kgs. <- as.numeric((data$Yield..kgs.))

View the data

head(data,10)
   Replication            Nitrogen Varieties Yield..kgs.
1            1  Nitrogen (0kgs/ha)        s1        4430
2            1  Nitrogen (0kgs/ha)        s2        3944
3            1  Nitrogen (0kgs/ha)        s3        3464
4            1  Nitrogen (0kgs/ha)        s4        4426
5            1 Nitrogen (30kgs/ha)        s1        5418
6            1 Nitrogen (30kgs/ha)        s2        6562
7            1 Nitrogen (30kgs/ha)        s3        4768
8            1 Nitrogen (30kgs/ha)        s4        5192
9            1 Nitrogen (60kgs/ha)        s1        6076
10           1 Nitrogen (60kgs/ha)        s2        6008

Check the structure of the dataset

str(data)
'data.frame':   72 obs. of  4 variables:
 $ Replication: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ Nitrogen   : Factor w/ 6 levels "Nitrogen (0kgs/ha)",..: 1 1 1 1 5 5 5 5 6 6 ...
 $ Varieties  : Factor w/ 4 levels "s1","s2","s3",..: 1 2 3 4 1 2 3 4 1 2 ...
 $ Yield..kgs.: num  4430 3944 3464 4426 5418 ...
attach(data)

EXPLORATORY DATA ANALYSIS (EDA)

After cleaning, the very first step is always visualizing the data because it is the time one can have a chance to have a general understanding of the data set, what are the obvious trends, how data points are spread out, and how each levels of variables are visually different from each other.

My EDA will explore:

  • Yield vs Variety
  • Yield vs Nitrogen
  • Yield vs Replication
  • Yield vs Variety + Nitrogen
  • Yield vs Variety + Nitrogen + Replication

There is no obvious yield difference among oat varieties. If a comparison is needed to be made, the Marvellous has the highest average yield, followed by Golden rain, then Victory. However, the differences are not much.

Yields vs Variety

mydf <- data %>% 
  dplyr::select(Varieties, Yield..kgs.) %>%                         # MASS package making select of dplyr error, having dplyr:: solve the issue.  
  group_by(Varieties) %>% 
  mutate(count = n(),
         xlab = as.factor(paste0(Varieties, "\n", "(n = ", count, ")"))) 
  
  
ggplot(mydf, aes(x = reorder(xlab, -Yield..kgs.), y = Yield..kgs., fill = Varieties, colour = Varieties)) +
  geom_boxplot(alpha = 0.1, outlier.shape = NA) +
  stat_boxplot(geom = "errorbar") +
  geom_jitter(width = 0.2) +
  stat_summary(fun.y = mean, geom = "point", size = 7, shape = 4, stroke = 2) +
  labs(x = "Variety",
       y = "Yield, kg/ha",
       title = "Figure 1. There is no obvious yield difference between Varieties",
       subtitle = " 'x' in the boxplot represents mean.") +
  theme_bw() +
  theme(legend.position = "none",
        axis.title.x = element_text(margin = margin(10, 0, 0, 0)),
        plot.title = element_text(face = "bold", vjust = 1))

Yields Vs Nitrogen

mydf2 <- data %>% 
  dplyr::select(Nitrogen, Yield..kgs.) %>%                         # MASS package making select of dplyr error, having dplyr:: solve the issue.  
  group_by(Nitrogen) %>% 
  mutate(count = n(),
         xlab = as.factor(paste0(Nitrogen, "\n", "(n = ", count, ")"))) 
  
  
ggplot(mydf2, aes(x = reorder(xlab, -Yield..kgs.), y = Yield..kgs., fill = Nitrogen, colour = Nitrogen)) +
  geom_boxplot(alpha = 0.1, outlier.shape = NA) +
  stat_boxplot(geom = "errorbar") +
  geom_jitter(width = 0.2) +
  stat_summary(fun.y = mean, geom = "point", size = 7, shape = 4, stroke = 2) +
  labs(x = "Nitrogen",
       y = "Yield, kg/ha",
       title = "Figure 1. There is no obvious yield difference between Nitrogen Levels",
       subtitle = " 'x' in the boxplot represents mean.") +
  theme_bw() +
  theme(legend.position = "none",
        axis.title.x = element_text(margin = margin(10, 0, 0, 0)),
        plot.title = element_text(face = "bold", vjust = 1))

Yields vs Replication

mydf <- data %>% 
  dplyr::select(Replication, Yield..kgs.) %>%                         # MASS package making select of dplyr error, having dplyr:: solve the issue.  
  group_by(Replication) %>% 
  mutate(count = n(),
         xlab = as.factor(paste0(Replication, "\n", "(n = ", count, ")"))) 
  
  
ggplot(mydf, aes(x = reorder(xlab, -Yield..kgs.), y = Yield..kgs., fill = Replication, colour = Replication)) +
  geom_boxplot(alpha = 0.1, outlier.shape = NA) +
  stat_boxplot(geom = "errorbar") +
  geom_jitter(width = 0.2) +
  stat_summary(fun.y = mean, geom = "point", size = 7, shape = 4, stroke = 2) +
  labs(x = "Replication",
       y = "Yield, kg/ha",
       title = "Figure 1. There is no obvious yield difference between Replications",
       subtitle = " 'x' in the boxplot represents mean.") +
  theme_bw() +
  theme(legend.position = "none",
        axis.title.x = element_text(margin = margin(10, 0, 0, 0)),
        plot.title = element_text(face = "bold", vjust = 1))

Descriptive Statistics

Yields VS Nitrogen

library(dplyr)
SUM<- group_by(data, Nitrogen) %>%
  summarise(
    count = n(),
    mean = mean(Yield..kgs., na.rm = TRUE),
    sd = sd(Yield..kgs., na.rm = TRUE),
    median = median(Yield..kgs., na.rm = TRUE),
    max = max(Yield..kgs., na.rm = TRUE),
    min = min(Yield..kgs., na.rm = TRUE),
    IQR = IQR(Yield..kgs., na.rm = TRUE)
  )
SUM
# A tibble: 6 × 8
  Nitrogen             count  mean    sd median   max   min   IQR
  <fct>                <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
1 Nitrogen (0kgs/ha)      12 4284.  671.   4428  5318  3142 1053 
2 Nitrogen (120kgs/ha)    12 5915. 1362.   6416  7139  2774  900.
3 Nitrogen (150kgs/ha)    12 6068. 2024.   6628  7848  1414 1356.
4 Nitrogen (180kgs/ha)    12 5845. 2552.   6117  8832  1960 2981.
5 Nitrogen (30kgs/ha)     12 4912. 1011.   5001  6562  3142 1204 
6 Nitrogen (60kgs/ha)     12 5771.  859.   6045  6704  4146  844 

Yields VS Variety

library(dplyr)
SUM2<- group_by(data, Varieties) %>%
  summarise(
    count = n(),
    mean = mean(Yield..kgs., na.rm = TRUE),
    sd = sd(Yield..kgs., na.rm = TRUE),
    median = median(Yield..kgs., na.rm = TRUE),
    max = max(Yield..kgs., na.rm = TRUE),
    min = min(Yield..kgs., na.rm = TRUE),
    IQR = IQR(Yield..kgs., na.rm = TRUE)
  )
SUM2
# A tibble: 4 × 8
  Varieties count  mean    sd median   max   min   IQR
  <fct>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
1 s1           18 5655. 1763.   5581  8818  1960 2194 
2 s2           18 6184. 1414.   6491  8832  3660 1462.
3 s3           18 5626. 1287.   5936  7387  3142 1302 
4 s4           18 4398. 1637.   4691  7122  1414 2601.

Yields VS Replication

library(dplyr)
SUM<- group_by(data, Replication) %>%
  summarise(
    count = n(),
    mean = mean(Yield..kgs., na.rm = TRUE),
    sd = sd(Yield..kgs., na.rm = TRUE),
    median = median(Yield..kgs., na.rm = TRUE),
    max = max(Yield..kgs., na.rm = TRUE),
    min = min(Yield..kgs., na.rm = TRUE),
    IQR = IQR(Yield..kgs., na.rm = TRUE)
  )
SUM
# A tibble: 3 × 8
  Replication count  mean    sd median   max   min   IQR
  <fct>       <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
1 1              24 5385. 1751.   5693  8452  1414  2058
2 2              24 5881. 1396.   5869  8832  1960  1729
3 3              24 5132. 1735.   5158  8818  2014  2907

Analysis

(a) Which is the whole plot factor, and which is the split plot factor?

(b) What are the mean yield for each nitrogen-variety combination?

Visualize the data as given in class

x <- aov(Yield..kgs. ~ Varieties:Replication:Nitrogen, data = data)
model.tables(x, "mean")
Tables of means
Grand mean
         
5465.861 

 Varieties:Replication:Nitrogen 
, , Nitrogen = Nitrogen (0kgs/ha)

         Replication
Varieties 1    2    3   
       s1 4430 4478 3850
       s2 3944 5318 3660
       s3 3464 4914 3142
       s4 4426 4944 4836

, , Nitrogen = Nitrogen (120kgs/ha)

         Replication
Varieties 1    2    3   
       s1 6462 5744 6580
       s2 7139 7056 6564
       s3 5792 6982 6370
       s4 2774 5880 3639

, , Nitrogen = Nitrogen (150kgs/ha)

         Replication
Varieties 1    2    3   
       s1 7290 5036 7552
       s2 7682 7848 6576
       s3 7080 6594 6320
       s4 1414 6662 2766

, , Nitrogen = Nitrogen (180kgs/ha)

         Replication
Varieties 1    2    3   
       s1 8452 1960 8818
       s2 6228 8832 6006
       s3 5594 7387 5480
       s4 2248 7122 2014

, , Nitrogen = Nitrogen (30kgs/ha)

         Replication
Varieties 1    2    3   
       s1 5418 4482 3850
       s2 6562 5166 3660
       s3 4768 5858 3142
       s4 5192 6004 4836

, , Nitrogen = Nitrogen (60kgs/ha)

         Replication
Varieties 1    2    3   
       s1 6076 4604 6704
       s2 6008 6420 6642
       s3 6244 6127 6014
       s4 4546 5724 4146

Approach one

library(agricolae)
model <- with(data, sp.plot(Replication, Nitrogen, Varieties, Yield..kgs.))

ANALYSIS SPLIT PLOT:  Yield..kgs. 
Class level information

Nitrogen    :  Nitrogen (0kgs/ha) Nitrogen (30kgs/ha) Nitrogen (60kgs/ha) Nitrogen (120kgs/ha) Nitrogen (150kgs/ha) Nitrogen (180kgs/ha) 
Varieties   :  s1 s2 s3 s4 
Replication     :  1 2 3 

Number of observations:  72 

Analysis of Variance Table

Response: Yield..kgs.
                   Df   Sum Sq  Mean Sq F value   Pr(>F)   
Replication         2  6968351  3484175     NaN      NaN   
Nitrogen            5 30077112  6015422  9.4205 0.001525 **
Ea                 10  6385433   638543     NaN      NaN   
Varieties           3 30893555 10297852  4.4019 0.009748 **
Nitrogen:Varieties 15 32967362  2197824  0.9395 0.532468   
Eb                 36 84218377  2339399     NaN      NaN   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

cv(a) = 14.6 %, cv(b) = 28 %, Mean = 5465.861 
summary(model)
      Length Class  Mode   
ANOVA 5      anova  list   
gl.a  1      -none- numeric
gl.b  1      -none- numeric
Ea    1      -none- numeric
Eb    1      -none- numeric

Approach two (The Most Appropriate Approach)

head(data,4)
  Replication           Nitrogen Varieties Yield..kgs.
1           1 Nitrogen (0kgs/ha)        s1        4430
2           1 Nitrogen (0kgs/ha)        s2        3944
3           1 Nitrogen (0kgs/ha)        s3        3464
4           1 Nitrogen (0kgs/ha)        s4        4426
library(doebioresearch)
model <- splitplot(data[4],Replication, Nitrogen, Varieties,1)
model
$Yield..kgs.
$Yield..kgs.[[1]]
$Yield..kgs.[[1]][[1]]
Analysis of Variance Table

Response: dependent.var
                   Df   Sum Sq  Mean Sq F value   Pr(>F)   
block               2  6968351  3484175  5.4564 0.024999 * 
main.plot           5 30077112  6015422  9.4205 0.001525 **
Ea                 10  6385433   638543                    
sub.plot            3 30893555 10297852  4.4019 0.009748 **
main.plot:sub.plot 15 32967362  2197824  0.9395 0.532468   
Eb                 36 84218377  2339399                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$Yield..kgs.[[1]][[2]]
[1] "CV(a): 14.62 , CV(b) : 27.983"

$Yield..kgs.[[1]][[3]]
[1] "R Square 0.56"

$Yield..kgs.[[1]][[4]]

    Shapiro-Wilk normality test

data:  model$residuals
W = 0.88825, p-value = 1.068e-05


$Yield..kgs.[[1]][[5]]
[1] "Normality assumption is violated"

$Yield..kgs.[[1]][[6]]
[1] "The means of one or more levels of main plot factor are not same, so go for multiple comparison test"

$Yield..kgs.[[1]][[7]]
$Yield..kgs.[[1]][[7]][[1]]
   MSerror Df     Mean       CV  t.value      LSD
  638543.3 10 5465.861 14.61964 2.228139 726.8785

$Yield..kgs.[[1]][[7]][[2]]
                     dependent.var groups
Nitrogen (150kgs/ha)      6068.333      a
Nitrogen (120kgs/ha)      5915.167      a
Nitrogen (180kgs/ha)      5845.083      a
Nitrogen (60kgs/ha)       5771.250      a
Nitrogen (30kgs/ha)       4911.500      b
Nitrogen (0kgs/ha)        4283.833      b


$Yield..kgs.[[1]][[8]]
[1] "The means of one or more levels of sub plot factor are not same, so go for multiple comparison test"

$Yield..kgs.[[1]][[9]]
$Yield..kgs.[[1]][[9]][[1]]
  MSerror Df     Mean       CV  t.value      LSD
  2339399 36 5465.861 27.98296 2.028094 1033.996

$Yield..kgs.[[1]][[9]][[2]]
   dependent.var groups
s2      6183.944      a
s1      5654.778      a
s3      5626.222      a
s4      4398.500      b


$Yield..kgs.[[1]][[10]]
[1] "All the interaction level means are same so dont go for any multiple comparison test"

$Yield..kgs.[[1]][[11]]
$Yield..kgs.[[1]][[11]][[1]]
  MSerror Df     Mean       CV  t.value      LSD
  2339399 36 5465.861 27.98296 2.028094 2532.763

$Yield..kgs.[[1]][[11]][[2]]
                        dependent.var  groups
Nitrogen (150kgs/ha):s2      7368.667       a
Nitrogen (180kgs/ha):s2      7022.000      ab
Nitrogen (120kgs/ha):s2      6919.667      ab
Nitrogen (150kgs/ha):s3      6664.667     abc
Nitrogen (150kgs/ha):s1      6626.000    abcd
Nitrogen (180kgs/ha):s1      6410.000    abcd
Nitrogen (120kgs/ha):s3      6381.333    abcd
Nitrogen (60kgs/ha):s2       6356.667   abcde
Nitrogen (120kgs/ha):s1      6262.000  abcdef
Nitrogen (180kgs/ha):s3      6153.667  abcdef
Nitrogen (60kgs/ha):s3       6128.333 abcdefg
Nitrogen (60kgs/ha):s1       5794.667 abcdefg
Nitrogen (30kgs/ha):s4       5344.000 abcdefg
Nitrogen (30kgs/ha):s2       5129.333 abcdefg
Nitrogen (60kgs/ha):s4       4805.333  bcdefg
Nitrogen (0kgs/ha):s4        4735.333  bcdefg
Nitrogen (30kgs/ha):s3       4589.333  bcdefg
Nitrogen (30kgs/ha):s1       4583.333  bcdefg
Nitrogen (0kgs/ha):s2        4307.333   cdefg
Nitrogen (0kgs/ha):s1        4252.667   cdefg
Nitrogen (120kgs/ha):s4      4097.667    defg
Nitrogen (0kgs/ha):s3        3840.000     efg
Nitrogen (180kgs/ha):s4      3794.667      fg
Nitrogen (150kgs/ha):s4      3614.000       g

RESPONSE SURFACE METHODOLOGY

Data source:

Design and analysis of experiments / Douglas C. Montgomery. — Eighth edition. ISBN 978-1-118-14692-7

Data file: DoEOpt04.csv

  • Time (x1): Time (min)
  • Temp (x2): Temperature (ºC)
  • Y: Yield (%)

building the data set

  • (run this code to build the DoEOpt04 data set before following the analysis)
DoEOpt04 <- data.frame(x1 = c(-1,-1,1,1,0,0,0,0,0),
                          x2 = c(-1,1,-1,1,0,0,0,0,0),
                          Time = c(30,30,40,40,35,35,35,35,35),
                          Temp = c(150,160,150,160,155,155,155,155,155),
                          Y = c(39.3,40,40.9,41.5,40.3,40.5,40.7,40.2,40.6)
                            )
head(DoEOpt04,5)
  x1 x2 Time Temp    Y
1 -1 -1   30  150 39.3
2 -1  1   30  160 40.0
3  1 -1   40  150 40.9
4  1  1   40  160 41.5
5  0  0   35  155 40.3
View(DoEOpt04)

loading Response Surface Methodology package

library(rsm)

View the structured of the data file

str(DoEOpt04)
'data.frame':   9 obs. of  5 variables:
 $ x1  : num  -1 -1 1 1 0 0 0 0 0
 $ x2  : num  -1 1 -1 1 0 0 0 0 0
 $ Time: num  30 30 40 40 35 35 35 35 35
 $ Temp: num  150 160 150 160 155 155 155 155 155
 $ Y   : num  39.3 40 40.9 41.5 40.3 40.5 40.7 40.2 40.6

Consider the following image

knitr::include_graphics("rsm.png")

setting the relationship between coded and natural variables

DoEOpt04 <- as.coded.data(DoEOpt04, 
              x1 ~ (Time-35)/5,
              x2 ~ (Temp-155)/5)

regression model with coded variables

model <- rsm(Y ~ FO(x1,x2) + TWI(x1,x2), data = DoEOpt04)
summary(model)

Call:
rsm(formula = Y ~ FO(x1, x2) + TWI(x1, x2), data = DoEOpt04)

             Estimate Std. Error  t value  Pr(>|t|)    
(Intercept) 40.444444   0.062311 649.0693 1.648e-13 ***
x1           0.775000   0.093467   8.2917 0.0004166 ***
x2           0.325000   0.093467   3.4772 0.0177127 *  
x1:x2       -0.025000   0.093467  -0.2675 0.7997870    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.9418,    Adjusted R-squared:  0.9069 
F-statistic: 26.97 on 3 and 5 DF,  p-value: 0.00163

Analysis of Variance Table

Response: Y
            Df  Sum Sq Mean Sq F value    Pr(>F)
FO(x1, x2)   2 2.82500 1.41250 40.4213 0.0008188
TWI(x1, x2)  1 0.00250 0.00250  0.0715 0.7997870
Residuals    5 0.17472 0.03494                  
Lack of fit  1 0.00272 0.00272  0.0633 0.8137408
Pure error   4 0.17200 0.04300                  

Stationary point of response surface:
x1 x2 
13 31 

Stationary point in original units:
Time Temp 
 100  310 

Eigenanalysis:
eigen() decomposition
$values
[1]  0.0125 -0.0125

$vectors
         [,1]       [,2]
x1 -0.7071068 -0.7071068
x2  0.7071068 -0.7071068

Consider the output below

knitr::include_graphics("rsm2.png")

The interaction term in the model above is not significant. In other words, the quadratic function in our model is not necessary. We can therefore estimated the new model without the quadratic function as shown below

knitr::include_graphics("rsm3.png")

Estimate the new model without the quadratic term

model1 <- rsm(Y ~ FO(x1,x2), data = DoEOpt04)
summary(model1)

Call:
rsm(formula = Y ~ FO(x1, x2), data = DoEOpt04)

             Estimate Std. Error  t value  Pr(>|t|)    
(Intercept) 40.444444   0.057288 705.9869 5.451e-16 ***
x1           0.775000   0.085932   9.0188  0.000104 ***
x2           0.325000   0.085932   3.7821  0.009158 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.941, Adjusted R-squared:  0.9213 
F-statistic: 47.82 on 2 and 6 DF,  p-value: 0.0002057

Analysis of Variance Table

Response: Y
            Df  Sum Sq Mean Sq F value    Pr(>F)
FO(x1, x2)   2 2.82500 1.41250 47.8213 0.0002057
Residuals    6 0.17722 0.02954                  
Lack of fit  2 0.00522 0.00261  0.0607 0.9419341
Pure error   4 0.17200 0.04300                  

Direction of steepest ascent (at radius 1):
       x1        x2 
0.9221944 0.3867267 

Corresponding increment in original units:
    Time     Temp 
4.610972 1.933633 
knitr::include_graphics("rsm4.png")

The coefficient of the new model are significant with a higher R-square and a higher adjusted R-square as compare to the previous model. In other words, the new model is doing better than the previous one. Let’s now build the contour plot of the model above as a function of x1 and x2.

contour plot

contour(model1, ~ x1+x2, 
        image = TRUE, 
        xlabs=c("Time (min)", "Temperature (ºC)"))
points(DoEOpt04$Time,DoEOpt04$Temp)

From the plot above, the yield increases as both temperature and time increases. However the yield seems to be far from the optimal. In other words, we do not see the region of the maximum yield. However we can see the path of the maximum increase in the yield. This is what we the direction of the steepest ascent. Consider the graph below

knitr::include_graphics("ascent.png")

In the section we are going to see how to calculate the direction of the steepest ascent and how to follow the direction of the steepest ascent to the vicinity of the maximum response.

Methods of the Steepest Ascent

In the contour plot, the direction of the steepest ascent is perpendicular to the contour lines as shown below;

knitr::include_graphics("ascent2.png")

This indicates the direction of the maximum increase in the response. The information in the RSM output tells the relative variation between the two factors. Now, in order to follow the path of the steepest ascent in this design, for each 4.6 min of increase in time, we should increase the temperature by 1.933 degrees.

knitr::include_graphics("ascent3.png")

The mathematical formular is as shown below

knitr::include_graphics("ascent4.png")

The graph above shows that if we increase time by 10 min, we will have to increase temperature by 4.1 degrees Celsius.

knitr::include_graphics("max.png")

The plot above shows that the condition around 85 minutes and the temperature of around 175 degrees Celcius is the optimal condition for the maximum increase in yield. In other words, these are the experimental space for the highest yields.Now, let us check these assumption by building an experimental design around this point. We can now use the 85 minutes and the temperature of 175 degrees Celsius as the central point. Remember the range of time and temperature for the experimental design can be chosen freely. Lets go with a range of 10 minutes and 10 degrees Celsius. When the reaction changes from 85 min to 90 minutes, temperature rise from 175 to 180 degrees Celsius. Consider the figure below

knitr::include_graphics("design.png")

From the figure above, the design must run as a single replicate in the factorial point and five replicate at the central point.

Designs with Central Points – Interpreting a Lousy Fitting

Data source:

Design and analysis of experiments / Douglas C. Montgomery. — Eighth edition. ISBN 978-1-118-14692-7

  • Data file: DoEOpt05
  • Time (x1): Time (min)
  • Temp (x2): Temperature (ºC)
  • Y: Yield (%)

Building the data set

(run this code to build the DoEOpt05 data set before following the analysis)

DoEOpt05 <- data.frame(x1 = c(-1,-1,1,1,0,0,0,0,0),
                          x2 = c(-1,1,-1,1,0,0,0,0,0),
                          Time = c(80,80,90,90,85,85,85,85,85),
                          Temp = c(170,180,170,180,175,175,175,175,175),
                          Y = c(76.5,77,78,79.5,79.9,80.3,80,79.7,79.8)
                            )
head(DoEOpt05,5)
  x1 x2 Time Temp    Y
1 -1 -1   80  170 76.5
2 -1  1   80  180 77.0
3  1 -1   90  170 78.0
4  1  1   90  180 79.5
5  0  0   85  175 79.9

View the data

View(DoEOpt05)

loading Response Surface Methodology package

library(rsm)

File Structure

str(DoEOpt05)
'data.frame':   9 obs. of  5 variables:
 $ x1  : num  -1 -1 1 1 0 0 0 0 0
 $ x2  : num  -1 1 -1 1 0 0 0 0 0
 $ Time: num  80 80 90 90 85 85 85 85 85
 $ Temp: num  170 180 170 180 175 175 175 175 175
 $ Y   : num  76.5 77 78 79.5 79.9 80.3 80 79.7 79.8

After checking the structure of the data, the next step is to assign the relationship between the coded and the natural variable using the command below. We will therefore assign the x1 and x2 as the coded data.

DoEOpt05 <- as.coded.data(DoEOpt05, 
              x1 ~ (Time-85)/5,
              x2 ~ (Temp-175)/5)

Now check the structure of data

str(DoEOpt05)
Classes 'coded.data' and 'data.frame':  9 obs. of  5 variables:
 $ x1  : num  -1 -1 1 1 0 0 0 0 0
 $ x2  : num  -1 1 -1 1 0 0 0 0 0
 $ Time: num  80 80 90 90 85 85 85 85 85
 $ Temp: num  170 180 170 180 175 175 175 175 175
 $ Y   : num  76.5 77 78 79.5 79.9 80.3 80 79.7 79.8
 - attr(*, "codings")=List of 2
  ..$ x1:Class 'formula'  language x1 ~ (Time - 85)/5
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  ..$ x2:Class 'formula'  language x2 ~ (Temp - 175)/5
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
 - attr(*, "rsdes")=List of 2
  ..$ primary: chr [1:2] "x1" "x2"
  ..$ call   : language as.coded.data(data = DoEOpt05, x1 ~ (Time - 85)/5, x2 ~ (Temp - 175)/5)

View the first few observations

head(DoEOpt05,5)
  Time Temp Time Temp    Y
1   80  170   80  170 76.5
2   80  180   80  180 77.0
3   90  170   90  170 78.0
4   90  180   90  180 79.5
5   85  175   85  175 79.9

Data are stored in coded form using these coding formulas ...
x1 ~ (Time - 85)/5
x2 ~ (Temp - 175)/5

The next step is to run the regression model

Regression model with coded variables

model <- rsm(Y ~ FO(x1,x2) + TWI(x1,x2), data = DoEOpt05)
summary(model)

Call:
rsm(formula = Y ~ FO(x1, x2) + TWI(x1, x2), data = DoEOpt05)

            Estimate Std. Error  t value  Pr(>|t|)    
(Intercept) 78.96667    0.49148 160.6702 1.772e-10 ***
x1           1.00000    0.73722   1.3564    0.2330    
x2           0.50000    0.73722   0.6782    0.5277    
x1:x2        0.25000    0.73722   0.3391    0.7483    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.3257,    Adjusted R-squared:  -0.07891 
F-statistic: 0.805 on 3 and 5 DF,  p-value: 0.5426

Analysis of Variance Table

Response: Y
            Df Sum Sq Mean Sq F value    Pr(>F)
FO(x1, x2)   2  5.000   2.500   1.150 0.3882680
TWI(x1, x2)  1  0.250   0.250   0.115 0.7483056
Residuals    5 10.870   2.174                  
Lack of fit  1 10.658  10.658 201.094 0.0001436
Pure error   4  0.212   0.053                  

Stationary point of response surface:
x1 x2 
-2 -4 

Stationary point in original units:
Time Temp 
  75  155 

Eigenanalysis:
eigen() decomposition
$values
[1]  0.125 -0.125

$vectors
        [,1]       [,2]
x1 0.7071068 -0.7071068
x2 0.7071068  0.7071068

Plot the contour plot

contour(model, ~ x1+x2, 
        image = TRUE, 
        xlabs=c("Time (min)", "Temperature (ºC)"))
points(DoEOpt05$Time,DoEOpt05$Temp)

From the model above, the results are pretty disappointing. Neither x1, x2 nor their interaction were significant. Besides, R-square and adjusted R-square are extremely low. Besides, the p-value shows that the model is not significant, and lack of fit was significant. Let’s take a keen look at the results and try to understand what is happening.

knitr::include_graphics("data.png")

From the results below, the effect of time from 77.0 to 79.5 is 2.5 while from 76.5 to 78 is 1.5. On the other hand, the effect of temperature from 76.5 to 77.0 is 0.5 while the effect of temperature from 78 to 79.5 is 1.5. The error variation among replicate point is 0.6. In other words, the effect of time and temperature seem to be higher than the intrinsic error.

knitr::include_graphics("data2.png")

The linear model with interaction is not adequate to describe the behavior of the data. We can also visualize this as shown below

plots

plot(DoEOpt05$Time, DoEOpt05$Y, xlab = "Time", ylab = "Yield (%)")

plot(DoEOpt05$Temp, DoEOpt05$Y, xlab = "Temperature", ylab = "Yield (%)")

WE can see the non-linear behavior when we plot the experimental results against the regression variables for both time and temperature. The behavior strictly follows the second order model. The quadratic model with interaction factor effect has six regression parameters, however, our design has five independent runs since the replicates are not independent runs. We need at least one independent run for each regression parameter. The 2^k design with central point can check model linearity but cannot estimates the regression parameters. The following section will see the experimental design that can fit the second order model.

Central Composite Design

One of the Key ideas behind an experimental design is to provide the results distribution of the data in the region of interest. The most used approach in the second order model is the central composite design. Starting by the power of k-factorial design with central point, the new point are obtained by rotating the factorial points by 45 degrees over x1 and x2. The new experimental points are called axial points and the cartesian codes are combination of (0, -α) or (0, +α). Consider the figure below.

knitr::include_graphics("alpha.png")

How to Calculate Alpha

Using two axial points and the central point, we can draw a triangle with hypotenuse = 2 and legs = α. The length of the hypotenuse is the difference between the high and low levels of the codes of the variable in between the power or K-design, which is equal to 2units. Consider the figure below

knitr::include_graphics("alpha2.png")

Using the well known Pythagoras theorem, we can easily determine the alpha value as 1.212 as shown below.

knitr::include_graphics("alpha3.png")

This way, the axial points are the combinations of 1.141 and 0) and -1.414 and 0. Consider the figure below.

knitr::include_graphics("alpha4.png")

This particular design is a spherical design with all experimental points equal distance from the design center. The experimental matrix of the central composite design consist of the following. Consider the figure below.

knitr::include_graphics("alpha5.png")

Let us now go back to our last design and argument it with axial point. We have to test the factorial design with central point and the relationship between the codes and the natural variables.

knitr::include_graphics("alpha6.png")

We can use this relationship to calculate the axial values of time and temperature. The axial points for time and temperatures are as shown below.

knitr::include_graphics("alpha7.png")

Before analyzing the response, let us visualize the treatment distribution with x1=time and x2= temperature. We can have a graphical visualiuzation of the experimental points.

knitr::include_graphics("alpha8.png")

However, this plot is not necessary, but is always a good to check the experimental points distribution. Lets us now analyze the data.

Analyzing the Central Composite Design

Data source:

Design and analysis of experiments / Douglas C. Montgomery. — Eighth edition. ISBN 978-1-118-14692-7

  • Data file: DoEOpt06
  • Time (x1): Time (min)
  • Temp (x2): Temperature (ºC)
  • Y: Yield (%)

building the data set

(run this code to build the DoEOpt05 data set before following the analysis)

DoEOpt06 <- data.frame(x1 = c(-1, -1, 1, 1, 0, 0, 0, 0, 0, 1.414, -1.414, 0, 0),
                          x2 = c(-1, 1, -1, 1, 0, 0, 0, 0, 0, 0, 0, 1.414, -1.414),
                          Time = c(80, 80, 90, 90, 85, 85, 85, 85, 85, 92.07, 77.93, 85, 85),
                          Temp = c(170, 180, 170, 180, 175, 175, 175, 175, 175, 175, 175, 182.07, 167.93),
                          Y = c(76.5, 77, 78, 79.5, 79.9, 80.3, 80, 79.7, 79.8, 78.4, 75.6, 78.5, 77)
                            )
head(DoEOpt06,5)
  x1 x2 Time Temp    Y
1 -1 -1   80  170 76.5
2 -1  1   80  180 77.0
3  1 -1   90  170 78.0
4  1  1   90  180 79.5
5  0  0   85  175 79.9
tail(DoEOpt06)
       x1     x2  Time   Temp    Y
8   0.000  0.000 85.00 175.00 79.7
9   0.000  0.000 85.00 175.00 79.8
10  1.414  0.000 92.07 175.00 78.4
11 -1.414  0.000 77.93 175.00 75.6
12  0.000  1.414 85.00 182.07 78.5
13  0.000 -1.414 85.00 167.93 77.0

loading Response Surface Methodology package

library(rsm)

checking file structure

str(DoEOpt06)
'data.frame':   13 obs. of  5 variables:
 $ x1  : num  -1 -1 1 1 0 ...
 $ x2  : num  -1 1 -1 1 0 0 0 0 0 0 ...
 $ Time: num  80 80 90 90 85 ...
 $ Temp: num  170 180 170 180 175 175 175 175 175 175 ...
 $ Y   : num  76.5 77 78 79.5 79.9 80.3 80 79.7 79.8 78.4 ...

setting the realtionship between the coded and the natural variables

DoEOpt06 <- as.coded.data(DoEOpt06, 
              x1 ~ (Time-85)/5,
              x2 ~ (Temp-175)/5)

Run the second order regression model for the Yield using rsm() function. However, we can two model as shown in the figure below

knitr::include_graphics("model.png")

model_Y <- rsm(Y ~ SO(x1,x2), data = DoEOpt06)
summary(model_Y)

Call:
rsm(formula = Y ~ SO(x1, x2), data = DoEOpt06)

             Estimate Std. Error  t value  Pr(>|t|)    
(Intercept) 79.939955   0.119089 671.2644 < 2.2e-16 ***
x1           0.995050   0.094155  10.5682 1.484e-05 ***
x2           0.515203   0.094155   5.4719  0.000934 ***
x1:x2        0.250000   0.133145   1.8777  0.102519    
x1^2        -1.376449   0.100984 -13.6303 2.693e-06 ***
x2^2        -1.001336   0.100984  -9.9158 2.262e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.9827,    Adjusted R-squared:  0.9704 
F-statistic: 79.67 on 5 and 7 DF,  p-value: 5.147e-06

Analysis of Variance Table

Response: Y
            Df  Sum Sq Mean Sq  F value    Pr(>F)
FO(x1, x2)   2 10.0430  5.0215  70.8143 2.267e-05
TWI(x1, x2)  1  0.2500  0.2500   3.5256    0.1025
PQ(x1, x2)   2 17.9537  8.9769 126.5944 3.194e-06
Residuals    7  0.4964  0.0709                   
Lack of fit  3  0.2844  0.0948   1.7885    0.2886
Pure error   4  0.2120  0.0530                   

Stationary point of response surface:
       x1        x2 
0.3892304 0.3058466 

Stationary point in original units:
     Time      Temp 
 86.94615 176.52923 

Eigenanalysis:
eigen() decomposition
$values
[1] -0.9634986 -1.4142867

$vectors
         [,1]       [,2]
x1 -0.2897174 -0.9571122
x2 -0.9571122  0.2897174

The results shows that both linear terms are significant. Besides, all the quadratic terms are significant. This second order model has a higher R-sqare and a very low p-value, pretty good. Additionally, lack of fit was not significant.

knitr::include_graphics("model2.png")

Alternatively

model_Y <- rsm(Y ~ FO(x1,x2) + TWI(x1,x2) + PQ(x1,x2), data = DoEOpt06)
summary(model_Y)

Call:
rsm(formula = Y ~ FO(x1, x2) + TWI(x1, x2) + PQ(x1, x2), data = DoEOpt06)

             Estimate Std. Error  t value  Pr(>|t|)    
(Intercept) 79.939955   0.119089 671.2644 < 2.2e-16 ***
x1           0.995050   0.094155  10.5682 1.484e-05 ***
x2           0.515203   0.094155   5.4719  0.000934 ***
x1:x2        0.250000   0.133145   1.8777  0.102519    
x1^2        -1.376449   0.100984 -13.6303 2.693e-06 ***
x2^2        -1.001336   0.100984  -9.9158 2.262e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.9827,    Adjusted R-squared:  0.9704 
F-statistic: 79.67 on 5 and 7 DF,  p-value: 5.147e-06

Analysis of Variance Table

Response: Y
            Df  Sum Sq Mean Sq  F value    Pr(>F)
FO(x1, x2)   2 10.0430  5.0215  70.8143 2.267e-05
TWI(x1, x2)  1  0.2500  0.2500   3.5256    0.1025
PQ(x1, x2)   2 17.9537  8.9769 126.5944 3.194e-06
Residuals    7  0.4964  0.0709                   
Lack of fit  3  0.2844  0.0948   1.7885    0.2886
Pure error   4  0.2120  0.0530                   

Stationary point of response surface:
       x1        x2 
0.3892304 0.3058466 

Stationary point in original units:
     Time      Temp 
 86.94615 176.52923 

Eigenanalysis:
eigen() decomposition
$values
[1] -0.9634986 -1.4142867

$vectors
         [,1]       [,2]
x1 -0.2897174 -0.9571122
x2 -0.9571122  0.2897174
knitr::include_graphics("model2.png")

Let us polish our model by removing the interaction term because it was significant. We do that by removing the TWI from the model.

model_Y <- rsm(Y ~ FO(x1,x2) + PQ(x1,x2), data = DoEOpt06)
summary(model_Y)

Call:
rsm(formula = Y ~ FO(x1, x2) + PQ(x1, x2), data = DoEOpt06)

            Estimate Std. Error  t value  Pr(>|t|)    
(Intercept) 79.93995    0.13660 585.2155 < 2.2e-16 ***
x1           0.99505    0.10800   9.2135 1.559e-05 ***
x2           0.51520    0.10800   4.7704  0.001408 ** 
x1^2        -1.37645    0.11583 -11.8831 2.310e-06 ***
x2^2        -1.00134    0.11583  -8.6447 2.489e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.974, Adjusted R-squared:  0.961 
F-statistic: 75.02 on 4 and 8 DF,  p-value: 2.226e-06

Analysis of Variance Table

Response: Y
            Df  Sum Sq Mean Sq F value    Pr(>F)
FO(x1, x2)   2 10.0430  5.0215 53.8227 2.290e-05
PQ(x1, x2)   2 17.9537  8.9769 96.2186 2.538e-06
Residuals    8  0.7464  0.0933                  
Lack of fit  4  0.5344  0.1336  2.5206    0.1962
Pure error   4  0.2120  0.0530                  

Stationary point of response surface:
       x1        x2 
0.3614555 0.2572577 

Stationary point in original units:
     Time      Temp 
 86.80728 176.28629 

Eigenanalysis:
eigen() decomposition
$values
[1] -1.001336 -1.376449

$vectors
   [,1] [,2]
x1    0   -1
x2   -1    0

In the new model, all the regression coefficients are significant. The model still has a higher R-square and a very lower p-value than the one from the previous model. Pretty good. Besides, lack of fit was not significant.

knitr::include_graphics("model3.png")

Visualization of the Results

Lets us now focus on the visualization of the results. For the results visualization, we are going to build both the contour plot and the perspective for the yields and time (x1) and temperature (x2). Consider the contour plot below ### contour and perspective plots

par(mfrow = c(1, 1))
contour(model_Y, x1~x2, image = TRUE, 
        xlabs=c("Time (min)", "Temperature (ºC)"))

persp(model_Y, x1~x2, col = terrain.colors(50), contours = "colors",
      zlab = "Yield (%)", 
      xlabs=c("Time (min)", "Temperature (ºC)"))

The contour and persp curve above are quite different from the ones we have analyzing from the previous linear models and interactions. Quadratic model can have have both maximum and minimum set of points. In this case, the plots shows that the reaction presents a set of condition, a specific combination of time and temperature that results in the maximum yields. The precise determination of the maximum point use the response surface equation. The maximum condition if it exists.

knitr::include_graphics("response2.png")

knitr::include_graphics("response.png")

The maximum condition if it exits, will be the set of x1 and x2 for which the partial derivative of delta y/ delta x1 = delta y/x2 = 0. Consider the equation if the figure below. This is called the stationary point.

knitr::include_graphics("condition.png")

Fortunately, with the new method, we do not need to derive the equation ourselves but R does for us. We can see the stationary point coordinates in both in coded and natural variables in the RSM output. In this case. The maximum yields occurs at 86.6 minutes and 176.6 degrees Celsius. Consider the figure below.

knitr::include_graphics("maxyield.png")

This experimental design has led us to the optimum condition for the maximum yield. The last question we should be asking ourselves is, “how much is the maximum predicted yield?” We can use the predicted function calculated. First, we are going to create a data frame called max with the coded coordinates of the states of stationary point and then use the predict function to calculate the response of the model for the max data frame. Let us run it. Consider the codes below

predictig the Yield at the stationary point

max <- data.frame(x1 = 0.361, x2 = 0.257)
predict(model_Y, max)
       1 
80.18606 

From the results above, the predicted yield is 80.2%. Now, one interesting follow up is to run the experiment using the predicted optimized condition to verify the model’s prediction.