In-class Examples with R Code (Response Surface Methodology (RSM))

Stat 579

Michigan State University

Set up Rstudio

Setting up RMarkdown when opening it enables you to create dynamic, reproducible, and visually appealing reports, presentations, and documents, that can help you communicate your data analysis and research findings more effectively.

Introduction to R

R is an open-source programming language and software environment for statistical computing and graphics. It is widely used by data analysts and statisticians for data analysis, visualization, and modeling. R provides a comprehensive set of tools for statistical analysis, including linear and nonlinear modeling, classical statistical tests, time-series analysis, and data visualization. RStudio is an integrated development environment (IDE) for R that provides a user-friendly interface for working with R. It includes a code editor, debugging tools, and data visualization tools that make it easier to work with R code. RStudio also supports the creation of interactive web applications using the Shiny package, making it a powerful tool for data visualization and analysis.

One of the key benefits of R and RStudio is their flexibility and extensibility. R has a large community of developers who contribute packages that extend its functionality, making it possible to perform a wide range of analyses with just a few lines of code. RStudio provides an interface for managing these packages and installing new ones. R and RStudio can be used for a wide range of data analysis tasks, including data cleaning and preparation, statistical modeling, and data visualization. They are particularly useful for working with large datasets, as they can handle data from a variety of sources and perform complex analyses with ease.

In summary, R and RStudio are powerful tools for statistical computing and data analysis. Their flexibility, extensibility, and user-friendly interface make them popular among data analysts and statisticians, and their ability to handle large datasets and perform complex analyses make them essential tools for many data-driven industries.

Introduction to Response Surface Methodology

Response Surface Methodology (RSM) is a statistical technique used to model and optimize the relationship between a response variable and one or more independent variables. RSM involves designing experiments, fitting mathematical models to the experimental data, and using the models to predict the optimal conditions for a process. RSM is particularly useful for optimizing processes that involve multiple variables, as it allows researchers to explore the relationship between the variables and determine the optimal settings for each.

RSM has been used in a wide range of fields, including chemistry, engineering, agriculture, and social sciences. In chemistry, RSM has been used to optimize the conditions for chemical reactions, such as temperature, pressure, and concentrations of reactants. In engineering, RSM has been used to optimize the parameters of manufacturing processes, such as cutting speed and feed rate. In agriculture, RSM has been used to optimize the conditions for crop growth, such as fertilizer application and irrigation.

The main goal of RSM is to find the optimal values of the independent variables that result in the maximum or minimum value of the response variable. RSM accomplishes this by constructing a response surface, which is a mathematical model that describes the relationship between the independent variables and the response variable. The response surface can be visualized as a contour plot or a three-dimensional surface, which allows researchers to identify the optimal values of the independent variables that result in the desired response.

Overall, RSM is a powerful statistical technique that is used to optimize processes and improve the efficiency of experiments. Its ability to model the relationship between multiple variables makes it a valuable tool for researchers and practitioners in a wide range of fields.

2.1.1 Read data

Read data, skip extra header lines, recode variables.

fn.data <- "http://statacumen.com/teach/RSM/data/RSM_TAB_02-08.txt"
df.2.8 <- read.table(fn.data, header=TRUE, skip=1)

View the structure of the data set

str(df.2.8)

'data.frame':   12 obs. of  3 variables:
 $ x1: int  -1 1 -1 1 -9 9 0 0 0 0 ...
 $ x2: int  -1 -1 1 1 0 0 -9 9 0 0 ...
 $ y : int  43 78 69 73 48 76 65 74 76 79 ...

View the first few observations

head(df.2.8,5)

Replace coded values as “9” with sqrt(2)

df.2.8[,c("x1","x2")] <- replace(df.2.8[,c("x1","x2")], (df.2.8[,c("x1","x2")] == 9)
, sqrt(2))
df.2.8[,c("x1","x2")] <- replace(df.2.8[,c("x1","x2")], (df.2.8[,c("x1","x2")] == -9)
, -sqrt(2))
head(df.2.8,5)

         x1 x2  y
1 -1.000000 -1 43
2  1.000000 -1 78
3 -1.000000  1 69
4  1.000000  1 73
5 -1.414214  0 48

The Scatterplot matrix below shows some relationships between y and other variables.

library(ggplot2)
library(reshape)
library(GGally)
suppressMessages(suppressWarnings(library(GGally)))
p <- ggpairs(df.2.8, alpha = 0.1)
# put scatterplots on top so y axis is vertical
#p <- ggpairs(df.2.8, upper = list(continuous = "points")
# , lower = list(continuous = "cor")
# )
print(p)

# detach package after use so reshape2 works (old reshape (v.1) conflicts)
detach("package:GGally", unload=TRUE)
detach("package:reshape", unload=TRUE)

Correlation matrix indicates some (linear) correlations with y are different than zero, but if curvature exists, this summary is not very meaningful.

# correlation matrix and associated p-values testing "H0: rho == 0"
library(Hmisc)
rcorr(as.matrix(df.2.8))

     x1   x2    y
x1 1.00 0.00 0.66
x2 0.00 1.00 0.28
y  0.66 0.28 1.00

n= 12 


P
   x1     x2     y     
x1        1.0000 0.0193
x2 1.0000        0.3718
y  0.0193 0.3718

Correlation plot

library(corrplot)
library(dplyr)
library(magrittr)
df.2.8 %>% 
  dplyr::select(x1:y) %>%
  cor() %>%
  round(3) %>%
  corrplot(method = "color", addCoef.col="white", type = "upper",
           title="Correlation between response variable and Predictor variables",
           mar=c(0,0,2,0),
           tl.cex=0.5, number.cex = 0.4)

2.1.2 Fit second-order model

Because this is a special kind of model (a full second-order model), we can get the test for higher order terms and lack of it simply by using rsm().

Fit second-order linear model.

# load the rsm package
library(rsm)
# fit second-order (SO) model
# -- look up ?SO and see other options
rsm.2.8.y.SOx12 <- rsm(y ~ SO(x1, x2), data = df.2.8)
# which variables are available in the rsm object?
names(rsm.2.8.y.SOx12)

 [1] "coefficients"  "residuals"     "effects"       "rank"         
 [5] "fitted.values" "assign"        "qr"            "df.residual"  
 [9] "xlevels"       "call"          "terms"         "model"        
[13] "data"          "b"             "order"         "B"            
[17] "newlabs"

which variables are available in the summary of the rsm object?

names(summary(rsm.2.8.y.SOx12))

 [1] "call"          "terms"         "residuals"     "coefficients" 
 [5] "aliased"       "sigma"         "df"            "r.squared"    
 [9] "adj.r.squared" "fstatistic"    "cov.unscaled"  "canonical"    
[13] "lof"

show the summary

summary(rsm.2.8.y.SOx12)


Call:
rsm(formula = y ~ SO(x1, x2), data = df.2.8)

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 79.75000    1.21361 65.7133 8.352e-10 ***
x1           9.82475    0.85815 11.4488 2.665e-05 ***
x2           4.21599    0.85815  4.9129 0.0026759 ** 
x1:x2       -7.75000    1.21361 -6.3859 0.0006938 ***
x1^2        -8.87500    0.95944 -9.2502 9.017e-05 ***
x2^2        -5.12500    0.95944 -5.3417 0.0017585 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:   0.98, Adjusted R-squared:  0.9634 
F-statistic: 58.85 on 5 and 6 DF,  p-value: 5.119e-05

Analysis of Variance Table

Response: y
            Df Sum Sq Mean Sq F value    Pr(>F)
FO(x1, x2)   2 914.40  457.20 77.6054 5.155e-05
TWI(x1, x2)  1 240.25  240.25 40.7801 0.0006938
PQ(x1, x2)   2 578.92  289.46 49.1328 0.0001906
Residuals    6  35.35    5.89                  
Lack of fit  3   8.60    2.87  0.3214 0.8119178
Pure error   3  26.75    8.92                  

Stationary point of response surface:
         x1          x2 
 0.55819271 -0.01073203 

Eigenanalysis:
eigen() decomposition
$values
[1]  -2.695206 -11.304794

$vectors
         [,1]       [,2]
x1  0.5312434 -0.8472193
x2 -0.8472193 -0.5312434

include externally Studentized residuals in the rsm object for plotting later

rsm.2.8.y.SOx12$studres <- rstudent(rsm.2.8.y.SOx12)
rsm.2.8.y.SOx12$studres

          1           2           3           4           5           6 
-0.61072616 -0.71491975  0.71491975  0.61072616 -0.06495026  0.06495026 
          7           8           9          10          11          12 
 0.98067965 -0.98067965 -2.37660148 -0.32922247  1.81973167  0.55959278

2.1.3 Model Fitting

The following illustrates ftting several model types using rsm().

Fit a frst-order model.

# fit the first-order model
rsm.2.8.y.FOx12 <- rsm(y ~ FO(x1, x2), data = df.2.8)
# externally Studentized residuals
rsm.2.8.y.FOx12$studres <- rstudent(rsm.2.8.y.FOx12)
summary(rsm.2.8.y.FOx12)


Call:
rsm(formula = y ~ FO(x1, x2), data = df.2.8)

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  70.4167     2.8129 25.0338 1.244e-09 ***
x1            9.8247     3.4450  2.8519   0.01903 *  
x2            4.2160     3.4450  1.2238   0.25210    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.5169,    Adjusted R-squared:  0.4096 
F-statistic: 4.815 on 2 and 9 DF,  p-value: 0.03785

Analysis of Variance Table

Response: y
            Df Sum Sq Mean Sq F value  Pr(>F)
FO(x1, x2)   2 914.40  457.20  4.8154 0.03785
Residuals    9 854.51   94.95                
Lack of fit  6 827.76  137.96 15.4722 0.02333
Pure error   3  26.75    8.92                

Direction of steepest ascent (at radius 1):
       x1        x2 
0.9189626 0.3943447 

Corresponding increment in original units:
       x1        x2 
0.9189626 0.3943447

Fit a frst-order with two-way interaction model

# fit the first-order with two-way interaction model.
rsm.2.8.y.TWIx12 <- rsm(y ~ FO(x1, x2) + TWI(x1, x2), data = df.2.8)
# externally Studentized residuals
rsm.2.8.y.TWIx12$studres <- rstudent(rsm.2.8.y.TWIx12)
summary(rsm.2.8.y.TWIx12)


Call:
rsm(formula = y ~ FO(x1, x2) + TWI(x1, x2), data = df.2.8)

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  70.4167     2.5295 27.8377 2.993e-09 ***
x1            9.8247     3.0980  3.1713   0.01317 *  
x2            4.2160     3.0980  1.3609   0.21066    
x1:x2        -7.7500     4.3813 -1.7689   0.11488    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.6527,    Adjusted R-squared:  0.5225 
F-statistic: 5.013 on 3 and 8 DF,  p-value: 0.03039

Analysis of Variance Table

Response: y
            Df Sum Sq Mean Sq F value  Pr(>F)
FO(x1, x2)   2 914.40  457.20  5.9544 0.02607
TWI(x1, x2)  1 240.25  240.25  3.1289 0.11488
Residuals    8 614.26   76.78                
Lack of fit  5 587.51  117.50 13.1779 0.02966
Pure error   3  26.75    8.92                

Stationary point of response surface:
       x1        x2 
0.5439987 1.2677094 

Eigenanalysis:
eigen() decomposition
$values
[1]  3.875 -3.875

$vectors
         [,1]       [,2]
x1 -0.7071068 -0.7071068
x2  0.7071068 -0.7071068

Fit a second-order without interactions model.

# Fit the second-order without interactions model
rsm.2.8.y.PQx12 <- rsm(y ~ FO(x1, x2) + PQ(x1, x2), data = df.2.8)
# externally Studentized residuals
rsm.2.8.y.PQx12$studres <- rstudent(rsm.2.8.y.PQx12)
summary(rsm.2.8.y.PQx12)


Call:
rsm(formula = y ~ FO(x1, x2) + PQ(x1, x2), data = df.2.8)

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  79.7500     3.1373 25.4198 3.724e-08 ***
x1            9.8247     2.2184  4.4287  0.003049 ** 
x2            4.2160     2.2184  1.9004  0.099141 .  
x1^2         -8.8750     2.4803 -3.5782  0.008997 ** 
x2^2         -5.1250     2.4803 -2.0663  0.077639 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.8442,    Adjusted R-squared:  0.7552 
F-statistic: 9.482 on 4 and 7 DF,  p-value: 0.005903

Analysis of Variance Table

Response: y
            Df Sum Sq Mean Sq F value   Pr(>F)
FO(x1, x2)   2 914.40  457.20  11.613 0.005978
PQ(x1, x2)   2 578.92  289.46   7.352 0.019052
Residuals    7 275.60   39.37                 
Lack of fit  4 248.85   62.21   6.977 0.071196
Pure error   3  26.75    8.92                 

Stationary point of response surface:
       x1        x2 
0.5535069 0.4113161 

Eigenanalysis:
eigen() decomposition
$values
[1] -5.125 -8.875

$vectors
   [,1] [,2]
x1    0   -1
x2   -1    0

2.1.4 Model comparisons

Model comparison (for multi-parameter tests).

# compare the reduced first-order model to the full second-order model
anova(rsm.2.8.y.FOx12, rsm.2.8.y.SOx12)

Analysis of Variance Table

Model 1: y ~ FO(x1, x2)
Model 2: y ~ FO(x1, x2) + TWI(x1, x2) + PQ(x1, x2)
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1      9 854.51                                  
2      6  35.35  3    819.17 46.349 0.0001524 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Confidence intervals and prediction intervals.

## conf int for parameters
confint(rsm.2.8.y.SOx12)

                    2.5 %    97.5 %
(Intercept)     76.780415 82.719585
FO(x1, x2)x1     7.724934 11.924561
FO(x1, x2)x2     2.116176  6.315804
TWI(x1, x2)    -10.719585 -4.780415
PQ(x1, x2)x1^2 -11.222663 -6.527337
PQ(x1, x2)x2^2  -7.472663 -2.777337

# conf int for regression line
predict(rsm.2.8.y.SOx12, df.2.8[1:dim(df.2.8)[1],], interval = "confidence")

        fit      lwr      upr
1  43.95926 39.26394 48.65459
2  79.10876 74.41343 83.80408
3  67.89124 63.19592 72.58657
4  72.04074 67.34541 76.73606
5  48.10571 43.41038 52.80104
6  75.89429 71.19896 80.58962
7  63.53769 58.84236 68.23302
8  75.46231 70.76698 80.15764
9  79.75000 76.78041 82.71959
10 79.75000 76.78041 82.71959
11 79.75000 76.78041 82.71959
12 79.75000 76.78041 82.71959

# pred int for new observations
predict(rsm.2.8.y.SOx12, df.2.8[1:dim(df.2.8)[1],], interval = "prediction")

        fit      lwr      upr
1  43.95926 36.38828 51.53025
2  79.10876 71.53777 86.67974
3  67.89124 60.32026 75.46223
4  72.04074 64.46975 79.61172
5  48.10571 40.53472 55.67670
6  75.89429 68.32330 83.46528
7  63.53769 55.96670 71.10868
8  75.46231 67.89132 83.03330
9  79.75000 73.10981 86.39019
10 79.75000 73.10981 86.39019
11 79.75000 73.10981 86.39019
12 79.75000 73.10981 86.39019

2.1.5 Lack-of-fit test

The lack-of-fit (LOF) test is equivalent to a comparison between two models. First, we define the full model by setting up a categorical group variable that will take unique values for each distinct pair of (x1, x2) values. This group variable fits a model that is equivalent to a one-way ANOVA. The SSE for the full model is 26.75 with 3 df. This is the pure error SS and df. (Try this for yourself.) Second, we define the reduced model (reduced compared to the one-way ANOVA above) as the regression model fit (taking x1 and x2 as continuous variables). The SSE for the reduced model is 35.35 with 6 df. This is the residual SS. The LOF SS is the difference of SSE between the reduced and the full model: 35:35 􀀀 26:75 = 8:6 with 6 􀀀 3 = 3 df. The F-test is then the LOF SSE and df vs the full SSE and df, F = (8:6=3)=(26:75=3) = 0:32, where there are 3 df and 3 df in the numerator and denominator

summary(rsm.2.8.y.SOx12)$lof

Analysis of Variance Table

Response: y
            Df Sum Sq Mean Sq F value    Pr(>F)    
FO(x1, x2)   2 914.40  457.20 77.6054 5.155e-05 ***
TWI(x1, x2)  1 240.25  240.25 40.7801 0.0006938 ***
PQ(x1, x2)   2 578.92  289.46 49.1328 0.0001906 ***
Residuals    6  35.35    5.89                      
Lack of fit  3   8.60    2.87  0.3214 0.8119178    
Pure error   3  26.75    8.92                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.1.6 Diagnostics and plots of estimated response surfaces

Plot the residuals, and comment on model adequacy.

# plot diagnistics
par(mfrow=c(2,4))
plot(df.2.8$x1, rsm.2.8.y.SOx12$studres, main="Residuals vs x1")
# horizontal line at zero
abline(h = 0, col = "gray75")
plot(df.2.8$x2, rsm.2.8.y.SOx12$studres, main="Residuals vs x2")
# horizontal line at zero
abline(h = 0, col = "gray75")
# residuals vs order of data
plot(rsm.2.8.y.SOx12$studres, main="Residuals vs Order of data")
# horizontal line at zero
abline(h = 0, col = "gray75")
plot(rsm.2.8.y.SOx12, which = c(1,4,6))
# Normality of Residuals
library(car)
qqPlot(rsm.2.8.y.SOx12$studres, las = 1, id = 3, main="QQ Plot")

[1]  9 11

# first-order model
par(mfrow=c(2,2))

contour(rsm.2.8.y.FOx12, ~ x1 + x2, image = TRUE, main="first-order model")

persp(rsm.2.8.y.FOx12, x2 ~ x1, zlab = "y", main="first-order model")

# second-order model
contour(rsm.2.8.y.SOx12, ~ x1 + x2, image = TRUE, main="second-order model")

persp(rsm.2.8.y.SOx12, x2 ~ x1, zlab = "y", main="second-order model")

Chapter 3

Two-Level Factorial Designs

3.1 Example 3.1, Table 3.5, p. 90

Read data and convert to long format.

#### 3.1
fn.data <- "http://statacumen.com/teach/RSM/data/RSM_EX_03-01.txt"
df.3.1 <- read.table(fn.data, header=TRUE)

Check the structure of the data set

str(df.3.1)

'data.frame':   8 obs. of  5 variables:
 $ a : int  -1 1 -1 1 -1 1 -1 1
 $ b : int  -1 -1 1 1 -1 -1 1 1
 $ c : int  -1 -1 -1 -1 1 1 1 1
 $ y1: int  247 470 429 435 837 551 775 660
 $ y2: int  400 446 405 445 850 670 865 530

View the first few observations

head(df.3.1,5)

   a  b  c  y1  y2
1 -1 -1 -1 247 400
2  1 -1 -1 470 446
3 -1  1 -1 429 405
4  1  1 -1 435 445
5 -1 -1  1 837 850

reshape data into long format

library(reshape2)
df.3.1.long <- melt(df.3.1, id.vars = c("a", "b", "c")
, variable.name = "rep", value.name = "y")
head(df.3.1.long,10)

    a  b  c rep   y
1  -1 -1 -1  y1 247
2   1 -1 -1  y1 470
3  -1  1 -1  y1 429
4   1  1 -1  y1 435
5  -1 -1  1  y1 837
6   1 -1  1  y1 551
7  -1  1  1  y1 775
8   1  1  1  y1 660
9  -1 -1 -1  y2 400
10  1 -1 -1  y2 446

Fit second-order linear model.

library(rsm)
rsm.3.1.y.SOabc <- rsm(y ~ SO(a, b, c), data = df.3.1.long)
rsm.3.1.y.SOabc


Call:
rsm(formula = y ~ SO(a, b, c), data = df.3.1.long)

Coefficients:
    (Intercept)     FO(a, b, c)a     FO(a, b, c)b     FO(a, b, c)c  
        563.438          -37.562            4.563          153.812  
TWI(a, b, c)a:b  TWI(a, b, c)a:c  TWI(a, b, c)b:c   PQ(a, b, c)a^2  
        -12.937          -76.938          -14.312               NA  
 PQ(a, b, c)b^2   PQ(a, b, c)c^2  
             NA               NA

externally Studentized residuals

rsm.3.1.y.SOabc$studres <- rstudent(rsm.3.1.y.SOabc)
summary(rsm.3.1.y.SOabc)


Call:
rsm(formula = y ~ SO(a, b, c), data = df.3.1.long)

Residuals:
    Min      1Q  Median      3Q     Max 
-91.438 -27.469   5.688  27.719  79.937 

Coefficients: (3 not defined because of singularities)
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      563.438     15.716  35.852 5.06e-11 ***
FO(a, b, c)a     -37.562     15.716  -2.390 0.040549 *  
FO(a, b, c)b       4.563     15.716   0.290 0.778154    
FO(a, b, c)c     153.812     15.716   9.787 4.28e-06 ***
TWI(a, b, c)a:b  -12.937     15.716  -0.823 0.431654    
TWI(a, b, c)a:c  -76.938     15.716  -4.896 0.000853 ***
TWI(a, b, c)b:c  -14.312     15.716  -0.911 0.386188    
PQ(a, b, c)a^2        NA         NA      NA       NA    
PQ(a, b, c)b^2        NA         NA      NA       NA    
PQ(a, b, c)c^2        NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 62.86 on 9 degrees of freedom
Multiple R-squared:  0.9339,    Adjusted R-squared:  0.8898 
F-statistic: 21.18 on 6 and 9 DF,  p-value: 7.875e-05

Note: The quadratic terms can’t be estimated because there are only two levels for each predictor variable.

Fit main-effect with three-way interaction linear model.

lm.3.1.y.3WIabc <- lm(y ~ (a + b + c)^3, data = df.3.1.long)
# externally Studentized residuals
lm.3.1.y.3WIabc$studres <- rstudent(lm.3.1.y.3WIabc)
summary(lm.3.1.y.3WIabc)


Call:
lm(formula = y ~ (a + b + c)^3, data = df.3.1.long)

Residuals:
   Min     1Q Median     3Q    Max 
-76.50 -20.25   0.00  20.25  76.50 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  563.438     15.810  35.637 4.21e-10 ***
a            -37.562     15.810  -2.376  0.04484 *  
b              4.563     15.810   0.289  0.78024    
c            153.812     15.810   9.729 1.04e-05 ***
a:b          -12.937     15.810  -0.818  0.43688    
a:c          -76.938     15.810  -4.866  0.00125 ** 
b:c          -14.312     15.810  -0.905  0.39177    
a:b:c         14.937     15.810   0.945  0.37242    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 63.24 on 8 degrees of freedom
Multiple R-squared:  0.9405,    Adjusted R-squared:  0.8884 
F-statistic: 18.06 on 7 and 8 DF,  p-value: 0.0002605

Here are the residual diagnostic plots, but we’ll skip over them since our interest is in the interaction plots below.

# plot diagnistics
par(mfrow=c(2,4))
plot(df.3.1.long$a, lm.3.1.y.3WIabc$studres, main="Residuals vs a")
# horizontal line at zero
abline(h = 0, col = "gray75")
plot(df.3.1.long$b, lm.3.1.y.3WIabc$studres, main="Residuals vs b")
# horizontal line at zero
abline(h = 0, col = "gray75")
plot(df.3.1.long$c, lm.3.1.y.3WIabc$studres, main="Residuals vs c")
# horizontal line at zero
abline(h = 0, col = "gray75")
# residuals vs order of data
plot(lm.3.1.y.3WIabc$studres, main="Residuals vs Order of data")
# horizontal line at zero
abline(h = 0, col = "gray75")
plot(lm.3.1.y.3WIabc, which = c(4,6))
# Normality of Residuals
library(car)
qqPlot(lm.3.1.y.3WIabc$studres, las = 1, id = 3, main="QQ Plot")

[1] 1 9

Cooks Distance

cooks.distance(lm.3.1.y.3WIabc)

          1           2           3           4           5           6 
0.365817068 0.009001266 0.009001266 0.001562720 0.002640996 0.221296745 
          7           8           9          10          11          12 
0.126580300 0.264099639 0.365817068 0.009001266 0.009001266 0.001562720 
         13          14          15          16 
0.002640996 0.221296745 0.126580300 0.264099639

Interaction plots, main effects vs main effects.

# interaction plot
par(mfcol=c(2,3))
interaction.plot(df.3.1.long$a, df.3.1.long$b, df.3.1.long$y)
interaction.plot(df.3.1.long$b, df.3.1.long$a, df.3.1.long$y)
interaction.plot(df.3.1.long$a, df.3.1.long$c, df.3.1.long$y)
interaction.plot(df.3.1.long$c, df.3.1.long$a, df.3.1.long$y)
interaction.plot(df.3.1.long$b, df.3.1.long$c, df.3.1.long$y)
interaction.plot(df.3.1.long$c, df.3.1.long$b, df.3.1.long$y)

[1] 520

Interaction plots, main effects vs two-way interactions.

# interaction plot
par(mfcol=c(2,3))
interaction.plot(df.3.1.long$a, df.3.1.long$b * df.3.1.long$c, df.3.1.long$y)
interaction.plot(df.3.1.long$b * df.3.1.long$c, df.3.1.long$a, df.3.1.long$y)
interaction.plot(df.3.1.long$b, df.3.1.long$a * df.3.1.long$c, df.3.1.long$y)
interaction.plot(df.3.1.long$a * df.3.1.long$c, df.3.1.long$b, df.3.1.long$y)
interaction.plot(df.3.1.long$c, df.3.1.long$a * df.3.1.long$b, df.3.1.long$y)
interaction.plot(df.3.1.long$a * df.3.1.long$b, df.3.1.long$c, df.3.1.long$y)

[1] 540

Interaction plots, main effects vs main effects in ggplot().

# Interaction plots, ggplot
library(plyr)
# Calculate the cell means for each (a, b) combination
# create factor version for ggplot categories
df.3.1.factor <- df.3.1.long
df.3.1.factor$a <- factor(df.3.1.factor$a)
df.3.1.factor$b <- factor(df.3.1.factor$b)
df.3.1.factor$c <- factor(df.3.1.factor$c)

#mean(df.3.1.factor[, "y"])
df.3.1.factor.mean <- ddply(df.3.1.factor, .(), summarise, m = mean(y))
#df.3.1.factor.mean
df.3.1.factor.mean.a <- ddply(df.3.1.factor, .(a), summarise, m = mean(y))
#df.3.1.factor.mean.a
df.3.1.factor.mean.b <- ddply(df.3.1.factor, .(b), summarise, m = mean(y))
#df.3.1.factor.mean.b
df.3.1.factor.mean.c <- ddply(df.3.1.factor, .(c), summarise, m = mean(y))
#df.3.1.factor.mean.c
df.3.1.factor.mean.ab <- ddply(df.3.1.factor, .(a,b), summarise, m = mean(y))
#df.3.1.factor.mean.ab
df.3.1.factor.mean.ac <- ddply(df.3.1.factor, .(a,c), summarise, m = mean(y))
#df.3.1.factor.mean.ac
df.3.1.factor.mean.bc <- ddply(df.3.1.factor, .(b,c), summarise, m = mean(y))
#df.3.1.factor.mean.bc
df.3.1.factor.mean.abc <- ddply(df.3.1.factor, .(a,b,c), summarise, m = mean(y))
#df.3.1.factor.mean.abc

library(ggplot2)
## (a, b)
p <- ggplot(df.3.1.factor, aes(x = a, y = y, colour = b, shape = b))
p <- p + geom_hline(aes(yintercept = 0), colour = "black"
, linetype = "solid", size = 0.2, alpha = 0.3)
p <- p + geom_boxplot(alpha = 0.25, outlier.size=0.1)
p <- p + geom_point(alpha = 0.5, position=position_dodge(width=0.75))
p <- p + geom_point(data = df.3.1.factor.mean.ab, aes(y = m), size = 4)

p <- p + geom_line(data = df.3.1.factor.mean.ab, aes(y = m, group = b), size = 1.5)
p <- p + labs(title = "Interaction plot, b by a")
print(p)

## ymax not defined: adjusting position using y instead
p <- ggplot(df.3.1.factor, aes(x = b, y = y, colour = a, shape = a))
p <- p + geom_hline(aes(yintercept = 0), colour = "black"
, linetype = "solid", size = 0.2, alpha = 0.3)
p <- p + geom_boxplot(alpha = 0.25, outlier.size=0.1)
p <- p + geom_point(alpha = 0.5, position=position_dodge(width=0.75))
p <- p + geom_point(data = df.3.1.factor.mean.ab, aes(y = m), size = 4)
p <- p + geom_line(data = df.3.1.factor.mean.ab, aes(y = m, group = a), size = 1.5)
p <- p + labs(title = "Interaction plot, a by b")
print(p)

## ymax not defined: adjusting position using y instead
## (a, c)
p <- ggplot(df.3.1.factor, aes(x = a, y = y, colour = c, shape = c))
p <- p + geom_hline(aes(yintercept = 0), colour = "black"
, linetype = "solid", size = 0.2, alpha = 0.3)
p <- p + geom_boxplot(alpha = 0.25, outlier.size=0.1)
p <- p + geom_point(alpha = 0.5, position=position_dodge(width=0.75))
p <- p + geom_point(data = df.3.1.factor.mean.ac, aes(y = m), size = 4)
p <- p + geom_line(data = df.3.1.factor.mean.ac, aes(y = m, group = c), size = 1.5)
p <- p + labs(title = "Interaction plot, c by a")
print(p)

## ymax not defined: adjusting position using y instead
p <- ggplot(df.3.1.factor, aes(x = c, y = y, colour = a, shape = a))
p <- p + geom_hline(aes(yintercept = 0), colour = "black"
, linetype = "solid", size = 0.2, alpha = 0.3)
p <- p + geom_boxplot(alpha = 0.25, outlier.size=0.1)
p <- p + geom_point(alpha = 0.5, position=position_dodge(width=0.75))
p <- p + geom_point(data = df.3.1.factor.mean.ac, aes(y = m), size = 4)
p <- p + geom_line(data = df.3.1.factor.mean.ac, aes(y = m, group = a), size = 1.5)
p <- p + labs(title = "Interaction plot, a by c")
print(p)

## ymax not defined: adjusting position using y instead
## (b, c)
p <- ggplot(df.3.1.factor, aes(x = b, y = y, colour = c, shape = c))
p <- p + geom_hline(aes(yintercept = 0), colour = "black"
, linetype = "solid", size = 0.2, alpha = 0.3)
p <- p + geom_boxplot(alpha = 0.25, outlier.size=0.1)
p <- p + geom_point(alpha = 0.5, position=position_dodge(width=0.75))
p <- p + geom_point(data = df.3.1.factor.mean.bc, aes(y = m), size = 4)
p <- p + geom_line(data = df.3.1.factor.mean.bc, aes(y = m, group = c), size = 1.5)
p <- p + labs(title = "Interaction plot, c by b")
print(p)

## ymax not defined: adjusting position using y instead
p <- ggplot(df.3.1.factor, aes(x = c, y = y, colour = b, shape = b))
p <- p + geom_hline(aes(yintercept = 0), colour = "black"
, linetype = "solid", size = 0.2, alpha = 0.3)
p <- p + geom_boxplot(alpha = 0.25, outlier.size=0.1)
p <- p + geom_point(alpha = 0.5, position=position_dodge(width=0.75))
p <- p + geom_point(data = df.3.1.factor.mean.bc, aes(y = m), size = 4)
p <- p + geom_line(data = df.3.1.factor.mean.bc, aes(y = m, group = b), size = 1.5)
p <- p + labs(title = "Interaction plot, b by c")
print(p)

3.2 Example 3.2, Table 3.7, p. 97

Here’s a quick way to create a design matrix for a factorial design.

design <- expand.grid("a" = c(-1, 1)
, "b" = c(-1, 1)
, "c" = c(-1, 1)
, "d" = c(-1, 1)
, "y" = NA)
design

    a  b  c  d  y
1  -1 -1 -1 -1 NA
2   1 -1 -1 -1 NA
3  -1  1 -1 -1 NA
4   1  1 -1 -1 NA
5  -1 -1  1 -1 NA
6   1 -1  1 -1 NA
7  -1  1  1 -1 NA
8   1  1  1 -1 NA
9  -1 -1 -1  1 NA
10  1 -1 -1  1 NA
11 -1  1 -1  1 NA
12  1  1 -1  1 NA
13 -1 -1  1  1 NA
14  1 -1  1  1 NA
15 -1  1  1  1 NA
16  1  1  1  1 NA

Read data.

3.2

fn.data <- "http://statacumen.com/teach/RSM/data/RSM_EX_03-02.txt"
df.3.2 <- read.table(fn.data, header=TRUE)

View the structure of your data set

str(df.3.2)

'data.frame':   16 obs. of  5 variables:
 $ a: int  -1 1 -1 1 -1 1 -1 1 -1 1 ...
 $ b: int  -1 -1 1 1 -1 -1 1 1 -1 -1 ...
 $ c: int  -1 -1 -1 -1 1 1 1 1 -1 -1 ...
 $ d: int  -1 -1 -1 -1 -1 -1 -1 -1 1 1 ...
 $ y: int  45 71 48 65 68 60 80 65 43 100 ...

View the first few observations

head(df.3.2,5)

   a  b  c  d  y
1 -1 -1 -1 -1 45
2  1 -1 -1 -1 71
3 -1  1 -1 -1 48
4  1  1 -1 -1 65
5 -1 -1  1 -1 68

Fit first-order with four-way interaction linear model.

lm.3.2.y.4WIabcd <- lm(y ~ (a + b + c + d)^4, data = df.3.2)
## externally Studentized residuals
lm.3.2.y.4WIabcd$studres <- rstudent(lm.3.2.y.4WIabcd)
summary(lm.3.2.y.4WIabcd)


Call:
lm(formula = y ~ (a + b + c + d)^4, data = df.3.2)

Residuals:
ALL 16 residuals are 0: no residual degrees of freedom!

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  70.0625        NaN     NaN      NaN
a            10.8125        NaN     NaN      NaN
b             1.5625        NaN     NaN      NaN
c             4.9375        NaN     NaN      NaN
d             7.3125        NaN     NaN      NaN
a:b           0.0625        NaN     NaN      NaN
a:c          -9.0625        NaN     NaN      NaN
a:d           8.3125        NaN     NaN      NaN
b:c           1.1875        NaN     NaN      NaN
b:d          -0.1875        NaN     NaN      NaN
c:d          -0.5625        NaN     NaN      NaN
a:b:c         0.9375        NaN     NaN      NaN
a:b:d         2.0625        NaN     NaN      NaN
a:c:d        -0.8125        NaN     NaN      NaN
b:c:d        -1.3125        NaN     NaN      NaN
a:b:c:d       0.6875        NaN     NaN      NaN

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 15 and 0 DF,  p-value: NA

Use Lenth procedure and normal plots to assess effects in unreplicated design.

BsMD package has unreplicated factorial tests (Daniel plots (aka normal), and Lenth)

library(BsMD)
par(mfrow=c(1,3))
LenthPlot(lm.3.2.y.4WIabcd, alpha = 0.05, main = "Lenth Plot") # , adj = 0.2

    alpha       PSE        ME       SME 
 0.050000  2.625000  6.747777 13.698960

DanielPlot(lm.3.2.y.4WIabcd, main = "Normal Plot")

DanielPlot(lm.3.2.y.4WIabcd, half = TRUE, main = "Half-Normal Plot")

In example 3.2, all terms with B were not significant. So, project design to a 2^3 in factors A, C, and D (exploratory, not confirmatory). Now we have two replicates in each term (A;C;D), so now have error terms. So can do ANOVA, etc. Then, we can run additional confirmatory experiments. Fit first-order with three-way interaction linear model.

lm.3.2.y.4WIacd <- lm(y ~ (a + c + d)^3, data = df.3.2)

externally Studentized residuals

lm.3.2.y.4WIacd$studres <- rstudent(lm.3.2.y.4WIacd)
summary(lm.3.2.y.4WIacd)


Call:
lm(formula = y ~ (a + c + d)^3, data = df.3.2)

Residuals:
   Min     1Q Median     3Q    Max 
  -6.0   -2.5    0.0    2.5    6.0 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  70.0625     1.1842  59.164 7.40e-12 ***
a            10.8125     1.1842   9.131 1.67e-05 ***
c             4.9375     1.1842   4.169 0.003124 ** 
d             7.3125     1.1842   6.175 0.000267 ***
a:c          -9.0625     1.1842  -7.653 6.00e-05 ***
a:d           8.3125     1.1842   7.019 0.000110 ***
c:d          -0.5625     1.1842  -0.475 0.647483    
a:c:d        -0.8125     1.1842  -0.686 0.512032    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.737 on 8 degrees of freedom
Multiple R-squared:  0.9687,    Adjusted R-squared:  0.9413 
F-statistic: 35.35 on 7 and 8 DF,  p-value: 2.119e-05

3.3 Creating Fractional Factorial designs with blocks and aliasing

The rsm package can generage blocked designs.

library(rsm)
# help for cube, see examples
?cube
# create the first block
block1 <- cube( basis = ~ x1 + x2 + x3 + x4
, n0 = 0
, blockgen = ~ c(x1 * x2 * x3, x1 * x3 * x4)
, randomize = FALSE
, bid = 1)
block1

   run.order std.order x1.as.is x2.as.is x3.as.is x4.as.is
1          1         1       -1       -1       -1       -1
6          2         2        1       -1        1       -1
12         3         3        1        1       -1        1
15         4         4       -1        1        1        1

Data are stored in coded form using these coding formulas ...
x1 ~ x1.as.is
x2 ~ x2.as.is
x3 ~ x3.as.is
x4 ~ x4.as.is

populate a list of all the blocks and print them all

block <- list()
for (i.block in 1:4) {
block[[i.block]] <- cube( basis = ~ x1 + x2 + x3 + x4
, n0 = 0
, blockgen = ~ c(x1 * x2 * x3, x1 * x3 * x4)
, randomize = FALSE
, bid = i.block)
}
block

[[1]]
   run.order std.order x1.as.is x2.as.is x3.as.is x4.as.is
1          1         1       -1       -1       -1       -1
6          2         2        1       -1        1       -1
12         3         3        1        1       -1        1
15         4         4       -1        1        1        1

Data are stored in coded form using these coding formulas ...
x1 ~ x1.as.is
x2 ~ x2.as.is
x3 ~ x3.as.is
x4 ~ x4.as.is

[[2]]
   run.order std.order x1.as.is x2.as.is x3.as.is x4.as.is
3          1         1       -1        1       -1       -1
8          2         2        1        1        1       -1
10         3         3        1       -1       -1        1
13         4         4       -1       -1        1        1

Data are stored in coded form using these coding formulas ...
x1 ~ x1.as.is
x2 ~ x2.as.is
x3 ~ x3.as.is
x4 ~ x4.as.is

[[3]]
   run.order std.order x1.as.is x2.as.is x3.as.is x4.as.is
4          1         1        1        1       -1       -1
7          2         2       -1        1        1       -1
9          3         3       -1       -1       -1        1
14         4         4        1       -1        1        1

Data are stored in coded form using these coding formulas ...
x1 ~ x1.as.is
x2 ~ x2.as.is
x3 ~ x3.as.is
x4 ~ x4.as.is

[[4]]
   run.order std.order x1.as.is x2.as.is x3.as.is x4.as.is
2          1         1        1       -1       -1       -1
5          2         2       -1       -1        1       -1
11         3         3       -1        1       -1        1
16         4         4        1        1        1        1

Data are stored in coded form using these coding formulas ...
x1 ~ x1.as.is
x2 ~ x2.as.is
x3 ~ x3.as.is
x4 ~ x4.as.is

Alternatively, package FrF2 is the most comprehensive R package for creating fractional factorial 2-level designs, as well as Plackett-Burman type screening designs. Regular fractional factorials default to maximum resolution minimum aberration designs and can be customized in various ways. It can also show the alias structure for regular fractional factorials of 2-level factors.

library(FrF2)
k = 4 # number of factors, defaults to names = A, B, ...
FrF2( nruns = 2^k
, nfactors = k
, blocks = 4
, alias.block.2fis=TRUE
, randomize = FALSE
)

  run.no run.no.std.rp Blocks  A  B  C  D
1      1         1.1.1      1 -1 -1 -1 -1
2      2         4.1.2      1 -1 -1  1  1
3      3        14.1.3      1  1  1 -1  1
4      4        15.1.4      1  1  1  1 -1
  run.no run.no.std.rp Blocks  A  B  C  D
5      5         5.2.1      2 -1  1 -1 -1
6      6         8.2.2      2 -1  1  1  1
7      7        10.2.3      2  1 -1 -1  1
8      8        11.2.4      2  1 -1  1 -1
   run.no run.no.std.rp Blocks  A  B  C  D
9       9         6.3.1      3 -1  1 -1  1
10     10         7.3.2      3 -1  1  1 -1
11     11         9.3.3      3  1 -1 -1 -1
12     12        12.3.4      3  1 -1  1  1
   run.no run.no.std.rp Blocks  A  B  C  D
13     13         2.4.1      4 -1 -1 -1  1
14     14         3.4.2      4 -1 -1  1 -1
15     15        13.4.3      4  1  1 -1 -1
16     16        16.4.4      4  1  1  1  1
class=design, type= FrF2.blocked 
NOTE: columns run.no and run.no.std.rp  are annotation, 
 not part of the data frame

Chapter 4

Two-Level Fractional Factorial Designs

4.1 Generate a III design

Original design is a III fractional factorial design. Below, I specify generators I=ABD=ACE. A, B, and C in notes are x3, x4, and x5, respectively. E=AC and below x1 = x3x5, and D=AB and below x2 =x1x4.

#### Generate design
library(rsm)
# help for cube, see examples
?cube
# create the first block
block1 <- cube( basis = ~ x1 + x2 + x3 + x4 + x5
, n0 = 0
, blockgen = ~ c(x1 * x2 * x4, x1 * x3 * x5)
, randomize = FALSE
, bid = 1)
block1

   run.order std.order x1.as.is x2.as.is x3.as.is x4.as.is x5.as.is
1          1         1       -1       -1       -1       -1       -1
8          2         2        1        1        1       -1       -1
11         3         3       -1        1       -1        1       -1
14         4         4        1       -1        1        1       -1
20         5         5        1        1       -1       -1        1
21         6         6       -1       -1        1       -1        1
26         7         7        1       -1       -1        1        1
31         8         8       -1        1        1        1        1

Data are stored in coded form using these coding formulas ...
x1 ~ x1.as.is
x2 ~ x2.as.is
x3 ~ x3.as.is
x4 ~ x4.as.is
x5 ~ x5.as.is

To put this in the same order as in the notes, change the order of the basis.

# create the first block
block1 <- cube( basis = ~ x4 + x5 + x1 + x2 + x3
, n0 = 0
, blockgen = ~ c(x1 * x2 * x4, x1 * x3 * x5)
, randomize = FALSE
, bid = 1)
block1

   run.order std.order x4.as.is x5.as.is x1.as.is x2.as.is x3.as.is
1          1         1       -1       -1       -1       -1       -1
8          2         2        1        1        1       -1       -1
10         3         3        1       -1       -1        1       -1
15         4         4       -1        1        1        1       -1
19         5         5       -1        1       -1       -1        1
22         6         6        1       -1        1       -1        1
28         7         7        1        1       -1        1        1
29         8         8       -1       -1        1        1        1

Data are stored in coded form using these coding formulas ...
x4 ~ x4.as.is
x5 ~ x5.as.is
x1 ~ x1.as.is
x2 ~ x2.as.is
x3 ~ x3.as.is

4.2 Example 4.5, Table 4.15, p. 161 Original design is a III fractional factorial design.

#### 4.5a
fn.data <- "http://statacumen.com/teach/RSM/data/RSM_EX_04-05a.txt"
df.4.5a <- read.table(fn.data, header=TRUE)
str(df.4.5a)

'data.frame':   8 obs. of  4 variables:
 $ a   : int  -1 1 -1 1 -1 1 -1 1
 $ b   : int  -1 -1 1 1 -1 -1 1 1
 $ c   : int  -1 -1 -1 -1 1 1 1 1
 $ time: num  85.5 75.1 93.2 145.4 83.7 ...

head(df.4.5a,5)

   a  b  c  time
1 -1 -1 -1  85.5
2  1 -1 -1  75.1
3 -1  1 -1  93.2
4  1  1 -1 145.4
5 -1 -1  1  83.7

Fit first-order with three-way interaction linear model.

lm.4.5a.time.3WIabc <- lm(time ~ (a + b + c)^3, data = df.4.5a)
## externally Studentized residuals
#lm.4.5a.time.3WIabc$studres <- rstudent(lm.4.5a.time.3WIabc)
summary(lm.4.5a.time.3WIabc)


Call:
lm.default(formula = time ~ (a + b + c)^3, data = df.4.5a)

Residuals:
ALL 8 residuals are 0: no residual degrees of freedom!

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  99.6625        NaN     NaN      NaN
a            10.3125        NaN     NaN      NaN
b            19.1875        NaN     NaN      NaN
c            -0.1375        NaN     NaN      NaN
a:b          14.4375        NaN     NaN      NaN
a:c          -0.1375        NaN     NaN      NaN
b:c          -0.3125        NaN     NaN      NaN
a:b:c        -1.2125        NaN     NaN      NaN

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 7 and 0 DF,  p-value: NA

The Lenth plot below indicates three large eects worth investigating (a; b; ab).

# BsMD package has unreplicated factorial tests (Daniel plots (aka normal), and Lenth)
library(BsMD)
par(mfrow=c(1,3))
LenthPlot(lm.4.5a.time.3WIabc, alpha = 0.05, main = "Lenth Plot") # , adj = 0.2

   alpha      PSE       ME      SME 
0.050000 0.675000 2.540783 6.080607

## alpha PSE ME SME
## 0.050 0.675 2.541 6.081
DanielPlot(lm.4.5a.time.3WIabc, main = "Normal Plot")
DanielPlot(lm.4.5a.time.3WIabc, half = TRUE, main = "Half-Normal Plot")

4.3 Example 4.5, Table 4.16, p. 163

Full fold-over design. Note that the column for a, b, and c are 􀀀1 times the same columns in the original design.

#### 4.5b
fn.data <- "http://statacumen.com/teach/RSM/data/RSM_EX_04-05b.txt"
df.4.5b <- read.table(fn.data, header=TRUE)
str(df.4.5b)

'data.frame':   8 obs. of  4 variables:
 $ a   : int  1 -1 1 -1 1 -1 1 -1
 $ b   : int  1 1 -1 -1 1 1 -1 -1
 $ c   : int  1 1 1 1 -1 -1 -1 -1
 $ time: num  91.3 136.7 82.4 73.4 94.1 ...

View the first few observations

head(df.4.5b,5)

   a  b  c  time
1  1  1  1  91.3
2 -1  1  1 136.7
3  1 -1  1  82.4
4 -1 -1  1  73.4
5  1  1 -1  94.1

Fit first-order with three-way interaction linear model.

lm.4.5b.time.3WIabc <- lm(time ~ (a + b + c)^3, data = df.4.5b)
## externally Studentized residuals
#lm.4.5b.time.3WIabc$studres <- rstudent(lm.4.5b.time.3WIabc)
summary(lm.4.5b.time.3WIabc)


Call:
lm.default(formula = time ~ (a + b + c)^3, data = df.4.5b)

Residuals:
ALL 8 residuals are 0: no residual degrees of freedom!

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  97.6125        NaN     NaN      NaN
a            -8.8375        NaN     NaN      NaN
b            18.8625        NaN     NaN      NaN
c            -1.6625        NaN     NaN      NaN
a:b         -14.9375        NaN     NaN      NaN
a:c          -0.2625        NaN     NaN      NaN
b:c          -0.8125        NaN     NaN      NaN
a:b:c         1.3375        NaN     NaN      NaN

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 7 and 0 DF,  p-value: NA

The Lenth plot below indicates the same three large eects are worth investigating (a; b; ab), but some eects are in different directions.

## BsMD package has unreplicated factorial tests (Daniel plots (aka normal), and Lenth)
library(BsMD)
par(mfrow=c(1,3))
LenthPlot(lm.4.5b.time.3WIabc, alpha = 0.05, main = "Lenth Plot") # , adj = 0.2

   alpha      PSE       ME      SME 
 0.05000  3.22500 12.13930 29.05179

DanielPlot(lm.4.5b.time.3WIabc, main = "Normal Plot")
DanielPlot(lm.4.5b.time.3WIabc, half = TRUE, main = "Half-Normal Plot")

4.4 Combining the data

Combine the two datasets to see what the full factorial suggests.

# combine two data sets
df.4.5 <- rbind(df.4.5a, df.4.5b)
str(df.4.5)

'data.frame':   16 obs. of  4 variables:
 $ a   : int  -1 1 -1 1 -1 1 -1 1 1 -1 ...
 $ b   : int  -1 -1 1 1 -1 -1 1 1 1 1 ...
 $ c   : int  -1 -1 -1 -1 1 1 1 1 1 1 ...
 $ time: num  85.5 75.1 93.2 145.4 83.7 ...

head(df.4.5,5)

   a  b  c  time
1 -1 -1 -1  85.5
2  1 -1 -1  75.1
3 -1  1 -1  93.2
4  1  1 -1 145.4
5 -1 -1  1  83.7

Fit first-order with three-way interaction linear model.

lm.4.5.time.3WIabc <- lm(time ~ (a + b + c)^3, data = df.4.5)
## externally Studentized residuals
#lm.4.5.time.3WIabc$studres <- rstudent(lm.4.5.time.3WIabc)
summary(lm.4.5.time.3WIabc)


Call:
lm.default(formula = time ~ (a + b + c)^3, data = df.4.5)

Residuals:
   Min     1Q Median     3Q    Max 
-25.65 -10.31   0.00  10.31  25.65 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  98.6375     6.2325  15.826 2.54e-07 ***
a             0.7375     6.2325   0.118   0.9087    
b            19.0250     6.2325   3.053   0.0158 *  
c            -0.9000     6.2325  -0.144   0.8888    
a:b          -0.2500     6.2325  -0.040   0.9690    
a:c          -0.2000     6.2325  -0.032   0.9752    
b:c          -0.5625     6.2325  -0.090   0.9303    
a:b:c         0.0625     6.2325   0.010   0.9922    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 24.93 on 8 degrees of freedom
Multiple R-squared:  0.5393,    Adjusted R-squared:  0.1361 
F-statistic: 1.338 on 7 and 8 DF,  p-value: 0.344

With more evidence, only effect b is important.

# BsMD package has unreplicated factorial tests (Daniel plots (aka normal), and Lenth)
library(BsMD)
par(mfrow=c(1,3))
LenthPlot(lm.4.5.time.3WIabc, alpha = 0.05, main = "Lenth Plot") # , adj = 0.2

    alpha       PSE        ME       SME 
 0.050000  1.218750  4.587525 10.978874

DanielPlot(lm.4.5.time.3WIabc, main = "Normal Plot")
DanielPlot(lm.4.5.time.3WIabc, half = TRUE, main = "Half-Normal Plot")

4.5 Plackett-Berman design

Here are some examples of Plackett-Berman designs.

library(FrF2)
pb(nruns=8, randomize=FALSE)

   A  B  C  D  E  F  G
1  1  1  1 -1  1 -1 -1
2 -1  1  1  1 -1  1 -1
3 -1 -1  1  1  1 -1  1
4  1 -1 -1  1  1  1 -1
5 -1  1 -1 -1  1  1  1
6  1 -1  1 -1 -1  1  1
7  1  1 -1  1 -1 -1  1
8 -1 -1 -1 -1 -1 -1 -1
class=design, type= pb

pb(nruns=12, randomize=FALSE)

    A  B  C  D  E  F  G  H  J  K  L
1   1  1 -1  1  1  1 -1 -1 -1  1 -1
2  -1  1  1 -1  1  1  1 -1 -1 -1  1
3   1 -1  1  1 -1  1  1  1 -1 -1 -1
4  -1  1 -1  1  1 -1  1  1  1 -1 -1
5  -1 -1  1 -1  1  1 -1  1  1  1 -1
6  -1 -1 -1  1 -1  1  1 -1  1  1  1
7   1 -1 -1 -1  1 -1  1  1 -1  1  1
8   1  1 -1 -1 -1  1 -1  1  1 -1  1
9   1  1  1 -1 -1 -1  1 -1  1  1 -1
10 -1  1  1  1 -1 -1 -1  1 -1  1  1
11  1 -1  1  1  1 -1 -1 -1  1 -1  1
12 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
class=design, type= pb

5.1 Example 5.1, Table 5.1, p. 185

Build a first-order response function. Read data.

#### 5.1
fn.data <- "http://statacumen.com/teach/RSM/data/RSM_EX_05-01.txt"
df.5.1 <- read.table(fn.data, header=TRUE)
str(df.5.1)

'data.frame':   8 obs. of  3 variables:
 $ a: int  -1 1 -1 1 0 0 0 0
 $ b: int  -1 -1 1 1 0 0 0 0
 $ y: int  775 670 890 730 745 760 780 720

head(df.5.1,5)

   a  b   y
1 -1 -1 775
2  1 -1 670
3 -1  1 890
4  1  1 730
5  0  0 745

Fit first-order with two-way interaction linear model. This model fit is slightly different than the one in the text; the intercept differs and the significance of the factors.

library(rsm)
rsm.5.1.y.TWIab <- rsm(y ~ FO(a, b) + TWI(a, b), data = df.5.1)
# externally Studentized residuals
#rsm.5.1.y.TWIab$studres <- rstudent(rsm.5.1.y.TWIab)
summary(rsm.5.1.y.TWIab)


Call:
rsm(formula = y ~ FO(a, b) + TWI(a, b), data = df.5.1)

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 758.7500     8.6037 88.1889 9.911e-08 ***
a           -66.2500    12.1675 -5.4449  0.005525 ** 
b            43.7500    12.1675  3.5957  0.022846 *  
a:b         -13.7500    12.1675 -1.1301  0.321622    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.9164,    Adjusted R-squared:  0.8537 
F-statistic: 14.62 on 3 and 4 DF,  p-value: 0.01273

Analysis of Variance Table

Response: y
            Df  Sum Sq Mean Sq F value   Pr(>F)
FO(a, b)     2 25212.5 12606.2 21.2876 0.007376
TWI(a, b)    1   756.3   756.3  1.2770 0.321622
Residuals    4  2368.7   592.2                 
Lack of fit  1   450.0   450.0  0.7036 0.463156
Pure error   3  1918.7   639.6                 

Stationary point of response surface:
        a         b 
 3.181818 -4.818182 

Eigenanalysis:
eigen() decomposition
$values
[1]  6.875 -6.875

$vectors
        [,1]       [,2]
a -0.7071068 -0.7071068
b  0.7071068 -0.7071068

Plots indicate the path of steepest ascent is similar to that in Figure 5.3.

par(mfrow=c(1,2))
contour(rsm.5.1.y.TWIab, ~ a + b, image = TRUE)
persp(rsm.5.1.y.TWIab, b ~ a, zlab = "y")

steepest.5.1 <- steepest(rsm.5.1.y.TWIab, dist = seq(0, 7, by = 1))

Path of steepest ascent from ridge analysis:

steepest.5.1

  dist      a     b |     yhat
1    0  0.000 0.000 |  758.750
2    1 -0.805 0.593 |  844.589
3    2 -1.573 1.235 |  943.704
4    3 -2.321 1.901 | 1056.353
5    4 -3.058 2.579 | 1182.614
6    5 -3.787 3.265 | 1322.495
7    6 -4.511 3.956 | 1476.055
8    7 -5.232 4.650 | 1643.329

Redo In the text they use a first-order model (equation p. 185) for their steepest ascent calulation.

library(rsm)
rsm.5.1.y.FOab <- rsm(y ~ FO(a, b), data = df.5.1)
# externally Studentized residuals
#rsm.5.1.y.FOab$studres <- rstudent(rsm.5.1.y.FOab)
summary(rsm.5.1.y.FOab)


Call:
rsm(formula = y ~ FO(a, b), data = df.5.1)

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 758.7500     8.8388  85.843 4.066e-09 ***
a           -66.2500    12.5000  -5.300  0.003192 ** 
b            43.7500    12.5000   3.500  0.017284 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.8897,    Adjusted R-squared:  0.8456 
F-statistic: 20.17 on 2 and 5 DF,  p-value: 0.004039

Analysis of Variance Table

Response: y
            Df  Sum Sq Mean Sq F value   Pr(>F)
FO(a, b)     2 25212.5 12606.2  20.170 0.004039
Residuals    5  3125.0   625.0                 
Lack of fit  2  1206.3   603.1   0.943 0.481119
Pure error   3  1918.7   639.6                 

Direction of steepest ascent (at radius 1):
         a          b 
-0.8344646  0.5510615 

Corresponding increment in original units:
         a          b 
-0.8344646  0.5510615

Plots indicate the path of steepest ascent is similar to that in Figure 5.3.

par(mfrow=c(1,2))
contour(rsm.5.1.y.FOab, ~ a + b, image = TRUE)
persp(rsm.5.1.y.FOab, b ~ a, zlab = "y")

This result (going in units of a=1) matches Table 5.3.

summary(rsm.5.1.y.FOab)$sa

         a          b 
-0.8344646  0.5510615

steepest.5.1 <- steepest(rsm.5.1.y.FOab, dist = seq(0, 7, by = 1))

Path of steepest ascent from ridge analysis:

steepest.5.1$a[2]

[1] -0.834

steepest.5.1b <- steepest(rsm.5.1.y.FOab, dist = seq(0, 7, by = 1/abs(steepest.5.1$a[2])))

Path of steepest ascent from ridge analysis:

steepest.5.1b

      dist      a     b |     yhat
1 0.000000  0.000 0.000 |  758.750
2 1.199041 -1.001 0.661 |  853.985
3 2.398082 -2.001 1.321 |  949.110
4 3.597122 -3.002 1.982 | 1044.345
5 4.796163 -4.002 2.643 | 1139.514
6 5.995204 -5.003 3.304 | 1234.749

CHAPTER SIX

ANALYSIS OF THE SECOND ORDER RESPONSE SURFACE

6.1 Example 6.2, Table 6.4, p. 234

Read the data and code the variables. Note that the data are already coded, so I’m including the coding information.

#### 6.2
fn.data <- "http://statacumen.com/teach/RSM/data/RSM_EX_06-02.txt"
df.6.2 <- read.table(fn.data, header=TRUE)
head(df.6.2,5)

  x1 x2 x3  y
1 -1 -1 -1 57
2  1 -1 -1 40
3 -1  1 -1 19
4  1  1 -1 40
5 -1 -1  1 54

# code the variables
library(rsm)
cd.6.2 <- as.coded.data(df.6.2, x1 ~ (SodiumCitrate - 3) / 0.7
, x2 ~ (Glycerol - 8) / 3
, x3 ~ (EquilibrationTime - 16) / 6)
# the original variables are shown, but their codings are shown
str(cd.6.2)

Classes 'coded.data' and 'data.frame':  15 obs. of  4 variables:
 $ x1: int  -1 1 -1 1 -1 1 -1 1 0 -2 ...
 $ x2: int  -1 -1 1 1 -1 -1 1 1 0 0 ...
 $ x3: int  -1 -1 -1 -1 1 1 1 1 0 0 ...
 $ y : int  57 40 19 40 54 41 21 43 63 28 ...
 - attr(*, "codings")=List of 3
  ..$ x1:Class 'formula'  language x1 ~ (SodiumCitrate - 3)/0.7
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  ..$ x2:Class 'formula'  language x2 ~ (Glycerol - 8)/3
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  ..$ x3:Class 'formula'  language x3 ~ (EquilibrationTime - 16)/6
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
 - attr(*, "rsdes")=List of 2
  ..$ primary: chr [1:3] "x1" "x2" "x3"
  ..$ call   : language as.coded.data(data = df.6.2, x1 ~ (SodiumCitrate - 3)/0.7, x2 ~ (Glycerol -      8)/3, x3 ~ (EquilibrationTime - 16)/6)

cd.6.2

   SodiumCitrate Glycerol EquilibrationTime  y
1            2.3        5                10 57
2            3.7        5                10 40
3            2.3       11                10 19
4            3.7       11                10 40
5            2.3        5                22 54
6            3.7        5                22 41
7            2.3       11                22 21
8            3.7       11                22 43
9            3.0        8                16 63
10           1.6        8                16 28
11           4.4        8                16 11
12           3.0        2                16  2
13           3.0       14                16 18
14           3.0        8                 4 56
15           3.0        8                28 46

Data are stored in coded form using these coding formulas ...
x1 ~ (SodiumCitrate - 3)/0.7
x2 ~ (Glycerol - 8)/3
x3 ~ (EquilibrationTime - 16)/6

# this prints the coded values
print(cd.6.2, decode = FALSE)

   x1 x2 x3  y
1  -1 -1 -1 57
2   1 -1 -1 40
3  -1  1 -1 19
4   1  1 -1 40
5  -1 -1  1 54
6   1 -1  1 41
7  -1  1  1 21
8   1  1  1 43
9   0  0  0 63
10 -2  0  0 28
11  2  0  0 11
12  0 -2  0  2
13  0  2  0 18
14  0  0 -2 56
15  0  0  2 46

Variable codings ...
x1 ~ (SodiumCitrate - 3)/0.7
x2 ~ (Glycerol - 8)/3
x3 ~ (EquilibrationTime - 16)/6

# note that the coded values (-1, +1) are used in the modelling.

Fit second-order linear model.

library(rsm)
rsm.6.2.y.SOx1x2x3 <- rsm(y ~ SO(x1, x2, x3), data = cd.6.2)
# externally Studentized residuals
rsm.6.2.y.SOx1x2x3$studres <- rstudent(rsm.6.2.y.SOx1x2x3)
summary(rsm.6.2.y.SOx1x2x3)


Call:
rsm(formula = y ~ SO(x1, x2, x3), data = cd.6.2)

            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  66.1111    11.5215  5.7380 0.002252 **
x1           -1.3125     3.2660 -0.4019 0.704388   
x2           -2.3125     3.2660 -0.7080 0.510551   
x3           -1.0625     3.2660 -0.3253 0.758114   
x1:x2         9.1250     4.6189  1.9756 0.105171   
x1:x3         0.6250     4.6189  0.1353 0.897643   
x2:x3         0.8750     4.6189  0.1894 0.857199   
x1^2        -11.2639     3.9253 -2.8696 0.035013 * 
x2^2        -13.6389     3.9253 -3.4746 0.017761 * 
x3^2         -3.3889     3.9253 -0.8633 0.427411   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.8216,    Adjusted R-squared:  0.5004 
F-statistic: 2.558 on 9 and 5 DF,  p-value: 0.1567

Analysis of Variance Table

Response: y
                Df  Sum Sq Mean Sq F value  Pr(>F)
FO(x1, x2, x3)   3  131.19   43.73  0.2562 0.85418
TWI(x1, x2, x3)  3  675.38  225.13  1.3190 0.36605
PQ(x1, x2, x3)   3 3123.00 1041.00  6.0994 0.03998
Residuals        5  853.37  170.67                
Lack of fit      5  853.37  170.67     NaN     NaN
Pure error       0    0.00     NaN                

Stationary point of response surface:
        x1         x2         x3 
-0.1157917 -0.1294177 -0.1841474 

Stationary point in original units:
    SodiumCitrate          Glycerol EquilibrationTime 
         2.918946          7.611747         14.895115 

Eigenanalysis:
eigen() decomposition
$values
[1]  -3.327052  -7.796973 -17.167642

$vectors
         [,1]       [,2]        [,3]
x1 0.08493000  0.7870012  0.61107775
x2 0.07971567  0.6059608 -0.79149032
x3 0.99319299 -0.1159337  0.01127208

## summary statistics
summary(rsm.6.2.y.SOx1x2x3)$canonical$xs

        x1         x2         x3 
-0.1157917 -0.1294177 -0.1841474

The plot below is like that in Figure 6.11. The contour plots below indicates increasing a increases thickness a great deal, c has almost no effect on thickness, and decreasing b increases thickness very little.

par(mfrow = c(2,2))
# this is the stationary point
canonical(rsm.6.2.y.SOx1x2x3)$xs

        x1         x2         x3 
-0.1157917 -0.1294177 -0.1841474

contour(rsm.6.2.y.SOx1x2x3, ~ x1 + x2, image=TRUE
, at = canonical(rsm.6.2.y.SOx1x2x3)$xs, main = "through stationary point")

contour(rsm.6.2.y.SOx1x2x3, ~ x1 + x2, image=TRUE
, at = data.frame(x1 = 0, x2 = 0, x3 = -1/3), main = "through (0, 0, ET=14)")

contour(rsm.6.2.y.SOx1x2x3, ~ x1 + x2, image=TRUE
, at = data.frame(x1 = 0, x2 = 0, x3 = -2/3), main = "through (0, 0, ET=12)")

contour(rsm.6.2.y.SOx1x2x3, ~ x1 + x2, image=TRUE
, at = data.frame(x1 = 0, x2 = 0, x3 = -1 ), main = "through (0, 0, ET=10)")

### Some additional contour plots
# op <- par(no.readonly = TRUE)
# par(mfrow = c(4,3), oma = c(0,0,0,0), mar = c(4,4,2,0))
# contour(rsm.6.2.y.SOx1x2x3, ~ x1 + x2 + x3, image=TRUE, at = data.frame(x1 = 0, x2 = 0, x3 # contour(rsm.6.2.y.SOx1x2x3, ~ x1 + x2 + x3, image=TRUE, at = data.frame(x1 = 0, x2 = 0, x3 # contour(rsm.6.2.y.SOx1x2x3, ~ x1 + x2 + x3, image=TRUE, at = data.frame(x1 = 0, x2 = 0, x3 # contour(rsm.6.2.y.SOx1x2x3, ~ x1 + x2 + x3, image=TRUE, at = canonical(rsm.6.2.y.SOx1x2x3)$xs, # par(op)

6.2 Example 6.3, p. 239

Ridge analysis of a saddle point.

Read the data.

#### 6.3
fn.data <- "http://statacumen.com/teach/RSM/data/RSM_EX_06-03.txt"
df.6.3 <- read.table(fn.data, header=TRUE)
str(df.6.3)

'data.frame':   25 obs. of  5 variables:
 $ x1: num  -1 1 -1 1 -1 1 -1 1 -1 1 ...
 $ x2: num  -1 -1 1 1 -1 -1 1 1 -1 -1 ...
 $ x3: num  -1 -1 -1 -1 1 1 1 1 -1 -1 ...
 $ x4: num  -1 -1 -1 -1 -1 -1 -1 -1 1 1 ...
 $ y : num  58.2 23.4 21.9 21.8 14.3 6.3 4.5 21.8 46.7 53.2 ...

head(df.6.3,5)

  x1 x2 x3 x4    y
1 -1 -1 -1 -1 58.2
2  1 -1 -1 -1 23.4
3 -1  1 -1 -1 21.9
4  1  1 -1 -1 21.8
5 -1 -1  1 -1 14.3

Fit second-order linear model.

library(rsm)
rsm.6.3.y.SOx1x2x3x4 <- rsm(y ~ SO(x1, x2, x3, x4), data = df.6.3)
# externally Studentized residuals
rsm.6.3.y.SOx1x2x3x4$studres <- rstudent(rsm.6.3.y.SOx1x2x3x4)
summary(rsm.6.3.y.SOx1x2x3x4)


Call:
rsm(formula = y ~ SO(x1, x2, x3, x4), data = df.6.3)

             Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 40.198215   8.321708  4.8305 0.0006912 ***
x1          -1.511044   3.151968 -0.4794 0.6419677    
x2           1.284137   3.151968  0.4074 0.6922929    
x3          -8.738956   3.151968 -2.7725 0.0197013 *  
x4           4.954819   3.151968  1.5720 0.1470308    
x1:x2        2.193750   3.516952  0.6238 0.5467465    
x1:x3       -0.143750   3.516952 -0.0409 0.9682013    
x1:x4        1.581250   3.516952  0.4496 0.6625813    
x2:x3        8.006250   3.516952  2.2765 0.0460610 *  
x2:x4        2.806250   3.516952  0.7979 0.4434516    
x3:x4        0.293750   3.516952  0.0835 0.9350832    
x1^2        -6.332399   5.035510 -1.2575 0.2371289    
x2^2        -4.291583   5.035510 -0.8523 0.4140130    
x3^2         0.019642   5.035510  0.0039 0.9969645    
x4^2        -2.505869   5.035510 -0.4976 0.6294983    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.6622,    Adjusted R-squared:  0.1892 
F-statistic:   1.4 on 14 and 10 DF,  p-value: 0.3004

Analysis of Variance Table

Response: y
                    Df Sum Sq Mean Sq F value  Pr(>F)
FO(x1, x2, x3, x4)   4 2088.7  522.16  2.6385 0.09725
TWI(x1, x2, x3, x4)  6 1270.3  211.72  1.0698 0.43976
PQ(x1, x2, x3, x4)   4  519.8  129.95  0.6566 0.63566
Residuals           10 1979.0  197.90                
Lack of fit         10 1979.0  197.90     NaN     NaN
Pure error           0    0.0     NaN                

Stationary point of response surface:
       x1        x2        x3        x4 
0.2646871 1.0336457 0.2905784 1.6679609 

Eigenanalysis:
eigen() decomposition
$values
[1]  2.604001 -2.159312 -6.008325 -7.546573

$vectors
          [,1]       [,2]       [,3]        [,4]
x1 -0.07414003  0.2150982  0.7687739  0.59768112
x2 -0.52824021  0.1373869  0.4567510 -0.70247107
x3 -0.82641775 -0.3071076 -0.2857634  0.37557672
x4 -0.18027545  0.9168111 -0.3445351  0.09085044

# the stationary point:
summary(rsm.6.3.y.SOx1x2x3x4)$canonical$xs

       x1        x2        x3        x4 
0.2646871 1.0336457 0.2905784 1.6679609

canonical(rsm.6.3.y.SOx1x2x3x4)$xs

       x1        x2        x3        x4 
0.2646871 1.0336457 0.2905784 1.6679609

summary(rsm.6.3.y.SOx1x2x3x4)$canonical$eigen

eigen() decomposition
$values
[1]  2.604001 -2.159312 -6.008325 -7.546573

$vectors
          [,1]       [,2]       [,3]        [,4]
x1 -0.07414003  0.2150982  0.7687739  0.59768112
x2 -0.52824021  0.1373869  0.4567510 -0.70247107
x3 -0.82641775 -0.3071076 -0.2857634  0.37557672
x4 -0.18027545  0.9168111 -0.3445351  0.09085044

canonical(rsm.6.3.y.SOx1x2x3x4)$eigen

eigen() decomposition
$values
[1]  2.604001 -2.159312 -6.008325 -7.546573

$vectors
          [,1]       [,2]       [,3]        [,4]
x1 -0.07414003  0.2150982  0.7687739  0.59768112
x2 -0.52824021  0.1373869  0.4567510 -0.70247107
x3 -0.82641775 -0.3071076 -0.2857634  0.37557672
x4 -0.18027545  0.9168111 -0.3445351  0.09085044

Ch 6: The Analysis of Second-Order Response Surfaces

The stationary point is just outside region of experimentation for x4. The contour plots below go through the stationary point. The saddle is most apparent in the (x2; x3) plot.

par(mfrow = c(2,3), oma = c(0,0,0,0), mar = c(4,4,2,0))
contour(rsm.6.3.y.SOx1x2x3x4, ~ x1 + x2 + x3 + x4, image=TRUE, at = canonical(rsm.6.3.y.SOx1x2x3x4)$xs, main = "through stationary point")

Starting at the stationary point, calculate the predicted increase along the ridge of steepest ascent. Include the standard error of the prediction, as well.

#### calculate predicted increase along ridge of steepest ascent
steepest.6.3 <- steepest(rsm.6.3.y.SOx1x2x3x4, dist = seq(0, 2, by = 0.1))

Path of steepest ascent from ridge analysis:

## Path of steepest ascent from ridge analysis:
# include SE of response in the table, too
predict.6.3 <- predict(rsm.6.3.y.SOx1x2x3x4
, newdata = steepest.6.3[,c("x1","x2","x3","x4")], se.fit = TRUE)
steepest.6.3$StdError <- predict.6.3$se.fit
# steepest.6.3

#### 6.2: Example 6.3, p. 239 53
par(mfrow = c(1, 2))
# plot expected response vs radius
plot (steepest.6.3$dist, steepest.6.3$yhat, pch = "y"
, main = "Ridge plot: Estimated maximum +- SE vs radius")
points(steepest.6.3$dist, steepest.6.3$yhat, type = "l")
points(steepest.6.3$dist, steepest.6.3$yhat - predict.6.3$se.fit, type = "l", col = "red")
points(steepest.6.3$dist, steepest.6.3$yhat + predict.6.3$se.fit, type = "l", col = "red")
# plot change of factor variables vs radius
plot (steepest.6.3$dist, steepest.6.3$x1, pch = "1", col = "red"
, main = "Ridge plot: Factor values vs radius"
, ylim = c(-2, 0.25))
points(steepest.6.3$dist, steepest.6.3$x1, type = "l", col = "red")
points(steepest.6.3$dist, steepest.6.3$x2, pch = "2", col = "blue")
points(steepest.6.3$dist, steepest.6.3$x2, type = "l", col = "blue")
points(steepest.6.3$dist, steepest.6.3$x3, pch = "3", col = "green")
points(steepest.6.3$dist, steepest.6.3$x3, type = "l", col = "green")
points(steepest.6.3$dist, steepest.6.3$x4, pch = "4", col = "purple")
points(steepest.6.3$dist, steepest.6.3$x4, type = "l", col = "purple")

steepest.6.3

   dist     x1     x2     x3    x4 |   yhat  StdError
1   0.0  0.000  0.000  0.000 0.000 | 40.198  8.321708
2   0.1 -0.013  0.006 -0.087 0.047 | 41.206  8.304672
3   0.2 -0.022  0.001 -0.177 0.090 | 42.193  8.254748
4   0.3 -0.029 -0.014 -0.270 0.127 | 43.177  8.175279
5   0.4 -0.035 -0.038 -0.364 0.158 | 44.162  8.073586
6   0.5 -0.040 -0.069 -0.459 0.181 | 45.156  7.960484
7   0.6 -0.045 -0.104 -0.554 0.199 | 46.171  7.849435
8   0.7 -0.050 -0.144 -0.649 0.212 | 47.218  7.757770
9   0.8 -0.056 -0.187 -0.744 0.220 | 48.302  7.707911
10  0.9 -0.061 -0.232 -0.838 0.225 | 49.420  7.724549
11  1.0 -0.067 -0.279 -0.931 0.226 | 50.572  7.832800
12  1.1 -0.073 -0.327 -1.023 0.225 | 51.764  8.055343
13  1.2 -0.079 -0.377 -1.115 0.222 | 53.012  8.414972
14  1.3 -0.085 -0.426 -1.206 0.217 | 54.292  8.922153
15  1.4 -0.091 -0.477 -1.296 0.211 | 55.621  9.579913
16  1.5 -0.098 -0.528 -1.386 0.203 | 57.002 10.395542
17  1.6 -0.104 -0.579 -1.475 0.194 | 58.421 11.354753
18  1.7 -0.111 -0.630 -1.564 0.185 | 59.895 12.460097
19  1.8 -0.117 -0.682 -1.652 0.174 | 61.411 13.691907
20  1.9 -0.124 -0.734 -1.740 0.163 | 62.983 15.054297
21  2.0 -0.131 -0.786 -1.828 0.151 | 64.608 16.540864

6.3 Example from Sec 6.6, Table 6.8, p. 253

Define two functions to obtain maximum or target desirability functions.

## Functions for desirability
## Maximum
f.d.max <- function(y, L, T, r) {
# Computes desirability function (Derringer and Suich)
# when object is to maximize the response
# y = response
# L = unacceptability boundary
# T = target acceptability boundary T
# r = exponent
# d = desirability function
y.L <- min(T, max(L, y)) # y if L < y, otherwise L, and y if y < T, otherwise T
d <- ((y.L - L) / (T - L))^r # desirability function
return(d)
}
## Target
f.d.target <- function(y, L, T, U, r1, r2) {
# Computes desirability function (Derringer and Suich)
# when object is to hit target value
# y = response
# L = unacceptability boundary
# T = target acceptability boundary T
# U = upper unacceptability boundary
# r1 = exponent 1 for L
# r2 = exponent 2 for U
# d = desirability function
y.L <- min(T, max(L, y)) # y if L < y, otherwise L, and y if y < T, otherwise T
y.U <- max(T, min(U, y)) # y if y < U, otherwise U, and y if T < y, otherwise T
d <- (((y.L - L) / (T - L))^r1) * (((U - y.U) / (U - T))^r2) # desirability function
return(d)
}

Read the data.

#### 6.6
fn.data <- "http://statacumen.com/teach/RSM/data/RSM_EX_06-06.txt"
df.6.6 <- read.table(fn.data, header=TRUE)
str(df.6.6)

'data.frame':   13 obs. of  5 variables:
 $ x1: num  -1 1 -1 1 0 ...
 $ x2: num  -1 -1 1 1 0 0 0 0 0 0 ...
 $ y1: num  76.5 77 78 79.5 79.9 80.3 80 79.7 79.8 78.4 ...
 $ y2: int  62 60 66 59 72 69 68 70 71 68 ...
 $ y3: int  2940 3470 3680 3890 3480 3200 3410 3290 3500 3360 ...

head(df.6.6,5)

  x1 x2   y1 y2   y3
1 -1 -1 76.5 62 2940
2  1 -1 77.0 60 3470
3 -1  1 78.0 66 3680
4  1  1 79.5 59 3890
5  0  0 79.9 72 3480

Ch 6: The Analysis of Second-Order Response Surfaces

Fit second-order linear models for each response variable. Based on the canonical analysis y1 has a maximum, y2 has a maximum, and y3 has a saddle point.

# 6.6, y1
library(rsm)
rsm.6.6.y1.SOx1x2 <- rsm(y1 ~ SO(x1, x2), data = df.6.6)
## externally Studentized residuals
#rsm.6.6.y1.SOx1x2$studres <- rstudent(rsm.6.6.y1.SOx1x2)
#summary(rsm.6.6.y1.SOx1x2)
# 6.6, y2
library(rsm)
rsm.6.6.y2.SOx1x2 <- rsm(y2 ~ SO(x1, x2), data = df.6.6)
## externally Studentized residuals
#rsm.6.6.y2.SOx1x2$studres <- rstudent(rsm.6.6.y2.SOx1x2)
#summary(rsm.6.6.y2.SOx1x2)
# 6.6, y3
library(rsm)
rsm.6.6.y3.SOx1x2 <- rsm(y3 ~ SO(x1, x2), data = df.6.6)
## externally Studentized residuals
#rsm.6.6.y3.SOx1x2$studres <- rstudent(rsm.6.6.y3.SOx1x2)
#summary(rsm.6.6.y3.SOx1x2)
canonical(rsm.6.6.y1.SOx1x2)$eigen$values

[1] -0.9634986 -1.4142867

canonical(rsm.6.6.y2.SOx1x2)$eigen$values

[1]  0.000000 -6.753528

canonical(rsm.6.6.y3.SOx1x2)$eigen$values

[1]  72.31441 -55.77167

Optimize the response subject to \[y1 \geq 78:5, 62 \le y2 \le 68\space and \space y3 \le 3400.\] ##### In particular (from p. 259): yield: max y1: L = 78.5, (T = 85), r = 1 time: target y2: L = 62 , T = 65 , U = 68 , r1 = r2 = 1 temp: min y3: (T = 3300), U = 3400, r = 1

6.3.1 Method A

Perform RSA on desirability function D evaluated at the observed responses at the design points.

For the values in our experiment, calculate D.

### Create empty columns to populate with the desirability values
df.6.6$d1 <- NA #$
df.6.6$d2 <- NA #$
df.6.6$d3 <- NA #$

### 6.3: Example from Sec 6.6, Table 6.8, p. 253 57
df.6.6$D <- NA #$
# For each data value, calculate desirability
for (i.x in 1:dim(df.6.6)[1]) {
d1 <- f.d.max (df.6.6$y1[i.x]
, L = 78.5, T = 85, r = 1)
d2 <- f.d.target(df.6.6$y2[i.x]
, L = 62, T = 65, U = 68, r1 = 1, r2 = 1)
d3 <- f.d.max (-df.6.6$y3[i.x]
, L = -3400, T = -3300, r = 1)
# Combined desirability
D <- (d1 * d2 * d3)^(1/3)
df.6.6[i.x, c("d1", "d2", "d3", "D")] <- c(d1, d2, d3, D)
}
df.6.6

       x1     x2   y1 y2   y3        d1        d2  d3 D
1  -1.000 -1.000 76.5 62 2940 0.0000000 0.0000000 1.0 0
2   1.000 -1.000 77.0 60 3470 0.0000000 0.0000000 0.0 0
3  -1.000  1.000 78.0 66 3680 0.0000000 0.6666667 0.0 0
4   1.000  1.000 79.5 59 3890 0.1538462 0.0000000 0.0 0
5   0.000  0.000 79.9 72 3480 0.2153846 0.0000000 0.0 0
6   0.000  0.000 80.3 69 3200 0.2769231 0.0000000 1.0 0
7   0.000  0.000 80.0 68 3410 0.2307692 0.0000000 0.0 0
8   0.000  0.000 79.7 70 3290 0.1846154 0.0000000 1.0 0
9   0.000  0.000 79.8 71 3500 0.2000000 0.0000000 0.0 0
10 -1.414  0.000 78.4 68 3360 0.0000000 0.0000000 0.4 0
11  1.414  0.000 75.6 71 3020 0.0000000 0.0000000 1.0 0
12  0.000 -1.414 78.5 58 3630 0.0000000 0.0000000 0.0 0
13  0.000  1.414 77.0 57 3150 0.0000000 0.0000000 1.0 0

# max among these points
df.6.6[which(df.6.6$D == max(df.6.6$D)),]

       x1     x2   y1 y2   y3        d1        d2  d3 D
1  -1.000 -1.000 76.5 62 2940 0.0000000 0.0000000 1.0 0
2   1.000 -1.000 77.0 60 3470 0.0000000 0.0000000 0.0 0
3  -1.000  1.000 78.0 66 3680 0.0000000 0.6666667 0.0 0
4   1.000  1.000 79.5 59 3890 0.1538462 0.0000000 0.0 0
5   0.000  0.000 79.9 72 3480 0.2153846 0.0000000 0.0 0
6   0.000  0.000 80.3 69 3200 0.2769231 0.0000000 1.0 0
7   0.000  0.000 80.0 68 3410 0.2307692 0.0000000 0.0 0
8   0.000  0.000 79.7 70 3290 0.1846154 0.0000000 1.0 0
9   0.000  0.000 79.8 71 3500 0.2000000 0.0000000 0.0 0
10 -1.414  0.000 78.4 68 3360 0.0000000 0.0000000 0.4 0
11  1.414  0.000 75.6 71 3020 0.0000000 0.0000000 1.0 0
12  0.000 -1.414 78.5 58 3630 0.0000000 0.0000000 0.0 0
13  0.000  1.414 77.0 57 3150 0.0000000 0.0000000 1.0 0

However, no overlapping nonzero desirability values, so widening limits on y1, y2, and y3.

# Create empty columns to populate with the desirability values
df.6.6$d1 <- NA #$
df.6.6$d2 <- NA #$
df.6.6$d3 <- NA #$
df.6.6$D <- NA #$
# For each data value, calculate desirability
for (i.x in 1:dim(df.6.6)[1]) {
d1 <- f.d.max (df.6.6$y1[i.x]
, L = 70, T = 85, r = 1)
d2 <- f.d.target(df.6.6$y2[i.x]
, L = 58, T = 65, U = 72, r1 = 1, r2 = 1)
d3 <- f.d.max (-df.6.6$y3[i.x]
, L = -3800, T = -3300, r = 1)
# Combined desirability
D <- (d1 * d2 * d3)^(1/3)
df.6.6[i.x, c("d1", "d2", "d3", "D")] <- c(d1, d2, d3, D)
}
df.6.6

       x1     x2   y1 y2   y3        d1        d2   d3         D
1  -1.000 -1.000 76.5 62 2940 0.4333333 0.5714286 1.00 0.6279543
2   1.000 -1.000 77.0 60 3470 0.4666667 0.2857143 0.66 0.4447960
3  -1.000  1.000 78.0 66 3680 0.5333333 0.8571429 0.24 0.4787268
4   1.000  1.000 79.5 59 3890 0.6333333 0.1428571 0.00 0.0000000
5   0.000  0.000 79.9 72 3480 0.6600000 0.0000000 0.64 0.0000000
6   0.000  0.000 80.3 69 3200 0.6866667 0.4285714 1.00 0.6651553
7   0.000  0.000 80.0 68 3410 0.6666667 0.5714286 0.78 0.6673010
8   0.000  0.000 79.7 70 3290 0.6466667 0.2857143 1.00 0.5695574
9   0.000  0.000 79.8 71 3500 0.6533333 0.1428571 0.60 0.3825862
10 -1.414  0.000 78.4 68 3360 0.5600000 0.5714286 0.88 0.6554570
11  1.414  0.000 75.6 71 3020 0.3733333 0.1428571 1.00 0.3764144
12  0.000 -1.414 78.5 58 3630 0.5666667 0.0000000 0.34 0.0000000
13  0.000  1.414 77.0 57 3150 0.4666667 0.0000000 1.00 0.0000000

# max among these points
df.6.6[which(df.6.6$D == max(df.6.6$D)),]

  x1 x2 y1 y2   y3        d1        d2   d3        D
7  0  0 80 68 3410 0.6666667 0.5714286 0.78 0.667301

Now fit a response surface for D to predict the optimal conditions.

# D as response
library(rsm)
rsm.6.6.D.SOx1x2 <- rsm(D ~ SO(x1, x2), data = df.6.6)
# externally Studentized residuals
rsm.6.6.D.SOx1x2$studres <- rstudent(rsm.6.6.D.SOx1x2)
summary(rsm.6.6.D.SOx1x2)


Call:
rsm(formula = D ~ SO(x1, x2), data = df.6.6)

             Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.456904   0.107092  4.2665 0.003717 **
x1          -0.132076   0.084670 -1.5599 0.162750   
x2          -0.074264   0.084670 -0.8771 0.409523   
x1:x2       -0.073892   0.119733 -0.6171 0.556673   
x1^2         0.062025   0.090811  0.6830 0.516563   
x2^2        -0.196021   0.090811 -2.1586 0.067753 . 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.5676,    Adjusted R-squared:  0.2587 
F-statistic: 1.838 on 5 and 7 DF,  p-value: 0.2242

Analysis of Variance Table

Response: D
            Df  Sum Sq  Mean Sq F value Pr(>F)
FO(x1, x2)   2 0.18365 0.091823  1.6013 0.2675
TWI(x1, x2)  1 0.02184 0.021840  0.3809 0.5567
PQ(x1, x2)   2 0.32142 0.160710  2.8026 0.1276
Residuals    7 0.40140 0.057344               
Lack of fit  3 0.08679 0.028931  0.3678 0.7812
Pure error   4 0.31461 0.078653               

Stationary point of response surface:
        x1         x2 
 0.8557887 -0.3507283 

Eigenanalysis:
eigen() decomposition
$values
[1]  0.06721042 -0.20120656

$vectors
         [,1]      [,2]
x1 -0.9902933 0.1389935
x2  0.1389935 0.9902933

This model is insignificant for lack-of-it.

The contour plot for D is below.

par(mfrow = c(1,1))
contour(rsm.6.6.D.SOx1x2, ~ x1 + x2
, bounds = list(x1 = c(-2, 2), x2 = c(-2, 2))
, image=TRUE, main = "D")

Summary: D has all zeros for original values. The RSA gives a saddle point for wider bounds for y1, y2, and y3.

Method A2

A brute-force search for the optimum predicted deirability, D, over the +/-2-unit cube. gives the result below.

### x-values
D.cube <- expand.grid(seq(-2, 2, by=0.1), seq(-2, 2, by=0.1))
colnames(D.cube) <- c("x1", "x2")
# predicted D
D.cube$D <- predict(rsm.6.6.D.SOx1x2, newdata = D.cube)
# predicted optimum
D.opt <- D.cube[which(D.cube$D == max(D.cube$D)),]
D.opt

    x1  x2         D
903 -2 0.2 0.9760195

# predicted response at the optimal D
y1 <- predict(rsm.6.6.y1.SOx1x2, newdata = D.opt)
y2 <- predict(rsm.6.6.y2.SOx1x2, newdata = D.opt)
y3 <- predict(rsm.6.6.y3.SOx1x2, newdata = D.opt)
c(y1, y2, y3)

       903        903        903 
  74.83096   68.71269 3190.54579

Always check that the optimal value is contained in the specified constraints. Summary: D has all zeros for original values. The RSA gives a saddle point for wider bounds for y1, y2, and y3.

6.3.2 Method B

Perform RSA on desirability function D evaluated at the predicted responses at the design points. At the center point, these are the desirability values, and the combined desirability, D.

d1 <- f.d.max (predict(rsm.6.6.y1.SOx1x2, newdata = data.frame(x1=0, x2=0, x3=0))
, L = 78.5, T = 85, r = 1)
d2 <- f.d.target(predict(rsm.6.6.y2.SOx1x2, newdata = data.frame(x1=0, x2=0, x3=0))
, L = 62, T = 65, U = 68, r1 = 1, r2 = 1)
d3 <- f.d.max (-predict(rsm.6.6.y3.SOx1x2, newdata = data.frame(x1=0, x2=0, x3=0))
, L = -3400, T = -3300, r = 1)
# Combined desirability
D <- (d1 * d2 * d3)^(1/3)
c(d1, d2, d3, D)

[1] 0.2215315 0.0000000 0.2402477 0.0000000

For the values in our experiment, calculate D.

THIS PART TO CONTINUE!!!!!!

Chapter 7 (Very Important)

Experimental Designs for Fitting Respons Surfaces-I

#### 7.6
fn.data <- "http://statacumen.com/teach/RSM/data/RSM_EX_07-06.txt"
df.7.6 <- read.table(fn.data, header=TRUE)
df.7.6$block <- factor(df.7.6$block)
str(df.7.6)

'data.frame':   14 obs. of  4 variables:
 $ x1   : num  -1 -1 1 1 0 0 0 0 0 0 ...
 $ x2   : num  -1 1 -1 1 0 0 0 0 0 0 ...
 $ block: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 2 2 2 ...
 $ y    : num  80.5 81.5 82 83.5 83.9 84.3 84 79.7 79.8 79.5 ...

head(df.7.6,10)

   x1 x2 block    y
1  -1 -1     1 80.5
2  -1  1     1 81.5
3   1 -1     1 82.0
4   1  1     1 83.5
5   0  0     1 83.9
6   0  0     1 84.3
7   0  0     1 84.0
8   0  0     2 79.7
9   0  0     2 79.8
10  0  0     2 79.5

Fit second-order linear model, without blocks.

library(rsm)
rsm.7.6.y.SOx1x2 <- rsm(y ~ SO(x1, x2), data = df.7.6)
# externally Studentized residuals
rsm.7.6.y.SOx1x2$studres <- rstudent(rsm.7.6.y.SOx1x2)
summary(rsm.7.6.y.SOx1x2)


Call:
rsm(formula = y ~ SO(x1, x2), data = df.7.6)

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 81.86621    1.20528 67.9228 2.457e-12 ***
x1           0.93254    1.04388  0.8933    0.3978    
x2           0.57771    1.04388  0.5534    0.5951    
x1:x2        0.12500    1.47616  0.0847    0.9346    
x1^2        -1.30816    1.08667 -1.2038    0.2631    
x2^2        -0.93305    1.08667 -0.8586    0.4155    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.2826,    Adjusted R-squared:  -0.1658 
F-statistic: 0.6303 on 5 and 8 DF,  p-value: 0.683

Analysis of Variance Table

Response: y
            Df Sum Sq Mean Sq F value Pr(>F)
FO(x1, x2)   2  9.626  4.8128  0.5522 0.5962
TWI(x1, x2)  1  0.063  0.0625  0.0072 0.9346
PQ(x1, x2)   2 17.779  8.8896  1.0199 0.4031
Residuals    8 69.730  8.7162               
Lack of fit  3 40.557 13.5189  2.3170 0.1928
Pure error   5 29.173  5.8347               

Stationary point of response surface:
       x1        x2 
0.3724143 0.3345289 

Eigenanalysis:
eigen() decomposition
$values
[1] -0.922910 -1.318302

$vectors
         [,1]       [,2]
x1 -0.1601375 -0.9870947
x2 -0.9870947  0.1601375

Fit second-order linear model, with blocks.

library(rsm)
rsm.7.6.y.b.SOx1x2 <- rsm(y ~ block + SO(x1, x2), data = df.7.6)
# externally Studentized residuals
rsm.7.6.y.b.SOx1x2$studres <- rstudent(rsm.7.6.y.b.SOx1x2)
summary(rsm.7.6.y.b.SOx1x2)


Call:
rsm(formula = y ~ block + SO(x1, x2), data = df.7.6)

             Estimate Std. Error  t value  Pr(>|t|)    
(Intercept) 84.095427   0.079631 1056.067 < 2.2e-16 ***
block2      -4.457530   0.087226  -51.103 2.877e-10 ***
x1           0.932541   0.057699   16.162 8.444e-07 ***
x2           0.577712   0.057699   10.012 2.122e-05 ***
x1:x2        0.125000   0.081592    1.532    0.1694    
x1^2        -1.308555   0.060064  -21.786 1.083e-07 ***
x2^2        -0.933442   0.060064  -15.541 1.104e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.9981,    Adjusted R-squared:  0.9964 
F-statistic: 607.2 on 6 and 7 DF,  p-value: 3.811e-09

Analysis of Variance Table

Response: y
            Df Sum Sq Mean Sq   F value    Pr(>F)
block        1 69.531  69.531 2611.0950 2.879e-10
FO(x1, x2)   2  9.626   4.813  180.7341 9.450e-07
TWI(x1, x2)  1  0.063   0.063    2.3470    0.1694
PQ(x1, x2)   2 17.791   8.896  334.0539 1.135e-07
Residuals    7  0.186   0.027                    
Lack of fit  3  0.053   0.018    0.5307    0.6851
Pure error   4  0.133   0.033                    

Stationary point of response surface:
       x1        x2 
0.3722954 0.3343802 

Eigenanalysis:
eigen() decomposition
$values
[1] -0.9233027 -1.3186949

$vectors
         [,1]       [,2]
x1 -0.1601375 -0.9870947
x2 -0.9870947  0.1601375

Fit second-order linear model, with blocks (as continuous and coded).

df.7.6$block.num <- as.numeric(df.7.6$block) - 1.5
head(df.7.6,10)

   x1 x2 block    y block.num
1  -1 -1     1 80.5      -0.5
2  -1  1     1 81.5      -0.5
3   1 -1     1 82.0      -0.5
4   1  1     1 83.5      -0.5
5   0  0     1 83.9      -0.5
6   0  0     1 84.3      -0.5
7   0  0     1 84.0      -0.5
8   0  0     2 79.7       0.5
9   0  0     2 79.8       0.5
10  0  0     2 79.5       0.5

library(rsm)
rsm.7.6.y.bn.SOx1x2 <- rsm(y ~ block.num + SO(x1, x2), data = df.7.6)
# externally Studentized residuals
rsm.7.6.y.bn.SOx1x2$studres <- rstudent(rsm.7.6.y.bn.SOx1x2)
summary(rsm.7.6.y.bn.SOx1x2)


Call:
rsm(formula = y ~ block.num + SO(x1, x2), data = df.7.6)

             Estimate Std. Error  t value  Pr(>|t|)    
(Intercept) 81.866662   0.066620 1228.863 < 2.2e-16 ***
block.num   -4.457530   0.087226  -51.103 2.877e-10 ***
x1           0.932541   0.057699   16.162 8.444e-07 ***
x2           0.577712   0.057699   10.012 2.122e-05 ***
x1:x2        0.125000   0.081592    1.532    0.1694    
x1^2        -1.308555   0.060064  -21.786 1.083e-07 ***
x2^2        -0.933442   0.060064  -15.541 1.104e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.9981,    Adjusted R-squared:  0.9964 
F-statistic: 607.2 on 6 and 7 DF,  p-value: 3.811e-09

Analysis of Variance Table

Response: y
            Df Sum Sq Mean Sq   F value    Pr(>F)
block.num    1 69.531  69.531 2611.0950 2.879e-10
FO(x1, x2)   2  9.626   4.813  180.7341 9.450e-07
TWI(x1, x2)  1  0.063   0.063    2.3470    0.1694
PQ(x1, x2)   2 17.791   8.896  334.0539 1.135e-07
Residuals    7  0.186   0.027                    
Lack of fit  3  0.053   0.018    0.5307    0.6851
Pure error   4  0.133   0.033                    

Stationary point of response surface:
       x1        x2 
0.3722954 0.3343802 

Eigenanalysis:
eigen() decomposition
$values
[1] -0.9233027 -1.3186949

$vectors
         [,1]       [,2]
x1 -0.1601375 -0.9870947
x2 -0.9870947  0.1601375

Chapter 8

Experimental Designs for Fitting Response Surfaces - II

8.1 Evaluating design efficiency Here are some ideas for evaluating efficiency of designs. Let’s start with a single rep of a 2^3 design.

#### 8.1
df.8.1 <- read.table(text="
x1 x2 x3
-1 -1 -1
1 -1 -1
-1 1 -1
1 1 -1
-1 -1 1
1 -1 1
-1 1 1
1 1 1
", header=TRUE)
str(df.8.1)

'data.frame':   8 obs. of  3 variables:
 $ x1: int  -1 1 -1 1 -1 1 -1 1
 $ x2: int  -1 -1 1 1 -1 -1 1 1
 $ x3: int  -1 -1 -1 -1 1 1 1 1

You can use the package AlgDesign function eval.design() to evaluate the D and A effciencies of the design. The effciency depends on the model function you specify.

library(AlgDesign)
eval.design(~ x1 + x2 + x3, df.8.1)

$determinant
[1] 1

$A
[1] 1

$diagonality
[1] 1

$gmean.variances
[1] 1

# these can be assigned to variables
D.eff.lin <- eval.design(~ x1 + x2 + x3, df.8.1)$determinant
# note that A-efficiency is 1/A given from eval.design()
A.eff.lin <- 1 / eval.design(~ x1 + x2 + x3, df.8.1)$A
c(D.eff.lin, A.eff.lin)

[1] 1 1

Now we calculate the same quantities after adding 3 center points.

8.1: Evaluating design effciencies 81

df.8.1c <- rbind(df.8.1, data.frame(x1 = rep(0,3), x2 = rep(0,3), x3 = rep(0,3)))
df.8.1c

   x1 x2 x3
1  -1 -1 -1
2   1 -1 -1
3  -1  1 -1
4   1  1 -1
5  -1 -1  1
6   1 -1  1
7  -1  1  1
8   1  1  1
9   0  0  0
10  0  0  0
11  0  0  0

library(AlgDesign)
eval.design(~ x1 + x2 + x3, df.8.1c)

$determinant
[1] 0.7875406

$A
[1] 1.28125

$diagonality
[1] 1

$gmean.variances
[1] 1.375

D.eff.lin <- eval.design(~ x1 + x2 + x3, df.8.1c)$determinant
A.eff.lin <- 1 / eval.design(~ x1 + x2 + x3, df.8.1c)$A
c(D.eff.lin, A.eff.lin)

[1] 0.7875406 0.7804878

Augmenting experimental designs

Sometimes it is desirable to augment a design in an optimal way. Suppose it is expected that a linear model will suce for the experiment and so the design in df.8.2 is run.

df.8.2 <- read.table(text="
x1 x2
-1 -1
1 -1
-1 1
0 0
0 0
", header=TRUE)
str(df.8.2)

'data.frame':   5 obs. of  2 variables:
 $ x1: int  -1 1 -1 0 0
 $ x2: int  -1 -1 1 0 0

Later, the experimenters realized that a second-order design would be more appropriate. They constructed a sensible list of design points to include (based on a face-centered CCD). Then they evaluate the star points at a number of values (0.5, 1.0, 1.5, 2.0), and find the D-optimal augmentation at each.

for (i.alpha in c(0.5, 1.0, 1.5, 2.0)) {
# augment design candidate points
D.candidate.points <- i.alpha * data.frame(x1 = c( 1, -1, 0, 0)
, x2 = c( 0, 0, 1, -1))
# combine original points with candidate points
D.all <- rbind(df.8.2, D.candidate.points)
# keep the original points and augment the design with 4 points
library(AlgDesign)
D.aug <- optFederov(~ quad(.), D.all, nTrials = dim(df.8.2)[1] + 4, rows = 1:dim(df.8.2)[1]
, augment = TRUE, criterion = "D", maxIteration = 1e4, nRepeats = 1e2)
## same as:
#D.aug <- optFederov(~ x1 + x2 + x1:x2 + x1^2 + x2^2
# , D.all, nTrials = dim(df.8.2)[1] + 4, rows = 1:dim(df.8.2)[1]
# , augment = TRUE, criterion = "D", maxIteration = 1e4, nRepeats = 1e2)
# Added points
print(D.aug$design[(dim(df.8.2)[1]+1):dim(D.aug$design)[1],])
library(AlgDesign)
# quadratic
#rm(D.eff.lin, A.eff.lin)
D.eff.lin <- eval.design(~ quad(.), D.aug$design)$determinant
A.eff.lin <- 1 / eval.design(~ quad(.), D.aug$design)$A
print(c(i.alpha, D.eff.lin, A.eff.lin))
}

    x1   x2
6  0.5  0.0
7 -0.5  0.0
8  0.0  0.5
9  0.0 -0.5
[1] 0.50000000 0.18812090 0.05286691
  x1 x2
6  1  0
7 -1  0
8  0  1
9  0 -1
[1] 1.0000000 0.3812509 0.2556391
    x1   x2
6  1.5  0.0
7 -1.5  0.0
8  0.0  1.5
9  0.0 -1.5
[1] 1.5000000 0.6583697 0.4208625
  x1 x2
6  2  0
7 -2  0
8  0  2
9  0 -2
[1] 2.0000000 1.1079479 0.6066351

Funding comes through and they don’t have to just add the four axial points, but can afford to add any 12 points they like to the initial run. They look at a range of combinations of points for x1 and x2 and find the D-optimal augmentation.

### augment design candidate points
D.candidate.points <- expand.grid(x1 = seq(-2, 2, by = 0.5)
, x2 = seq(-2, 2, by = 0.5))
str(D.candidate.points)

'data.frame':   81 obs. of  2 variables:
 $ x1: num  -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 ...
 $ x2: num  -2 -2 -2 -2 -2 -2 -2 -2 -2 -1.5 ...
 - attr(*, "out.attrs")=List of 2
  ..$ dim     : Named int [1:2] 9 9
  .. ..- attr(*, "names")= chr [1:2] "x1" "x2"
  ..$ dimnames:List of 2
  .. ..$ x1: chr [1:9] "x1=-2.0" "x1=-1.5" "x1=-1.0" "x1=-0.5" ...
  .. ..$ x2: chr [1:9] "x2=-2.0" "x2=-1.5" "x2=-1.0" "x2=-0.5" ...

head(D.candidate.points)

    x1 x2
1 -2.0 -2
2 -1.5 -2
3 -1.0 -2
4 -0.5 -2
5  0.0 -2
6  0.5 -2

tail(D.candidate.points)

# combine original points with candidate points
D.all <- rbind(df.8.2, D.candidate.points)
# keep the original points and augment the design with 4 points
library(AlgDesign)
D.aug <- optFederov(~ quad(.), D.all, nTrials = dim(df.8.2)[1] + 12, rows = 1:dim(df.8.2)[1]
, augment = TRUE, criterion = "D", maxIteration = 1e4, nRepeats = 1e2)
# Added points
print(D.aug$design[(dim(df.8.2)[1]+1):dim(D.aug$design)[1],])

     x1   x2
6  -2.0 -2.0
7  -1.5 -2.0
10  0.0 -2.0
14  2.0 -2.0
23  2.0 -1.5
33 -2.0 -0.5
59  2.0  0.5
69 -2.0  1.5
78 -2.0  2.0
82  0.0  2.0
85  1.5  2.0
86  2.0  2.0

library(AlgDesign)
# quadratic
#rm(D.eff.lin, A.eff.lin)
D.eff.lin <- eval.design(~ quad(.), D.aug$design)$determinant
A.eff.lin <- 1 / eval.design(~ quad(.), D.aug$design)$A
print(c(i.alpha, D.eff.lin, A.eff.lin))

[1] 2.000000 2.504373 1.017305

### full design
library(plyr)
D.aug.ordered <- arrange(D.aug$design, x1, x2)
D.aug.ordered

     x1   x2
1  -2.0 -2.0
2  -2.0 -0.5
3  -2.0  1.5
4  -2.0  2.0
5  -1.5 -2.0
6  -1.0 -1.0
7  -1.0  1.0
8   0.0 -2.0
9   0.0  0.0
10  0.0  0.0
11  0.0  2.0
12  1.0 -1.0
13  1.5  2.0
14  2.0 -2.0
15  2.0 -1.5
16  2.0  0.5
17  2.0  2.0

Example 8.10, p. 390

The AlgDesign package can be used to create computer-generated designs, providing a list of permitted/suggested design points.

\[ D-option \space3*2*2 \space for \space y = \beta_0 +\beta_1x_1 + \beta_2x_2 + \beta_3x_3 + \beta_11x_1^2\space + \eta \]

#### 8.10
# design candidate values 3 * 2 * 2
D.candidate.points <- expand.grid(x1 = seq(-1, 1, by = 1)
, x2 = c(-1, 1)
, x3 = c(-1, 1))
D.candidate.points

   x1 x2 x3
1  -1 -1 -1
2   0 -1 -1
3   1 -1 -1
4  -1  1 -1
5   0  1 -1
6   1  1 -1
7  -1 -1  1
8   0 -1  1
9   1 -1  1
10 -1  1  1
11  0  1  1
12  1  1  1

# choose the 12 points based on D-criterion
library(AlgDesign)
D.gen <- optFederov(~ x1 + x2 + x3 + x1^2, D.candidate.points, nTrials = 12
, criterion = "D", evaluateI = TRUE, maxIteration = 1e4, nRepeats = 1e2)
D.gen

$D
[1] 0.903602

$A
[1] 1.125

$I
[1] 4

$Ge
[1] 0.889

$Dea
[1] 0.882

$design
   x1 x2 x3
1  -1 -1 -1
2   0 -1 -1
3   1 -1 -1
4  -1  1 -1
5   0  1 -1
6   1  1 -1
7  -1 -1  1
8   0 -1  1
9   1 -1  1
10 -1  1  1
11  0  1  1
12  1  1  1

$rows
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

A.eff <- 1/D.gen$A # A efficiency is the reciprocal of $A
# easier to read when ordered
library(plyr)
D.gen.ordered <- arrange(D.gen$design, x1, x2, x3)
D.gen.ordered

   x1 x2 x3
1  -1 -1 -1
2  -1 -1  1
3  -1  1 -1
4  -1  1  1
5   0 -1 -1
6   0 -1  1
7   0  1 -1
8   0  1  1
9   1 -1 -1
10  1 -1  1
11  1  1 -1
12  1  1  1

\[ D-option \space5*2*2 \space for \space y = \beta_0 +\beta_1x_1 + \beta_2x_2 + \beta_3x_3 + \beta_11x_1^2\space + \eta \]

#### 8.10
# design candidate values 5 * 2 * 2
D.candidate.points <- expand.grid(x1 = seq(-1, 1, by = 0.5)
, x2 = c(-1, 1)
, x3 = c(-1, 1))
D.candidate.points

     x1 x2 x3
1  -1.0 -1 -1
2  -0.5 -1 -1
3   0.0 -1 -1
4   0.5 -1 -1
5   1.0 -1 -1
6  -1.0  1 -1
7  -0.5  1 -1
8   0.0  1 -1
9   0.5  1 -1
10  1.0  1 -1
11 -1.0 -1  1
12 -0.5 -1  1
13  0.0 -1  1
14  0.5 -1  1
15  1.0 -1  1
16 -1.0  1  1
17 -0.5  1  1
18  0.0  1  1
19  0.5  1  1
20  1.0  1  1

# choose the 12 points based on D-criterion
library(AlgDesign)
D.gen <- optFederov(~ x1 + x2 + x3 + x1^2, D.candidate.points, nTrials = 12
, criterion = "D", evaluateI = TRUE, maxIteration = 1e4, nRepeats = 1e2)
D.gen

$D
[1] 0.9306049

$A
[1] 1.083333

$I
[1] 3.666667

$Ge
[1] 0.923

$Dea
[1] 0.92

$design
     x1 x2 x3
1  -1.0 -1 -1
2  -0.5 -1 -1
5   1.0 -1 -1
6  -1.0  1 -1
9   0.5  1 -1
10  1.0  1 -1
11 -1.0 -1  1
14  0.5 -1  1
15  1.0 -1  1
16 -1.0  1  1
17 -0.5  1  1
20  1.0  1  1

$rows
 [1]  1  2  5  6  9 10 11 14 15 16 17 20

A.eff <- 1/D.gen$A # A efficiency is the reciprocal of $A
# easier to read when ordered
library(plyr)
D.gen.ordered <- arrange(D.gen$design, x1, x2, x3)
D.gen.ordered

     x1 x2 x3
1  -1.0 -1 -1
2  -1.0 -1  1
3  -1.0  1 -1
4  -1.0  1  1
5  -0.5 -1 -1
6  -0.5  1  1
7   0.5 -1  1
8   0.5  1 -1
9   1.0 -1 -1
10  1.0 -1  1
11  1.0  1 -1
12  1.0  1  1

$$ I-option 22 for y = _0 +_1x_1 + _2x_2 + _3x_3 + _11x_1^2+

#### 8.10
# design candidate values 5 * 2 * 2
D.candidate.points <- expand.grid(x1 = seq(-1, 1, by = 0.5)
, x2 = c(-1, 1)
, x3 = c(-1, 1))
D.candidate.points

     x1 x2 x3
1  -1.0 -1 -1
2  -0.5 -1 -1
3   0.0 -1 -1
4   0.5 -1 -1
5   1.0 -1 -1
6  -1.0  1 -1
7  -0.5  1 -1
8   0.0  1 -1
9   0.5  1 -1
10  1.0  1 -1
11 -1.0 -1  1
12 -0.5 -1  1
13  0.0 -1  1
14  0.5 -1  1
15  1.0 -1  1
16 -1.0  1  1
17 -0.5  1  1
18  0.0  1  1
19  0.5  1  1
20  1.0  1  1

# choose the 12 points based on D-criterion
library(AlgDesign)
D.gen <- optFederov(~ x1 + x2 + x3 + x1^2, D.candidate.points, nTrials = 12
, criterion = "I", evaluateI = TRUE, maxIteration = 1e4, nRepeats = 1e2)
D.gen

$D
[1] 0.9306049

$A
[1] 1.083333

$I
[1] 3.666667

$Ge
[1] 0.923

$Dea
[1] 0.92

$design
     x1 x2 x3
1  -1.0 -1 -1
4   0.5 -1 -1
5   1.0 -1 -1
6  -1.0  1 -1
7  -0.5  1 -1
10  1.0  1 -1
11 -1.0 -1  1
12 -0.5 -1  1
15  1.0 -1  1
16 -1.0  1  1
19  0.5  1  1
20  1.0  1  1

$rows
 [1]  1  4  5  6  7 10 11 12 15 16 19 20

A.eff <- 1/D.gen$A # A efficiency is the reciprocal of $A
# easier to read when ordered
library(plyr)
D.gen.ordered <- arrange(D.gen$design, x1, x2, x3)
D.gen.ordered

     x1 x2 x3
1  -1.0 -1 -1
2  -1.0 -1  1
3  -1.0  1 -1
4  -1.0  1  1
5  -0.5 -1  1
6  -0.5  1 -1
7   0.5 -1 -1
8   0.5  1  1
9   1.0 -1 -1
10  1.0 -1  1
11  1.0  1 -1
12  1.0  1  1

$$ A-option 22 for y = _0 +_1x_1 + _2x_2 + _3x_3 + _11x_1^2+

#### 8.10
# design candidate values
D.candidate.points <- expand.grid(x1 = seq(-1, 1, by = 0.5)
, x2 = c(-1, 1)
, x3 = c(-1, 1))
#D.candidate.points
# choose the 12 points based on A-criterion
library(AlgDesign)
D.gen <- optFederov(~ x1 + x2 + x3 + x1^2, D.candidate.points, nTrials = 12
, criterion = "A", evaluateI = TRUE, maxIteration = 1e4, nRepeats = 1e2)
D.gen

$D
[1] 0.9306049

$A
[1] 1.083333

$I
[1] 3.666667

$Ge
[1] 0.923

$Dea
[1] 0.92

$design
     x1 x2 x3
1  -1.0 -1 -1
2  -0.5 -1 -1
5   1.0 -1 -1
6  -1.0  1 -1
9   0.5  1 -1
10  1.0  1 -1
11 -1.0 -1  1
14  0.5 -1  1
15  1.0 -1  1
16 -1.0  1  1
17 -0.5  1  1
20  1.0  1  1

$rows
 [1]  1  2  5  6  9 10 11 14 15 16 17 20

A.eff <- 1/D.gen$A # A efficiency is the reciprocal of $A
# easier to read when ordered
library(plyr)
D.gen.ordered <- arrange(D.gen$design, x1, x2, x3)
D.gen.ordered

     x1 x2 x3
1  -1.0 -1 -1
2  -1.0 -1  1
3  -1.0  1 -1
4  -1.0  1  1
5  -0.5 -1 -1
6  -0.5  1  1
7   0.5 -1  1
8   0.5  1 -1
9   1.0 -1 -1
10  1.0 -1  1
11  1.0  1 -1
12  1.0  1  1

Note: All of these choose the same designs.

The function varfcn() computes the scaled variance function for a design, based on a specified model, and plots it either as a function of the radius in a specified direction or as a contour plot.

library(rsm)
par(mfrow = c(2,2))
varfcn(D.gen.ordered, ~ x1 + x2 + x3 + x1^2)
# the contour plots will plot the first two variables in the formula
varfcn(D.gen.ordered, ~ x1 + x2 + x3 + x1^2, contour = TRUE)
varfcn(D.gen.ordered, ~ x2 + x3 + x1 + x1^2, contour = TRUE)
varfcn(D.gen.ordered, ~ x1 + x3 + x2 + x1^2, contour = TRUE)

Chapter 9 Advanced Topics in Response Surface Methodology

Note Covered

Chapter 10 Robust Parameter Design and Process Robustness Studies

10.1 Example 10.5, p. 511

10.1.1 Model mean and ln of variance separately

Read data and calculate mean, variance, log-variance, and SNR.

#### 10.5
fn.data <- "http://statacumen.com/teach/RSM/data/RSM_EX_10-05.txt"
df.10.5 <- read.table(fn.data, header=TRUE)
df.10.5

  x1 x2      y1      y2      y3      y4
1 -1 -1 33.5021 41.2268 25.2683 31.9930
2 -1  0 35.8234 38.0689 32.7928 34.0383
3 -1  1 33.0773 31.8435 36.2500 34.0162
4  0 -1 30.4481 41.2870 15.1493 23.9883
5  0  0 34.8679 40.2276 27.7724 31.1321
6  0  1 35.2202 37.1008 33.3280 35.2085
7  1 -1 21.1553 34.1086  0.7917 15.7450
8  1  0 27.6736 38.1477 15.5132 25.9873
9  1  1 32.1245 38.1193 26.1673 32.1622

# calculate some summary statistics, especially SNR
df.10.5a <- df.10.5
df.10.5a$m <- apply(df.10.5a[,c("y1","y2","y3","y4")], 1, mean)
df.10.5a$s <- apply(df.10.5a[,c("y1","y2","y3","y4")], 1, sd)
df.10.5a$s2 <- apply(df.10.5a[,c("y1","y2","y3","y4")], 1, var)
df.10.5a$logs2 <- log(df.10.5a$s2)
df.10.5a$SNR <- apply(df.10.5a[,c("y1","y2","y3","y4")], 1
, function(x) {
-10 * log10( sum(1 / x^2) / length(x) )
})
# average SNR over each condition, independently
library(plyr)
df.10.5a.SNR <- rbind(
ddply(df.10.5a, .(x1), function(.X) {
data.frame(
var = "x1"
, level = .X$x1[1]
, m.SNR = mean(.X$SNR)
, s.SNR = sd(.X$SNR)
, min.SNR = min(.X$SNR)
, max.SNR = max(.X$SNR)
)
})[,-1]
,
ddply(df.10.5a, .(x2), function(.X) {
data.frame(
var = "x2"
, level = .X$x2[1]
, m.SNR = mean(.X$SNR)
, s.SNR = sd(.X$SNR)
, min.SNR = min(.X$SNR)
, max.SNR = max(.X$SNR)
)
})[,-1]
)
df.10.5a.SNR

  var level    m.SNR      s.SNR   min.SNR  max.SNR
1  x1    -1 30.46985  0.4599576 29.975634 30.88540
2  x1     0 29.43202  2.0275222 27.121776 30.91566
3  x1     1 20.35925 14.2560878  3.972453 29.90935
4  x2    -1 20.35662 14.2606748  3.972453 29.97563
5  x2     0 29.44667  1.9741986 27.195963 30.88540
6  x2     1 30.45784  0.5092463 29.909346 30.91566

First, model mean and ln of variance separately, as suggested on p. 519 of text. Fit second-order linear model for \[\bar{y} \] and \[log (S^2)\]

library(rsm)
rsm.10.5a.m.SOx1x2 <- rsm(m ~ SO(x1, x2), data = df.10.5a)
# externally Studentized residuals
rsm.10.5a.m.SOx1x2$studres <- rstudent(rsm.10.5a.m.SOx1x2)
summary(rsm.10.5a.m.SOx1x2)


Call:
rsm(formula = m ~ SO(x1, x2), data = df.10.5a)

             Estimate Std. Error  t value  Pr(>|t|)    
(Intercept) 33.388881   0.071727  465.498 2.186e-08 ***
x1          -4.175204   0.039287 -106.275 1.837e-06 ***
x2           3.748096   0.039287   95.404 2.539e-06 ***
x1:x2        3.348494   0.048116   69.592 6.538e-06 ***
x1^2        -2.327671   0.068046  -34.207 5.493e-05 ***
x2^2        -1.867046   0.068046  -27.438 0.0001063 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.9999,    Adjusted R-squared:  0.9997 
F-statistic:  5432 on 5 and 3 DF,  p-value: 3.94e-06

Analysis of Variance Table

Response: m
            Df  Sum Sq Mean Sq  F value    Pr(>F)
FO(x1, x2)   2 188.883  94.442 10198.17 1.783e-06
TWI(x1, x2)  1  44.850  44.850  4843.03 6.538e-06
PQ(x1, x2)   2  17.808   8.904   961.48 6.148e-05
Residuals    3   0.028   0.009                   
Lack of fit  3   0.028   0.009      NaN       NaN
Pure error   0   0.000     NaN                   

Stationary point of response surface:
        x1         x2 
-0.4926412  0.5619813 

Eigenanalysis:
eigen() decomposition
$values
[1] -0.4073446 -3.7873721

$vectors
         [,1]       [,2]
x1 -0.6571611 -0.7537501
x2 -0.7537501  0.6571611

library(rsm)
rsm.10.5a.logs2.SOx1x2 <- rsm(logs2 ~ SO(x1, x2), data = df.10.5a)
# externally Studentized residuals
rsm.10.5a.logs2.SOx1x2$studres <- rstudent(rsm.10.5a.logs2.SOx1x2)
summary(rsm.10.5a.logs2.SOx1x2)


Call:
rsm(formula = logs2 ~ SO(x1, x2), data = df.10.5a)

             Estimate Std. Error t value Pr(>|t|)  
(Intercept)  2.986329   0.541521  5.5147  0.01174 *
x1           1.034956   0.296603  3.4894  0.03979 *
x2          -1.421199   0.296603 -4.7916  0.01729 *
x1:x2        0.109458   0.363263  0.3013  0.78285  
x1^2         0.251571   0.513732  0.4897  0.65792  
x2^2         0.026179   0.513732  0.0510  0.96256  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.922, Adjusted R-squared:  0.792 
F-statistic: 7.094 on 5 and 3 DF,  p-value: 0.06883

Analysis of Variance Table

Response: logs2
            Df  Sum Sq Mean Sq F value  Pr(>F)
FO(x1, x2)   2 18.5457  9.2728 17.5674 0.02206
TWI(x1, x2)  1  0.0479  0.0479  0.0908 0.78285
PQ(x1, x2)   2  0.1279  0.0640  0.1212 0.88999
Residuals    3  1.5835  0.5278                
Lack of fit  3  1.5835  0.5278     NaN     NaN
Pure error   0  0.0000     NaN                

Stationary point of response surface:
       x1        x2 
-14.60436  57.67617 

Eigenanalysis:
eigen() decomposition
$values
[1] 0.26415693 0.01359235

$vectors
         [,1]       [,2]
x1 -0.9745607  0.2241237
x2 -0.2241237 -0.9745607

par(mfrow = c(1,2))
# this is the stationary point
canonical(rsm.10.5a.m.SOx1x2)$xs

        x1         x2 
-0.4926412  0.5619813

contour(rsm.10.5a.m.SOx1x2, ~ x1 + x2, image=TRUE
, at = canonical(rsm.10.5a.m.SOx1x2)$xs, main = "mean")

# this is the stationary point
canonical(rsm.10.5a.logs2.SOx1x2)$xs

        x1         x2 
-1.2730074 -0.2927588

contour(rsm.10.5a.logs2.SOx1x2, ~ x1 + x2, image=TRUE
, at = canonical(rsm.10.5a.logs2.SOx1x2)$xs, main = "log(s^2)")

10.1.2 Model mean and ln of variance as on p. 512 (eq. 10.20)

Reshape data and create z1 and z2 variables.

#### 10.5b
fn.data <- "http://statacumen.com/teach/RSM/data/RSM_EX_10-05.txt"
df.10.5b <- read.table(fn.data, header=TRUE)
df.10.5b

  x1 x2      y1      y2      y3      y4
1 -1 -1 33.5021 41.2268 25.2683 31.9930
2 -1  0 35.8234 38.0689 32.7928 34.0383
3 -1  1 33.0773 31.8435 36.2500 34.0162
4  0 -1 30.4481 41.2870 15.1493 23.9883
5  0  0 34.8679 40.2276 27.7724 31.1321
6  0  1 35.2202 37.1008 33.3280 35.2085
7  1 -1 21.1553 34.1086  0.7917 15.7450
8  1  0 27.6736 38.1477 15.5132 25.9873
9  1  1 32.1245 38.1193 26.1673 32.1622

library(reshape2)
df.10.5b.long <- melt(df.10.5b, id.vars = c("x1", "x2"), variable.name = "rep", value.name = "# create z1 and z2 variables")

# create z1 and z2 variables
df.10.5b.long$z1 <- NA
df.10.5b.long$z2 <- NA
df.10.5b.long[(df.10.5b.long$rep == "y1"), c("z1", "z2")] <- data.frame(z1 = -1, z2 = -1)
df.10.5b.long[(df.10.5b.long$rep == "y2"), c("z1", "z2")] <- data.frame(z1 = -1, z2 = 1)
df.10.5b.long[(df.10.5b.long$rep == "y3"), c("z1", "z2")] <- data.frame(z1 = 1, z2 = -1)
df.10.5b.long[(df.10.5b.long$rep == "y4"), c("z1", "z2")] <- data.frame(z1 = 1, z2 = 1)
df.10.5b.long

   x1 x2 rep # create z1 and z2 variables z1 z2
1  -1 -1  y1                      33.5021 -1 -1
2  -1  0  y1                      35.8234 -1 -1
3  -1  1  y1                      33.0773 -1 -1
4   0 -1  y1                      30.4481 -1 -1
5   0  0  y1                      34.8679 -1 -1
6   0  1  y1                      35.2202 -1 -1
7   1 -1  y1                      21.1553 -1 -1
8   1  0  y1                      27.6736 -1 -1
9   1  1  y1                      32.1245 -1 -1
10 -1 -1  y2                      41.2268 -1  1
11 -1  0  y2                      38.0689 -1  1
12 -1  1  y2                      31.8435 -1  1
13  0 -1  y2                      41.2870 -1  1
14  0  0  y2                      40.2276 -1  1
15  0  1  y2                      37.1008 -1  1
16  1 -1  y2                      34.1086 -1  1
17  1  0  y2                      38.1477 -1  1
18  1  1  y2                      38.1193 -1  1
19 -1 -1  y3                      25.2683  1 -1
20 -1  0  y3                      32.7928  1 -1
21 -1  1  y3                      36.2500  1 -1
22  0 -1  y3                      15.1493  1 -1
23  0  0  y3                      27.7724  1 -1
24  0  1  y3                      33.3280  1 -1
25  1 -1  y3                       0.7917  1 -1
26  1  0  y3                      15.5132  1 -1
27  1  1  y3                      26.1673  1 -1
28 -1 -1  y4                      31.9930  1  1
29 -1  0  y4                      34.0383  1  1
30 -1  1  y4                      34.0162  1  1
31  0 -1  y4                      23.9883  1  1
32  0  0  y4                      31.1321  1  1
33  0  1  y4                      35.2085  1  1
34  1 -1  y4                      15.7450  1  1
35  1  0  y4                      25.9873  1  1
36  1  1  y4                      32.1622  1  1

Response Surface Methodology I

Lumumba W. Victor

2023-03-27

In-class Examples with R Code (Response Surface Methodology (RSM))

Stat 579

Michigan State University

Set up Rstudio

Introduction to R

Introduction to Response Surface Methodology

2.1.1 Read data

Read data, skip extra header lines, recode variables.

View the structure of the data set

View the first few observations

Replace coded values as “9” with sqrt(2)

Correlation plot

2.1.2 Fit second-order model

Fit second-order linear model.

which variables are available in the summary of the rsm object?

show the summary

include externally Studentized residuals in the rsm object for plotting later

2.1.3 Model Fitting

Fit a frst-order model.

Fit a frst-order with two-way interaction model

Fit a second-order without interactions model.

2.1.4 Model comparisons

Model comparison (for multi-parameter tests).

Confidence intervals and prediction intervals.

2.1.5 Lack-of-fit test

2.1.6 Diagnostics and plots of estimated response surfaces

Plot the residuals, and comment on model adequacy.

Chapter 3

Two-Level Factorial Designs

3.1 Example 3.1, Table 3.5, p. 90

Read data and convert to long format.

Check the structure of the data set

View the first few observations

reshape data into long format

Fit second-order linear model.

externally Studentized residuals

Note: The quadratic terms can’t be estimated because there are only two levels for each predictor variable.

Fit main-effect with three-way interaction linear model.

Cooks Distance

Interaction plots, main effects vs main effects.

Interaction plots, main effects vs two-way interactions.

Interaction plots, main effects vs main effects in ggplot().

3.2 Example 3.2, Table 3.7, p. 97

Here’s a quick way to create a design matrix for a factorial design.

Read data.

3.2

View the structure of your data set

View the first few observations

Fit first-order with four-way interaction linear model.

Use Lenth procedure and normal plots to assess effects in unreplicated design.

BsMD package has unreplicated factorial tests (Daniel plots (aka normal), and Lenth)

externally Studentized residuals

3.3 Creating Fractional Factorial designs with blocks and aliasing

The rsm package can generage blocked designs.

populate a list of all the blocks and print them all

Chapter 4

Two-Level Fractional Factorial Designs

4.1 Generate a III design

4.2 Example 4.5, Table 4.15, p. 161 Original design is a III fractional factorial design.

Fit first-order with three-way interaction linear model.

The Lenth plot below indicates three large e ects worth investigating (a; b; ab).

4.3 Example 4.5, Table 4.16, p. 163

Full fold-over design. Note that the column for a, b, and c are 􀀀1 times the same columns in the original design.

View the first few observations

Fit first-order with three-way interaction linear model.

The Lenth plot below indicates the same three large e ects are worth investigating (a; b; ab), but some e ects are in different directions.

4.4 Combining the data

Combine the two datasets to see what the full factorial suggests.

Fit first-order with three-way interaction linear model.

With more evidence, only effect b is important.

4.5 Plackett-Berman design

Here are some examples of Plackett-Berman designs.

5.1 Example 5.1, Table 5.1, p. 185

Build a first-order response function. Read data.

Plots indicate the path of steepest ascent is similar to that in Figure 5.3.

Redo In the text they use a first-order model (equation p. 185) for their steepest ascent calulation.

Plots indicate the path of steepest ascent is similar to that in Figure 5.3.

The Lenth plot below indicates three large eects worth investigating (a; b; ab).

The Lenth plot below indicates the same three large eects are worth investigating (a; b; ab), but some eects are in different directions.