Response surface project for Coursera Experimentation for Improvement.

PDF

Experiment Generator - see 8/29/14 email for my sign in

Score modifier: \[5.0 \times \dfrac{\text{Your optimum} - \text{Baseline}}{\text{True optimum} - \text{Baseline}} - 0.25N + 3.0\] where \(N\) is your number of experiments.
There is error in the response variable in the order of \(\pm 25\) $/kg; please take this into account.
For the same levels of A, B and C, the simulation will return different results for different groups.
You must wait 3 hours between each experimental condition; please plan your time accordingly to make sure you finish on time.

Factors

A = temperature during the batch reaction, Kelvin; continuous; 390-480K
B = duration of the batch in minutes; continuous; 20-50 minutes
C = choice of solvent: hexane or xylene; discrete; two levels

Experiments Performed

Plot of experiments.

Experiments

Solution.

Solution

library(gdata, quietly=TRUE)
## Warning: package 'gdata' was built under R version 3.0.3
## gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
## 
## gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
## 
## Attaching package: 'gdata'
## 
## The following object is masked from 'package:stats':
## 
##     nobs
## 
## The following object is masked from 'package:utils':
## 
##     object.size
data <- read.xls("~/Education/Coursera - Experimentation for Improvement/R/RSM_project/RSM_project.xls")

Experiment spreadsheet built up by working through the steps in this document.

library(knitr)
## Warning: package 'knitr' was built under R version 3.0.3
# http://blogs.uoregon.edu/rclub/2014/05/15/nice-tables-from-r-data-frames/
kable(data)
Run. Exp Rand A B C Acen Bcen Arng Brng xA xB xC Gen ypred y Type Notes
0 1 NA 439 43.0 Xylene 426 39 26 8 1 1.000 1 1.00 NA 515 Factorial Baseline
3 2 3 413 35.0 Xylene 426 39 26 8 -1 -1.000 1 1.00 NA 479 Factorial
1 3 1 439 35.0 Hexane 426 39 26 8 1 -1.000 -1 -1.00 NA 618 Factorial
2 4 2 413 43.0 Hexane 426 39 26 8 -1 1.000 -1 -1.00 NA 25 Factorial
4 5 1 439 27.0 Xylene 452 31 26 8 -1 -1.000 1 1.00 1072 1026 Factorial
5 6 2 465 27.0 Hexane 452 31 26 8 1 -1.000 -1 -1.00 1211 751 Factorial
1 3 NA 439 35.0 Hexane 452 31 26 8 -1 1.000 -1 -1.00 618 618 Factorial
6 7 3 465 35.0 Xylene 452 31 26 8 1 1.000 1 1.00 1108 835 Factorial
4 5 NA 439 27.0 Xylene 452 31 26 8 -1 -1.000 1 1.00 1026 1026 Factorial
8 8 2 465 27.0 Xylene 452 31 26 8 1 -1.000 1 -1.00 1147 967 Factorial
7 9 1 439 35.0 Xylene 452 31 26 8 -1 1.000 1 -1.00 714 771 Factorial
6 7 NA 465 35.0 Xylene 452 31 26 8 1 1.000 1 1.00 835 835 Factorial
9 10 NA 413 20.0 Xylene 452 31 26 8 -3 -2.750 1 8.25 1400 1015 Ascent
NA NA NA 413 20.0 Xylene 439 35 26 8 -2 -3.750 NA NA 1400 NA Prediction mod6
10 11 NA 439 20.0 Xylene 452 31 26 8 -1 -2.750 1 2.75 1098 933 Factorial
11 12 NA 426 23.5 Xylene 452 31 26 8 -2 -1.875 1 3.75 1046 1035 Center
12 13 NA 413 27.0 Xylene 452 31 26 8 -3 -1.000 1 3.00 945 936 Factorial
13 14 NA 413 27.0 Hexane 452 31 26 8 -3 -1.000 -1 3.00 777 628 Factorial
14 15 NA 465 43.0 Hexane 452 31 26 8 1 3.000 -1 3.00 502 552 Factorial

First Factorial Run

Given that we have 3 factors and want to run a maximum of 20 experiments start with a half factorial (we can complete the full factorial if necessary). See the Gen(erator) column in the table above for selection of C.

The aliasing is that each main effect overlaps with the interaction effect for the other two factors and the intercept overlaps with the three factor interaction effect.

The results of this experiment will be used either to:
Take a steepest ascent step in the appropriate direction or
A starting point for additional runs (possibly up to a full factorial)
depending on how much aliasing appears to be present.

Choose ranges based on about 1/4 of maximum range. For A use 26K and for B use 8 min.

Randomize order of first three runs: 3, 1, 2

After performing the experiments the results are entered in the spreadsheet (see table above).

source("paretoPlot.R")
source("contourPlot.R")
source("contourPlotRSM.R") # Not working correctly yet

datamod1 <- data[1:4,]

# Create linear model for first factorial
mod1 <- lm(y ~ xA * xB * xC, data=datamod1)
mod1
## 
## Call:
## lm(formula = y ~ xA * xB * xC, data = datamod1)
## 
## Coefficients:
## (Intercept)           xA           xB           xC        xA:xB  
##       409.2        157.2       -139.2         87.8           NA  
##       xA:xC        xB:xC     xA:xB:xC  
##          NA           NA           NA
paretoPlot(mod1)
## Warning: package 'ggplot2' was built under R version 3.0.3

plot of chunk unnamed-chunk-3

contourPlotRSM(mod1, datamod1$xA, datamod1$xB, datamod1$xC)
## Warning: prediction from a rank-deficient fit may be misleading

plot of chunk unnamed-chunk-3plot of chunk unnamed-chunk-3

# Try just looking at A and B
mod1a <- lm(y ~ xA * xB, data=datamod1)
mod1a
## 
## Call:
## lm(formula = y ~ xA * xB, data = datamod1)
## 
## Coefficients:
## (Intercept)           xA           xB        xA:xB  
##       409.2        157.2       -139.2         87.8
contourPlot(mod1a, datamod1$xA, datamod1$xB)

plot of chunk unnamed-chunk-3

The model with only xA and xB (flattening the half factorial) gives a very different result because of the aliasing of C with AB. Try to evaluate which model is more accurate for the next set of runs.

# Build an rsm model instead
library(rsm)
## Warning: package 'rsm' was built under R version 3.0.3
rsmmod1 <- rsm(y ~ FO(xA, xB, xC), data=datamod1)
summary(rsmmod1)
## Warning: ANOVA F-tests on an essentially perfect fit are unreliable
## 
## Call:
## rsm(formula = y ~ FO(xA, xB, xC), data = datamod1)
## 
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    409.2         NA      NA       NA
## xA             157.2         NA      NA       NA
## xB            -139.2         NA      NA       NA
## xC              87.8         NA      NA       NA
## 
## Multiple R-squared:     1,   Adjusted R-squared:   NaN 
## F-statistic:  NaN on 3 and 0 DF,  p-value: NA
## 
## Analysis of Variance Table
## 
## Response: y
##                Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB, xC)  3 207273   69091               
## Residuals       0      0                       
## Lack of fit     0      0                       
## Pure error      0      0                       
## 
## Direction of steepest ascent (at radius 1):
##      xA      xB      xC 
##  0.6908 -0.6117  0.3855 
## 
## Corresponding increment in original units:
##      xA      xB      xC 
##  0.6908 -0.6117  0.3855
# Try looking at rsm contour plots
contour(rsmmod1, ~ xA*xB)

plot of chunk unnamed-chunk-4

steepest(rsmmod1)
## Path of steepest ascent from ridge analysis:
##    dist    xA     xB    xC |   yhat
## 1   0.0 0.000  0.000 0.000 |  409.2
## 2   0.5 0.345 -0.306 0.193 |  523.0
## 3   1.0 0.691 -0.612 0.385 |  636.9
## 4   1.5 1.036 -0.918 0.578 |  750.7
## 5   2.0 1.382 -1.223 0.771 |  864.5
## 6   2.5 1.727 -1.529 0.964 |  978.3
## 7   3.0 2.072 -1.835 1.156 | 1092.0
## 8   3.5 2.418 -2.141 1.349 | 1206.0
## 9   4.0 2.763 -2.447 1.542 | 1319.8
## 10  4.5 3.109 -2.753 1.735 | 1433.7
## 11  5.0 3.454 -3.059 1.927 | 1547.5
# Try using cube to create experiment?  See rs-illus.pdf

Based on steepest ascent move to the lower right from experiment number 1. Build another half factorial in that direction. The expectation is that we will then add experiments at the upper right and lower left to form half factorials in the first and third quadrants. Then add a Xylene experiment in the center (at exp 1) to give a full factorial with a center for Xylene and a rotated full factorial with a center for Hexane. This will result in a total of 9 experiments so far.

Note that the initial indication is that xC (solvent) is less important than xA and xB since it has a coefficient of 87.5 (indicating Xylene is better) and only has possible values +/- 1 (for comparison both xA and xB can attain values with abs > 5 on the scale used here, and have coefficients with abs ~140). Check if these initial impressions hold up in later runs.

Second Factorial Run

Add experiments for this factorial to the spreadsheet (some are reused).

Make predictions and add to the spreadsheet.

# First make predictions for each of the runs in this factorial
datamod2 <- data[5:8,]
# Calculate coded values for first model from actual values in spreadsheet
datamod2pred <- datamod2
datamod2pred$xA <- (datamod2pred$A - 426) / (0.5 * 26)
datamod2pred$xB <- (datamod2pred$B - 39) / (0.5 * 8)
#datamod2pred
predict(mod1, newdata=datamod2pred)
## Warning: prediction from a rank-deficient fit may be misleading
##    5    6    7    8 
## 1072 1211  618 1108

Also predict with the flattened model to see which is a better match.

predict(mod1a, newdata=datamod2pred)
##   5   6   7   8 
## 721 509 618 757

Randomize order of second three runs: 1, 2, 3

After performing the experiments and adding the results to the spreadsheet build models for evaluation.

# Create linear model for second factorial
mod2 <- lm(y ~ xA * xB * xC, data=datamod2)
mod2
## 
## Call:
## lm(formula = y ~ xA * xB * xC, data = datamod2)
## 
## Coefficients:
## (Intercept)           xA           xB           xC        xA:xB  
##       807.5        -14.5        -81.0        123.0           NA  
##       xA:xC        xB:xC     xA:xB:xC  
##          NA           NA           NA
paretoPlot(mod2)

plot of chunk unnamed-chunk-7

contourPlotRSM(mod2, datamod2$xA, datamod2$xB, datamod2$xC)
## Warning: prediction from a rank-deficient fit may be misleading

plot of chunk unnamed-chunk-7plot of chunk unnamed-chunk-7

# Try just looking at A and B
mod2a <- lm(y ~ xA * xB, data=datamod2)
mod2a
## 
## Call:
## lm(formula = y ~ xA * xB, data = datamod2)
## 
## Coefficients:
## (Intercept)           xA           xB        xA:xB  
##       807.5        -14.5        -81.0        123.0
contourPlot(mod2a, datamod2$xA, datamod2$xB)

plot of chunk unnamed-chunk-7

rsmmod2 <- rsm(y ~ FO(xA, xB, xC), data=datamod2)
summary(rsmmod2)
## Warning: ANOVA F-tests on an essentially perfect fit are unreliable
## 
## Call:
## rsm(formula = y ~ FO(xA, xB, xC), data = datamod2)
## 
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    807.5         NA      NA       NA
## xA             -14.5         NA      NA       NA
## xB             -81.0         NA      NA       NA
## xC             123.0         NA      NA       NA
## 
## Multiple R-squared:     1,   Adjusted R-squared:   NaN 
## F-statistic:  NaN on 3 and 0 DF,  p-value: NA
## 
## Analysis of Variance Table
## 
## Response: y
##                Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB, xC)  3  87601   29200               
## Residuals       0      0                       
## Lack of fit     0      0                       
## Pure error      0      0                       
## 
## Direction of steepest ascent (at radius 1):
##       xA       xB       xC 
## -0.09798 -0.54734  0.83115 
## 
## Corresponding increment in original units:
##       xA       xB       xC 
## -0.09798 -0.54734  0.83115
# Try looking at rsm contour plots
contour(rsmmod2, ~ xA*xB)

plot of chunk unnamed-chunk-7

This factorial shows an even larger coefficient for xC (indicating Xylene performs better). Given the other coefficients it looks like the direction to explore is down and a little left.

But first we now have four Xylene runs. Let’s take a look at those by themselves.

Rotated Xylene Factorial

We now have enough data (Baseline, 3, 4, and 6) to make up a rotated full factorial for Xylene only. Build a model for this and evaluate response. Calculate coded values centered at run 1 (439, 35) with the same ranges as above.

datamod4 <- data[c(1,2,5,8),]
# Calculate coded values from actual values in spreadsheet
datamod4pred <- datamod4
datamod4pred$xA <- (datamod4pred$A - 439) / (0.5 * 26)
datamod4pred$xB <- (datamod4pred$B - 35) / (0.5 * 8)
datamod4pred
##   Run. Exp Rand   A  B      C Acen Bcen Arng Brng xA xB xC Gen ypred    y
## 1    0   1   NA 439 43 Xylene  426   39   26    8  0  2  1   1    NA  515
## 2    3   2    3 413 35 Xylene  426   39   26    8 -2  0  1   1    NA  479
## 5    4   5    1 439 27 Xylene  452   31   26    8  0 -2  1   1  1072 1026
## 8    6   7    3 465 35 Xylene  452   31   26    8  2  0  1   1  1108  835
##        Type    Notes
## 1 Factorial Baseline
## 2 Factorial         
## 5 Factorial         
## 8 Factorial
mod4 <- lm(y ~ xA * xB, data=datamod4pred)
mod4
## 
## Call:
## lm(formula = y ~ xA * xB, data = datamod4pred)
## 
## Coefficients:
## (Intercept)           xA           xB        xA:xB  
##         714           89         -128           NA
contourPlot(mod4, datamod4pred$xA, datamod4pred$xB)
## Warning: prediction from a rank-deficient fit may be misleading

plot of chunk unnamed-chunk-8

rsmmod4 <- rsm(y ~ FO(xA, xB), data=datamod4pred)
summary(rsmmod4)
## 
## Call:
## rsm(formula = y ~ FO(xA, xB), data = datamod4pred)
## 
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    713.8       56.8   12.58    0.051 .
## xA              89.0       40.1    2.22    0.270  
## xB            -127.8       40.1   -3.18    0.194  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.938,  Adjusted R-squared:  0.813 
## F-statistic: 7.53 on 2 and 1 DF,  p-value: 0.25
## 
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB)   2 193928   96964    7.53   0.25
## Residuals    1  12882   12882               
## Lack of fit  1  12882   12882               
## Pure error   0      0                       
## 
## Direction of steepest ascent (at radius 1):
##      xA      xB 
##  0.5716 -0.8205 
## 
## Corresponding increment in original units:
##      xA      xB 
##  0.5716 -0.8205
steepest(rsmmod4)
## Path of steepest ascent from ridge analysis:
##    dist    xA     xB |   yhat
## 1   0.0 0.000  0.000 |  713.8
## 2   0.5 0.286 -0.410 |  791.6
## 3   1.0 0.572 -0.821 |  869.5
## 4   1.5 0.857 -1.231 |  947.3
## 5   2.0 1.143 -1.641 | 1025.1
## 6   2.5 1.429 -2.051 | 1102.9
## 7   3.0 1.715 -2.462 | 1180.9
## 8   3.5 2.001 -2.872 | 1258.7
## 9   4.0 2.287 -3.282 | 1336.6
## 10  4.5 2.572 -3.692 | 1414.3
## 11  5.0 2.858 -4.103 | 1492.3
contour(rsmmod4, ~ xA*xB)

plot of chunk unnamed-chunk-8

# Predict the center point (or just use the intercept ;-)
predict(mod4, newdata=data.frame(xA=0, xB=0))
## Warning: prediction from a rank-deficient fit may be misleading
##     1 
## 713.8
# Predict lower right Xylene point
predict(mod4, newdata=data.frame(xA=2, xB=-2))
## Warning: prediction from a rank-deficient fit may be misleading
##    1 
## 1147

Given these results I think the next step is to run Xylene experiments at the last two predicted points which will fill out a full factorial for Xylene to the lower right. The prediction at the run 4 location indicates it should be an improvement.

In addition, both of these points already have Hexane runs so should help confirm if Xylene is the better choice.

Full Xylene Factorial

First also predict the new experiments with the second half factorial model.

predict(mod2, newdata=data.frame(xA=-1, xB=1, xC=1))
## Warning: prediction from a rank-deficient fit may be misleading
##   1 
## 864
predict(mod2, newdata=data.frame(xA=1, xB=-1, xC=1))
## Warning: prediction from a rank-deficient fit may be misleading
##   1 
## 997

This model is less optimistic about the lower right point.

Randomize order of these two runs: 2, 1

Now build the models.

datamod5 <- data[9:12,]

# Create linear model for xylene full factorial
mod5 <- lm(y ~ xA * xB, data=datamod5)
mod5
## 
## Call:
## lm(formula = y ~ xA * xB, data = datamod5)
## 
## Coefficients:
## (Intercept)           xA           xB        xA:xB  
##      899.75         1.25       -96.75        30.75
paretoPlot(mod5)

plot of chunk unnamed-chunk-10

contourPlot(mod5, datamod5$xA, datamod5$xB)

plot of chunk unnamed-chunk-10

rsmmod5 <- rsm(y ~ FO(xA, xB), data=datamod5)
summary(rsmmod5)
## 
## Call:
## rsm(formula = y ~ FO(xA, xB), data = datamod5)
## 
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   899.75      30.75   29.26    0.022 *
## xA              1.25      30.75    0.04    0.974  
## xB            -96.75      30.75   -3.15    0.196  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.908,  Adjusted R-squared:  0.725 
## F-statistic: 4.95 on 2 and 1 DF,  p-value: 0.303
## 
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB)   2  37449   18724    4.95    0.3
## Residuals    1   3782    3782               
## Lack of fit  1   3782    3782               
## Pure error   0      0                       
## 
## Direction of steepest ascent (at radius 1):
##       xA       xB 
##  0.01292 -0.99992 
## 
## Corresponding increment in original units:
##       xA       xB 
##  0.01292 -0.99992
steepest(rsmmod5)
## Path of steepest ascent from ridge analysis:
##    dist    xA   xB |   yhat
## 1   0.0 0.000  0.0 |  899.8
## 2   0.5 0.006 -0.5 |  948.1
## 3   1.0 0.013 -1.0 |  996.5
## 4   1.5 0.019 -1.5 | 1044.9
## 5   2.0 0.026 -2.0 | 1093.3
## 6   2.5 0.032 -2.5 | 1141.7
## 7   3.0 0.039 -3.0 | 1190.0
## 8   3.5 0.045 -3.5 | 1238.4
## 9   4.0 0.052 -4.0 | 1286.8
## 10  4.5 0.058 -4.5 | 1335.2
## 11  5.0 0.065 -5.0 | 1383.6
contour(rsmmod5, ~ xA*xB)

plot of chunk unnamed-chunk-10

I’m not sure where to go from here. Try some predictions at possible next experiments.

predict(mod5, newdata=data.frame(xA=-3, xB=-1))
##    1 
## 1085
predict(mod5, newdata=data.frame(xA=-3, xB=-2))
##    1 
## 1274
predict(mod5, newdata=data.frame(xA=-1, xB=-2))
##    1 
## 1154

Try a model with all the Xylene data.

datamod6 <- data[c(1,2,5,8,10,11),]
# Calculate coded values from actual values in spreadsheet
datamod6pred <- datamod6
datamod6pred$xA <- (datamod6pred$A - 439) / (0.5 * 26)
datamod6pred$xB <- (datamod6pred$B - 35) / (0.5 * 8)
datamod6pred
##    Run. Exp Rand   A  B      C Acen Bcen Arng Brng xA xB xC Gen ypred    y
## 1     0   1   NA 439 43 Xylene  426   39   26    8  0  2  1   1    NA  515
## 2     3   2    3 413 35 Xylene  426   39   26    8 -2  0  1   1    NA  479
## 5     4   5    1 439 27 Xylene  452   31   26    8  0 -2  1   1  1072 1026
## 8     6   7    3 465 35 Xylene  452   31   26    8  2  0  1   1  1108  835
## 10    8   8    2 465 27 Xylene  452   31   26    8  2 -2  1  -1  1147  967
## 11    7   9    1 439 35 Xylene  452   31   26    8  0  0  1  -1   714  771
##         Type    Notes
## 1  Factorial Baseline
## 2  Factorial         
## 5  Factorial         
## 8  Factorial         
## 10 Factorial         
## 11 Factorial
mod6 <- lm(y ~ xA * xB, data=datamod6pred)
mod6
## 
## Call:
## lm(formula = y ~ xA * xB, data = datamod6pred)
## 
## Coefficients:
## (Intercept)           xA           xB        xA:xB  
##       725.2         89.0       -127.8         47.9
contourPlot(mod6, datamod6pred$xA, datamod6pred$xB)

plot of chunk unnamed-chunk-12

rsmmod6 <- rsm(y ~ SO(xA, xB), data=datamod6pred)
summary(rsmmod6)
## Warning: ANOVA F-tests on an essentially perfect fit are unreliable
## 
## Call:
## rsm(formula = y ~ SO(xA, xB), data = datamod6pred)
## 
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  771.000         NA      NA       NA
## xA            89.000         NA      NA       NA
## xB          -127.750         NA      NA       NA
## xA:xB         30.750         NA      NA       NA
## xA^2         -28.500         NA      NA       NA
## xB^2          -0.125         NA      NA       NA
## 
## Multiple R-squared:     1,   Adjusted R-squared:   NaN 
## F-statistic:  NaN on 5 and 0 DF,  p-value: NA
## 
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB)   2 225947  112974               
## TWI(xA, xB)  1  16704   16704               
## PQ(xA, xB)   2  15504    7752               
## Residuals    0      0                       
## Lack of fit  0      0                       
## Pure error   0      0                       
## 
## Stationary point of response surface:
##    xA    xB 
## 4.194 4.880 
## 
## Eigenanalysis:
## $values
## [1]   6.608 -35.233
## 
## $vectors
##       [,1]    [,2]
## xA -0.4012 -0.9160
## xB -0.9160  0.4012
steepest(rsmmod6)
## Path of steepest ascent from ridge analysis:
##    dist     xA     xB |   yhat
## 1   0.0  0.000  0.000 |  771.0
## 2   0.5  0.229 -0.444 |  843.5
## 3   1.0  0.331 -0.944 |  908.2
## 4   1.5  0.320 -1.465 |  969.0
## 5   2.0  0.238 -1.986 | 1029.3
## 6   2.5  0.115 -2.497 | 1090.2
## 7   3.0 -0.033 -3.000 | 1153.2
## 8   3.5 -0.194 -3.495 | 1218.5
## 9   4.0 -0.365 -3.983 | 1286.3
## 10  4.5 -0.543 -4.467 | 1357.0
## 11  5.0 -0.725 -4.947 | 1430.7
contour(rsmmod6, ~ xA*xB)

plot of chunk unnamed-chunk-12

Predictions with this model.

mod6pred <- data.frame(xA=c(0, -2, -2, -2, 0), xB=c(-3, -3, -2, -3.8, -3.8))
mod6pred
##   xA   xB
## 1  0 -3.0
## 2 -2 -3.0
## 3 -2 -2.0
## 4 -2 -3.8
## 5  0 -3.8
predict(mod6, newdata=mod6pred)
##      1      2      3      4      5 
## 1108.5 1218.0  994.4 1396.9 1210.7
predict(rsmmod6, newdata=mod6pred)
##    1    2    3    4    5 
## 1153 1046  857 1196 1255
contourPlot(mod6, c(datamod6pred$xA, mod6pred$xA), 
            c(datamod6pred$xB, mod6pred$xB))

plot of chunk unnamed-chunk-13

contour(rsmmod6, ~ xA*xB)
points(c(datamod6pred$xA, mod6pred$xA), c(datamod6pred$xB, mod6pred$xB), pch=19)
text(datamod6pred$xA,datamod6pred$xB,labels=datamod6pred$y)

plot of chunk unnamed-chunk-13

The linear and quadratic models give rather different results. I think running at A=413 and B=23 will give us the most information for a valid model and can be expanded into a factorial if desired.

After thinking about this some more I will try stepping to the lower left to minimum B and A corresponding to the earlier left side runs (A = 413, B = 20). This will be the beginning of a factorial with experiment 4 if that seems appropriate.

An important aspect of this experiment is trying to determine if the linear model is wrong here (i.e. we are in a region of curvature).

Experiment 9 Results

Given the result of experiment 9 (1015) it seems likely we are in a region of curvature and have probably stepped over a maximum between experiments 4 and 9. Complete the factorial around exp 4 and 9, but first add experiment 9 to the model and take a look at predictions.

I’m not sure whether it is better to use all the Xylene data or focus on the more local runs. For now, use it all.

datamod7 <- data[c(1,2,5,8,10,11,13),]
# Calculate coded values from actual values in spreadsheet
datamod7pred <- datamod7
datamod7pred$xA <- (datamod7pred$A - 439) / (0.5 * 26)
datamod7pred$xB <- (datamod7pred$B - 35) / (0.5 * 8)
datamod7pred
##    Run. Exp Rand   A  B      C Acen Bcen Arng Brng xA    xB xC   Gen ypred
## 1     0   1   NA 439 43 Xylene  426   39   26    8  0  2.00  1  1.00    NA
## 2     3   2    3 413 35 Xylene  426   39   26    8 -2  0.00  1  1.00    NA
## 5     4   5    1 439 27 Xylene  452   31   26    8  0 -2.00  1  1.00  1072
## 8     6   7    3 465 35 Xylene  452   31   26    8  2  0.00  1  1.00  1108
## 10    8   8    2 465 27 Xylene  452   31   26    8  2 -2.00  1 -1.00  1147
## 11    7   9    1 439 35 Xylene  452   31   26    8  0  0.00  1 -1.00   714
## 13    9  10   NA 413 20 Xylene  452   31   26    8 -2 -3.75  1  8.25  1400
##       y      Type    Notes
## 1   515 Factorial Baseline
## 2   479 Factorial         
## 5  1026 Factorial         
## 8   835 Factorial         
## 10  967 Factorial         
## 11  771 Factorial         
## 13 1015    Ascent
mod7 <- lm(y ~ xA * xB, data=datamod7pred)
mod7
## 
## Call:
## lm(formula = y ~ xA * xB, data = datamod7pred)
## 
## Coefficients:
## (Intercept)           xA           xB        xA:xB  
##       709.5         83.0       -102.2         15.4
contourPlot(mod7, datamod7pred$xA, datamod7pred$xB)

plot of chunk unnamed-chunk-14

rsmmod7 <- rsm(y ~ SO(xA, xB), data=datamod7pred)
summary(rsmmod7)
## 
## Call:
## rsm(formula = y ~ SO(xA, xB), data = datamod7pred)
## 
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   786.54      15.94   49.34    0.013 *
## xA             88.37       7.85   11.26    0.056 .
## xB           -125.05       7.39  -16.92    0.038 *
## xA:xB          20.87       5.13    4.07    0.153  
## xA^2          -33.42       4.72   -7.09    0.089 .
## xB^2           -5.95       3.54   -1.68    0.341  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.998,  Adjusted R-squared:  0.99 
## F-statistic:  126 on 5 and 1 DF,  p-value: 0.0677
## 
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB)   2 278705  139353   281.2  0.042
## TWI(xA, xB)  1   7186    7186    14.5  0.163
## PQ(xA, xB)   2  25126   12563    25.4  0.139
## Residuals    1    496     496               
## Lack of fit  1    496     496               
## Pure error   0      0                       
## 
## Stationary point of response surface:
##      xA      xB 
##  -4.327 -18.093 
## 
## Eigenanalysis:
## $values
## [1]  -2.437 -36.935
## 
## $vectors
##       [,1]    [,2]
## xA -0.3192 -0.9477
## xB -0.9477  0.3192
steepest(rsmmod7)
## Path of steepest ascent from ridge analysis:
##    dist     xA     xB |   yhat
## 1   0.0  0.000  0.000 |  786.5
## 2   0.5  0.241 -0.438 |  857.3
## 3   1.0  0.380 -0.925 |  918.5
## 4   1.5  0.430 -1.437 |  972.9
## 5   2.0  0.415 -1.956 | 1022.3
## 6   2.5  0.359 -2.474 | 1068.4
## 7   3.0  0.274 -2.987 | 1111.6
## 8   3.5  0.172 -3.496 | 1152.6
## 9   4.0  0.057 -4.000 | 1191.7
## 10  4.5 -0.067 -4.500 | 1229.0
## 11  5.0 -0.197 -4.996 | 1264.6
contour(rsmmod7, ~ xA*xB)
points(datamod7pred$xA, datamod7pred$xB, pch=19)
text(datamod7pred$xA,datamod7pred$xB,labels=datamod7pred$y)
title("RSM Model 7")

plot of chunk unnamed-chunk-14

Make predictions for the runs I am considering.

mod7pred <- data.frame(xA=c(0, -2, 2), xB=c(-3.8, -2, -3.8))
mod7pred
##   xA   xB
## 1  0 -3.8
## 2 -2 -2.0
## 3  2 -3.8
predict(mod7, newdata=mod7pred)
##      1      2      3 
## 1097.7  809.5 1146.5
predict(rsmmod7, newdata=mod7pred)
##      1      2      3 
## 1175.8  785.9 1060.2
contourPlot(mod7, c(datamod7pred$xA, mod7pred$xA), 
            c(datamod7pred$xB, mod7pred$xB))

plot of chunk unnamed-chunk-15

contour(rsmmod7, ~ xA*xB)
points(c(datamod7pred$xA, mod7pred$xA), c(datamod7pred$xB, mod7pred$xB), pch=19)
text(datamod7pred$xA,datamod7pred$xB,labels=datamod7pred$y)
title("RSM Model 7 Predictions")

plot of chunk unnamed-chunk-15

From these predictions the center bottom (A=439, B=20) point looks like a good choice and allows the completion of factorials on the lower left or right as desired. At this point I am trying to decide if right or left looks more promising then will complete the factorial and add a center point if it looks promising.

Experiment 10 Results

The result of experiment 10 (933) gives us a better sense of the curvature.

Given how far we are (high leverage) from the baseline I think removing it from the model and recentering on experiment 4 is appropriate (I tried as in mod7 above first though, this version looks more realistic in region near optimum).

# datamod8 <- data[c(1,2,5,8,10,11,13,15),]
datamod8 <- data[c(2,5,8,10,11,13,15),] # Without baseline
# Calculate coded values from actual values in spreadsheet
datamod8pred <- datamod8
datamod8pred$xA <- (datamod8pred$A - 439) / (0.5 * 26)
datamod8pred$xB <- (datamod8pred$B - 35) / (0.5 * 8)
datamod8pred
##    Run. Exp Rand   A  B      C Acen Bcen Arng Brng xA    xB xC   Gen ypred
## 2     3   2    3 413 35 Xylene  426   39   26    8 -2  0.00  1  1.00    NA
## 5     4   5    1 439 27 Xylene  452   31   26    8  0 -2.00  1  1.00  1072
## 8     6   7    3 465 35 Xylene  452   31   26    8  2  0.00  1  1.00  1108
## 10    8   8    2 465 27 Xylene  452   31   26    8  2 -2.00  1 -1.00  1147
## 11    7   9    1 439 35 Xylene  452   31   26    8  0  0.00  1 -1.00   714
## 13    9  10   NA 413 20 Xylene  452   31   26    8 -2 -3.75  1  8.25  1400
## 15   10  11   NA 439 20 Xylene  452   31   26    8  0 -3.75  1  2.75  1098
##       y      Type Notes
## 2   479 Factorial      
## 5  1026 Factorial      
## 8   835 Factorial      
## 10  967 Factorial      
## 11  771 Factorial      
## 13 1015    Ascent      
## 15  933 Factorial
mod8 <- lm(y ~ xA * xB, data=datamod8pred)
mod8
## 
## Call:
## lm(formula = y ~ xA * xB, data = datamod8pred)
## 
## Coefficients:
## (Intercept)           xA           xB        xA:xB  
##       718.7         89.5        -79.4         23.4
contourPlot(mod8, datamod8pred$xA, datamod8pred$xB)

plot of chunk unnamed-chunk-16

rsmmod8 <- rsm(y ~ SO(xA, xB), data=datamod8pred)
summary(rsmmod8)
## 
## Call:
## rsm(formula = y ~ SO(xA, xB), data = datamod8pred)
## 
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   752.10      24.19   31.09    0.020 *
## xA             90.44      10.76    8.41    0.075 .
## xB           -249.53      32.61   -7.65    0.083 .
## xA:xB          44.72       6.36    7.03    0.090 .
## xA^2          -21.41       6.18   -3.47    0.179  
## xB^2          -53.21       9.50   -5.60    0.112  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.996,  Adjusted R-squared:  0.975 
## F-statistic: 46.9 on 5 and 1 DF,  p-value: 0.11
## 
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB)   2 165115   82557    87.6  0.075
## TWI(xA, xB)  1  17558   17558    18.6  0.145
## PQ(xA, xB)   2  38446   19223    20.4  0.155
## Residuals    1    942     942               
## Lack of fit  1    942     942               
## Pure error   0      0                       
## 
## Stationary point of response surface:
##      xA      xB 
## -0.5997 -2.5967 
## 
## Eigenanalysis:
## $values
## [1]  -9.877 -64.747
## 
## $vectors
##       [,1]    [,2]
## xA -0.8887 -0.4585
## xB -0.4585  0.8887
steepest(rsmmod8)
## Path of steepest ascent from ridge analysis:
##    dist     xA     xB |   yhat
## 1   0.0  0.000  0.000 |  752.1
## 2   0.5  0.155 -0.475 |  868.8
## 3   1.0  0.258 -0.966 |  954.3
## 4   1.5  0.249 -1.479 | 1009.5
## 5   2.0  0.017 -2.000 | 1038.3
## 6   2.5 -0.432 -2.462 | 1048.4
## 7   3.0 -0.947 -2.847 | 1046.9
## 8   3.5 -1.462 -3.180 | 1037.4
## 9   4.0 -1.965 -3.484 | 1021.3
## 10  4.5 -2.458 -3.769 |  999.3
## 11  5.0 -2.943 -4.042 |  971.7
contour(rsmmod8, ~ xA*xB)
points(datamod8pred$xA, datamod8pred$xB, pch=19)
text(datamod8pred$xA,datamod8pred$xB,labels=datamod8pred$y)
title("RSM Model 8")

plot of chunk unnamed-chunk-16

Note that this RSM model gives the first stationary point that looks even close to valid: -0.5997 -2.5967

Make predictions for the runs I am considering.

mod8pred <- data.frame(xA=c(-2, -1, 1, 2, -0.6), xB=c(-2, -2.9, -2.9, -3.8, -2.6))
mod8pred
##     xA   xB
## 1 -2.0 -2.0
## 2 -1.0 -2.9
## 3  1.0 -2.9
## 4  2.0 -3.8
## 5 -0.6 -2.6
predict(mod8, newdata=mod8pred)
##      1      2      3      4      5 
##  792.1  927.3  970.7 1021.7  908.0
predict(rsmmod8, newdata=mod8pred)
##      1      2      3      4      5 
##  950.7 1046.1  967.6  687.3 1049.0
contourPlot(mod8, c(datamod8pred$xA, mod8pred$xA), 
            c(datamod8pred$xB, mod8pred$xB))

plot of chunk unnamed-chunk-17

contour(rsmmod8, ~ xA*xB)
points(c(datamod8pred$xA, mod8pred$xA), c(datamod8pred$xB, mod8pred$xB), pch=19)
text(datamod8pred$xA,datamod8pred$xB,labels=datamod8pred$y)
title("RSM Model 8 Prediction Points")

plot of chunk unnamed-chunk-17

From these predictions the center point of the (incomplete) lower left factorial (A= 426, B=23.5) looks like a good choice.

Experiment 11 Results

The result of experiment 11 (1035) should improve the model accuracy near the optimum.

# datamod9 <- data[c(1,2,5,8,10,11,13,15),]
datamod9 <- data[c(2,5,8,10,11,13,15,16),] # Without baseline
# Calculate coded values from actual values in spreadsheet
datamod9pred <- datamod9
datamod9pred$xA <- (datamod9pred$A - 439) / (0.5 * 26)
datamod9pred$xB <- (datamod9pred$B - 35) / (0.5 * 8)
datamod9pred
##    Run. Exp Rand   A    B      C Acen Bcen Arng Brng xA     xB xC   Gen
## 2     3   2    3 413 35.0 Xylene  426   39   26    8 -2  0.000  1  1.00
## 5     4   5    1 439 27.0 Xylene  452   31   26    8  0 -2.000  1  1.00
## 8     6   7    3 465 35.0 Xylene  452   31   26    8  2  0.000  1  1.00
## 10    8   8    2 465 27.0 Xylene  452   31   26    8  2 -2.000  1 -1.00
## 11    7   9    1 439 35.0 Xylene  452   31   26    8  0  0.000  1 -1.00
## 13    9  10   NA 413 20.0 Xylene  452   31   26    8 -2 -3.750  1  8.25
## 15   10  11   NA 439 20.0 Xylene  452   31   26    8  0 -3.750  1  2.75
## 16   11  12   NA 426 23.5 Xylene  452   31   26    8 -1 -2.875  1  3.75
##    ypred    y      Type Notes
## 2     NA  479 Factorial      
## 5   1072 1026 Factorial      
## 8   1108  835 Factorial      
## 10  1147  967 Factorial      
## 11   714  771 Factorial      
## 13  1400 1015    Ascent      
## 15  1098  933 Factorial      
## 16  1046 1035    Center
mod9 <- lm(y ~ xA * xB, data=datamod9pred)
mod9
## 
## Call:
## lm(formula = y ~ xA * xB, data = datamod9pred)
## 
## Coefficients:
## (Intercept)           xA           xB        xA:xB  
##       722.8         86.5        -84.2         24.4
contourPlot(mod9, datamod9pred$xA, datamod9pred$xB)

plot of chunk unnamed-chunk-18

rsmmod9 <- rsm(y ~ SO(xA, xB), data=datamod9pred)
summary(rsmmod9)
## 
## Call:
## rsm(formula = y ~ SO(xA, xB), data = datamod9pred)
## 
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   751.32      17.69   42.47  0.00055 ***
## xA             90.68       7.89   11.49  0.00749 ** 
## xB           -245.19      21.28  -11.52  0.00745 ** 
## xA:xB          44.18       4.47    9.87  0.01010 *  
## xA^2          -21.03       4.44   -4.74  0.04176 *  
## xB^2          -51.98       6.24   -8.32  0.01412 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.996,  Adjusted R-squared:  0.986 
## F-statistic:   97 on 5 and 2 DF,  p-value: 0.0102
## 
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB)   2 180529   90265   176.9 0.0056
## TWI(xA, xB)  1  19119   19119    37.5 0.0257
## PQ(xA, xB)   2  47927   23964    47.0 0.0209
## Residuals    2   1021     510               
## Lack of fit  2   1021     510               
## Pure error   0      0                       
## 
## Stationary point of response surface:
##      xA      xB 
## -0.5796 -2.6047 
## 
## Eigenanalysis:
## $values
## [1]  -9.537 -63.477
## 
## $vectors
##       [,1]    [,2]
## xA -0.8871 -0.4616
## xB -0.4616  0.8871
residuals(rsmmod9)
##       2       5       8      10      11      13      15      16 
##  -6.829  -7.766 -13.555  12.697  19.683   9.401  -6.772  -6.860
steepest(rsmmod9)
## Path of steepest ascent from ridge analysis:
##    dist     xA     xB |   yhat
## 1   0.0  0.000  0.000 |  751.3
## 2   0.5  0.158 -0.474 |  866.4
## 3   1.0  0.267 -0.964 |  950.7
## 4   1.5  0.264 -1.476 | 1005.2
## 5   2.0  0.039 -2.000 | 1033.8
## 6   2.5 -0.408 -2.467 | 1043.8
## 7   3.0 -0.924 -2.854 | 1042.4
## 8   3.5 -1.440 -3.190 | 1033.2
## 9   4.0 -1.944 -3.496 | 1017.6
## 10  4.5 -2.437 -3.783 |  996.3
## 11  5.0 -2.921 -4.058 |  969.6
contour(rsmmod9, ~ xA*xB)
points(datamod9pred$xA, datamod9pred$xB, pch=19)
text(datamod9pred$xA,datamod9pred$xB,labels=datamod9pred$y)
title("RSM Model 9")

plot of chunk unnamed-chunk-18

Make predictions for the runs I am considering.

mod9pred <- data.frame(xA=c(-2, 1, 2, -0.6), xB=c(-2, -2.9, -3.8, -2.6))
mod9pred
##     xA   xB
## 1 -2.0 -2.0
## 2  1.0 -2.9
## 3  2.0 -3.8
## 4 -0.6 -2.6
predict(mod9, newdata=mod9pred)
##      1      2      3      4 
##  815.5  982.8 1030.6  927.7
predict(rsmmod9, newdata=mod9pred)
##      1      2      3      4 
##  945.0  966.7  693.9 1044.3
contourPlot(mod9, c(datamod9pred$xA, mod9pred$xA), 
            c(datamod9pred$xB, mod9pred$xB))

plot of chunk unnamed-chunk-19

contour(rsmmod9, ~ xA*xB)
points(c(datamod9pred$xA, mod9pred$xA), c(datamod9pred$xB, mod9pred$xB), pch=19)
text(datamod9pred$xA,datamod9pred$xB,labels=datamod9pred$y)
title("RSM Model 9 Prediction Points")

plot of chunk unnamed-chunk-19

I feel confident that run 11 is close to the optimum (but global or local and what about Hexane?). Considering runs at A=413, B=27 to complete the lower left factorial or A=439 + -0.6 * 0.5 * 26=431, B=35 + -2.8 * 0.5 * 8=24 to get closer to the optimum.

On reflection, I don’t think either run is really worth it, but in the interest of being systematic I think completing the lower left factorial with a center point is appropriate.

Experiment 12 Results

The result of experiment 12 (936) completes the Xylene factorial with center at the lower left and should improve the model accuracy near the optimum.

Is it worth creating a model for the LL factorial only?

# datamod10 <- data[c(1,2,5,8,10,11,13,15),]
datamod10 <- data[c(2,5,8,10,11,13,15,16,17),] # Without baseline
# Calculate coded values from actual values in spreadsheet
datamod10pred <- datamod10
datamod10pred$xA <- (datamod10pred$A - 439) / (0.5 * 26)
datamod10pred$xB <- (datamod10pred$B - 35) / (0.5 * 8)
datamod10pred
##    Run. Exp Rand   A    B      C Acen Bcen Arng Brng xA     xB xC   Gen
## 2     3   2    3 413 35.0 Xylene  426   39   26    8 -2  0.000  1  1.00
## 5     4   5    1 439 27.0 Xylene  452   31   26    8  0 -2.000  1  1.00
## 8     6   7    3 465 35.0 Xylene  452   31   26    8  2  0.000  1  1.00
## 10    8   8    2 465 27.0 Xylene  452   31   26    8  2 -2.000  1 -1.00
## 11    7   9    1 439 35.0 Xylene  452   31   26    8  0  0.000  1 -1.00
## 13    9  10   NA 413 20.0 Xylene  452   31   26    8 -2 -3.750  1  8.25
## 15   10  11   NA 439 20.0 Xylene  452   31   26    8  0 -3.750  1  2.75
## 16   11  12   NA 426 23.5 Xylene  452   31   26    8 -1 -2.875  1  3.75
## 17   12  13   NA 413 27.0 Xylene  452   31   26    8 -2 -2.000  1  3.00
##    ypred    y      Type Notes
## 2     NA  479 Factorial      
## 5   1072 1026 Factorial      
## 8   1108  835 Factorial      
## 10  1147  967 Factorial      
## 11   714  771 Factorial      
## 13  1400 1015    Ascent      
## 15  1098  933 Factorial      
## 16  1046 1035    Center      
## 17   945  936 Factorial
mod10 <- lm(y ~ xA * xB, data=datamod10pred)
mod10
## 
## Call:
## lm(formula = y ~ xA * xB, data = datamod10pred)
## 
## Coefficients:
## (Intercept)           xA           xB        xA:xB  
##       736.2         78.3        -81.6         26.0
contourPlot(mod10, datamod10pred$xA, datamod10pred$xB)

plot of chunk unnamed-chunk-20

rsmmod10 <- rsm(y ~ SO(xA, xB), data=datamod10pred)
summary(rsmmod10)
## 
## Call:
## rsm(formula = y ~ SO(xA, xB), data = datamod10pred)
## 
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   752.02      14.52   51.78  1.6e-05 ***
## xA             91.18       6.37   14.32  0.00074 ***
## xB           -242.01      14.55  -16.64  0.00047 ***
## xA:xB          43.89       3.61   12.17  0.00119 ** 
## xA^2          -21.32       3.57   -5.97  0.00940 ** 
## xB^2          -51.01       4.17  -12.23  0.00118 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.996,  Adjusted R-squared:  0.989 
## F-statistic:  142 on 5 and 3 DF,  p-value: 0.000922
## 
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value  Pr(>F)
## FO(xA, xB)   2 169455   84727   240.7 0.00049
## TWI(xA, xB)  1  21974   21974    62.4 0.00423
## PQ(xA, xB)   2  58643   29322    83.3 0.00235
## Residuals    3   1056     352                
## Lack of fit  3   1056     352                
## Pure error   0      0                        
## 
## Stationary point of response surface:
##     xA     xB 
## -0.545 -2.607 
## 
## Eigenanalysis:
## $values
## [1]  -9.67 -62.66
## 
## $vectors
##       [,1]    [,2]
## xA -0.8832 -0.4689
## xB -0.4689  0.8832
residuals(rsmmod10)
##       2       5       8      10      11      13      15      16      17 
##  -5.388  -6.017 -14.089  13.477  18.978  11.167  -9.291  -4.893  -3.944
steepest(rsmmod10)
## Path of steepest ascent from ridge analysis:
##    dist     xA     xB |   yhat
## 1   0.0  0.000  0.000 |  752.0
## 2   0.5  0.161 -0.473 |  865.9
## 3   1.0  0.272 -0.962 |  949.4
## 4   1.5  0.274 -1.475 | 1003.7
## 5   2.0  0.057 -1.999 | 1032.1
## 6   2.5 -0.381 -2.471 | 1042.1
## 7   3.0 -0.891 -2.865 | 1040.6
## 8   3.5 -1.403 -3.206 | 1031.2
## 9   4.0 -1.904 -3.518 | 1015.2
## 10  4.5 -2.395 -3.810 |  993.5
## 11  5.0 -2.877 -4.089 |  966.3
contour(rsmmod10, ~ xA*xB)
points(datamod10pred$xA, datamod10pred$xB, pch=19)
text(datamod10pred$xA,datamod10pred$xB,labels=datamod10pred$y)
title("RSM Model 10")

plot of chunk unnamed-chunk-20

Make predictions for the runs I am considering.

mod10pred <- data.frame(xA=c(1, 2, -0.6), xB=c(-2.9, -3.8, -2.6))
mod10pred
##     xA   xB
## 1  1.0 -2.9
## 2  2.0 -3.8
## 3 -0.6 -2.6
predict(mod10, newdata=mod10pred)
##      1      2      3 
##  975.9 1005.5  942.0
predict(rsmmod10, newdata=mod10pred)
##      1      2      3 
##  967.5  698.6 1042.5
contourPlot(mod10, c(datamod10pred$xA, mod10pred$xA), 
            c(datamod10pred$xB, mod10pred$xB))

plot of chunk unnamed-chunk-21

contour(rsmmod10, ~ xA*xB)
points(c(datamod10pred$xA, mod10pred$xA), c(datamod10pred$xB, mod10pred$xB), pch=19)
text(datamod10pred$xA,datamod10pred$xB,labels=datamod10pred$y)
title("RSM Model 10 Prediction Points")

plot of chunk unnamed-chunk-21

The predicted optimum is at -0.545 -2.607
which corresponds to:
A=439 + -0.545 * 0.5 * 26=431.9, B=35 + -2.607 * 0.5 * 8=24.57

Rounding to the nearest integer this gives us a predicted optimum of
A = 432, B = 25

I think this is the best model to use so look at the results in more detail.

contour(rsmmod10, ~ xA*xB, image=TRUE)

plot of chunk unnamed-chunk-22

image(rsmmod10, ~ xA*xB)

plot of chunk unnamed-chunk-22

persp(rsmmod10, ~ xA*xB, zlab="Profit per kg ($)")

plot of chunk unnamed-chunk-22

Xylene LL Factorial

Now the LL factorial by itself.

# datamod11 <- data[c(1,2,5,8,10,11,13,15),]
datamod11 <- data[c(5,13,15,16,17),]
# Calculate coded values from actual values in spreadsheet
datamod11pred <- datamod11
datamod11pred$xA <- (datamod11pred$A - 426) / (0.5 * 26)
datamod11pred$xB <- (datamod11pred$B - 23.5) / (0.5 * 7)
datamod11pred
##    Run. Exp Rand   A    B      C Acen Bcen Arng Brng xA xB xC  Gen ypred
## 5     4   5    1 439 27.0 Xylene  452   31   26    8  1  1  1 1.00  1072
## 13    9  10   NA 413 20.0 Xylene  452   31   26    8 -1 -1  1 8.25  1400
## 15   10  11   NA 439 20.0 Xylene  452   31   26    8  1 -1  1 2.75  1098
## 16   11  12   NA 426 23.5 Xylene  452   31   26    8  0  0  1 3.75  1046
## 17   12  13   NA 413 27.0 Xylene  452   31   26    8 -1  1  1 3.00   945
##       y      Type Notes
## 5  1026 Factorial      
## 13 1015    Ascent      
## 15  933 Factorial      
## 16 1035    Center      
## 17  936 Factorial
mod11 <- lm(y ~ xA * xB, data=datamod11pred)
mod11
## 
## Call:
## lm(formula = y ~ xA * xB, data = datamod11pred)
## 
## Coefficients:
## (Intercept)           xA           xB        xA:xB  
##       989.0          2.0          3.5         43.0
contourPlot(mod11, datamod11pred$xA, datamod11pred$xB)

plot of chunk unnamed-chunk-23

rsmmod11 <- rsm(y ~ FO(xA, xB), data=datamod11pred)
summary(rsmmod11)
## 
## Call:
## rsm(formula = y ~ FO(xA, xB), data = datamod11pred)
## 
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    989.0       31.7   31.21    0.001 **
## xA               2.0       35.4    0.06    0.960   
## xB               3.5       35.4    0.10    0.930   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.00643,    Adjusted R-squared:  -0.987 
## F-statistic: 0.00647 on 2 and 2 DF,  p-value: 0.994
## 
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB)   2     65      32    0.01   0.99
## Residuals    2  10041    5020               
## Lack of fit  2  10041    5020               
## Pure error   0      0                       
## 
## Direction of steepest ascent (at radius 1):
##     xA     xB 
## 0.4961 0.8682 
## 
## Corresponding increment in original units:
##     xA     xB 
## 0.4961 0.8682
residuals(rsmmod11)
##     5    13    15    16    17 
##  31.5  31.5 -54.5  46.0 -54.5
steepest(rsmmod11)
## Path of steepest ascent from ridge analysis:
##    dist    xA    xB |   yhat
## 1   0.0 0.000 0.000 |  989.0
## 2   0.5 0.248 0.434 |  991.0
## 3   1.0 0.496 0.868 |  993.0
## 4   1.5 0.744 1.302 |  995.0
## 5   2.0 0.992 1.736 |  997.1
## 6   2.5 1.240 2.171 |  999.1
## 7   3.0 1.488 2.605 | 1001.1
## 8   3.5 1.736 3.039 | 1003.1
## 9   4.0 1.985 3.473 | 1005.1
## 10  4.5 2.233 3.907 | 1007.1
## 11  5.0 2.481 4.341 | 1009.2
contour(rsmmod11, ~ xA*xB)
points(datamod11pred$xA, datamod11pred$xB, pch=19)
text(datamod11pred$xA,datamod11pred$xB,labels=datamod11pred$y)
title("RSM Model 11")

plot of chunk unnamed-chunk-23

I think model 10 is better (the first order model can not capture the curvature here). Try double weighting the full factorial points in model 10 because they are closest?

This raises one question I have. Given the error (+/- 25) it seems like the points with more divergent y values (more distant, but less subject to error relatively) might be more useful for the quadratic model. Is this a valid/useful observation? (I expect this is only true if the surface is close to quadratic in the given region though.)

Hexane Full Factorial

In the interests of completeness I think using two runs to complete the Hexane full factorial with a center point is worthwhile. This should help with understanding the process and should also decrease the chances I have missed an even better Hexane optimum.

Made predictions first using modall below.

datamod12 <- data[c(3,4,6,18,19),]
# Calculate coded values from actual values in spreadsheet
datamod12pred <- datamod12
datamod12pred$xA <- (datamod12pred$A - 439) / (0.5 * 26)
datamod12pred$xB <- (datamod12pred$B - 35) / (0.5 * 8)
datamod12pred
##    Run. Exp Rand   A  B      C Acen Bcen Arng Brng xA xB xC Gen ypred   y
## 3     1   3    1 439 35 Hexane  426   39   26    8  0  0 -1  -1    NA 618
## 4     2   4    2 413 43 Hexane  426   39   26    8 -2  2 -1  -1    NA  25
## 6     5   6    2 465 27 Hexane  452   31   26    8  2 -2 -1  -1  1211 751
## 18   13  14   NA 413 27 Hexane  452   31   26    8 -2 -2 -1   3   777 628
## 19   14  15   NA 465 43 Hexane  452   31   26    8  2  2 -1   3   502 552
##         Type Notes
## 3  Factorial      
## 4  Factorial      
## 6  Factorial      
## 18 Factorial      
## 19 Factorial
mod12 <- lm(y ~ xA * xB, data=datamod12pred)
mod12
## 
## Call:
## lm(formula = y ~ xA * xB, data = datamod12pred)
## 
## Coefficients:
## (Intercept)           xA           xB        xA:xB  
##       514.8         81.3       -100.2         25.3
contourPlot(mod12, datamod12pred$xA, datamod12pred$xB)

plot of chunk unnamed-chunk-24

rsmmod12 <- rsm(y ~ FO(xA, xB), data=datamod12pred)
summary(rsmmod12)
## 
## Call:
## rsm(formula = y ~ FO(xA, xB), data = datamod12pred)
## 
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    514.8       73.6    7.00     0.02 *
## xA              81.3       41.1    1.98     0.19  
## xB            -100.2       41.1   -2.44     0.14  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.831,  Adjusted R-squared:  0.662 
## F-statistic: 4.92 on 2 and 2 DF,  p-value: 0.169
## 
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB)   2 266426  133213    4.92   0.17
## Residuals    2  54117   27058               
## Lack of fit  2  54117   27058               
## Pure error   0      0                       
## 
## Direction of steepest ascent (at radius 1):
##      xA      xB 
##  0.6296 -0.7769 
## 
## Corresponding increment in original units:
##      xA      xB 
##  0.6296 -0.7769
steepest(rsmmod12)
## Path of steepest ascent from ridge analysis:
##    dist    xA     xB |   yhat
## 1   0.0 0.000  0.000 |  514.8
## 2   0.5 0.315 -0.388 |  579.3
## 3   1.0 0.630 -0.777 |  643.9
## 4   1.5 0.944 -1.165 |  708.3
## 5   2.0 1.259 -1.554 |  772.9
## 6   2.5 1.574 -1.942 |  837.4
## 7   3.0 1.889 -2.331 |  902.0
## 8   3.5 2.204 -2.719 |  966.5
## 9   4.0 2.519 -3.108 | 1031.0
## 10  4.5 2.833 -3.496 | 1095.5
## 11  5.0 3.148 -3.884 | 1159.9
residuals(rsmmod12)
##      3      4      6     18     19 
##  103.2 -126.8 -126.8   75.2   75.2
contour(rsmmod12, ~ xA*xB)
points(datamod12pred$xA, datamod12pred$xB, pch=19)
text(datamod12pred$xA,datamod12pred$xB,labels=datamod12pred$y)
title("RSM Model 12")

plot of chunk unnamed-chunk-24

These results seem to confirm the choice of the solvent Xylene for the more detailed analysis.

The model has a problem with lack of fit indicating there is curvature here. It looks like the problem is the upper left point (y = 25!) so let’s also look at a model ignoring that point.

datamod12a <- data[c(3,6,18,19),]
# Calculate coded values from actual values in spreadsheet
datamod12apred <- datamod12a
datamod12apred$xA <- (datamod12apred$A - 439) / (0.5 * 26)
datamod12apred$xB <- (datamod12apred$B - 35) / (0.5 * 8)
datamod12apred
##    Run. Exp Rand   A  B      C Acen Bcen Arng Brng xA xB xC Gen ypred   y
## 3     1   3    1 439 35 Hexane  426   39   26    8  0  0 -1  -1    NA 618
## 6     5   6    2 465 27 Hexane  452   31   26    8  2 -2 -1  -1  1211 751
## 18   13  14   NA 413 27 Hexane  452   31   26    8 -2 -2 -1   3   777 628
## 19   14  15   NA 465 43 Hexane  452   31   26    8  2  2 -1   3   502 552
##         Type Notes
## 3  Factorial      
## 6  Factorial      
## 18 Factorial      
## 19 Factorial
mod12a <- lm(y ~ xA * xB, data=datamod12apred)
mod12a
## 
## Call:
## lm(formula = y ~ xA * xB, data = datamod12apred)
## 
## Coefficients:
## (Intercept)           xA           xB        xA:xB  
##       618.0         16.7        -35.7         -7.0
contourPlot(mod12a, datamod12apred$xA, datamod12apred$xB)

plot of chunk unnamed-chunk-25

rsmmod12a <- rsm(y ~ FO(xA, xB), data=datamod12apred)
summary(rsmmod12a)
## 
## Call:
## rsm(formula = y ~ FO(xA, xB), data = datamod12apred)
## 
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   599.33      13.20   45.41    0.014 *
## xA             28.42       7.74    3.67    0.169  
## xB            -47.42       7.74   -6.13    0.103  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.975,  Adjusted R-squared:  0.924 
## F-statistic: 19.3 on 2 and 1 DF,  p-value: 0.159
## 
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB)   2  20140   10070    19.3   0.16
## Residuals    1    523     523               
## Lack of fit  1    523     523               
## Pure error   0      0                       
## 
## Direction of steepest ascent (at radius 1):
##      xA      xB 
##  0.5141 -0.8578 
## 
## Corresponding increment in original units:
##      xA      xB 
##  0.5141 -0.8578
steepest(rsmmod12a)
## Path of steepest ascent from ridge analysis:
##    dist    xA     xB |  yhat
## 1   0.0 0.000  0.000 | 599.3
## 2   0.5 0.257 -0.429 | 627.0
## 3   1.0 0.514 -0.858 | 654.6
## 4   1.5 0.771 -1.287 | 682.3
## 5   2.0 1.028 -1.716 | 709.9
## 6   2.5 1.285 -2.144 | 737.5
## 7   3.0 1.542 -2.573 | 765.2
## 8   3.5 1.799 -3.002 | 792.8
## 9   4.0 2.056 -3.431 | 820.4
## 10  4.5 2.313 -3.860 | 848.1
## 11  5.0 2.570 -4.289 | 875.7
residuals(rsmmod12a)
##          3          6         18         19 
##  1.867e+01  8.882e-16 -9.333e+00 -9.333e+00
contour(rsmmod12a, ~ xA*xB)
points(datamod12apred$xA, datamod12apred$xB, pch=19)
text(datamod12apred$xA,datamod12apred$xB,labels=datamod12apred$y)
title("RSM Model 12a")

plot of chunk unnamed-chunk-25

This model is even less optimistic about the value of a Hexane optimum.

RSM Model of All Data

Try combining all the runs in a single model.

dataall <- data[c(1:6,8,10,11,13,15:19),]
dataall
##    Run. Exp Rand   A    B      C Acen Bcen Arng Brng xA     xB xC   Gen
## 1     0   1   NA 439 43.0 Xylene  426   39   26    8  1  1.000  1  1.00
## 2     3   2    3 413 35.0 Xylene  426   39   26    8 -1 -1.000  1  1.00
## 3     1   3    1 439 35.0 Hexane  426   39   26    8  1 -1.000 -1 -1.00
## 4     2   4    2 413 43.0 Hexane  426   39   26    8 -1  1.000 -1 -1.00
## 5     4   5    1 439 27.0 Xylene  452   31   26    8 -1 -1.000  1  1.00
## 6     5   6    2 465 27.0 Hexane  452   31   26    8  1 -1.000 -1 -1.00
## 8     6   7    3 465 35.0 Xylene  452   31   26    8  1  1.000  1  1.00
## 10    8   8    2 465 27.0 Xylene  452   31   26    8  1 -1.000  1 -1.00
## 11    7   9    1 439 35.0 Xylene  452   31   26    8 -1  1.000  1 -1.00
## 13    9  10   NA 413 20.0 Xylene  452   31   26    8 -3 -2.750  1  8.25
## 15   10  11   NA 439 20.0 Xylene  452   31   26    8 -1 -2.750  1  2.75
## 16   11  12   NA 426 23.5 Xylene  452   31   26    8 -2 -1.875  1  3.75
## 17   12  13   NA 413 27.0 Xylene  452   31   26    8 -3 -1.000  1  3.00
## 18   13  14   NA 413 27.0 Hexane  452   31   26    8 -3 -1.000 -1  3.00
## 19   14  15   NA 465 43.0 Hexane  452   31   26    8  1  3.000 -1  3.00
##    ypred    y      Type    Notes
## 1     NA  515 Factorial Baseline
## 2     NA  479 Factorial         
## 3     NA  618 Factorial         
## 4     NA   25 Factorial         
## 5   1072 1026 Factorial         
## 6   1211  751 Factorial         
## 8   1108  835 Factorial         
## 10  1147  967 Factorial         
## 11   714  771 Factorial         
## 13  1400 1015    Ascent         
## 15  1098  933 Factorial         
## 16  1046 1035    Center         
## 17   945  936 Factorial         
## 18   777  628 Factorial         
## 19   502  552 Factorial
# Try using coded data
codeddata <- coded.data(dataall[,c("A", "B", "C", "y")],
                        xA ~ (A - 439) / 26,
                        xB ~ (B - 35) / 8)
codeddata
##      A    B      C    y
## 1  439 43.0 Xylene  515
## 2  413 35.0 Xylene  479
## 3  439 35.0 Hexane  618
## 4  413 43.0 Hexane   25
## 5  439 27.0 Xylene 1026
## 6  465 27.0 Hexane  751
## 8  465 35.0 Xylene  835
## 10 465 27.0 Xylene  967
## 11 439 35.0 Xylene  771
## 13 413 20.0 Xylene 1015
## 15 439 20.0 Xylene  933
## 16 426 23.5 Xylene 1035
## 17 413 27.0 Xylene  936
## 18 413 27.0 Hexane  628
## 19 465 43.0 Hexane  552
## 
## Data are stored in coded form using these coding formulas ...
## xA ~ (A - 439)/26
## xB ~ (B - 35)/8
str(codeddata)
## Classes 'coded.data' and 'data.frame':   15 obs. of  4 variables:
##  $ xA: num  0 -1 0 -1 0 1 1 1 0 -1 ...
##  $ xB: num  1 0 0 1 -1 ...
##  $ C : Factor w/ 2 levels "Hexane","Xylene": 2 2 1 1 2 1 2 2 2 2 ...
##  $ y : int  515 479 618 25 1026 751 835 967 771 1015 ...
##  - attr(*, "codings")=List of 2
##   ..$ xA:Class 'formula' length 3 xA ~ (A - 439)/26
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   ..$ xB:Class 'formula' length 3 xB ~ (B - 35)/8
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##  - attr(*, "rsdes")=List of 2
##   ..$ primary: chr  "xA" "xB"
##   ..$ call   : language coded.data(data = dataall[, c("A", "B", "C", "y")], xA ~ (A - 439)/26,      xB ~ (B - 35)/8)
codeddata$xA
##  [1]  0.0 -1.0  0.0 -1.0  0.0  1.0  1.0  1.0  0.0 -1.0  0.0 -0.5 -1.0 -1.0
## [15]  1.0
codeddata$xB
##  [1]  1.000  0.000  0.000  1.000 -1.000 -1.000  0.000 -1.000  0.000 -1.875
## [11] -1.875 -1.438 -1.000 -1.000  1.000
#rsmmodall <- rsm(y ~ SO(xA, xB, C), data=codeddata)
rsmmodall <- rsm(y ~ FO(xA, xB, C) + TWI(xA, xB, C) + PQ(xA, xB), data=codeddata)
#rsmmodall <- rsm(y ~ FO(xA, xB, C), data=dataall)
#rsmmodall <- rsm(y ~ SO(A, B, C), data=dataall)
#rsmmodall <- rsm(y ~ FO(A, B, C) + TWI(A, B, C) + PQ(A, B), data=dataall)
summary(rsmmodall)
## 
## Call:
## rsm(formula = y ~ FO(xA, xB, C) + TWI(xA, xB, C) + PQ(xA, xB), 
##     data = codeddata)
## 
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    470.9       90.4    5.21   0.0020 **
## xA             174.6       80.1    2.18   0.0719 . 
## xB            -146.4       80.5   -1.82   0.1188   
## C              162.0       45.5    3.56   0.0119 * 
## xA:xB          117.5       29.1    4.04   0.0068 **
## xA:C           -12.1       51.9   -0.23   0.8227   
## xB:C           -54.1       52.5   -1.03   0.3429   
## xA^2           -74.8       40.8   -1.83   0.1163   
## xB^2           -72.8       27.6   -2.63   0.0389 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.972,  Adjusted R-squared:  0.935 
## F-statistic: 26.3 on 8 and 6 DF,  p-value: 0.000399
## 
## Analysis of Variance Table
## 
## Response: y
##                Df Sum Sq Mean Sq F value  Pr(>F)
## FO(xA, xB, C)   3 925678  308559   62.23 6.5e-05
## TWI(xA, xB, C)  3  74097   24699    4.98   0.046
## PQ(xA, xB)      2  44953   22476    4.53   0.063
## Residuals       6  29750    4958                
## Lack of fit     6  29750    4958                
## Pure error      0      0                        
## 
## Stationary point of response surface:
##     xA     xB      C 
##  3.133  2.292 -2.069 
## 
## Stationary point in original units:
##       A       B       C 
## 520.466  53.334  -2.069 
## 
## Eigenanalysis:
## $values
## [1]   18.13  -31.54 -134.18
## 
## $vectors
##       [,1]   [,2]    [,3]
## xA -0.3480 0.6313  0.6931
## xB -0.4668 0.5245 -0.7121
## C   0.8130 0.5713 -0.1121
residuals(rsmmodall)
##       1       2       3       4       5       6       8      10      11 
##  47.456 -90.703 -14.872  20.168  49.284  20.168 -35.406  32.215 -23.887 
##      13      15      16      17      18      19 
##   3.516 -83.449  33.958  67.018 -12.732 -12.732
# modallpred <- data.frame(A=c(413, 465), B=c(27, 43), C=c("Hexane", "Hexane"))
# modallpred
# predict(rsmmodall, newdata=modallpred)

# at is not working properly here?
# contour(rsmmodall, ~ A*B, at=list(C="Xylene"))
# points(datamodallpred$xA, datamodallpred$xB, pch=19)
# text(datamodallpred$xA,datamodallpred$xB,labels=datamodallpred$y)
# title("RSM Model All")

I have been unable to work out how to use discrete factors (here C, solvent) effectively in rsm so focus on solvent specific models as above.

The residuals were much (~4x) larger with the baseline included so omitted here.

dataallx <- data[c(2,5,8,10,11,13,15,16,17),]
dataallx
##    Run. Exp Rand   A    B      C Acen Bcen Arng Brng xA     xB xC   Gen
## 2     3   2    3 413 35.0 Xylene  426   39   26    8 -1 -1.000  1  1.00
## 5     4   5    1 439 27.0 Xylene  452   31   26    8 -1 -1.000  1  1.00
## 8     6   7    3 465 35.0 Xylene  452   31   26    8  1  1.000  1  1.00
## 10    8   8    2 465 27.0 Xylene  452   31   26    8  1 -1.000  1 -1.00
## 11    7   9    1 439 35.0 Xylene  452   31   26    8 -1  1.000  1 -1.00
## 13    9  10   NA 413 20.0 Xylene  452   31   26    8 -3 -2.750  1  8.25
## 15   10  11   NA 439 20.0 Xylene  452   31   26    8 -1 -2.750  1  2.75
## 16   11  12   NA 426 23.5 Xylene  452   31   26    8 -2 -1.875  1  3.75
## 17   12  13   NA 413 27.0 Xylene  452   31   26    8 -3 -1.000  1  3.00
##    ypred    y      Type Notes
## 2     NA  479 Factorial      
## 5   1072 1026 Factorial      
## 8   1108  835 Factorial      
## 10  1147  967 Factorial      
## 11   714  771 Factorial      
## 13  1400 1015    Ascent      
## 15  1098  933 Factorial      
## 16  1046 1035    Center      
## 17   945  936 Factorial
# Try using coded data
codeddata <- coded.data(dataallx[,c("A", "B", "y")],
                        xA ~ (A - 439) / 26,
                        xB ~ (B - 35) / 8)
codeddata
##      A    B    y
## 2  413 35.0  479
## 5  439 27.0 1026
## 8  465 35.0  835
## 10 465 27.0  967
## 11 439 35.0  771
## 13 413 20.0 1015
## 15 439 20.0  933
## 16 426 23.5 1035
## 17 413 27.0  936
## 
## Data are stored in coded form using these coding formulas ...
## xA ~ (A - 439)/26
## xB ~ (B - 35)/8
str(codeddata)
## Classes 'coded.data' and 'data.frame':   9 obs. of  3 variables:
##  $ xA: num  -1 0 1 1 0 -1 0 -0.5 -1
##  $ xB: num  0 -1 0 -1 0 ...
##  $ y : int  479 1026 835 967 771 1015 933 1035 936
##  - attr(*, "codings")=List of 2
##   ..$ xA:Class 'formula' length 3 xA ~ (A - 439)/26
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   ..$ xB:Class 'formula' length 3 xB ~ (B - 35)/8
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##  - attr(*, "rsdes")=List of 2
##   ..$ primary: chr  "xA" "xB"
##   ..$ call   : language coded.data(data = dataallx[, c("A", "B", "y")], xA ~ (A - 439)/26,      xB ~ (B - 35)/8)
codeddata$xA
## [1] -1.0  0.0  1.0  1.0  0.0 -1.0  0.0 -0.5 -1.0
codeddata$xB
## [1]  0.000 -1.000  0.000 -1.000  0.000 -1.875 -1.875 -1.438 -1.000
rsmmodallx <- rsm(y ~ SO(xA, xB), data=codeddata)
summary(rsmmodallx)
## 
## Call:
## rsm(formula = y ~ SO(xA, xB), data = codeddata)
## 
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    752.0       14.5   51.78  1.6e-05 ***
## xA             182.4       12.7   14.32  0.00074 ***
## xB            -484.0       29.1  -16.64  0.00047 ***
## xA:xB          175.6       14.4   12.17  0.00119 ** 
## xA^2           -85.3       14.3   -5.97  0.00940 ** 
## xB^2          -204.0       16.7  -12.23  0.00118 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.996,  Adjusted R-squared:  0.989 
## F-statistic:  142 on 5 and 3 DF,  p-value: 0.000922
## 
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value  Pr(>F)
## FO(xA, xB)   2 169455   84727   240.7 0.00049
## TWI(xA, xB)  1  21974   21974    62.4 0.00423
## PQ(xA, xB)   2  58643   29322    83.3 0.00235
## Residuals    3   1056     352                
## Lack of fit  3   1056     352                
## Pure error   0      0                        
## 
## Stationary point of response surface:
##      xA      xB 
## -0.2725 -1.3034 
## 
## Stationary point in original units:
##      A      B 
## 431.91  24.57 
## 
## Eigenanalysis:
## $values
## [1]  -38.68 -250.62
## 
## $vectors
##       [,1]    [,2]
## xA -0.8832 -0.4689
## xB -0.4689  0.8832
residuals(rsmmodallx)
##       2       5       8      10      11      13      15      16      17 
##  -5.388  -6.017 -14.089  13.477  18.978  11.167  -9.291  -4.893  -3.944
contour(rsmmodallx, ~ xA*xB)
#points(codeddata$xA, codeddata$xB, pch=19)
points(dataallx$A, dataallx$B, pch=19)
text(dataallx$A,dataallx$B,labels=dataallx$y)
title("RSM Model All Xylene")

plot of chunk unnamed-chunk-27

contour(rsmmodallx, ~ xA*xB, at = xs(rsmmodallx),
        xlim=c(412,440), ylim=c(19.8,27.2),
        image=TRUE, levels = pretty(c(900,1050), 30))
## Warning: "levels" is not a graphical parameter
## Warning: "levels" is not a graphical parameter
## Warning: "levels" is not a graphical parameter
## Warning: "levels" is not a graphical parameter
## Warning: "levels" is not a graphical parameter
## Warning: "levels" is not a graphical parameter
points(dataallx$A, dataallx$B, pch=19)
text(dataallx$A,dataallx$B,labels=dataallx$y)
title("RSM Model All Xylene")

plot of chunk unnamed-chunk-27

contour(rsmmodallx, ~ xA*xB, at = xs(rsmmodallx),
        xlim=c(390,480), ylim=c(19,50),
        image=TRUE)
points(dataallx$A, dataallx$B, pch=19)
text(dataallx$A,dataallx$B,labels=dataallx$y)
title("RSM Model All Xylene")

plot of chunk unnamed-chunk-27

Choice of Optimum

Based on the results above I predict an optimum near (based on RSM model 10):
A = 432, B = 25

Given that the predicted value there is only 8 or 17 more than the nearest two runs while the error is +/- 25 I don’t think it is worth making a run to confirm that.

Process Observations

Xylene appears superior to Hexane as a solvent.

Shorter batch durations are generally superior to longer.

Optimal temperature is near the middle of the range with some dependence on batch duration (higher temperature for longer duration, which is nonintuitive to me).

Final Comments

I wish I had gone through the rsm package vignettes before doing this assignment.
There are three: rsm.pdf rs-illus.pdf rsm-plots.pdf
Note that my usage of the rsm package was less sophisticated when I initially did this (e.g. I only used SO models so could not plot contours until I had enough points).

I still don’t understand how to choose specific values for discrete factors.

Evaluation

The score modifier: \[5.0 \times \dfrac{\text{Your optimum} - \text{Baseline}}{\text{True optimum} - \text{Baseline}} - 0.25N + 3.0\] where \(N\) is your number of experiments
looks like a good benchmark. Let’s try estimating the behavior of this over a range of \(N\) and \(optimum\)

Understanding this behavior should be helpful when trying to choose a stopping point.

Remember: “The grading for this question will be marked mostly on the systematic methodology used to approach the optimum.”

# Take a guess at the true optimum (try different values)
#trueOpt <- 1500
trueOpt <- 1200
baseline <- 515

N <- 1:20
opt <- seq(515, trueOpt, 20)
grd <- expand.grid(N, opt)
colnames(grd) <- c("N", "opt")

scoreMod <- function(N, opt, trueOpt=1500) {
  5.0 * (opt - baseline) / (trueOpt - baseline) - 0.25 * N + 3.0
}

grd$score <- with(grd, 5.0 * (opt - baseline) / (trueOpt - baseline) - 0.25 * N + 3.0)

library(lattice)
## Warning: package 'lattice' was built under R version 3.0.3
contourplot(score ~ N + opt, data=grd, #xlim=v, ylim=v,
            cuts=10, region=TRUE,
            main="Estimated Score Modifier by N and Optimum Achieved",
            col.regions = terrain.colors(100))

plot of chunk unnamed-chunk-28

These comments based on trueBaseline = 1500

If all I wanted was to maximize this score metric I would seriously consider stopping after experiment 4 given that the estimated score modifier of 4.6 looks quite good (written before I performed experiment 5).

The prediction for experiment 5 is promising though. Assuming a similar prediction error as for experiment 4 we would have a score modifier of 5. In reality the actual result for exp 5 was much worse than the prediction.