Response surface project for Coursera Experimentation for Improvement.
Experiment Generator - see 8/29/14 email for my sign in
Score modifier: \[5.0 \times \dfrac{\text{Your optimum} - \text{Baseline}}{\text{True optimum} - \text{Baseline}} - 0.25N + 3.0\] where \(N\) is your number of experiments.
There is error in the response variable in the order of \(\pm 25\) $/kg; please take this into account.
For the same levels of A, B and C, the simulation will return different results for different groups.
You must wait 3 hours between each experimental condition; please plan your time accordingly to make sure you finish on time.
A = temperature during the batch reaction, Kelvin; continuous; 390-480K
B = duration of the batch in minutes; continuous; 20-50 minutes
C = choice of solvent: hexane or xylene; discrete; two levels
Plot of experiments.
Solution.
library(gdata, quietly=TRUE)
## Warning: package 'gdata' was built under R version 3.0.3
## gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
##
## gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
##
## Attaching package: 'gdata'
##
## The following object is masked from 'package:stats':
##
## nobs
##
## The following object is masked from 'package:utils':
##
## object.size
data <- read.xls("~/Education/Coursera - Experimentation for Improvement/R/RSM_project/RSM_project.xls")
Experiment spreadsheet built up by working through the steps in this document.
library(knitr)
## Warning: package 'knitr' was built under R version 3.0.3
# http://blogs.uoregon.edu/rclub/2014/05/15/nice-tables-from-r-data-frames/
kable(data)
| Run. | Exp | Rand | A | B | C | Acen | Bcen | Arng | Brng | xA | xB | xC | Gen | ypred | y | Type | Notes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | NA | 439 | 43.0 | Xylene | 426 | 39 | 26 | 8 | 1 | 1.000 | 1 | 1.00 | NA | 515 | Factorial | Baseline |
| 3 | 2 | 3 | 413 | 35.0 | Xylene | 426 | 39 | 26 | 8 | -1 | -1.000 | 1 | 1.00 | NA | 479 | Factorial | |
| 1 | 3 | 1 | 439 | 35.0 | Hexane | 426 | 39 | 26 | 8 | 1 | -1.000 | -1 | -1.00 | NA | 618 | Factorial | |
| 2 | 4 | 2 | 413 | 43.0 | Hexane | 426 | 39 | 26 | 8 | -1 | 1.000 | -1 | -1.00 | NA | 25 | Factorial | |
| 4 | 5 | 1 | 439 | 27.0 | Xylene | 452 | 31 | 26 | 8 | -1 | -1.000 | 1 | 1.00 | 1072 | 1026 | Factorial | |
| 5 | 6 | 2 | 465 | 27.0 | Hexane | 452 | 31 | 26 | 8 | 1 | -1.000 | -1 | -1.00 | 1211 | 751 | Factorial | |
| 1 | 3 | NA | 439 | 35.0 | Hexane | 452 | 31 | 26 | 8 | -1 | 1.000 | -1 | -1.00 | 618 | 618 | Factorial | |
| 6 | 7 | 3 | 465 | 35.0 | Xylene | 452 | 31 | 26 | 8 | 1 | 1.000 | 1 | 1.00 | 1108 | 835 | Factorial | |
| 4 | 5 | NA | 439 | 27.0 | Xylene | 452 | 31 | 26 | 8 | -1 | -1.000 | 1 | 1.00 | 1026 | 1026 | Factorial | |
| 8 | 8 | 2 | 465 | 27.0 | Xylene | 452 | 31 | 26 | 8 | 1 | -1.000 | 1 | -1.00 | 1147 | 967 | Factorial | |
| 7 | 9 | 1 | 439 | 35.0 | Xylene | 452 | 31 | 26 | 8 | -1 | 1.000 | 1 | -1.00 | 714 | 771 | Factorial | |
| 6 | 7 | NA | 465 | 35.0 | Xylene | 452 | 31 | 26 | 8 | 1 | 1.000 | 1 | 1.00 | 835 | 835 | Factorial | |
| 9 | 10 | NA | 413 | 20.0 | Xylene | 452 | 31 | 26 | 8 | -3 | -2.750 | 1 | 8.25 | 1400 | 1015 | Ascent | |
| NA | NA | NA | 413 | 20.0 | Xylene | 439 | 35 | 26 | 8 | -2 | -3.750 | NA | NA | 1400 | NA | Prediction | mod6 |
| 10 | 11 | NA | 439 | 20.0 | Xylene | 452 | 31 | 26 | 8 | -1 | -2.750 | 1 | 2.75 | 1098 | 933 | Factorial | |
| 11 | 12 | NA | 426 | 23.5 | Xylene | 452 | 31 | 26 | 8 | -2 | -1.875 | 1 | 3.75 | 1046 | 1035 | Center | |
| 12 | 13 | NA | 413 | 27.0 | Xylene | 452 | 31 | 26 | 8 | -3 | -1.000 | 1 | 3.00 | 945 | 936 | Factorial | |
| 13 | 14 | NA | 413 | 27.0 | Hexane | 452 | 31 | 26 | 8 | -3 | -1.000 | -1 | 3.00 | 777 | 628 | Factorial | |
| 14 | 15 | NA | 465 | 43.0 | Hexane | 452 | 31 | 26 | 8 | 1 | 3.000 | -1 | 3.00 | 502 | 552 | Factorial |
Given that we have 3 factors and want to run a maximum of 20 experiments start with a half factorial (we can complete the full factorial if necessary). See the Gen(erator) column in the table above for selection of C.
The aliasing is that each main effect overlaps with the interaction effect for the other two factors and the intercept overlaps with the three factor interaction effect.
The results of this experiment will be used either to:
Take a steepest ascent step in the appropriate direction or
A starting point for additional runs (possibly up to a full factorial)
depending on how much aliasing appears to be present.
Choose ranges based on about 1/4 of maximum range. For A use 26K and for B use 8 min.
Randomize order of first three runs: 3, 1, 2
After performing the experiments the results are entered in the spreadsheet (see table above).
source("paretoPlot.R")
source("contourPlot.R")
source("contourPlotRSM.R") # Not working correctly yet
datamod1 <- data[1:4,]
# Create linear model for first factorial
mod1 <- lm(y ~ xA * xB * xC, data=datamod1)
mod1
##
## Call:
## lm(formula = y ~ xA * xB * xC, data = datamod1)
##
## Coefficients:
## (Intercept) xA xB xC xA:xB
## 409.2 157.2 -139.2 87.8 NA
## xA:xC xB:xC xA:xB:xC
## NA NA NA
paretoPlot(mod1)
## Warning: package 'ggplot2' was built under R version 3.0.3
contourPlotRSM(mod1, datamod1$xA, datamod1$xB, datamod1$xC)
## Warning: prediction from a rank-deficient fit may be misleading
# Try just looking at A and B
mod1a <- lm(y ~ xA * xB, data=datamod1)
mod1a
##
## Call:
## lm(formula = y ~ xA * xB, data = datamod1)
##
## Coefficients:
## (Intercept) xA xB xA:xB
## 409.2 157.2 -139.2 87.8
contourPlot(mod1a, datamod1$xA, datamod1$xB)
The model with only xA and xB (flattening the half factorial) gives a very different result because of the aliasing of C with AB. Try to evaluate which model is more accurate for the next set of runs.
# Build an rsm model instead
library(rsm)
## Warning: package 'rsm' was built under R version 3.0.3
rsmmod1 <- rsm(y ~ FO(xA, xB, xC), data=datamod1)
summary(rsmmod1)
## Warning: ANOVA F-tests on an essentially perfect fit are unreliable
##
## Call:
## rsm(formula = y ~ FO(xA, xB, xC), data = datamod1)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 409.2 NA NA NA
## xA 157.2 NA NA NA
## xB -139.2 NA NA NA
## xC 87.8 NA NA NA
##
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 3 and 0 DF, p-value: NA
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB, xC) 3 207273 69091
## Residuals 0 0
## Lack of fit 0 0
## Pure error 0 0
##
## Direction of steepest ascent (at radius 1):
## xA xB xC
## 0.6908 -0.6117 0.3855
##
## Corresponding increment in original units:
## xA xB xC
## 0.6908 -0.6117 0.3855
# Try looking at rsm contour plots
contour(rsmmod1, ~ xA*xB)
steepest(rsmmod1)
## Path of steepest ascent from ridge analysis:
## dist xA xB xC | yhat
## 1 0.0 0.000 0.000 0.000 | 409.2
## 2 0.5 0.345 -0.306 0.193 | 523.0
## 3 1.0 0.691 -0.612 0.385 | 636.9
## 4 1.5 1.036 -0.918 0.578 | 750.7
## 5 2.0 1.382 -1.223 0.771 | 864.5
## 6 2.5 1.727 -1.529 0.964 | 978.3
## 7 3.0 2.072 -1.835 1.156 | 1092.0
## 8 3.5 2.418 -2.141 1.349 | 1206.0
## 9 4.0 2.763 -2.447 1.542 | 1319.8
## 10 4.5 3.109 -2.753 1.735 | 1433.7
## 11 5.0 3.454 -3.059 1.927 | 1547.5
# Try using cube to create experiment? See rs-illus.pdf
Based on steepest ascent move to the lower right from experiment number 1. Build another half factorial in that direction. The expectation is that we will then add experiments at the upper right and lower left to form half factorials in the first and third quadrants. Then add a Xylene experiment in the center (at exp 1) to give a full factorial with a center for Xylene and a rotated full factorial with a center for Hexane. This will result in a total of 9 experiments so far.
Note that the initial indication is that xC (solvent) is less important than xA and xB since it has a coefficient of 87.5 (indicating Xylene is better) and only has possible values +/- 1 (for comparison both xA and xB can attain values with abs > 5 on the scale used here, and have coefficients with abs ~140). Check if these initial impressions hold up in later runs.
Add experiments for this factorial to the spreadsheet (some are reused).
Make predictions and add to the spreadsheet.
# First make predictions for each of the runs in this factorial
datamod2 <- data[5:8,]
# Calculate coded values for first model from actual values in spreadsheet
datamod2pred <- datamod2
datamod2pred$xA <- (datamod2pred$A - 426) / (0.5 * 26)
datamod2pred$xB <- (datamod2pred$B - 39) / (0.5 * 8)
#datamod2pred
predict(mod1, newdata=datamod2pred)
## Warning: prediction from a rank-deficient fit may be misleading
## 5 6 7 8
## 1072 1211 618 1108
Also predict with the flattened model to see which is a better match.
predict(mod1a, newdata=datamod2pred)
## 5 6 7 8
## 721 509 618 757
Randomize order of second three runs: 1, 2, 3
After performing the experiments and adding the results to the spreadsheet build models for evaluation.
# Create linear model for second factorial
mod2 <- lm(y ~ xA * xB * xC, data=datamod2)
mod2
##
## Call:
## lm(formula = y ~ xA * xB * xC, data = datamod2)
##
## Coefficients:
## (Intercept) xA xB xC xA:xB
## 807.5 -14.5 -81.0 123.0 NA
## xA:xC xB:xC xA:xB:xC
## NA NA NA
paretoPlot(mod2)
contourPlotRSM(mod2, datamod2$xA, datamod2$xB, datamod2$xC)
## Warning: prediction from a rank-deficient fit may be misleading
# Try just looking at A and B
mod2a <- lm(y ~ xA * xB, data=datamod2)
mod2a
##
## Call:
## lm(formula = y ~ xA * xB, data = datamod2)
##
## Coefficients:
## (Intercept) xA xB xA:xB
## 807.5 -14.5 -81.0 123.0
contourPlot(mod2a, datamod2$xA, datamod2$xB)
rsmmod2 <- rsm(y ~ FO(xA, xB, xC), data=datamod2)
summary(rsmmod2)
## Warning: ANOVA F-tests on an essentially perfect fit are unreliable
##
## Call:
## rsm(formula = y ~ FO(xA, xB, xC), data = datamod2)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 807.5 NA NA NA
## xA -14.5 NA NA NA
## xB -81.0 NA NA NA
## xC 123.0 NA NA NA
##
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 3 and 0 DF, p-value: NA
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB, xC) 3 87601 29200
## Residuals 0 0
## Lack of fit 0 0
## Pure error 0 0
##
## Direction of steepest ascent (at radius 1):
## xA xB xC
## -0.09798 -0.54734 0.83115
##
## Corresponding increment in original units:
## xA xB xC
## -0.09798 -0.54734 0.83115
# Try looking at rsm contour plots
contour(rsmmod2, ~ xA*xB)
This factorial shows an even larger coefficient for xC (indicating Xylene performs better). Given the other coefficients it looks like the direction to explore is down and a little left.
But first we now have four Xylene runs. Let’s take a look at those by themselves.
We now have enough data (Baseline, 3, 4, and 6) to make up a rotated full factorial for Xylene only. Build a model for this and evaluate response. Calculate coded values centered at run 1 (439, 35) with the same ranges as above.
datamod4 <- data[c(1,2,5,8),]
# Calculate coded values from actual values in spreadsheet
datamod4pred <- datamod4
datamod4pred$xA <- (datamod4pred$A - 439) / (0.5 * 26)
datamod4pred$xB <- (datamod4pred$B - 35) / (0.5 * 8)
datamod4pred
## Run. Exp Rand A B C Acen Bcen Arng Brng xA xB xC Gen ypred y
## 1 0 1 NA 439 43 Xylene 426 39 26 8 0 2 1 1 NA 515
## 2 3 2 3 413 35 Xylene 426 39 26 8 -2 0 1 1 NA 479
## 5 4 5 1 439 27 Xylene 452 31 26 8 0 -2 1 1 1072 1026
## 8 6 7 3 465 35 Xylene 452 31 26 8 2 0 1 1 1108 835
## Type Notes
## 1 Factorial Baseline
## 2 Factorial
## 5 Factorial
## 8 Factorial
mod4 <- lm(y ~ xA * xB, data=datamod4pred)
mod4
##
## Call:
## lm(formula = y ~ xA * xB, data = datamod4pred)
##
## Coefficients:
## (Intercept) xA xB xA:xB
## 714 89 -128 NA
contourPlot(mod4, datamod4pred$xA, datamod4pred$xB)
## Warning: prediction from a rank-deficient fit may be misleading
rsmmod4 <- rsm(y ~ FO(xA, xB), data=datamod4pred)
summary(rsmmod4)
##
## Call:
## rsm(formula = y ~ FO(xA, xB), data = datamod4pred)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 713.8 56.8 12.58 0.051 .
## xA 89.0 40.1 2.22 0.270
## xB -127.8 40.1 -3.18 0.194
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.938, Adjusted R-squared: 0.813
## F-statistic: 7.53 on 2 and 1 DF, p-value: 0.25
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB) 2 193928 96964 7.53 0.25
## Residuals 1 12882 12882
## Lack of fit 1 12882 12882
## Pure error 0 0
##
## Direction of steepest ascent (at radius 1):
## xA xB
## 0.5716 -0.8205
##
## Corresponding increment in original units:
## xA xB
## 0.5716 -0.8205
steepest(rsmmod4)
## Path of steepest ascent from ridge analysis:
## dist xA xB | yhat
## 1 0.0 0.000 0.000 | 713.8
## 2 0.5 0.286 -0.410 | 791.6
## 3 1.0 0.572 -0.821 | 869.5
## 4 1.5 0.857 -1.231 | 947.3
## 5 2.0 1.143 -1.641 | 1025.1
## 6 2.5 1.429 -2.051 | 1102.9
## 7 3.0 1.715 -2.462 | 1180.9
## 8 3.5 2.001 -2.872 | 1258.7
## 9 4.0 2.287 -3.282 | 1336.6
## 10 4.5 2.572 -3.692 | 1414.3
## 11 5.0 2.858 -4.103 | 1492.3
contour(rsmmod4, ~ xA*xB)
# Predict the center point (or just use the intercept ;-)
predict(mod4, newdata=data.frame(xA=0, xB=0))
## Warning: prediction from a rank-deficient fit may be misleading
## 1
## 713.8
# Predict lower right Xylene point
predict(mod4, newdata=data.frame(xA=2, xB=-2))
## Warning: prediction from a rank-deficient fit may be misleading
## 1
## 1147
Given these results I think the next step is to run Xylene experiments at the last two predicted points which will fill out a full factorial for Xylene to the lower right. The prediction at the run 4 location indicates it should be an improvement.
In addition, both of these points already have Hexane runs so should help confirm if Xylene is the better choice.
First also predict the new experiments with the second half factorial model.
predict(mod2, newdata=data.frame(xA=-1, xB=1, xC=1))
## Warning: prediction from a rank-deficient fit may be misleading
## 1
## 864
predict(mod2, newdata=data.frame(xA=1, xB=-1, xC=1))
## Warning: prediction from a rank-deficient fit may be misleading
## 1
## 997
This model is less optimistic about the lower right point.
Randomize order of these two runs: 2, 1
Now build the models.
datamod5 <- data[9:12,]
# Create linear model for xylene full factorial
mod5 <- lm(y ~ xA * xB, data=datamod5)
mod5
##
## Call:
## lm(formula = y ~ xA * xB, data = datamod5)
##
## Coefficients:
## (Intercept) xA xB xA:xB
## 899.75 1.25 -96.75 30.75
paretoPlot(mod5)
contourPlot(mod5, datamod5$xA, datamod5$xB)
rsmmod5 <- rsm(y ~ FO(xA, xB), data=datamod5)
summary(rsmmod5)
##
## Call:
## rsm(formula = y ~ FO(xA, xB), data = datamod5)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 899.75 30.75 29.26 0.022 *
## xA 1.25 30.75 0.04 0.974
## xB -96.75 30.75 -3.15 0.196
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.908, Adjusted R-squared: 0.725
## F-statistic: 4.95 on 2 and 1 DF, p-value: 0.303
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB) 2 37449 18724 4.95 0.3
## Residuals 1 3782 3782
## Lack of fit 1 3782 3782
## Pure error 0 0
##
## Direction of steepest ascent (at radius 1):
## xA xB
## 0.01292 -0.99992
##
## Corresponding increment in original units:
## xA xB
## 0.01292 -0.99992
steepest(rsmmod5)
## Path of steepest ascent from ridge analysis:
## dist xA xB | yhat
## 1 0.0 0.000 0.0 | 899.8
## 2 0.5 0.006 -0.5 | 948.1
## 3 1.0 0.013 -1.0 | 996.5
## 4 1.5 0.019 -1.5 | 1044.9
## 5 2.0 0.026 -2.0 | 1093.3
## 6 2.5 0.032 -2.5 | 1141.7
## 7 3.0 0.039 -3.0 | 1190.0
## 8 3.5 0.045 -3.5 | 1238.4
## 9 4.0 0.052 -4.0 | 1286.8
## 10 4.5 0.058 -4.5 | 1335.2
## 11 5.0 0.065 -5.0 | 1383.6
contour(rsmmod5, ~ xA*xB)
I’m not sure where to go from here. Try some predictions at possible next experiments.
predict(mod5, newdata=data.frame(xA=-3, xB=-1))
## 1
## 1085
predict(mod5, newdata=data.frame(xA=-3, xB=-2))
## 1
## 1274
predict(mod5, newdata=data.frame(xA=-1, xB=-2))
## 1
## 1154
Try a model with all the Xylene data.
datamod6 <- data[c(1,2,5,8,10,11),]
# Calculate coded values from actual values in spreadsheet
datamod6pred <- datamod6
datamod6pred$xA <- (datamod6pred$A - 439) / (0.5 * 26)
datamod6pred$xB <- (datamod6pred$B - 35) / (0.5 * 8)
datamod6pred
## Run. Exp Rand A B C Acen Bcen Arng Brng xA xB xC Gen ypred y
## 1 0 1 NA 439 43 Xylene 426 39 26 8 0 2 1 1 NA 515
## 2 3 2 3 413 35 Xylene 426 39 26 8 -2 0 1 1 NA 479
## 5 4 5 1 439 27 Xylene 452 31 26 8 0 -2 1 1 1072 1026
## 8 6 7 3 465 35 Xylene 452 31 26 8 2 0 1 1 1108 835
## 10 8 8 2 465 27 Xylene 452 31 26 8 2 -2 1 -1 1147 967
## 11 7 9 1 439 35 Xylene 452 31 26 8 0 0 1 -1 714 771
## Type Notes
## 1 Factorial Baseline
## 2 Factorial
## 5 Factorial
## 8 Factorial
## 10 Factorial
## 11 Factorial
mod6 <- lm(y ~ xA * xB, data=datamod6pred)
mod6
##
## Call:
## lm(formula = y ~ xA * xB, data = datamod6pred)
##
## Coefficients:
## (Intercept) xA xB xA:xB
## 725.2 89.0 -127.8 47.9
contourPlot(mod6, datamod6pred$xA, datamod6pred$xB)
rsmmod6 <- rsm(y ~ SO(xA, xB), data=datamod6pred)
summary(rsmmod6)
## Warning: ANOVA F-tests on an essentially perfect fit are unreliable
##
## Call:
## rsm(formula = y ~ SO(xA, xB), data = datamod6pred)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 771.000 NA NA NA
## xA 89.000 NA NA NA
## xB -127.750 NA NA NA
## xA:xB 30.750 NA NA NA
## xA^2 -28.500 NA NA NA
## xB^2 -0.125 NA NA NA
##
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 5 and 0 DF, p-value: NA
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB) 2 225947 112974
## TWI(xA, xB) 1 16704 16704
## PQ(xA, xB) 2 15504 7752
## Residuals 0 0
## Lack of fit 0 0
## Pure error 0 0
##
## Stationary point of response surface:
## xA xB
## 4.194 4.880
##
## Eigenanalysis:
## $values
## [1] 6.608 -35.233
##
## $vectors
## [,1] [,2]
## xA -0.4012 -0.9160
## xB -0.9160 0.4012
steepest(rsmmod6)
## Path of steepest ascent from ridge analysis:
## dist xA xB | yhat
## 1 0.0 0.000 0.000 | 771.0
## 2 0.5 0.229 -0.444 | 843.5
## 3 1.0 0.331 -0.944 | 908.2
## 4 1.5 0.320 -1.465 | 969.0
## 5 2.0 0.238 -1.986 | 1029.3
## 6 2.5 0.115 -2.497 | 1090.2
## 7 3.0 -0.033 -3.000 | 1153.2
## 8 3.5 -0.194 -3.495 | 1218.5
## 9 4.0 -0.365 -3.983 | 1286.3
## 10 4.5 -0.543 -4.467 | 1357.0
## 11 5.0 -0.725 -4.947 | 1430.7
contour(rsmmod6, ~ xA*xB)
Predictions with this model.
mod6pred <- data.frame(xA=c(0, -2, -2, -2, 0), xB=c(-3, -3, -2, -3.8, -3.8))
mod6pred
## xA xB
## 1 0 -3.0
## 2 -2 -3.0
## 3 -2 -2.0
## 4 -2 -3.8
## 5 0 -3.8
predict(mod6, newdata=mod6pred)
## 1 2 3 4 5
## 1108.5 1218.0 994.4 1396.9 1210.7
predict(rsmmod6, newdata=mod6pred)
## 1 2 3 4 5
## 1153 1046 857 1196 1255
contourPlot(mod6, c(datamod6pred$xA, mod6pred$xA),
c(datamod6pred$xB, mod6pred$xB))
contour(rsmmod6, ~ xA*xB)
points(c(datamod6pred$xA, mod6pred$xA), c(datamod6pred$xB, mod6pred$xB), pch=19)
text(datamod6pred$xA,datamod6pred$xB,labels=datamod6pred$y)
The linear and quadratic models give rather different results. I think running at A=413 and B=23 will give us the most information for a valid model and can be expanded into a factorial if desired.
After thinking about this some more I will try stepping to the lower left to minimum B and A corresponding to the earlier left side runs (A = 413, B = 20). This will be the beginning of a factorial with experiment 4 if that seems appropriate.
An important aspect of this experiment is trying to determine if the linear model is wrong here (i.e. we are in a region of curvature).
Given the result of experiment 9 (1015) it seems likely we are in a region of curvature and have probably stepped over a maximum between experiments 4 and 9. Complete the factorial around exp 4 and 9, but first add experiment 9 to the model and take a look at predictions.
I’m not sure whether it is better to use all the Xylene data or focus on the more local runs. For now, use it all.
datamod7 <- data[c(1,2,5,8,10,11,13),]
# Calculate coded values from actual values in spreadsheet
datamod7pred <- datamod7
datamod7pred$xA <- (datamod7pred$A - 439) / (0.5 * 26)
datamod7pred$xB <- (datamod7pred$B - 35) / (0.5 * 8)
datamod7pred
## Run. Exp Rand A B C Acen Bcen Arng Brng xA xB xC Gen ypred
## 1 0 1 NA 439 43 Xylene 426 39 26 8 0 2.00 1 1.00 NA
## 2 3 2 3 413 35 Xylene 426 39 26 8 -2 0.00 1 1.00 NA
## 5 4 5 1 439 27 Xylene 452 31 26 8 0 -2.00 1 1.00 1072
## 8 6 7 3 465 35 Xylene 452 31 26 8 2 0.00 1 1.00 1108
## 10 8 8 2 465 27 Xylene 452 31 26 8 2 -2.00 1 -1.00 1147
## 11 7 9 1 439 35 Xylene 452 31 26 8 0 0.00 1 -1.00 714
## 13 9 10 NA 413 20 Xylene 452 31 26 8 -2 -3.75 1 8.25 1400
## y Type Notes
## 1 515 Factorial Baseline
## 2 479 Factorial
## 5 1026 Factorial
## 8 835 Factorial
## 10 967 Factorial
## 11 771 Factorial
## 13 1015 Ascent
mod7 <- lm(y ~ xA * xB, data=datamod7pred)
mod7
##
## Call:
## lm(formula = y ~ xA * xB, data = datamod7pred)
##
## Coefficients:
## (Intercept) xA xB xA:xB
## 709.5 83.0 -102.2 15.4
contourPlot(mod7, datamod7pred$xA, datamod7pred$xB)
rsmmod7 <- rsm(y ~ SO(xA, xB), data=datamod7pred)
summary(rsmmod7)
##
## Call:
## rsm(formula = y ~ SO(xA, xB), data = datamod7pred)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 786.54 15.94 49.34 0.013 *
## xA 88.37 7.85 11.26 0.056 .
## xB -125.05 7.39 -16.92 0.038 *
## xA:xB 20.87 5.13 4.07 0.153
## xA^2 -33.42 4.72 -7.09 0.089 .
## xB^2 -5.95 3.54 -1.68 0.341
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.998, Adjusted R-squared: 0.99
## F-statistic: 126 on 5 and 1 DF, p-value: 0.0677
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB) 2 278705 139353 281.2 0.042
## TWI(xA, xB) 1 7186 7186 14.5 0.163
## PQ(xA, xB) 2 25126 12563 25.4 0.139
## Residuals 1 496 496
## Lack of fit 1 496 496
## Pure error 0 0
##
## Stationary point of response surface:
## xA xB
## -4.327 -18.093
##
## Eigenanalysis:
## $values
## [1] -2.437 -36.935
##
## $vectors
## [,1] [,2]
## xA -0.3192 -0.9477
## xB -0.9477 0.3192
steepest(rsmmod7)
## Path of steepest ascent from ridge analysis:
## dist xA xB | yhat
## 1 0.0 0.000 0.000 | 786.5
## 2 0.5 0.241 -0.438 | 857.3
## 3 1.0 0.380 -0.925 | 918.5
## 4 1.5 0.430 -1.437 | 972.9
## 5 2.0 0.415 -1.956 | 1022.3
## 6 2.5 0.359 -2.474 | 1068.4
## 7 3.0 0.274 -2.987 | 1111.6
## 8 3.5 0.172 -3.496 | 1152.6
## 9 4.0 0.057 -4.000 | 1191.7
## 10 4.5 -0.067 -4.500 | 1229.0
## 11 5.0 -0.197 -4.996 | 1264.6
contour(rsmmod7, ~ xA*xB)
points(datamod7pred$xA, datamod7pred$xB, pch=19)
text(datamod7pred$xA,datamod7pred$xB,labels=datamod7pred$y)
title("RSM Model 7")
Make predictions for the runs I am considering.
mod7pred <- data.frame(xA=c(0, -2, 2), xB=c(-3.8, -2, -3.8))
mod7pred
## xA xB
## 1 0 -3.8
## 2 -2 -2.0
## 3 2 -3.8
predict(mod7, newdata=mod7pred)
## 1 2 3
## 1097.7 809.5 1146.5
predict(rsmmod7, newdata=mod7pred)
## 1 2 3
## 1175.8 785.9 1060.2
contourPlot(mod7, c(datamod7pred$xA, mod7pred$xA),
c(datamod7pred$xB, mod7pred$xB))
contour(rsmmod7, ~ xA*xB)
points(c(datamod7pred$xA, mod7pred$xA), c(datamod7pred$xB, mod7pred$xB), pch=19)
text(datamod7pred$xA,datamod7pred$xB,labels=datamod7pred$y)
title("RSM Model 7 Predictions")
From these predictions the center bottom (A=439, B=20) point looks like a good choice and allows the completion of factorials on the lower left or right as desired. At this point I am trying to decide if right or left looks more promising then will complete the factorial and add a center point if it looks promising.
The result of experiment 10 (933) gives us a better sense of the curvature.
Given how far we are (high leverage) from the baseline I think removing it from the model and recentering on experiment 4 is appropriate (I tried as in mod7 above first though, this version looks more realistic in region near optimum).
# datamod8 <- data[c(1,2,5,8,10,11,13,15),]
datamod8 <- data[c(2,5,8,10,11,13,15),] # Without baseline
# Calculate coded values from actual values in spreadsheet
datamod8pred <- datamod8
datamod8pred$xA <- (datamod8pred$A - 439) / (0.5 * 26)
datamod8pred$xB <- (datamod8pred$B - 35) / (0.5 * 8)
datamod8pred
## Run. Exp Rand A B C Acen Bcen Arng Brng xA xB xC Gen ypred
## 2 3 2 3 413 35 Xylene 426 39 26 8 -2 0.00 1 1.00 NA
## 5 4 5 1 439 27 Xylene 452 31 26 8 0 -2.00 1 1.00 1072
## 8 6 7 3 465 35 Xylene 452 31 26 8 2 0.00 1 1.00 1108
## 10 8 8 2 465 27 Xylene 452 31 26 8 2 -2.00 1 -1.00 1147
## 11 7 9 1 439 35 Xylene 452 31 26 8 0 0.00 1 -1.00 714
## 13 9 10 NA 413 20 Xylene 452 31 26 8 -2 -3.75 1 8.25 1400
## 15 10 11 NA 439 20 Xylene 452 31 26 8 0 -3.75 1 2.75 1098
## y Type Notes
## 2 479 Factorial
## 5 1026 Factorial
## 8 835 Factorial
## 10 967 Factorial
## 11 771 Factorial
## 13 1015 Ascent
## 15 933 Factorial
mod8 <- lm(y ~ xA * xB, data=datamod8pred)
mod8
##
## Call:
## lm(formula = y ~ xA * xB, data = datamod8pred)
##
## Coefficients:
## (Intercept) xA xB xA:xB
## 718.7 89.5 -79.4 23.4
contourPlot(mod8, datamod8pred$xA, datamod8pred$xB)
rsmmod8 <- rsm(y ~ SO(xA, xB), data=datamod8pred)
summary(rsmmod8)
##
## Call:
## rsm(formula = y ~ SO(xA, xB), data = datamod8pred)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 752.10 24.19 31.09 0.020 *
## xA 90.44 10.76 8.41 0.075 .
## xB -249.53 32.61 -7.65 0.083 .
## xA:xB 44.72 6.36 7.03 0.090 .
## xA^2 -21.41 6.18 -3.47 0.179
## xB^2 -53.21 9.50 -5.60 0.112
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.996, Adjusted R-squared: 0.975
## F-statistic: 46.9 on 5 and 1 DF, p-value: 0.11
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB) 2 165115 82557 87.6 0.075
## TWI(xA, xB) 1 17558 17558 18.6 0.145
## PQ(xA, xB) 2 38446 19223 20.4 0.155
## Residuals 1 942 942
## Lack of fit 1 942 942
## Pure error 0 0
##
## Stationary point of response surface:
## xA xB
## -0.5997 -2.5967
##
## Eigenanalysis:
## $values
## [1] -9.877 -64.747
##
## $vectors
## [,1] [,2]
## xA -0.8887 -0.4585
## xB -0.4585 0.8887
steepest(rsmmod8)
## Path of steepest ascent from ridge analysis:
## dist xA xB | yhat
## 1 0.0 0.000 0.000 | 752.1
## 2 0.5 0.155 -0.475 | 868.8
## 3 1.0 0.258 -0.966 | 954.3
## 4 1.5 0.249 -1.479 | 1009.5
## 5 2.0 0.017 -2.000 | 1038.3
## 6 2.5 -0.432 -2.462 | 1048.4
## 7 3.0 -0.947 -2.847 | 1046.9
## 8 3.5 -1.462 -3.180 | 1037.4
## 9 4.0 -1.965 -3.484 | 1021.3
## 10 4.5 -2.458 -3.769 | 999.3
## 11 5.0 -2.943 -4.042 | 971.7
contour(rsmmod8, ~ xA*xB)
points(datamod8pred$xA, datamod8pred$xB, pch=19)
text(datamod8pred$xA,datamod8pred$xB,labels=datamod8pred$y)
title("RSM Model 8")
Note that this RSM model gives the first stationary point that looks even close to valid: -0.5997 -2.5967
Make predictions for the runs I am considering.
mod8pred <- data.frame(xA=c(-2, -1, 1, 2, -0.6), xB=c(-2, -2.9, -2.9, -3.8, -2.6))
mod8pred
## xA xB
## 1 -2.0 -2.0
## 2 -1.0 -2.9
## 3 1.0 -2.9
## 4 2.0 -3.8
## 5 -0.6 -2.6
predict(mod8, newdata=mod8pred)
## 1 2 3 4 5
## 792.1 927.3 970.7 1021.7 908.0
predict(rsmmod8, newdata=mod8pred)
## 1 2 3 4 5
## 950.7 1046.1 967.6 687.3 1049.0
contourPlot(mod8, c(datamod8pred$xA, mod8pred$xA),
c(datamod8pred$xB, mod8pred$xB))
contour(rsmmod8, ~ xA*xB)
points(c(datamod8pred$xA, mod8pred$xA), c(datamod8pred$xB, mod8pred$xB), pch=19)
text(datamod8pred$xA,datamod8pred$xB,labels=datamod8pred$y)
title("RSM Model 8 Prediction Points")
From these predictions the center point of the (incomplete) lower left factorial (A= 426, B=23.5) looks like a good choice.
The result of experiment 11 (1035) should improve the model accuracy near the optimum.
# datamod9 <- data[c(1,2,5,8,10,11,13,15),]
datamod9 <- data[c(2,5,8,10,11,13,15,16),] # Without baseline
# Calculate coded values from actual values in spreadsheet
datamod9pred <- datamod9
datamod9pred$xA <- (datamod9pred$A - 439) / (0.5 * 26)
datamod9pred$xB <- (datamod9pred$B - 35) / (0.5 * 8)
datamod9pred
## Run. Exp Rand A B C Acen Bcen Arng Brng xA xB xC Gen
## 2 3 2 3 413 35.0 Xylene 426 39 26 8 -2 0.000 1 1.00
## 5 4 5 1 439 27.0 Xylene 452 31 26 8 0 -2.000 1 1.00
## 8 6 7 3 465 35.0 Xylene 452 31 26 8 2 0.000 1 1.00
## 10 8 8 2 465 27.0 Xylene 452 31 26 8 2 -2.000 1 -1.00
## 11 7 9 1 439 35.0 Xylene 452 31 26 8 0 0.000 1 -1.00
## 13 9 10 NA 413 20.0 Xylene 452 31 26 8 -2 -3.750 1 8.25
## 15 10 11 NA 439 20.0 Xylene 452 31 26 8 0 -3.750 1 2.75
## 16 11 12 NA 426 23.5 Xylene 452 31 26 8 -1 -2.875 1 3.75
## ypred y Type Notes
## 2 NA 479 Factorial
## 5 1072 1026 Factorial
## 8 1108 835 Factorial
## 10 1147 967 Factorial
## 11 714 771 Factorial
## 13 1400 1015 Ascent
## 15 1098 933 Factorial
## 16 1046 1035 Center
mod9 <- lm(y ~ xA * xB, data=datamod9pred)
mod9
##
## Call:
## lm(formula = y ~ xA * xB, data = datamod9pred)
##
## Coefficients:
## (Intercept) xA xB xA:xB
## 722.8 86.5 -84.2 24.4
contourPlot(mod9, datamod9pred$xA, datamod9pred$xB)
rsmmod9 <- rsm(y ~ SO(xA, xB), data=datamod9pred)
summary(rsmmod9)
##
## Call:
## rsm(formula = y ~ SO(xA, xB), data = datamod9pred)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 751.32 17.69 42.47 0.00055 ***
## xA 90.68 7.89 11.49 0.00749 **
## xB -245.19 21.28 -11.52 0.00745 **
## xA:xB 44.18 4.47 9.87 0.01010 *
## xA^2 -21.03 4.44 -4.74 0.04176 *
## xB^2 -51.98 6.24 -8.32 0.01412 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.996, Adjusted R-squared: 0.986
## F-statistic: 97 on 5 and 2 DF, p-value: 0.0102
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB) 2 180529 90265 176.9 0.0056
## TWI(xA, xB) 1 19119 19119 37.5 0.0257
## PQ(xA, xB) 2 47927 23964 47.0 0.0209
## Residuals 2 1021 510
## Lack of fit 2 1021 510
## Pure error 0 0
##
## Stationary point of response surface:
## xA xB
## -0.5796 -2.6047
##
## Eigenanalysis:
## $values
## [1] -9.537 -63.477
##
## $vectors
## [,1] [,2]
## xA -0.8871 -0.4616
## xB -0.4616 0.8871
residuals(rsmmod9)
## 2 5 8 10 11 13 15 16
## -6.829 -7.766 -13.555 12.697 19.683 9.401 -6.772 -6.860
steepest(rsmmod9)
## Path of steepest ascent from ridge analysis:
## dist xA xB | yhat
## 1 0.0 0.000 0.000 | 751.3
## 2 0.5 0.158 -0.474 | 866.4
## 3 1.0 0.267 -0.964 | 950.7
## 4 1.5 0.264 -1.476 | 1005.2
## 5 2.0 0.039 -2.000 | 1033.8
## 6 2.5 -0.408 -2.467 | 1043.8
## 7 3.0 -0.924 -2.854 | 1042.4
## 8 3.5 -1.440 -3.190 | 1033.2
## 9 4.0 -1.944 -3.496 | 1017.6
## 10 4.5 -2.437 -3.783 | 996.3
## 11 5.0 -2.921 -4.058 | 969.6
contour(rsmmod9, ~ xA*xB)
points(datamod9pred$xA, datamod9pred$xB, pch=19)
text(datamod9pred$xA,datamod9pred$xB,labels=datamod9pred$y)
title("RSM Model 9")
Make predictions for the runs I am considering.
mod9pred <- data.frame(xA=c(-2, 1, 2, -0.6), xB=c(-2, -2.9, -3.8, -2.6))
mod9pred
## xA xB
## 1 -2.0 -2.0
## 2 1.0 -2.9
## 3 2.0 -3.8
## 4 -0.6 -2.6
predict(mod9, newdata=mod9pred)
## 1 2 3 4
## 815.5 982.8 1030.6 927.7
predict(rsmmod9, newdata=mod9pred)
## 1 2 3 4
## 945.0 966.7 693.9 1044.3
contourPlot(mod9, c(datamod9pred$xA, mod9pred$xA),
c(datamod9pred$xB, mod9pred$xB))
contour(rsmmod9, ~ xA*xB)
points(c(datamod9pred$xA, mod9pred$xA), c(datamod9pred$xB, mod9pred$xB), pch=19)
text(datamod9pred$xA,datamod9pred$xB,labels=datamod9pred$y)
title("RSM Model 9 Prediction Points")
I feel confident that run 11 is close to the optimum (but global or local and what about Hexane?). Considering runs at A=413, B=27 to complete the lower left factorial or A=439 + -0.6 * 0.5 * 26=431, B=35 + -2.8 * 0.5 * 8=24 to get closer to the optimum.
On reflection, I don’t think either run is really worth it, but in the interest of being systematic I think completing the lower left factorial with a center point is appropriate.
The result of experiment 12 (936) completes the Xylene factorial with center at the lower left and should improve the model accuracy near the optimum.
Is it worth creating a model for the LL factorial only?
# datamod10 <- data[c(1,2,5,8,10,11,13,15),]
datamod10 <- data[c(2,5,8,10,11,13,15,16,17),] # Without baseline
# Calculate coded values from actual values in spreadsheet
datamod10pred <- datamod10
datamod10pred$xA <- (datamod10pred$A - 439) / (0.5 * 26)
datamod10pred$xB <- (datamod10pred$B - 35) / (0.5 * 8)
datamod10pred
## Run. Exp Rand A B C Acen Bcen Arng Brng xA xB xC Gen
## 2 3 2 3 413 35.0 Xylene 426 39 26 8 -2 0.000 1 1.00
## 5 4 5 1 439 27.0 Xylene 452 31 26 8 0 -2.000 1 1.00
## 8 6 7 3 465 35.0 Xylene 452 31 26 8 2 0.000 1 1.00
## 10 8 8 2 465 27.0 Xylene 452 31 26 8 2 -2.000 1 -1.00
## 11 7 9 1 439 35.0 Xylene 452 31 26 8 0 0.000 1 -1.00
## 13 9 10 NA 413 20.0 Xylene 452 31 26 8 -2 -3.750 1 8.25
## 15 10 11 NA 439 20.0 Xylene 452 31 26 8 0 -3.750 1 2.75
## 16 11 12 NA 426 23.5 Xylene 452 31 26 8 -1 -2.875 1 3.75
## 17 12 13 NA 413 27.0 Xylene 452 31 26 8 -2 -2.000 1 3.00
## ypred y Type Notes
## 2 NA 479 Factorial
## 5 1072 1026 Factorial
## 8 1108 835 Factorial
## 10 1147 967 Factorial
## 11 714 771 Factorial
## 13 1400 1015 Ascent
## 15 1098 933 Factorial
## 16 1046 1035 Center
## 17 945 936 Factorial
mod10 <- lm(y ~ xA * xB, data=datamod10pred)
mod10
##
## Call:
## lm(formula = y ~ xA * xB, data = datamod10pred)
##
## Coefficients:
## (Intercept) xA xB xA:xB
## 736.2 78.3 -81.6 26.0
contourPlot(mod10, datamod10pred$xA, datamod10pred$xB)
rsmmod10 <- rsm(y ~ SO(xA, xB), data=datamod10pred)
summary(rsmmod10)
##
## Call:
## rsm(formula = y ~ SO(xA, xB), data = datamod10pred)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 752.02 14.52 51.78 1.6e-05 ***
## xA 91.18 6.37 14.32 0.00074 ***
## xB -242.01 14.55 -16.64 0.00047 ***
## xA:xB 43.89 3.61 12.17 0.00119 **
## xA^2 -21.32 3.57 -5.97 0.00940 **
## xB^2 -51.01 4.17 -12.23 0.00118 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.996, Adjusted R-squared: 0.989
## F-statistic: 142 on 5 and 3 DF, p-value: 0.000922
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB) 2 169455 84727 240.7 0.00049
## TWI(xA, xB) 1 21974 21974 62.4 0.00423
## PQ(xA, xB) 2 58643 29322 83.3 0.00235
## Residuals 3 1056 352
## Lack of fit 3 1056 352
## Pure error 0 0
##
## Stationary point of response surface:
## xA xB
## -0.545 -2.607
##
## Eigenanalysis:
## $values
## [1] -9.67 -62.66
##
## $vectors
## [,1] [,2]
## xA -0.8832 -0.4689
## xB -0.4689 0.8832
residuals(rsmmod10)
## 2 5 8 10 11 13 15 16 17
## -5.388 -6.017 -14.089 13.477 18.978 11.167 -9.291 -4.893 -3.944
steepest(rsmmod10)
## Path of steepest ascent from ridge analysis:
## dist xA xB | yhat
## 1 0.0 0.000 0.000 | 752.0
## 2 0.5 0.161 -0.473 | 865.9
## 3 1.0 0.272 -0.962 | 949.4
## 4 1.5 0.274 -1.475 | 1003.7
## 5 2.0 0.057 -1.999 | 1032.1
## 6 2.5 -0.381 -2.471 | 1042.1
## 7 3.0 -0.891 -2.865 | 1040.6
## 8 3.5 -1.403 -3.206 | 1031.2
## 9 4.0 -1.904 -3.518 | 1015.2
## 10 4.5 -2.395 -3.810 | 993.5
## 11 5.0 -2.877 -4.089 | 966.3
contour(rsmmod10, ~ xA*xB)
points(datamod10pred$xA, datamod10pred$xB, pch=19)
text(datamod10pred$xA,datamod10pred$xB,labels=datamod10pred$y)
title("RSM Model 10")
Make predictions for the runs I am considering.
mod10pred <- data.frame(xA=c(1, 2, -0.6), xB=c(-2.9, -3.8, -2.6))
mod10pred
## xA xB
## 1 1.0 -2.9
## 2 2.0 -3.8
## 3 -0.6 -2.6
predict(mod10, newdata=mod10pred)
## 1 2 3
## 975.9 1005.5 942.0
predict(rsmmod10, newdata=mod10pred)
## 1 2 3
## 967.5 698.6 1042.5
contourPlot(mod10, c(datamod10pred$xA, mod10pred$xA),
c(datamod10pred$xB, mod10pred$xB))
contour(rsmmod10, ~ xA*xB)
points(c(datamod10pred$xA, mod10pred$xA), c(datamod10pred$xB, mod10pred$xB), pch=19)
text(datamod10pred$xA,datamod10pred$xB,labels=datamod10pred$y)
title("RSM Model 10 Prediction Points")
The predicted optimum is at -0.545 -2.607
which corresponds to:
A=439 + -0.545 * 0.5 * 26=431.9, B=35 + -2.607 * 0.5 * 8=24.57
Rounding to the nearest integer this gives us a predicted optimum of
A = 432, B = 25
I think this is the best model to use so look at the results in more detail.
contour(rsmmod10, ~ xA*xB, image=TRUE)
image(rsmmod10, ~ xA*xB)
persp(rsmmod10, ~ xA*xB, zlab="Profit per kg ($)")
Now the LL factorial by itself.
# datamod11 <- data[c(1,2,5,8,10,11,13,15),]
datamod11 <- data[c(5,13,15,16,17),]
# Calculate coded values from actual values in spreadsheet
datamod11pred <- datamod11
datamod11pred$xA <- (datamod11pred$A - 426) / (0.5 * 26)
datamod11pred$xB <- (datamod11pred$B - 23.5) / (0.5 * 7)
datamod11pred
## Run. Exp Rand A B C Acen Bcen Arng Brng xA xB xC Gen ypred
## 5 4 5 1 439 27.0 Xylene 452 31 26 8 1 1 1 1.00 1072
## 13 9 10 NA 413 20.0 Xylene 452 31 26 8 -1 -1 1 8.25 1400
## 15 10 11 NA 439 20.0 Xylene 452 31 26 8 1 -1 1 2.75 1098
## 16 11 12 NA 426 23.5 Xylene 452 31 26 8 0 0 1 3.75 1046
## 17 12 13 NA 413 27.0 Xylene 452 31 26 8 -1 1 1 3.00 945
## y Type Notes
## 5 1026 Factorial
## 13 1015 Ascent
## 15 933 Factorial
## 16 1035 Center
## 17 936 Factorial
mod11 <- lm(y ~ xA * xB, data=datamod11pred)
mod11
##
## Call:
## lm(formula = y ~ xA * xB, data = datamod11pred)
##
## Coefficients:
## (Intercept) xA xB xA:xB
## 989.0 2.0 3.5 43.0
contourPlot(mod11, datamod11pred$xA, datamod11pred$xB)
rsmmod11 <- rsm(y ~ FO(xA, xB), data=datamod11pred)
summary(rsmmod11)
##
## Call:
## rsm(formula = y ~ FO(xA, xB), data = datamod11pred)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 989.0 31.7 31.21 0.001 **
## xA 2.0 35.4 0.06 0.960
## xB 3.5 35.4 0.10 0.930
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.00643, Adjusted R-squared: -0.987
## F-statistic: 0.00647 on 2 and 2 DF, p-value: 0.994
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB) 2 65 32 0.01 0.99
## Residuals 2 10041 5020
## Lack of fit 2 10041 5020
## Pure error 0 0
##
## Direction of steepest ascent (at radius 1):
## xA xB
## 0.4961 0.8682
##
## Corresponding increment in original units:
## xA xB
## 0.4961 0.8682
residuals(rsmmod11)
## 5 13 15 16 17
## 31.5 31.5 -54.5 46.0 -54.5
steepest(rsmmod11)
## Path of steepest ascent from ridge analysis:
## dist xA xB | yhat
## 1 0.0 0.000 0.000 | 989.0
## 2 0.5 0.248 0.434 | 991.0
## 3 1.0 0.496 0.868 | 993.0
## 4 1.5 0.744 1.302 | 995.0
## 5 2.0 0.992 1.736 | 997.1
## 6 2.5 1.240 2.171 | 999.1
## 7 3.0 1.488 2.605 | 1001.1
## 8 3.5 1.736 3.039 | 1003.1
## 9 4.0 1.985 3.473 | 1005.1
## 10 4.5 2.233 3.907 | 1007.1
## 11 5.0 2.481 4.341 | 1009.2
contour(rsmmod11, ~ xA*xB)
points(datamod11pred$xA, datamod11pred$xB, pch=19)
text(datamod11pred$xA,datamod11pred$xB,labels=datamod11pred$y)
title("RSM Model 11")
I think model 10 is better (the first order model can not capture the curvature here). Try double weighting the full factorial points in model 10 because they are closest?
This raises one question I have. Given the error (+/- 25) it seems like the points with more divergent y values (more distant, but less subject to error relatively) might be more useful for the quadratic model. Is this a valid/useful observation? (I expect this is only true if the surface is close to quadratic in the given region though.)
In the interests of completeness I think using two runs to complete the Hexane full factorial with a center point is worthwhile. This should help with understanding the process and should also decrease the chances I have missed an even better Hexane optimum.
Made predictions first using modall below.
datamod12 <- data[c(3,4,6,18,19),]
# Calculate coded values from actual values in spreadsheet
datamod12pred <- datamod12
datamod12pred$xA <- (datamod12pred$A - 439) / (0.5 * 26)
datamod12pred$xB <- (datamod12pred$B - 35) / (0.5 * 8)
datamod12pred
## Run. Exp Rand A B C Acen Bcen Arng Brng xA xB xC Gen ypred y
## 3 1 3 1 439 35 Hexane 426 39 26 8 0 0 -1 -1 NA 618
## 4 2 4 2 413 43 Hexane 426 39 26 8 -2 2 -1 -1 NA 25
## 6 5 6 2 465 27 Hexane 452 31 26 8 2 -2 -1 -1 1211 751
## 18 13 14 NA 413 27 Hexane 452 31 26 8 -2 -2 -1 3 777 628
## 19 14 15 NA 465 43 Hexane 452 31 26 8 2 2 -1 3 502 552
## Type Notes
## 3 Factorial
## 4 Factorial
## 6 Factorial
## 18 Factorial
## 19 Factorial
mod12 <- lm(y ~ xA * xB, data=datamod12pred)
mod12
##
## Call:
## lm(formula = y ~ xA * xB, data = datamod12pred)
##
## Coefficients:
## (Intercept) xA xB xA:xB
## 514.8 81.3 -100.2 25.3
contourPlot(mod12, datamod12pred$xA, datamod12pred$xB)
rsmmod12 <- rsm(y ~ FO(xA, xB), data=datamod12pred)
summary(rsmmod12)
##
## Call:
## rsm(formula = y ~ FO(xA, xB), data = datamod12pred)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 514.8 73.6 7.00 0.02 *
## xA 81.3 41.1 1.98 0.19
## xB -100.2 41.1 -2.44 0.14
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.831, Adjusted R-squared: 0.662
## F-statistic: 4.92 on 2 and 2 DF, p-value: 0.169
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB) 2 266426 133213 4.92 0.17
## Residuals 2 54117 27058
## Lack of fit 2 54117 27058
## Pure error 0 0
##
## Direction of steepest ascent (at radius 1):
## xA xB
## 0.6296 -0.7769
##
## Corresponding increment in original units:
## xA xB
## 0.6296 -0.7769
steepest(rsmmod12)
## Path of steepest ascent from ridge analysis:
## dist xA xB | yhat
## 1 0.0 0.000 0.000 | 514.8
## 2 0.5 0.315 -0.388 | 579.3
## 3 1.0 0.630 -0.777 | 643.9
## 4 1.5 0.944 -1.165 | 708.3
## 5 2.0 1.259 -1.554 | 772.9
## 6 2.5 1.574 -1.942 | 837.4
## 7 3.0 1.889 -2.331 | 902.0
## 8 3.5 2.204 -2.719 | 966.5
## 9 4.0 2.519 -3.108 | 1031.0
## 10 4.5 2.833 -3.496 | 1095.5
## 11 5.0 3.148 -3.884 | 1159.9
residuals(rsmmod12)
## 3 4 6 18 19
## 103.2 -126.8 -126.8 75.2 75.2
contour(rsmmod12, ~ xA*xB)
points(datamod12pred$xA, datamod12pred$xB, pch=19)
text(datamod12pred$xA,datamod12pred$xB,labels=datamod12pred$y)
title("RSM Model 12")
These results seem to confirm the choice of the solvent Xylene for the more detailed analysis.
The model has a problem with lack of fit indicating there is curvature here. It looks like the problem is the upper left point (y = 25!) so let’s also look at a model ignoring that point.
datamod12a <- data[c(3,6,18,19),]
# Calculate coded values from actual values in spreadsheet
datamod12apred <- datamod12a
datamod12apred$xA <- (datamod12apred$A - 439) / (0.5 * 26)
datamod12apred$xB <- (datamod12apred$B - 35) / (0.5 * 8)
datamod12apred
## Run. Exp Rand A B C Acen Bcen Arng Brng xA xB xC Gen ypred y
## 3 1 3 1 439 35 Hexane 426 39 26 8 0 0 -1 -1 NA 618
## 6 5 6 2 465 27 Hexane 452 31 26 8 2 -2 -1 -1 1211 751
## 18 13 14 NA 413 27 Hexane 452 31 26 8 -2 -2 -1 3 777 628
## 19 14 15 NA 465 43 Hexane 452 31 26 8 2 2 -1 3 502 552
## Type Notes
## 3 Factorial
## 6 Factorial
## 18 Factorial
## 19 Factorial
mod12a <- lm(y ~ xA * xB, data=datamod12apred)
mod12a
##
## Call:
## lm(formula = y ~ xA * xB, data = datamod12apred)
##
## Coefficients:
## (Intercept) xA xB xA:xB
## 618.0 16.7 -35.7 -7.0
contourPlot(mod12a, datamod12apred$xA, datamod12apred$xB)
rsmmod12a <- rsm(y ~ FO(xA, xB), data=datamod12apred)
summary(rsmmod12a)
##
## Call:
## rsm(formula = y ~ FO(xA, xB), data = datamod12apred)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 599.33 13.20 45.41 0.014 *
## xA 28.42 7.74 3.67 0.169
## xB -47.42 7.74 -6.13 0.103
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.975, Adjusted R-squared: 0.924
## F-statistic: 19.3 on 2 and 1 DF, p-value: 0.159
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB) 2 20140 10070 19.3 0.16
## Residuals 1 523 523
## Lack of fit 1 523 523
## Pure error 0 0
##
## Direction of steepest ascent (at radius 1):
## xA xB
## 0.5141 -0.8578
##
## Corresponding increment in original units:
## xA xB
## 0.5141 -0.8578
steepest(rsmmod12a)
## Path of steepest ascent from ridge analysis:
## dist xA xB | yhat
## 1 0.0 0.000 0.000 | 599.3
## 2 0.5 0.257 -0.429 | 627.0
## 3 1.0 0.514 -0.858 | 654.6
## 4 1.5 0.771 -1.287 | 682.3
## 5 2.0 1.028 -1.716 | 709.9
## 6 2.5 1.285 -2.144 | 737.5
## 7 3.0 1.542 -2.573 | 765.2
## 8 3.5 1.799 -3.002 | 792.8
## 9 4.0 2.056 -3.431 | 820.4
## 10 4.5 2.313 -3.860 | 848.1
## 11 5.0 2.570 -4.289 | 875.7
residuals(rsmmod12a)
## 3 6 18 19
## 1.867e+01 8.882e-16 -9.333e+00 -9.333e+00
contour(rsmmod12a, ~ xA*xB)
points(datamod12apred$xA, datamod12apred$xB, pch=19)
text(datamod12apred$xA,datamod12apred$xB,labels=datamod12apred$y)
title("RSM Model 12a")
This model is even less optimistic about the value of a Hexane optimum.
Try combining all the runs in a single model.
dataall <- data[c(1:6,8,10,11,13,15:19),]
dataall
## Run. Exp Rand A B C Acen Bcen Arng Brng xA xB xC Gen
## 1 0 1 NA 439 43.0 Xylene 426 39 26 8 1 1.000 1 1.00
## 2 3 2 3 413 35.0 Xylene 426 39 26 8 -1 -1.000 1 1.00
## 3 1 3 1 439 35.0 Hexane 426 39 26 8 1 -1.000 -1 -1.00
## 4 2 4 2 413 43.0 Hexane 426 39 26 8 -1 1.000 -1 -1.00
## 5 4 5 1 439 27.0 Xylene 452 31 26 8 -1 -1.000 1 1.00
## 6 5 6 2 465 27.0 Hexane 452 31 26 8 1 -1.000 -1 -1.00
## 8 6 7 3 465 35.0 Xylene 452 31 26 8 1 1.000 1 1.00
## 10 8 8 2 465 27.0 Xylene 452 31 26 8 1 -1.000 1 -1.00
## 11 7 9 1 439 35.0 Xylene 452 31 26 8 -1 1.000 1 -1.00
## 13 9 10 NA 413 20.0 Xylene 452 31 26 8 -3 -2.750 1 8.25
## 15 10 11 NA 439 20.0 Xylene 452 31 26 8 -1 -2.750 1 2.75
## 16 11 12 NA 426 23.5 Xylene 452 31 26 8 -2 -1.875 1 3.75
## 17 12 13 NA 413 27.0 Xylene 452 31 26 8 -3 -1.000 1 3.00
## 18 13 14 NA 413 27.0 Hexane 452 31 26 8 -3 -1.000 -1 3.00
## 19 14 15 NA 465 43.0 Hexane 452 31 26 8 1 3.000 -1 3.00
## ypred y Type Notes
## 1 NA 515 Factorial Baseline
## 2 NA 479 Factorial
## 3 NA 618 Factorial
## 4 NA 25 Factorial
## 5 1072 1026 Factorial
## 6 1211 751 Factorial
## 8 1108 835 Factorial
## 10 1147 967 Factorial
## 11 714 771 Factorial
## 13 1400 1015 Ascent
## 15 1098 933 Factorial
## 16 1046 1035 Center
## 17 945 936 Factorial
## 18 777 628 Factorial
## 19 502 552 Factorial
# Try using coded data
codeddata <- coded.data(dataall[,c("A", "B", "C", "y")],
xA ~ (A - 439) / 26,
xB ~ (B - 35) / 8)
codeddata
## A B C y
## 1 439 43.0 Xylene 515
## 2 413 35.0 Xylene 479
## 3 439 35.0 Hexane 618
## 4 413 43.0 Hexane 25
## 5 439 27.0 Xylene 1026
## 6 465 27.0 Hexane 751
## 8 465 35.0 Xylene 835
## 10 465 27.0 Xylene 967
## 11 439 35.0 Xylene 771
## 13 413 20.0 Xylene 1015
## 15 439 20.0 Xylene 933
## 16 426 23.5 Xylene 1035
## 17 413 27.0 Xylene 936
## 18 413 27.0 Hexane 628
## 19 465 43.0 Hexane 552
##
## Data are stored in coded form using these coding formulas ...
## xA ~ (A - 439)/26
## xB ~ (B - 35)/8
str(codeddata)
## Classes 'coded.data' and 'data.frame': 15 obs. of 4 variables:
## $ xA: num 0 -1 0 -1 0 1 1 1 0 -1 ...
## $ xB: num 1 0 0 1 -1 ...
## $ C : Factor w/ 2 levels "Hexane","Xylene": 2 2 1 1 2 1 2 2 2 2 ...
## $ y : int 515 479 618 25 1026 751 835 967 771 1015 ...
## - attr(*, "codings")=List of 2
## ..$ xA:Class 'formula' length 3 xA ~ (A - 439)/26
## .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## ..$ xB:Class 'formula' length 3 xB ~ (B - 35)/8
## .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## - attr(*, "rsdes")=List of 2
## ..$ primary: chr "xA" "xB"
## ..$ call : language coded.data(data = dataall[, c("A", "B", "C", "y")], xA ~ (A - 439)/26, xB ~ (B - 35)/8)
codeddata$xA
## [1] 0.0 -1.0 0.0 -1.0 0.0 1.0 1.0 1.0 0.0 -1.0 0.0 -0.5 -1.0 -1.0
## [15] 1.0
codeddata$xB
## [1] 1.000 0.000 0.000 1.000 -1.000 -1.000 0.000 -1.000 0.000 -1.875
## [11] -1.875 -1.438 -1.000 -1.000 1.000
#rsmmodall <- rsm(y ~ SO(xA, xB, C), data=codeddata)
rsmmodall <- rsm(y ~ FO(xA, xB, C) + TWI(xA, xB, C) + PQ(xA, xB), data=codeddata)
#rsmmodall <- rsm(y ~ FO(xA, xB, C), data=dataall)
#rsmmodall <- rsm(y ~ SO(A, B, C), data=dataall)
#rsmmodall <- rsm(y ~ FO(A, B, C) + TWI(A, B, C) + PQ(A, B), data=dataall)
summary(rsmmodall)
##
## Call:
## rsm(formula = y ~ FO(xA, xB, C) + TWI(xA, xB, C) + PQ(xA, xB),
## data = codeddata)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 470.9 90.4 5.21 0.0020 **
## xA 174.6 80.1 2.18 0.0719 .
## xB -146.4 80.5 -1.82 0.1188
## C 162.0 45.5 3.56 0.0119 *
## xA:xB 117.5 29.1 4.04 0.0068 **
## xA:C -12.1 51.9 -0.23 0.8227
## xB:C -54.1 52.5 -1.03 0.3429
## xA^2 -74.8 40.8 -1.83 0.1163
## xB^2 -72.8 27.6 -2.63 0.0389 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.972, Adjusted R-squared: 0.935
## F-statistic: 26.3 on 8 and 6 DF, p-value: 0.000399
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB, C) 3 925678 308559 62.23 6.5e-05
## TWI(xA, xB, C) 3 74097 24699 4.98 0.046
## PQ(xA, xB) 2 44953 22476 4.53 0.063
## Residuals 6 29750 4958
## Lack of fit 6 29750 4958
## Pure error 0 0
##
## Stationary point of response surface:
## xA xB C
## 3.133 2.292 -2.069
##
## Stationary point in original units:
## A B C
## 520.466 53.334 -2.069
##
## Eigenanalysis:
## $values
## [1] 18.13 -31.54 -134.18
##
## $vectors
## [,1] [,2] [,3]
## xA -0.3480 0.6313 0.6931
## xB -0.4668 0.5245 -0.7121
## C 0.8130 0.5713 -0.1121
residuals(rsmmodall)
## 1 2 3 4 5 6 8 10 11
## 47.456 -90.703 -14.872 20.168 49.284 20.168 -35.406 32.215 -23.887
## 13 15 16 17 18 19
## 3.516 -83.449 33.958 67.018 -12.732 -12.732
# modallpred <- data.frame(A=c(413, 465), B=c(27, 43), C=c("Hexane", "Hexane"))
# modallpred
# predict(rsmmodall, newdata=modallpred)
# at is not working properly here?
# contour(rsmmodall, ~ A*B, at=list(C="Xylene"))
# points(datamodallpred$xA, datamodallpred$xB, pch=19)
# text(datamodallpred$xA,datamodallpred$xB,labels=datamodallpred$y)
# title("RSM Model All")
I have been unable to work out how to use discrete factors (here C, solvent) effectively in rsm so focus on solvent specific models as above.
The residuals were much (~4x) larger with the baseline included so omitted here.
dataallx <- data[c(2,5,8,10,11,13,15,16,17),]
dataallx
## Run. Exp Rand A B C Acen Bcen Arng Brng xA xB xC Gen
## 2 3 2 3 413 35.0 Xylene 426 39 26 8 -1 -1.000 1 1.00
## 5 4 5 1 439 27.0 Xylene 452 31 26 8 -1 -1.000 1 1.00
## 8 6 7 3 465 35.0 Xylene 452 31 26 8 1 1.000 1 1.00
## 10 8 8 2 465 27.0 Xylene 452 31 26 8 1 -1.000 1 -1.00
## 11 7 9 1 439 35.0 Xylene 452 31 26 8 -1 1.000 1 -1.00
## 13 9 10 NA 413 20.0 Xylene 452 31 26 8 -3 -2.750 1 8.25
## 15 10 11 NA 439 20.0 Xylene 452 31 26 8 -1 -2.750 1 2.75
## 16 11 12 NA 426 23.5 Xylene 452 31 26 8 -2 -1.875 1 3.75
## 17 12 13 NA 413 27.0 Xylene 452 31 26 8 -3 -1.000 1 3.00
## ypred y Type Notes
## 2 NA 479 Factorial
## 5 1072 1026 Factorial
## 8 1108 835 Factorial
## 10 1147 967 Factorial
## 11 714 771 Factorial
## 13 1400 1015 Ascent
## 15 1098 933 Factorial
## 16 1046 1035 Center
## 17 945 936 Factorial
# Try using coded data
codeddata <- coded.data(dataallx[,c("A", "B", "y")],
xA ~ (A - 439) / 26,
xB ~ (B - 35) / 8)
codeddata
## A B y
## 2 413 35.0 479
## 5 439 27.0 1026
## 8 465 35.0 835
## 10 465 27.0 967
## 11 439 35.0 771
## 13 413 20.0 1015
## 15 439 20.0 933
## 16 426 23.5 1035
## 17 413 27.0 936
##
## Data are stored in coded form using these coding formulas ...
## xA ~ (A - 439)/26
## xB ~ (B - 35)/8
str(codeddata)
## Classes 'coded.data' and 'data.frame': 9 obs. of 3 variables:
## $ xA: num -1 0 1 1 0 -1 0 -0.5 -1
## $ xB: num 0 -1 0 -1 0 ...
## $ y : int 479 1026 835 967 771 1015 933 1035 936
## - attr(*, "codings")=List of 2
## ..$ xA:Class 'formula' length 3 xA ~ (A - 439)/26
## .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## ..$ xB:Class 'formula' length 3 xB ~ (B - 35)/8
## .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## - attr(*, "rsdes")=List of 2
## ..$ primary: chr "xA" "xB"
## ..$ call : language coded.data(data = dataallx[, c("A", "B", "y")], xA ~ (A - 439)/26, xB ~ (B - 35)/8)
codeddata$xA
## [1] -1.0 0.0 1.0 1.0 0.0 -1.0 0.0 -0.5 -1.0
codeddata$xB
## [1] 0.000 -1.000 0.000 -1.000 0.000 -1.875 -1.875 -1.438 -1.000
rsmmodallx <- rsm(y ~ SO(xA, xB), data=codeddata)
summary(rsmmodallx)
##
## Call:
## rsm(formula = y ~ SO(xA, xB), data = codeddata)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 752.0 14.5 51.78 1.6e-05 ***
## xA 182.4 12.7 14.32 0.00074 ***
## xB -484.0 29.1 -16.64 0.00047 ***
## xA:xB 175.6 14.4 12.17 0.00119 **
## xA^2 -85.3 14.3 -5.97 0.00940 **
## xB^2 -204.0 16.7 -12.23 0.00118 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.996, Adjusted R-squared: 0.989
## F-statistic: 142 on 5 and 3 DF, p-value: 0.000922
##
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(xA, xB) 2 169455 84727 240.7 0.00049
## TWI(xA, xB) 1 21974 21974 62.4 0.00423
## PQ(xA, xB) 2 58643 29322 83.3 0.00235
## Residuals 3 1056 352
## Lack of fit 3 1056 352
## Pure error 0 0
##
## Stationary point of response surface:
## xA xB
## -0.2725 -1.3034
##
## Stationary point in original units:
## A B
## 431.91 24.57
##
## Eigenanalysis:
## $values
## [1] -38.68 -250.62
##
## $vectors
## [,1] [,2]
## xA -0.8832 -0.4689
## xB -0.4689 0.8832
residuals(rsmmodallx)
## 2 5 8 10 11 13 15 16 17
## -5.388 -6.017 -14.089 13.477 18.978 11.167 -9.291 -4.893 -3.944
contour(rsmmodallx, ~ xA*xB)
#points(codeddata$xA, codeddata$xB, pch=19)
points(dataallx$A, dataallx$B, pch=19)
text(dataallx$A,dataallx$B,labels=dataallx$y)
title("RSM Model All Xylene")
contour(rsmmodallx, ~ xA*xB, at = xs(rsmmodallx),
xlim=c(412,440), ylim=c(19.8,27.2),
image=TRUE, levels = pretty(c(900,1050), 30))
## Warning: "levels" is not a graphical parameter
## Warning: "levels" is not a graphical parameter
## Warning: "levels" is not a graphical parameter
## Warning: "levels" is not a graphical parameter
## Warning: "levels" is not a graphical parameter
## Warning: "levels" is not a graphical parameter
points(dataallx$A, dataallx$B, pch=19)
text(dataallx$A,dataallx$B,labels=dataallx$y)
title("RSM Model All Xylene")
contour(rsmmodallx, ~ xA*xB, at = xs(rsmmodallx),
xlim=c(390,480), ylim=c(19,50),
image=TRUE)
points(dataallx$A, dataallx$B, pch=19)
text(dataallx$A,dataallx$B,labels=dataallx$y)
title("RSM Model All Xylene")
Based on the results above I predict an optimum near (based on RSM model 10):
A = 432, B = 25
Given that the predicted value there is only 8 or 17 more than the nearest two runs while the error is +/- 25 I don’t think it is worth making a run to confirm that.
Xylene appears superior to Hexane as a solvent.
Shorter batch durations are generally superior to longer.
Optimal temperature is near the middle of the range with some dependence on batch duration (higher temperature for longer duration, which is nonintuitive to me).
I wish I had gone through the rsm package vignettes before doing this assignment.
There are three: rsm.pdf rs-illus.pdf rsm-plots.pdf
Note that my usage of the rsm package was less sophisticated when I initially did this (e.g. I only used SO models so could not plot contours until I had enough points).
I still don’t understand how to choose specific values for discrete factors.
The score modifier: \[5.0 \times \dfrac{\text{Your optimum} - \text{Baseline}}{\text{True optimum} - \text{Baseline}} - 0.25N + 3.0\] where \(N\) is your number of experiments
looks like a good benchmark. Let’s try estimating the behavior of this over a range of \(N\) and \(optimum\)
Understanding this behavior should be helpful when trying to choose a stopping point.
Remember: “The grading for this question will be marked mostly on the systematic methodology used to approach the optimum.”
# Take a guess at the true optimum (try different values)
#trueOpt <- 1500
trueOpt <- 1200
baseline <- 515
N <- 1:20
opt <- seq(515, trueOpt, 20)
grd <- expand.grid(N, opt)
colnames(grd) <- c("N", "opt")
scoreMod <- function(N, opt, trueOpt=1500) {
5.0 * (opt - baseline) / (trueOpt - baseline) - 0.25 * N + 3.0
}
grd$score <- with(grd, 5.0 * (opt - baseline) / (trueOpt - baseline) - 0.25 * N + 3.0)
library(lattice)
## Warning: package 'lattice' was built under R version 3.0.3
contourplot(score ~ N + opt, data=grd, #xlim=v, ylim=v,
cuts=10, region=TRUE,
main="Estimated Score Modifier by N and Optimum Achieved",
col.regions = terrain.colors(100))
These comments based on trueBaseline = 1500
If all I wanted was to maximize this score metric I would seriously consider stopping after experiment 4 given that the estimated score modifier of 4.6 looks quite good (written before I performed experiment 5).
The prediction for experiment 5 is promising though. Assuming a similar prediction error as for experiment 4 we would have a score modifier of 5. In reality the actual result for exp 5 was much worse than the prediction.