Recipe 9: Response Surface Methods

Recipe for Response Surface Methods

Wei Zou

RPI

Nov 30, 14; Version 1

1. Setting

System under test

The purpose of this project is to use the response surface method to generate sequential groups of experiments, thus helping us to: 1. Exam the second order effects of the selected factors on the response variables. 2. To reach an optimum (stationary point) efficiently.

The dataset used for this experiment is the “Diamond” from the “Ecdat Package” in R, which is used to explore the influencing factors of diamond stone prices in Singapore. The original dataset has 308 observations with 5 variables. In this study, we focus on the effects of the following 4 factors : 1. carat(weight of diamond stones in carat unit) 2. colour (with 6 levels) 3. clarity (with 5 levels) 4. certification (certification body with 3 levels)

The null hypothesis and alternative hyphothesis of the experiment can be stated as:

H0: The variation in diamond stone price in Singapore is due to sample randomization only. (i.e, the selected 4 factors cannot explain the variation in state unemployment rates, including the main effects, 2 factor interaction effects, and pure quadratic effects) HA: The variation in diamond sone price in Singapore is due to something else other than sample randomization (i.e., the selected 4 factors (including their main effects, 2 factor interaction effects, and pure quadratic effects) may affect the state unemployment rates)

To test the hypothesis, we first estimate a second order 4 factor linear regression to exam the variations in diamond stone prices. Then we apply the response surface methods to the same four factors, compare the model estimates (including residues, coefficients and ANOVA results) to the linear regression model results, and determine the stationary point of the second order surface.

#Read in the data 
library("rsm", lib.loc="~/R/win-library/3.1")

## Warning: package 'rsm' was built under R version 3.1.2

 library("Ecdat", lib.loc="~/R/win-library/3.1")

## Loading required package: Ecfun
## 
## Attaching package: 'Ecdat'
## 
## The following object is masked from 'package:datasets':
## 
##     Orange

data1<- Diamond
data1<-na.omit(data1)
summary(data1)

##      carat       colour clarity   certification     price      
##  Min.   :0.180   D:16   IF  :44   GIA:151       Min.   :  638  
##  1st Qu.:0.350   E:44   VS1 :81   HRD: 79       1st Qu.: 1625  
##  Median :0.620   F:82   VS2 :53   IGI: 78       Median : 4215  
##  Mean   :0.631   G:65   VVS1:52                 Mean   : 5019  
##  3rd Qu.:0.850   H:61   VVS2:78                 3rd Qu.: 7446  
##  Max.   :1.100   I:40                           Max.   :16008

attach(data1)

Data Summary

Factor: The original data have 3 factors, we generate the fourth one by re-grouping the “Carat” variable: X1:1 if carat <=0.3, 2 if 0.30.6 X2:1 if colour is “D”, 2 if colour is “E”, 3 if colour is “F”, 4 if colour is “G”, 5 if colour is “H”, 6 if colour is “I” X3: 1 if clarity is “VS1”, 2 if clarity is “VS2”, 3 if clarity is “VVS1”, 4 if clarity is “VVS2” X4: 1 if the certification body is “GIA”, 2 if the certification body is “HRD”, 3 if the certification body is “IGI”

Continuous variable and Response Variable: In this study, we treat “price” (the price of the diamond stone) as the response variable.

Organization: Singfat Chu at the National University of Singapore used data pertaining to 308 diamond stones listed in an advertisement to conduct his study on the pricing of the C’s of Diamond Stones. The dataset has information on the price of the diamond stone and 4Cs (Carat, Clarity, Colour and Cut) that charaterize the quality of a diamond stone.

Randomization: The data were randomly selected among the diamond stones available in the Singapore market, however, there is no randomize assignment and random execution order in the experiment.

#data recoding
data1$carat1<-NA
data1$carat1[carat<=0.3]<-1
data1$carat1[carat>0.3&carat<= 0.6]<-2
data1$carat1[carat>0.6]<-3

data1$colour1<-NA
data1$colour1[colour == "D"]<-1
data1$colour1[colour == "E"]<-2
data1$colour1[colour == "F"]<-3
data1$colour1[colour == "G"]<-4
data1$colour1[colour == "H"]<-5
data1$colour1[colour == "I"]<-6

data1$clarity1<-NA
data1$clarity1[clarity == "IF"]<-1
data1$clarity1[clarity == "VS1"]<-2
data1$clarity1[clarity == "VS2"]<-3
data1$clarity1[clarity == "VVS1"]<-4
data1$clarity1[clarity == "VVS2"]<-5

data1$certification1[certification == "GIA"]<-1
data1$certification1[certification == "HRD"]<-2
data1$certification1[certification == "IGI"]<-3

data1$x1<-as.integer(data1$carat1)
data1$x2<-as.integer(data1$colour1)
data1$x3<-as.integer(data1$clarity1)
data1$x4<-as.integer(data1$certification1)

attach(data1)

## The following objects are masked from data1 (pos = 3):
## 
##     carat, certification, clarity, colour, price

summary(data1)

##      carat       colour clarity   certification     price      
##  Min.   :0.180   D:16   IF  :44   GIA:151       Min.   :  638  
##  1st Qu.:0.350   E:44   VS1 :81   HRD: 79       1st Qu.: 1625  
##  Median :0.620   F:82   VS2 :53   IGI: 78       Median : 4215  
##  Mean   :0.631   G:65   VVS1:52                 Mean   : 5019  
##  3rd Qu.:0.850   H:61   VVS2:78                 3rd Qu.: 7446  
##  Max.   :1.100   I:40                           Max.   :16008  
##      carat1        colour1        clarity1    certification1
##  Min.   :1.00   Min.   :1.00   Min.   :1.00   Min.   :1.00  
##  1st Qu.:2.00   1st Qu.:3.00   1st Qu.:2.00   1st Qu.:1.00  
##  Median :3.00   Median :4.00   Median :3.00   Median :2.00  
##  Mean   :2.35   Mean   :3.75   Mean   :3.13   Mean   :1.76  
##  3rd Qu.:3.00   3rd Qu.:5.00   3rd Qu.:5.00   3rd Qu.:3.00  
##  Max.   :3.00   Max.   :6.00   Max.   :5.00   Max.   :3.00  
##        x1             x2             x3             x4      
##  Min.   :1.00   Min.   :1.00   Min.   :1.00   Min.   :1.00  
##  1st Qu.:2.00   1st Qu.:3.00   1st Qu.:2.00   1st Qu.:1.00  
##  Median :3.00   Median :4.00   Median :3.00   Median :2.00  
##  Mean   :2.35   Mean   :3.75   Mean   :3.13   Mean   :1.76  
##  3rd Qu.:3.00   3rd Qu.:5.00   3rd Qu.:5.00   3rd Qu.:3.00  
##  Max.   :3.00   Max.   :6.00   Max.   :5.00   Max.   :3.00

2. (Experimental) Design

Organization of the Experiment and Rationale for this design

In this sample recipe, the effects of the selected 4 factors on the diamond stone price are studied. In the first,a linear regression analysis is performed on the data to verify if the variation in diamond stone pricing are due to pure sample randomization.

Then we apply the response surface methods to the same data, estimate the residuls, coefficients and ANOVA, the optimum (stationary point) is also estimated.By doing so, we can test the hypothesis, and reach an optimum efficiently.

Randomize, Replicate and Block

As discussed in the previous section, the data are randomly selected, however, they are not randomly assigned and executed.

There are no replicated and/or blocking used in this experiment.

3. (Statistical) Analysis for the Original Data

(Exploratory Data Analysis) Graphics and descriptive summary

The boxplot shows that the diamond stone prices do vary a lot among different levels of the selected factors, indicating that the variation of diamond stone prices may due to the variation in diamond weight/diamond colour/diamond clarity/diamond certification body, or their interaction effects.

#bloxplots 
par(mfrow=c(2,2))
boxplot(price~data1$carat1,xlab="Carat",ylab="Diamond Price (dollars)")
boxplot(price~data1$colour,xlab="Colour",ylab="Diamond Price (dollars)")
boxplot(price~data1$clarity,xlab="Clarity",ylab="Diamond Price (dollars)")
boxplot(price~data1$certification,xlab="Certification",ylab="Diamond Price (dollars)")

plot of chunk unnamed-chunk-3 ### Linear Regression Testing

According to the 2nd order linear regression result, the main effect of “certification X4”" (p-value = 0.0176), the interaction effect of “colour X2” and “clarity X3” (p-value = 0.0346), and the pure quadratic effect of “carat X1” (p-value = 1.28e-7), “certification X4” (p-value = 0.0033) are shown to be statistically siginificant at a 0.05 level. Therefore we reject the null hypothesis that the variation in diamond stone price is due to sample randomization only. The estimated R square is 0.74, indicating that 74% of the variation in diamond stone prices can be explained by the selected four variables (includes their main effects, 2 factor interaction effects, and pure quadratic effects).

model1 = lm(price~(x1+x2+x3+x4)^2+I(x1^2)+I(x2^2)+I(x3^2)+I(x4^2),data=data1)
summary(model1)

## 
## Call:
## lm(formula = price ~ (x1 + x2 + x3 + x4)^2 + I(x1^2) + I(x2^2) + 
##     I(x3^2) + I(x4^2), data = data1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -3728  -1177    -43   1164   5859 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -370.5     3001.1   -0.12   0.9018    
## x1           -2368.8     1714.5   -1.38   0.1681    
## x2            -282.6      513.4   -0.55   0.5825    
## x3             191.4      647.5    0.30   0.7677    
## x4            3514.2     1471.6    2.39   0.0176 *  
## I(x1^2)       1729.4      319.4    5.41  1.3e-07 ***
## I(x2^2)         93.9       49.2    1.91   0.0575 .  
## I(x3^2)        102.3       76.4    1.34   0.1812    
## I(x4^2)       -880.7      297.3   -2.96   0.0033 ** 
## x1:x2         -251.4      123.5   -2.04   0.0427 *  
## x1:x3         -104.6      119.8   -0.87   0.3834    
## x1:x4          -27.2      258.3   -0.11   0.9161    
## x2:x3         -118.0       55.6   -2.12   0.0346 *  
## x2:x4           42.0      105.1    0.40   0.6896    
## x3:x4          -66.3      106.2   -0.62   0.5327    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1740 on 293 degrees of freedom
## Multiple R-squared:  0.749,  Adjusted R-squared:  0.737 
## F-statistic: 62.5 on 14 and 293 DF,  p-value: <2e-16

aov(model1)

## Call:
##    aov(formula = model1)
## 
## Terms:
##                        x1        x2        x3        x4   I(x1^2)
## Sum of Squares  2.265e+09 1.078e+08 1.972e+06 4.844e+07 1.424e+08
## Deg. of Freedom         1         1         1         1         1
##                   I(x2^2)   I(x3^2)   I(x4^2)     x1:x2     x1:x3
## Sum of Squares  7.022e+06 6.224e+06 2.883e+07 3.760e+07 2.942e+06
## Deg. of Freedom         1         1         1         1         1
##                     x1:x4     x2:x3     x2:x4     x3:x4 Residuals
## Sum of Squares  4.975e+04 1.360e+07 5.035e+05 1.188e+06 8.919e+08
## Deg. of Freedom         1         1         1         1       293
## 
## Residual standard error: 1745
## Estimated effects may be unbalanced

Analysis using the Response Surface Methods

We then apply the reponse surface methods to the same data, estimate the residuals, coefficients and ANOVA. The model estimates are consistent with the 2nd order linear regression model estimates: he main effect of “certification X4”" (p-value = 0.0176), the interaction effect of “carat X1” and “colour X2” (p-value = 0.0427), “carat X1” and “colour X2” (p-value = 0.0427), “colour X2” and “clarity X3” (p-value = 0.0346), and the pure quadratic effect of “carat X1” (p-value = 1.28e-7), “certification X4” (p-value = 0.0033) are shown to be statistically siginificant at a 0.05 level. Therefore we reject the null hypothesis that the variation in diamond stone price is due to sample randomization only. The estimated R square is 0.75, indicating that 75% of the variation in diamond stone prices can be explained by the selected four variables (includes their main effects, 2 factor interaction effects, and pure quadratic effects).

The ANOVA estimates further reject the null hyphothesis by showing that the First order main effects of the selected 4 factors, their two-way interaction effects, and the pure quadractic contribute significantly to the model, so the canonical analysis is relevant. The lack of fit estimate is statistically significant at the 0.05 level, indicating that this model specification can be further improved (for example, involve higher order estimates, exlude the insignificant variables). However, since the purpose of this study is to exam all the second order effects, and the lack of fit becomes insignificant is we reduce the level to 0.01, we retain all the varaibles in the final model (both the statistically significant ones and statistically insignificant ones).

The summary for a second-order model provides results of a canonical analysis of the surface, and the stationary point of the fitted surface is at (x1=1.09, x2 = 4.21, x3 = 2.69, x4 = 1.98). The Eigenvalues are of mixed sign, indicating that it is a saddle point (neither a maximum nor a minimum). We further confirm this conclusion in the following contour plots.

model2=rsm(price~SO(x1,x2,x3,x4), data=data1)
summary(model2)

## 
## Call:
## rsm(formula = price ~ SO(x1, x2, x3, x4), data = data1)
## 
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -370.5     3001.1   -0.12   0.9018    
## x1           -2368.8     1714.5   -1.38   0.1681    
## x2            -282.6      513.4   -0.55   0.5825    
## x3             191.4      647.5    0.30   0.7677    
## x4            3514.2     1471.6    2.39   0.0176 *  
## x1:x2         -251.4      123.5   -2.04   0.0427 *  
## x1:x3         -104.6      119.8   -0.87   0.3834    
## x1:x4          -27.2      258.3   -0.11   0.9161    
## x2:x3         -118.0       55.6   -2.12   0.0346 *  
## x2:x4           42.0      105.1    0.40   0.6896    
## x3:x4          -66.3      106.2   -0.62   0.5327    
## x1^2          1729.4      319.4    5.41  1.3e-07 ***
## x2^2            93.9       49.2    1.91   0.0575 .  
## x3^2           102.3       76.4    1.34   0.1812    
## x4^2          -880.7      297.3   -2.96   0.0033 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.749,  Adjusted R-squared:  0.737 
## F-statistic: 62.5 on 14 and 293 DF,  p-value: <2e-16
## 
## Analysis of Variance Table
## 
## Response: price
##                      Df   Sum Sq  Mean Sq F value  Pr(>F)
## FO(x1, x2, x3, x4)    4 2.42e+09 6.06e+08  199.02 < 2e-16
## TWI(x1, x2, x3, x4)   6 9.11e+07 1.52e+07    4.99 7.0e-05
## PQ(x1, x2, x3, x4)    4 1.49e+08 3.73e+07   12.26 3.1e-09
## Residuals           293 8.92e+08 3.04e+06                
## Lack of fit         106 3.85e+08 3.63e+06    1.34   0.042
## Pure error          187 5.07e+08 2.71e+06                
## 
## Stationary point of response surface:
##    x1    x2    x3    x4 
## 1.088 4.213 2.691 1.977 
## 
## Eigenanalysis:
## $values
## [1] 1740.47  157.39   29.29 -882.29
## 
## $vectors
##         [,1]     [,2]     [,3]     [,4]
## x1  0.996737  0.02778 -0.07562 -0.00496
## x2 -0.075129  0.65661 -0.75024  0.01888
## x3 -0.029013 -0.75281 -0.65678 -0.03279
## x4 -0.005415  0.03697  0.00775 -0.99927

aov(model2)

## Call:
##    aov(formula = model2)
## 
## Terms:
##                 FO(x1, x2, x3, x4) TWI(x1, x2, x3, x4) PQ(x1, x2, x3, x4)
## Sum of Squares           2.423e+09           9.106e+07          1.493e+08
## Deg. of Freedom                  4                   6                  4
##                 Residuals
## Sum of Squares  8.919e+08
## Deg. of Freedom       293
## 
## Residual standard error: 1745
## Estimated effects may be unbalanced

There are four response surface predictors, making six pairs of predicators. The first plot (upper left) depits a slice of the contour surface in x1 (carat) and x2 (colour), where x3 (clarity) = 2.69 and x4 (certification) = 1.98, the stationary values. The minimum (= 2000 dolloars) for this slice is at coordinates x1 = 1.09 and x2 = 4.21. The remaining 5 plots can be understood similarly, and it is obvious that no local maximum/minimum is achieved in plot 3, plot 5 and plot 6.

The 3-D interaction plots further confirmed the above findings: the estimated stationary points are saddle points (not local maximum/minimum) of surfaces in plot 3, 5 and 6.

#contour plots 
par(mfrow=c(2,3))
contour(model2, ~x1+x2+x3+x4, image=TRUE, at=summary(model2$canoncial$xs))

plot of chunk unnamed-chunk-6

persp(model2,~x1+x2,image=TRUE,contour="colors",zlab="Price",theta=30)

## Warning: "image" is not a graphical parameter
## Warning: "image" is not a graphical parameter
## Warning: "image" is not a graphical parameter

persp(model2,~x1+x3,image=TRUE,contour="colors",zlab="Price",theta=30)

## Warning: "image" is not a graphical parameter
## Warning: "image" is not a graphical parameter
## Warning: "image" is not a graphical parameter

persp(model2,~x1+x4,image=TRUE,contour="colors",zlab="Price",theta=30)

## Warning: "image" is not a graphical parameter
## Warning: "image" is not a graphical parameter
## Warning: "image" is not a graphical parameter

persp(model2,~x2+x3,image=TRUE,contour="colors",zlab="Price",theta=30)

## Warning: "image" is not a graphical parameter
## Warning: "image" is not a graphical parameter
## Warning: "image" is not a graphical parameter

persp(model2,~x2+x4,image=TRUE,contour="colors",zlab="Price",theta=30)

## Warning: "image" is not a graphical parameter
## Warning: "image" is not a graphical parameter
## Warning: "image" is not a graphical parameter

persp(model2,~x3+x4,image=TRUE,contour="colors",zlab="Price",theta=30)

## Warning: "image" is not a graphical parameter
## Warning: "image" is not a graphical parameter
## Warning: "image" is not a graphical parameter

plot of chunk unnamed-chunk-6

6. References to the literature

http://cran.r-project.org/web/packages/Ecdat/Ecdat.pdf