Setting up RMarkdown when opening it enables you to create dynamic, reproducible, and visually appealing reports, presentations, and documents, that can help you communicate your data analysis and research findings more effectively.
library(FrF2)
library(DoE.base)
library(conf.design)
library(gridBase)
This experiment is designed to enable us to estimate the interaction and quadratic effects, and thus give us an idea of the shape of the response surface we are investigating. For this reason, this experiment is termed response surface method (RSM) designs. RSM designs are used for the following objectives:
“There is no effect of the variability in each factor/independent variable used (Cattle manure, Poultry manure, and Chicken manure) on the overall maize yield (response variable)”
RSM <- read.csv("optimization.csv")
attach(RSM)
head(RSM,5)
S.No Cattle.Manure..kg. Poultry.Manure..kg. Chicken.Manure..kg.
1 1 0.5 1.0 1.5
2 2 1.0 1.0 1.0
3 3 0.5 0.5 1.0
4 4 1.5 0.5 0.5
5 5 1.0 1.5 0.5
Maize.Yield..kg.ha.
1 450
2 480
3 400
4 520
5 530
summary(RSM)
S.No Cattle.Manure..kg. Poultry.Manure..kg. Chicken.Manure..kg.
Min. : 1.00 Min. :0.2500 Min. :0.2500 Min. :0.2500
1st Qu.:10.75 1st Qu.:0.6875 1st Qu.:0.5000 1st Qu.:0.5000
Median :20.50 Median :1.0000 Median :0.7500 Median :1.0000
Mean :20.50 Mean :0.9750 Mean :0.8812 Mean :0.9187
3rd Qu.:30.25 3rd Qu.:1.2500 3rd Qu.:1.2500 3rd Qu.:1.2500
Max. :40.00 Max. :1.7500 Max. :1.7500 Max. :1.7500
Maize.Yield..kg.ha.
Min. :400.0
1st Qu.:450.0
Median :480.0
Mean :484.8
3rd Qu.:522.5
Max. :570.0
str(RSM)
'data.frame': 40 obs. of 5 variables:
$ S.No : int 1 2 3 4 5 6 7 8 9 10 ...
$ Cattle.Manure..kg. : num 0.5 1 0.5 1.5 1 0.5 1.5 1 0.5 1 ...
$ Poultry.Manure..kg.: num 1 1 0.5 0.5 1.5 1.5 0.5 0.5 0.5 1.5 ...
$ Chicken.Manure..kg.: num 1.5 1 1 0.5 0.5 1.5 1.5 0.5 0.5 1.5 ...
$ Maize.Yield..kg.ha.: int 450 480 400 520 530 460 500 510 420 490 ...
# Generate 3 plots.
par(mfrow=c(2,2))
par(bg = "lightblue")
qqnorm(RSM$Maize.Yield..kg.ha.)
qqline(RSM$Maize.Yield..kg.ha., col = 2)
boxplot(RSM$Maize.Yield..kg.ha., horizontal=TRUE, main="Box Plot", xlab="Maize Yield")
hist(RSM$Maize.Yield..kg.ha., main="Histogram", xlab="Maize Yield")
par(mfrow=c(1,1))
par(mfrow=c(2,3))
par(bg = "lightblue")
plot(RSM$Maize.Yield..kg.ha.~RSM$Cattle.Manure..kg.,xlab="Cattle Manure (kgs)",ylab="Maize Yield (kgs)")
plot(RSM$Maize.Yield..kg.ha.~RSM$Poultry.Manure..kg.,xlab="Poultry Manure (kgs)",ylab="Maize Yield (kgs)")
plot(RSM$Maize.Yield..kg.ha.~RSM$Chicken.Manure..kg.,xlab="Chicken Manure (kgs)",ylab="Maize Yield (kgs")
Result:There is no clear a normal distribution in the way the response variable is distributed around the mean in the histogram. However, the boxplot does indicate some outliers. Most part the data is not normally distributed (as is also evident by the Normal Q-Q plot).
par(bg = "lightblue")
boxplot(RSM$Maize.Yield..kg.ha.~Cattle.Manure..kg., data=RSM, main="Maize Yield Production", xlab="Cattle Manure",ylab="Yields")
par(bg = "lightblue")
boxplot(RSM$Maize.Yield..kg.ha.~Poultry.Manure..kg., data=RSM, main="Maize Yield Production", xlab="Poultry Manure",ylab="Yields")
par(bg = "lightblue")
boxplot(RSM$Maize.Yield..kg.ha.~Chicken.Manure..kg., data=RSM, main="Maize Yield Production", xlab="Chicken Manure",ylab="Yields")
par(mfrow=c(2,3))
par(bg = "lightblue")
boxplot(RSM$Maize.Yield..kg.ha.~Cattle.Manure..kg., data=RSM, main="Maize Yield Production", xlab="Cattle Manure",ylab="Yields")
boxplot(RSM$Maize.Yield..kg.ha.~Poultry.Manure..kg., data=RSM, main="Maize Yield Production", xlab="Poultry Manure",ylab="Yields")
boxplot(RSM$Maize.Yield..kg.ha.~Chicken.Manure..kg., data=RSM, main="Maize Yield Production", xlab="Chicken Manure",ylab="Yields")
Result: All the 4 factors show variability at all levels and can be assumed to have an impact on the response variable. However, we cannot derive anything conclusive from this exploratory data analysis. It only provides us with the direction of our next level analysis.
As mentioned before, Response-surface methodology consists of a collection of methods for exploring the optimum operating conditions through experimental methods. Essentially, this involves doing several experiments, and then using the results of one experiment to provide direction for the next steps.
The next action could be to perform the experiment around a different set of conditions, or to collect more data in the current experimental domain in order to fit a higher-order model or simply to confirm to what we have found.
In the next step we fit a response surface model to the given data set. This is an extension of lm, and works almost like it; however, the model formula for rsm must make use of the special functions FO, TWI, PQ, or SO (for first-order,two-way interaction, pure quadratic and second-order respectively), because the presence of these specifies the response-surface portion of the model. (Reference : Response-Surface Methods in R, Using rsm Updated to version 2.00, 5 December 2012)
library(rsm)
par(bg = "lightblue")
model=rsm(Maize.Yield..kg.ha. ~ SO(Cattle.Manure..kg.,Poultry.Manure..kg.,Chicken.Manure..kg.), data=RSM)
summary(model)
Call:
rsm(formula = Maize.Yield..kg.ha. ~ SO(Cattle.Manure..kg., Poultry.Manure..kg.,
Chicken.Manure..kg.), data = RSM)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 363.6796 33.0300 11.0106 4.641e-12
Cattle.Manure..kg. 165.3665 48.9877 3.3757 0.002051
Poultry.Manure..kg. -13.6060 58.8281 -0.2313 0.818665
Chicken.Manure..kg. 6.4031 50.4366 0.1270 0.899824
Cattle.Manure..kg.:Poultry.Manure..kg. 39.0315 30.3162 1.2875 0.207772
Cattle.Manure..kg.:Chicken.Manure..kg. 9.8622 25.3042 0.3897 0.699477
Poultry.Manure..kg.:Chicken.Manure..kg. -18.3766 20.8779 -0.8802 0.385750
Cattle.Manure..kg.^2 -63.2669 21.8825 -2.8912 0.007072
Poultry.Manure..kg.^2 11.1209 28.0637 0.3963 0.694709
Chicken.Manure..kg.^2 1.8019 22.1198 0.0815 0.935616
(Intercept) ***
Cattle.Manure..kg. **
Poultry.Manure..kg.
Chicken.Manure..kg.
Cattle.Manure..kg.:Poultry.Manure..kg.
Cattle.Manure..kg.:Chicken.Manure..kg.
Poultry.Manure..kg.:Chicken.Manure..kg.
Cattle.Manure..kg.^2 **
Poultry.Manure..kg.^2
Chicken.Manure..kg.^2
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.8004, Adjusted R-squared: 0.7406
F-statistic: 13.37 on 9 and 30 DF, p-value: 2.8e-08
Analysis of Variance Table
Response: Maize.Yield..kg.ha.
Df Sum Sq
FO(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.) 3 60383
TWI(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.) 3 3079
PQ(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.) 3 5855
Residuals 30 17281
Lack of fit 29 17081
Pure error 1 200
Mean Sq
FO(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.) 20127.5
TWI(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.) 1026.5
PQ(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.) 1951.6
Residuals 576.0
Lack of fit 589.0
Pure error 200.0
F value
FO(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.) 34.9419
TWI(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.) 1.7819
PQ(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.) 3.3880
Residuals
Lack of fit 2.9450
Pure error
Pr(>F)
FO(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.) 6.464e-10
TWI(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.) 0.17185
PQ(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.) 0.03074
Residuals
Lack of fit 0.43541
Pure error
Stationary point of response surface:
Cattle.Manure..kg. Poultry.Manure..kg. Chicken.Manure..kg.
4.098227 5.388745 14.486290
Eigenanalysis:
eigen() decomposition
$values
[1] 19.438973 -1.008559 -68.774591
$vectors
[,1] [,2] [,3]
Cattle.Manure..kg. -0.1858209 -0.1905947 0.96392130
Poultry.Manure..kg. -0.8917497 -0.3792430 -0.24689505
Chicken.Manure..kg. 0.4126173 -0.9054548 -0.09949148
par(mfrow=c(1,1))
par(bg = "lightblue")
contour(model, ~Cattle.Manure..kg.+Poultry.Manure..kg.+Chicken.Manure..kg., image=TRUE, at=summary(model$canonical$xs))
par(mfrow=c(1,1))
par(bg = "lightblue")
persp(model, ~ Cattle.Manure..kg.+Poultry.Manure..kg., image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Maize Yield (kgs)",col.lab=33,contour="colors")
par(mfrow=c(1,1))
par(bg = "lightblue")
persp(model, ~ Cattle.Manure..kg.+Chicken.Manure..kg., image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Maize Yield (kgs)",col.lab=33,contour="colors")
par(mfrow=c(1,1))
par(bg = "lightblue")
persp(model, ~ Chicken.Manure..kg.+Poultry.Manure..kg., image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Maize Yield (kgs)",col.lab=33,contour="colors",cex.lab = 0.8)
par(mfrow=c(1,1))
par(bg = "lightblue")
contour(model, ~Cattle.Manure..kg.+Poultry.Manure..kg., image=TRUE, at=summary(model$canonical$xs))
persp(model, ~ Poultry.Manure..kg.+Cattle.Manure..kg., image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Maize Yield (kgs)",col.lab=33,contour="colors")
par(mfrow=c(1,1))
par(bg = "lightblue")
contour(model, ~Cattle.Manure..kg.+Poultry.Manure..kg., image=TRUE, at=summary(model$canonical$xs))
persp(model, ~ Cattle.Manure..kg.+Poultry.Manure..kg., image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Maize Yield (kgs)",col.lab=33,contour="colors")
For further investigation we can use steepest descent method in case we encounter a saddle point.In the second-order case, the steepest function still works, but it uses the ridge analysis method which is the analog of steepest ascent/descent in the sense that for a specified distance d, it finds the point at which the predicted response is a maximum/minimum among all predictor combinations at radius d.
Model adequacy checking is done to test the validity of our model given the numerous assumptions it is based on. The shapiro test evaluates the Null hypothesis such that “the samples come from a Normal distribution” against the alternative hypothesis “the samples do not come from a Normal distribution”. The Q-Q Norm plots also test the normality of the sample. The scatter plots of the residuals and the fitted model will check the distribution of the model over the entire dynamic range.
steepest(model)
Path of steepest ascent from ridge analysis:
dist Cattle.Manure..kg. Poultry.Manure..kg. Chicken.Manure..kg. | yhat
1 0.0 0.000 0.000 0.000 | 363.680
2 0.5 0.497 0.026 0.053 | 430.976
3 1.0 0.912 0.392 0.118 | 473.193
4 1.5 1.132 0.985 -0.011 | 510.718
5 2.0 1.275 1.527 -0.203 | 554.738
6 2.5 1.395 2.034 -0.406 | 607.617
7 3.0 1.506 2.521 -0.612 | 669.814
8 3.5 1.611 2.997 -0.818 | 741.471
9 4.0 1.713 3.466 -1.024 | 822.737
10 4.5 1.813 3.930 -1.231 | 913.665
11 5.0 1.912 4.391 -1.438 | 1014.378
canonical.path(model, dist = seq(-5, 5, by = 0.5))
dist Cattle.Manure..kg. Poultry.Manure..kg. Chicken.Manure..kg. | yhat
1 -5.0 1.989 3.802 -2.011 | 941.956
2 -4.5 1.896 3.356 -1.805 | 849.607
3 -4.0 1.803 2.910 -1.598 | 766.942
4 -3.5 1.710 2.464 -1.392 | 694.040
5 -3.0 1.617 2.018 -1.186 | 630.857
6 -2.5 1.524 1.572 -0.979 | 577.383
7 -2.0 1.432 1.127 -0.773 | 533.750
8 -1.5 1.339 0.681 -0.567 | 499.713
9 -1.0 1.246 0.235 -0.361 | 475.395
10 -0.5 1.153 -0.211 -0.154 | 460.818
11 0.0 1.060 -0.657 0.052 | 455.948
12 0.5 0.967 -1.103 0.258 | 460.797
13 1.0 0.874 -1.549 0.465 | 475.411
14 1.5 0.781 -1.995 0.671 | 499.707
15 2.0 0.688 -2.440 0.877 | 533.665
16 2.5 0.595 -2.886 1.084 | 577.452
17 3.0 0.502 -3.332 1.290 | 630.898
18 3.5 0.410 -3.778 1.496 | 694.044
19 4.0 0.317 -4.224 1.703 | 767.018
20 4.5 0.224 -4.670 1.909 | 849.626
21 5.0 0.131 -5.116 2.115 | 941.954
shapiro.test(RSM$Maize.Yield..kg.ha.)
Shapiro-Wilk normality test
data: RSM$Maize.Yield..kg.ha.
W = 0.95898, p-value = 0.1546
par(bg = "lightblue")
qqnorm(residuals(model))
qqline(residuals(model))
par(bg = "lightblue")
plot(fitted(model),residuals(model))
From the results of the ridge analysis for points within the radius of 5 in the canonical path, we see that the best point for next experimentation is at a distance of 1.5. The Shapiro-Wilk normality test does not return a p-value which is very small (less than 0.1) which shows that the data is not perfectly normally distributed.The Normal Q-Q plot conforms normality for the residuals. The scatter plot also indicates no trend i.e. there is uniform scatter over the entire dynamic range.
Surface response methodology usually involves two or more quantitative factors. For example, if you are studying the effect of temperature and pressure on a chemical reaction, you will need to define the ranges of values for both temperature and pressure.
For each quantitative factor, you need to define the minimum and maximum values. The range of values should cover the expected operating range of the factors. For example, if the temperature of the reaction is expected to range from 50 to 150 degrees Celsius, you might choose a minimum value of 50 and a maximum value of 150 for the temperature factor.
Once you have defined the range of values for each factor, you can generate a design matrix that specifies the combinations of factor levels to be tested. There are several types of design matrices that can be used for surface response methodology, including central composite designs and Box-Behnken designs. These designs are often generated using statistical software such as SAS, MINITAB, STATISTICA, or Design Expert.
To complete the dataset, you need to calculate the response variable for each combination of factor levels specified in the design matrix. The response variable is the outcome of interest, such as yield, purity, or efficiency. The response variable can be calculated using experimental data or by using a mathematical model.
Each run in the design matrix specifies a combination of factor levels to be tested. The first four runs test the minimum and maximum levels of each factor, while the remaining runs test the midpoint of each factor and the interaction between the factors. The response variable would need to be calculated for each of these nine runs to complete the dataset.
library(pacman)
p_load(data.table, stringr, magrittr, foreach, ggplot2, zoo, knitr, plyr,
dplyr, ggthemes, readxl, stringi, Hmisc, tidyr, rockchalk, stargazer, rsm)
rsm_data <- read.csv("rsm2.csv")
attach(rsm_data)
head(rsm_data,5)
Run Pressure H2.WF6 Uniformity Stress CodedPress CodedH2.WF6
1 1 80.00 6.00 4.6 8.04 1.00 0.00
2 2 42.00 6.00 6.2 7.78 0.00 0.00
3 3 68.87 3.17 3.4 7.58 0.71 -0.71
4 4 15.13 8.83 6.9 7.27 -0.71 0.71
5 5 4.00 6.00 7.3 6.49 -1.00 0.00
To perform the steps outlined above in R, you can use the rsm package, which is designed to provide R support for standard response-surface methods. Here are the steps to create a sample dataset for surface response methodology using the rsm package:
You can install the rsm package using the following code: install.packages(“rsm”). You can then load the package using the following code: library(rsm).
You can use the ccd.pick() function to generate a central composite design with a specified number of factors and levels. For example, to generate a central composite design with two factors, temperature and pressure, and three levels for each factor, you can use the following
This example uses experimental data published in Czitrom and Spagon, (1997), Statistical Case Studies for Industrial Process Improvement. The material is copyrighted by the American Statistical Association and the Society for Industrial and Applied Mathematics, and is used with their permission. Specifically, Chapter 15, titled “Elimination of TiN Peeling During Exposure to CVD Tungsten Deposition Process Using Designed Experiments”, describes a semiconductor wafer processing experiment (labeled Experiment 2).
The goal of this experiment was to fit response surface models to the two responses, deposition layer Uniformity and deposition layer Stress, as a function of two particular controllable factors of the chemical vapor deposition (CVD) reactor process. These factors were Pressure (measured in torr) and the ratio of the gaseous reactants H2 and WF6 (called H2/WF6). The experiment also included an important third (categorical) response - the presence or absence of titanium nitride (TiN) peeling. The third response has been omitted in this example in order to focus on the response surface aspects of the experiment. To summarize, the goal is to obtain a response surface model for two responses, Uniformity and Stress. The factors are: Pressure and H2/WF6.
The minimum and maximum values chosen for Pressure were 4 torr and 80 torr (0.5333 kPa and 10.6658 kPa). Although the international system of units indicates that the standard unit for pressure is Pascal, or 1 N/m2, we use torr to be consistent with the analysis appearing in the paper by Czitrom and Spagon. The minimum and maximum H2/WF6 ratios were chosen to be 2 and 10. Since response curvature, especially for Uniformity, was a distinct possibility, an experimental design that allowed estimating a second order (quadratic) model was needed. The experimenters decided to use a central composite inscribed (CCI) design. For two factors, this design is typically recommended to have 13 runs with 5 centerpoint runs. However, the experimenters, perhaps to conserve a limited supply of wafer resources, chose to include only 3 centerpoint runs. The design is still rotatable, but the uniform precision property has been sacrificed.
The table below shows the CCI design and experimental responses, in the order in which they were run (presumably randomized). The last two columns show coded values of the factors. (The reader can download the data as a text file.)
Uniformity is calculated from four-point probe sheet resistance measurements made at 49 different locations across a wafer. The value in the table is the standard deviation of the 49 measurements divided by their mean, expressed as a percentage. So a smaller value of Uniformity indicates a more uniform layer - hence, lower values are desirable. The Stress calculation is based on an optical measurement of wafer bow, and again lower values are more desirable.
The steps for fitting a response surface (second-order or quadratic) model are as follows: * Fit the full model to the first response. * Use stepwise regression, forward selection, or backward elimination to identify important variables. * When selecting variables for inclusion in the model, follow the hierarchy principle and keep all main effects that are part of significant higher-order terms or interactions, even if the main effect p-value is larger than you would like (note that not all analysts agree with this principle). * Generate diagnostic residual plots (histograms, box plots, normal plots, etc.) for the model selected. * Examine the fitted model plot, interaction plots, and ANOVA statistics (R2, adjusted R2, lack-of-fit test, etc.). Use all these plots and statistics to determine whether the model fit is satisfactory. * Use contour plots of the response surface to explore the effect of changing factor levels on the response. * Repeat all the above steps for the second response variable. * After satisfactory models have been fit to both responses, you can overlay the surface contours for both responses. * Find optimal factor settings.
data <- read.csv("rsm2.csv")
m <- data
head(m,5)
Run Pressure H2.WF6 Uniformity Stress CodedPress CodedH2.WF6
1 1 80.00 6.00 4.6 8.04 1.00 0.00
2 2 42.00 6.00 6.2 7.78 0.00 0.00
3 3 68.87 3.17 3.4 7.58 0.71 -0.71
4 4 15.13 8.83 6.9 7.27 -0.71 0.71
5 5 4.00 6.00 7.3 6.49 -1.00 0.00
run = m[,1]
pressure = m[,2]
h2 = m[,3]
uniformity = m[,4]
stress = m[,5]
ph2 = pressure*h2
press2 = pressure*pressure
h22 = h2*h2
cpress = m[,6]
ch2 = m[,7]
cph = cpress*ch2
ch22 = ch2*ch2
cpress2 = cpress*cpress
df <- data.frame(run,pressure,h2,uniformity,stress,cpress,ch2,
ph2,press2,h22,cph,ch22,cpress2)
head(df,5)
run pressure h2 uniformity stress cpress ch2 ph2 press2 h22
1 1 80.00 6.00 4.6 8.04 1.00 0.00 480.0000 6400.0000 36.0000
2 2 42.00 6.00 6.2 7.78 0.00 0.00 252.0000 1764.0000 36.0000
3 3 68.87 3.17 3.4 7.58 0.71 -0.71 218.3179 4743.0769 10.0489
4 4 15.13 8.83 6.9 7.27 -0.71 0.71 133.5979 228.9169 77.9689
5 5 4.00 6.00 7.3 6.49 -1.00 0.00 24.0000 16.0000 36.0000
cph ch22 cpress2
1 0.0000 0.0000 1.0000
2 0.0000 0.0000 0.0000
3 -0.5041 0.5041 0.5041
4 -0.5041 0.5041 0.5041
5 0.0000 0.0000 1.0000
z = lm(uniformity ~ pressure+h2+ph2+press2+h22,data=df)
summary(z)
Call:
lm.default(formula = uniformity ~ pressure + h2 + ph2 + press2 +
h22, data = df)
Residuals:
1 2 3 4 5 6 7 8 9 10
0.5125 0.3334 -0.5068 0.6068 -0.6124 0.5334 0.2886 0.1751 -0.1886 -0.2752
11
-0.8666
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.137e+01 1.977e+00 5.753 0.00223 **
pressure -1.252e-01 4.663e-02 -2.684 0.04361 *
h2 -5.507e-01 5.082e-01 -1.084 0.32794
ph2 1.118e-02 4.773e-03 2.342 0.06622 .
press2 9.235e-05 4.231e-04 0.218 0.83584
h22 2.088e-03 3.817e-02 0.055 0.95850
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.7259 on 5 degrees of freedom
Multiple R-squared: 0.8707, Adjusted R-squared: 0.7415
F-statistic: 6.736 on 5 and 5 DF, p-value: 0.02823
zz = step(z,direction="both")
Start: AIC=-3.72
uniformity ~ pressure + h2 + ph2 + press2 + h22
Df Sum of Sq RSS AIC
- h22 1 0.0016 2.6362 -5.7141
- press2 1 0.0251 2.6597 -5.6164
<none> 2.6346 -3.7207
- h2 1 0.6189 3.2535 -3.3997
- ph2 1 2.8900 5.5246 2.4245
- pressure 1 3.7961 6.4307 4.0950
Step: AIC=-5.71
uniformity ~ pressure + h2 + ph2 + press2
Df Sum of Sq RSS AIC
- press2 1 0.0236 2.6598 -7.6162
<none> 2.6362 -5.7141
+ h22 1 0.0016 2.6346 -3.7207
- ph2 1 2.8900 5.5262 0.4277
- h2 1 3.0077 5.6439 0.6595
- pressure 1 3.9601 6.5963 2.3748
Step: AIC=-7.62
uniformity ~ pressure + h2 + ph2
Df Sum of Sq RSS AIC
<none> 2.6598 -7.6162
+ press2 1 0.0236 2.6362 -5.7141
+ h22 1 0.0001 2.6597 -5.6164
- ph2 1 2.8900 5.5498 -1.5255
- h2 1 3.0077 5.6675 -1.2946
- pressure 1 7.9682 10.6280 5.6216
Start: AIC=-3.79 Model: Uniformity ~ Pressure + H2/WF6 + Pressure*H2/WF6 + Pressure^2 + H2/WF6^2
Step 1: Remove H2/WF6^2, AIC=-5.79 Model: Uniformity ~ Pressure + H2/WF6 + Pressure*H2/WF6 + Pressure^2
Step 2: Remove Pressure^2, AIC=-7.69 Model: Uniformity ~ Pressure + H2/WF6 + Pressure*H2/WF6
Step 3: Remove H2/WF6, AIC=-8.88 Model: Uniformity ~ Pressure + Pressure*H2/WF6
The stepwise routine selects a model containing the intercept, Pressure, and the interaction term. However, many statisticians do not think an interaction term should be included in a model unless both main effects are also included. Thus, we will use the model from Step 2 that included Pressure, H2/WF6, and the interaction term. Interaction plots confirm the need for an interaction term in the model.
anova(update(zz,~1),zz)
Analysis of Variance Table
Model 1: uniformity ~ 1
Model 2: uniformity ~ pressure + h2 + ph2
Res.Df RSS Df Sum of Sq F Pr(>F)
1 10 20.3818
2 7 2.6598 3 17.722 15.547 0.001772 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(zz)
Call:
lm.default(formula = uniformity ~ pressure + h2 + ph2, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.9273 -0.3932 0.1479 0.3920 0.6295
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.195193 1.186352 9.437 3.13e-05 ***
pressure -0.117395 0.025636 -4.579 0.00255 **
h2 -0.525696 0.186849 -2.813 0.02602 *
ph2 0.011178 0.004053 2.758 0.02818 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6164 on 7 degrees of freedom
Multiple R-squared: 0.8695, Adjusted R-squared: 0.8136
F-statistic: 15.55 on 3 and 7 DF, p-value: 0.001772
anova(zz)
Analysis of Variance Table
Response: uniformity
Df Sum Sq Mean Sq F value Pr(>F)
pressure 1 14.6296 14.6296 38.5022 0.0004431 ***
h2 1 0.2024 0.2024 0.5326 0.4891950
ph2 1 2.8900 2.8900 7.6059 0.0281821 *
Residuals 7 2.6598 0.3800
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lof = factor(paste(pressure,h2,ph2))
inner.model = lm(uniformity ~ pressure + h2 + ph2, data = df)
outer.model = lm(uniformity ~ lof)
anova(inner.model, outer.model)
Analysis of Variance Table
Model 1: uniformity ~ pressure + h2 + ph2
Model 2: uniformity ~ lof
Res.Df RSS Df Sum of Sq F Pr(>F)
1 7 2.6598
2 2 1.1467 5 1.5131 0.5278 0.7559
par(mfrow=c(1,1),bg=rgb(1,1,0.8))
plot(predict(zz),df$uniformity,ylab="Observed Uniformity",
xlab="Predicted Uniformity", col=4, pch=19)
### Add regression line and confidence bounds to the plot.
rline = lm(predict(zz)~df$uniformity)
abline(rline)
bnds = data.frame(predict.lm(rline, interval = "confidence"),uniformity)
bnds = bnds[order(bnds$fit),]
lines(bnds$uniformity,bnds[,2],col=2)
lines(bnds$uniformity,bnds[,3],col=2)
par(mfrow=c(1,1))
qef = z$coef
qef = qef[-1]
## Sort effects and save labels.
sef = qef[order(qef)]
qlab = names(sef)
qlab=c("H2/WF6","Press","Press^2","H2/WF6^2","Press*H2/WF6")
## Generate theoretical quantiles.
ip = ppoints(length(sef))
zp = qnorm(ip)
## Generate normal probability plot of all effects (excluding the
## intercept).
par(mfrow=c(1,1),bg=rgb(1,1,0.8))
plot(zp, sef, pch=19,
ylab="Parameter Estimate", xlab="Theoretical Quantiles",
main="Normal Probability Plot of Parameter Estimates")
qqline(sef, col=2)
abline(h=0, col=4)
## Add labels for effects.
small2 = c(1:(length(sef)-3))
text(zp[small2],sef[small2],label=qlab[small2],pos=4,cex=1)
text(zp[-small2],sef[-small2],label=qlab[-small2],pos=2,cex=1)
## Generate interaction plots.
dfp = subset(df,pressure==15.13|pressure==68.87)
par(mfrow=c(1,1), bg=rgb(1,1,0.8), mar=c(5, 8, 2, 4))
interaction.plot(dfp$h2, dfp$pressure, dfp$uniformity, fun=mean,
type="b", pch=c(21,21), col=4,
xlab="H2/W6", ylab="Uniformity", trace.lab="Pressure")
interaction.plot(dfp$pressure, dfp$h2, dfp$uniformity, fun=mean,
type="b", pch=c(21,21), col=4,
trace.lab="H2/W6", ylab="Uniformity", xlab="Pressure")
| Source | DF | Sum of Sq | Mean Sq | F value | Pr(>F) |
|---|---|---|---|---|---|
| Model | 3 | 17.739 | 5.9130 | 15.66 | 0.0017 |
| Total error | 7 | 2.643 | 0.3776 |
Lack-of-fit 5 1.4963 0.2993 0.52 0.7588 Pure error 2 1.1467 0.5734
Residual standard error: 0.6145 based on 7 degrees of freedom Multiple R-squared: 0.8703 Adjusted R-squared: 0.8148
| Source | Estimate | Std. Error | t value | Pr(>|t|) |
|---|---|---|---|---|
| Intercept | 5.9273 | 0.1853 | 31.993 | 7.54e-09 |
| Pressure | -1.9097 | 0.3066 | -6.228 | 0.000433 |
| H2/WF6 | -0.2241 | 0.3066 | -0.731 | 0.488607 |
| Pressure*H2/WF6 | 1.6862 | 0.6095 | 2.767 | 0.027829 |
xord = seq(4,80,by=4)
yord = seq(2,10,by=.5)
model = function (a, b){
zz$coefficients[1] +
zz$coefficients[2]*a +
zz$coefficients[3]*b +
zz$coefficients[4]*a*b}
pmatu = outer(xord,yord,model)
A contour plot and perspective plot of Uniformity provide a visual display of the response surface.
par(mfrow=c(1,1), bg=rgb(1,1,0.8))
contour(xord, yord, pmatu, nlevels=30, xlab="Pressure", ylab="H2/WF6",
col="blue")
par(mfrow=c(1,1))
par(mfrow=c(1,1), bg=rgb(1,1,0.8))
persp(xord, yord, pmatu, xlab="Pressure", ylab="H2/WF6", zlab="Uniformity",
theta=30, phi=30, ticktype="detailed", col="lightblue")
par(mfrow=c(1,1))
Residual plots We perform a residuals analysis to validate the model assumptions. We generate a normal plot, a box plot, a histogram and a run-order plot of the residuals.
par(mfrow=c(2,2), bg=rgb(1,1,0.8))
qqnorm(zz$residuals)
qqline(zz$residuals, col = 2)
abline(h=0)
boxplot(zz$residuals, horizontal=TRUE, main="Box Plot", xlab="Residual")
hist(zz$residuals, main="Histogram", xlab="Residual")
plot(run, zz$residuals, xlab="Actual Run Order", ylab="Residual",
main="Run Order Plot", col=4, pch=19)
par(mfrow=c(1,1))
The residual plots do not indicate problems with the underlying assumptions.
From the above output, we make the following conclusions. * The R2 is reasonable for fitting Uniformity (well known to be a difficult response to model). * The lack-of-fit test is not significant (very small “Prob > F” would indicate a lack of fit). * The residual plots do not reveal any major violations of the underlying assumptions. * The interaction plot shows why an interaction term is needed (parallel lines would suggest no interaction).
q = lm(stress ~ pressure+h2+ph2+press2+h22,data=df)
summary(q)
Call:
lm.default(formula = stress ~ pressure + h2 + ph2 + press2 +
h22, data = df)
Residuals:
1 2 3 4 5 6 7 8
0.042447 -0.010004 -0.019295 0.009363 -0.032516 -0.100004 0.033712 -0.012256
9 10 11
-0.043644 0.022198 0.109996
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.427e+00 2.081e-01 26.073 1.55e-06 ***
pressure 4.748e-02 4.909e-03 9.671 0.000201 ***
h2 1.951e-01 5.350e-02 3.646 0.014803 *
ph2 4.603e-04 5.025e-04 0.916 0.401714
press2 -3.670e-04 4.454e-05 -8.240 0.000429 ***
h22 -7.498e-03 4.019e-03 -1.866 0.121067
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.07642 on 5 degrees of freedom
Multiple R-squared: 0.9919, Adjusted R-squared: 0.9838
F-statistic: 122.3 on 5 and 5 DF, p-value: 3.192e-05
stepq = step(q,direction="both")
Start: AIC=-53.25
stress ~ pressure + h2 + ph2 + press2 + h22
Df Sum of Sq RSS AIC
- ph2 1 0.00490 0.03410 -53.539
<none> 0.02920 -53.245
- h22 1 0.02033 0.04954 -49.433
- h2 1 0.07766 0.10686 -40.975
- press2 1 0.39658 0.42579 -25.769
- pressure 1 0.54626 0.57547 -22.455
Step: AIC=-53.54
stress ~ pressure + h2 + press2 + h22
Df Sum of Sq RSS AIC
<none> 0.03410 -53.539
+ ph2 1 0.00490 0.02920 -53.245
- h22 1 0.02033 0.05444 -50.395
- h2 1 0.11110 0.14521 -39.602
- press2 1 0.39658 0.43069 -27.643
- pressure 1 0.98214 1.01625 -18.200
qq = lm(stress ~ pressure+h2+press2,data=df)
anova(qq,q)
Analysis of Variance Table
Model 1: stress ~ pressure + h2 + press2
Model 2: stress ~ pressure + h2 + ph2 + press2 + h22
Res.Df RSS Df Sum of Sq F Pr(>F)
1 7 0.054435
2 5 0.029203 2 0.025232 2.1601 0.2108
anova(update(qq,~1),qq)
Analysis of Variance Table
Model 1: stress ~ 1
Model 2: stress ~ pressure + h2 + press2
Res.Df RSS Df Sum of Sq F Pr(>F)
1 10 3.6001
2 7 0.0544 3 3.5456 151.98 9.839e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(qq)
Call:
lm.default(formula = stress ~ pressure + h2 + press2, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.07576 -0.04519 -0.02985 0.04699 0.16647
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.567e+00 1.056e-01 52.706 2.32e-10 ***
pressure 4.819e-02 4.286e-03 11.241 9.84e-06 ***
h2 1.244e-01 1.102e-02 11.293 9.55e-06 ***
press2 -3.426e-04 4.912e-05 -6.974 0.000217 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.08818 on 7 degrees of freedom
Multiple R-squared: 0.9849, Adjusted R-squared: 0.9784
F-statistic: 152 on 3 and 7 DF, p-value: 9.839e-07
anova(qq)
Analysis of Variance Table
Response: stress
Df Sum Sq Mean Sq F value Pr(>F)
pressure 1 2.17573 2.17573 279.783 6.676e-07 ***
h2 1 0.99166 0.99166 127.521 9.550e-06 ***
press2 1 0.37822 0.37822 48.637 0.0002165 ***
Residuals 7 0.05444 0.00778
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The stepwise routine identifies a model containing the intercept, the main effects, and both squared terms. However, the fit of the full quadratic model indicates that neither the H2/WF6 squared term nor the interaction term are significant. A comparison of the full model and the model containing just the main effects and squared pressure terms indicates that there is no significant difference between the two models.
lof = factor(paste(pressure,h2,ph2))
inner.model = lm(stress ~ pressure + h2 + press2, data = df)
outer.model = lm(stress ~ lof)
anova(inner.model, outer.model)
Analysis of Variance Table
Model 1: stress ~ pressure + h2 + press2
Model 2: stress ~ lof
Res.Df RSS Df Sum of Sq F Pr(>F)
1 7 0.054435
2 2 0.022200 5 0.032235 0.5808 0.7301
## Plot actual versus predicted.
par(mfrow=c(1,1), bg=rgb(1,1,0.8))
plot(predict(qq),df$stress,ylab="Observed Stress",
xlab="Predicted Stress", pch=19, col=4)
## Add regression line and confidence bounds to the plot.
rline = lm(predict(qq)~df$stress)
abline(rline)
bnds = data.frame(predict.lm(rline, interval = "confidence"), stress)
bnds = bnds[order(bnds$fit),]
lines(bnds$stress,bnds[,2], col=2)
lines(bnds$stress,bnds[,3], col=2)
par(mfrow=c(1,1))
qef = q$coef
qef = qef[-1]
## Sort effects and save labels.
sef = qef[order(qef)]
qlab = names(sef)
qlab=c("H2/WF6^2","Press^2","Press*H2/WF6","Press","H2/WF6")
ip = ppoints(length(sef))
zp = qnorm(ip)
par(mfrow=c(1,1),bg=rgb(1,1,0.8))
plot(zp, sef, pch=19,
ylab="Parameter Estimate", xlab="Theoretical Quantiles",
main="Normal Probability Plot of Parameter Estimates")
## Add labels for effects.
small2 = c(1:(length(sef)-2))
text(zp[small2],sef[small2],label=qlab[small2],pos=4,cex=1)
text(zp[-small2],sef[-small2],label=qlab[-small2],pos=2,cex=1)
par(mfrow=c(1,1))
dfp = subset(df,pressure==15.13|pressure==68.87)
par(mfrow=c(2,1), bg=rgb(1,1,0.8), mar=c(5, 8, 2, 4))
interaction.plot(dfp$h2, dfp$pressure, dfp$stress, fun=mean,
type="b", pch=c(21,21), col=2,
xlab="H2/W6", ylab="Stress", trace.lab="Pressure")
interaction.plot(dfp$pressure, dfp$h2, dfp$stress, fun=mean,
type="b", pch=c(21,21), col=2,
trace.lab="H2/W6", ylab="Stress", xlab="Pressure")
par(mfrow=c(1,1), mar=c(5, 4, 4, 2)+0.1)
Thus, we will proceed with the model containing main effects and the squared pressure term. The fact that the stepwise procedure selected a model for Stress containing a term that was not significant indicates that all output generated by statistical software should be carefully examined. In this case, the stepwise procedure identified the model with the lowest AIC (Akaike information criterion), but did not take into account contributions by individual terms. Other software using a different criteria may identify a different model, so it is important to understand the algorithms being used.
xord = seq(4,80,by=4)
yord = seq(2,10,by=.5)
model = function (a, b){
qq$coefficients[1] +
qq$coefficients[2]*a +
qq$coefficients[3]*b +
qq$coefficients[4]*a*a}
pmats = outer(xord,yord,model)
par(mfrow=c(1,1), bg=rgb(1,1,0.8))
contour(xord, yord, pmats, nlevels=30, xlab="Pressure", ylab="H2/WF6",
col="red")
par(mfrow=c(1,1))
par(mfrow=c(1,1), bg=rgb(1,1,0.8))
persp(xord, yord, pmats, xlab="Pressure", ylab="H2/WF6", zlab="Stress",
theta=30, phi=30, ticktype="detailed", col="lightblue")
par(mfrow=c(1,1))
We perform a residuals analysis to validate the model by generating a run-order plot, box plot, histogram, and normal probability plot of the residuals.
par(mfrow=c(2,2), bg=rgb(1,1,0.8))
qqnorm(qq$residuals)
qqline(qq$residuals, col = 2)
abline(h=0)
boxplot(qq$residuals, horizontal=TRUE, main="Box Plot", xlab="Residual")
hist(qq$residuals, main="Histogram", xlab="Residual")
plot(run, qq$residuals, xlab="Actual Run Order", ylab="Residual",
main="Run Order Plot", pch=19, col=4)
par(mfrow=c(1,1))
The residual plots do not indicate any major violations of the underlying assumptions.
From the above output, we make the following conclusions. * The R2 is very good for fitting Stress. * The lack-of-fit test is not significant (very small “Prob > F” would indicate a lack of fit). * The residual plots do not reveal any major violations of the underlying assumptions. * The nearly parallel lines in the interaction plots show why an interaction term is not needed.
We overlay the contour plots for the two responses to visually compare the surfaces over the region of interest.
par(mfrow=c(1,1), bg=rgb(1,1,0.8))
contour(xord, yord, pmatu, nlevels=20, xlab="Pressure", ylab="H2/WF6",
col="blue")
contour(xord, yord, pmats, nlevels=15, xlab="Pressure", ylab="H2/WF6",
col="red",add=TRUE)
par(mfrow=c(1,1))
The response surface models fit to (coded) Uniformity and Stress were: Uniformity = 5.93 - 1.91Pressure - 0.22H2/WF6 + 1.70PressureH2/WF6 Stress = 7.73 + 0.74Pressure + 0.50H2/WF6 - 0.49Pressure2*
###Trade-offs are often needed for multiple responses
The models and the corresponding contour plots show that trade-offs have
to be made when trying to achieve low values for both Uniformity and
Stress since a high value of Pressure is good for Uniformity while a low
value of Pressure is good for Stress. While low values of H2/WF6 are
good for both responses, the situation is further complicated by the
fact that the Peeling response (not considered in this analysis) was
unacceptable for values of H2/WF6 below approximately 5.
In this case, the experimenters chose to focus on optimizing Uniformity while keeping H2/WF6 at 5. That meant setting Pressure at 80 torr.
A set of 16 verification runs at the chosen conditions confirmed that all goals, except those for the Stress response, were met by this set of process settings.