Set up Rstudio

Setting up RMarkdown when opening it enables you to create dynamic, reproducible, and visually appealing reports, presentations, and documents, that can help you communicate your data analysis and research findings more effectively.

PART ONE

Application of Response Surface Methodology and Central Composite Design to Optimize Maize Production Using Cattle Manure, Poultry Manure and Chicken Manure

Load the required libraries

library(FrF2)
library(DoE.base)
library(conf.design)
library(gridBase)

Setting

Design

This experiment is designed to enable us to estimate the interaction and quadratic effects, and thus give us an idea of the shape of the response surface we are investigating. For this reason, this experiment is termed response surface method (RSM) designs. RSM designs are used for the following objectives:

1.Find improved or optimal process settings

2.Troubleshoot process problems and weak points

3. Make a product or process more robust (insensitive) against external and non-controllable influences.

The test: Null Hypothesis can be stated as:

“There is no effect of the variability in each factor/independent variable used (Cattle manure, Poultry manure, and Chicken manure) on the overall maize yield (response variable)”

Data Summary and Preliminary analysis

RSM <- read.csv("optimization.csv")
attach(RSM)
head(RSM,5)
  S.No Cattle.Manure..kg. Poultry.Manure..kg. Chicken.Manure..kg.
1    1                0.5                 1.0                 1.5
2    2                1.0                 1.0                 1.0
3    3                0.5                 0.5                 1.0
4    4                1.5                 0.5                 0.5
5    5                1.0                 1.5                 0.5
  Maize.Yield..kg.ha.
1                 450
2                 480
3                 400
4                 520
5                 530

Summary of the original data

summary(RSM)
      S.No       Cattle.Manure..kg. Poultry.Manure..kg. Chicken.Manure..kg.
 Min.   : 1.00   Min.   :0.2500     Min.   :0.2500      Min.   :0.2500     
 1st Qu.:10.75   1st Qu.:0.6875     1st Qu.:0.5000      1st Qu.:0.5000     
 Median :20.50   Median :1.0000     Median :0.7500      Median :1.0000     
 Mean   :20.50   Mean   :0.9750     Mean   :0.8812      Mean   :0.9187     
 3rd Qu.:30.25   3rd Qu.:1.2500     3rd Qu.:1.2500      3rd Qu.:1.2500     
 Max.   :40.00   Max.   :1.7500     Max.   :1.7500      Max.   :1.7500     
 Maize.Yield..kg.ha.
 Min.   :400.0      
 1st Qu.:450.0      
 Median :480.0      
 Mean   :484.8      
 3rd Qu.:522.5      
 Max.   :570.0      

Check the structure of the dataset

str(RSM)
'data.frame':   40 obs. of  5 variables:
 $ S.No               : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Cattle.Manure..kg. : num  0.5 1 0.5 1.5 1 0.5 1.5 1 0.5 1 ...
 $ Poultry.Manure..kg.: num  1 1 0.5 0.5 1.5 1.5 0.5 0.5 0.5 1.5 ...
 $ Chicken.Manure..kg.: num  1.5 1 1 0.5 0.5 1.5 1.5 0.5 0.5 1.5 ...
 $ Maize.Yield..kg.ha.: int  450 480 400 520 530 460 500 510 420 490 ...

Exploratory Data Analysis

# Generate 3 plots.
par(mfrow=c(2,2))
par(bg = "lightblue")
qqnorm(RSM$Maize.Yield..kg.ha.)
qqline(RSM$Maize.Yield..kg.ha., col = 2)
boxplot(RSM$Maize.Yield..kg.ha., horizontal=TRUE, main="Box Plot", xlab="Maize Yield")
hist(RSM$Maize.Yield..kg.ha., main="Histogram", xlab="Maize Yield")
par(mfrow=c(1,1))

par(mfrow=c(2,3))
par(bg = "lightblue")
plot(RSM$Maize.Yield..kg.ha.~RSM$Cattle.Manure..kg.,xlab="Cattle Manure (kgs)",ylab="Maize Yield (kgs)")
plot(RSM$Maize.Yield..kg.ha.~RSM$Poultry.Manure..kg.,xlab="Poultry Manure (kgs)",ylab="Maize Yield (kgs)")
plot(RSM$Maize.Yield..kg.ha.~RSM$Chicken.Manure..kg.,xlab="Chicken Manure (kgs)",ylab="Maize Yield (kgs")

Result:There is no clear a normal distribution in the way the response variable is distributed around the mean in the histogram. However, the boxplot does indicate some outliers. Most part the data is not normally distributed (as is also evident by the Normal Q-Q plot).

Generate boxplots

par(bg = "lightblue")
boxplot(RSM$Maize.Yield..kg.ha.~Cattle.Manure..kg., data=RSM, main="Maize Yield Production", xlab="Cattle Manure",ylab="Yields")

par(bg = "lightblue")
boxplot(RSM$Maize.Yield..kg.ha.~Poultry.Manure..kg., data=RSM, main="Maize Yield Production", xlab="Poultry Manure",ylab="Yields")

par(bg = "lightblue")
boxplot(RSM$Maize.Yield..kg.ha.~Chicken.Manure..kg., data=RSM, main="Maize Yield Production", xlab="Chicken Manure",ylab="Yields")

Combine Boxplots

par(mfrow=c(2,3))
par(bg = "lightblue")
boxplot(RSM$Maize.Yield..kg.ha.~Cattle.Manure..kg., data=RSM, main="Maize Yield Production", xlab="Cattle Manure",ylab="Yields")
boxplot(RSM$Maize.Yield..kg.ha.~Poultry.Manure..kg., data=RSM, main="Maize Yield Production", xlab="Poultry Manure",ylab="Yields")
boxplot(RSM$Maize.Yield..kg.ha.~Chicken.Manure..kg., data=RSM, main="Maize Yield Production", xlab="Chicken Manure",ylab="Yields")

Result: All the 4 factors show variability at all levels and can be assumed to have an impact on the response variable. However, we cannot derive anything conclusive from this exploratory data analysis. It only provides us with the direction of our next level analysis.

Response surface designs

As mentioned before, Response-surface methodology consists of a collection of methods for exploring the optimum operating conditions through experimental methods. Essentially, this involves doing several experiments, and then using the results of one experiment to provide direction for the next steps.

The next action could be to perform the experiment around a different set of conditions, or to collect more data in the current experimental domain in order to fit a higher-order model or simply to confirm to what we have found.

In the next step we fit a response surface model to the given data set. This is an extension of lm, and works almost like it; however, the model formula for rsm must make use of the special functions FO, TWI, PQ, or SO (for first-order,two-way interaction, pure quadratic and second-order respectively), because the presence of these specifies the response-surface portion of the model. (Reference : Response-Surface Methods in R, Using rsm Updated to version 2.00, 5 December 2012)

Estimate the Model

library(rsm)
par(bg = "lightblue")
model=rsm(Maize.Yield..kg.ha. ~ SO(Cattle.Manure..kg.,Poultry.Manure..kg.,Chicken.Manure..kg.), data=RSM) 
summary(model)

Call:
rsm(formula = Maize.Yield..kg.ha. ~ SO(Cattle.Manure..kg., Poultry.Manure..kg., 
    Chicken.Manure..kg.), data = RSM)

                                        Estimate Std. Error t value  Pr(>|t|)
(Intercept)                             363.6796    33.0300 11.0106 4.641e-12
Cattle.Manure..kg.                      165.3665    48.9877  3.3757  0.002051
Poultry.Manure..kg.                     -13.6060    58.8281 -0.2313  0.818665
Chicken.Manure..kg.                       6.4031    50.4366  0.1270  0.899824
Cattle.Manure..kg.:Poultry.Manure..kg.   39.0315    30.3162  1.2875  0.207772
Cattle.Manure..kg.:Chicken.Manure..kg.    9.8622    25.3042  0.3897  0.699477
Poultry.Manure..kg.:Chicken.Manure..kg. -18.3766    20.8779 -0.8802  0.385750
Cattle.Manure..kg.^2                    -63.2669    21.8825 -2.8912  0.007072
Poultry.Manure..kg.^2                    11.1209    28.0637  0.3963  0.694709
Chicken.Manure..kg.^2                     1.8019    22.1198  0.0815  0.935616
                                           
(Intercept)                             ***
Cattle.Manure..kg.                      ** 
Poultry.Manure..kg.                        
Chicken.Manure..kg.                        
Cattle.Manure..kg.:Poultry.Manure..kg.     
Cattle.Manure..kg.:Chicken.Manure..kg.     
Poultry.Manure..kg.:Chicken.Manure..kg.    
Cattle.Manure..kg.^2                    ** 
Poultry.Manure..kg.^2                      
Chicken.Manure..kg.^2                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.8004,    Adjusted R-squared:  0.7406 
F-statistic: 13.37 on 9 and 30 DF,  p-value: 2.8e-08

Analysis of Variance Table

Response: Maize.Yield..kg.ha.
                                                                  Df Sum Sq
FO(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.)   3  60383
TWI(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.)  3   3079
PQ(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.)   3   5855
Residuals                                                         30  17281
Lack of fit                                                       29  17081
Pure error                                                         1    200
                                                                  Mean Sq
FO(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.)  20127.5
TWI(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.)  1026.5
PQ(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.)   1951.6
Residuals                                                           576.0
Lack of fit                                                         589.0
Pure error                                                          200.0
                                                                  F value
FO(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.)  34.9419
TWI(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.)  1.7819
PQ(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.)   3.3880
Residuals                                                                
Lack of fit                                                        2.9450
Pure error                                                               
                                                                     Pr(>F)
FO(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.)  6.464e-10
TWI(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.)   0.17185
PQ(Cattle.Manure..kg., Poultry.Manure..kg., Chicken.Manure..kg.)    0.03074
Residuals                                                                  
Lack of fit                                                         0.43541
Pure error                                                                 

Stationary point of response surface:
 Cattle.Manure..kg. Poultry.Manure..kg. Chicken.Manure..kg. 
           4.098227            5.388745           14.486290 

Eigenanalysis:
eigen() decomposition
$values
[1]  19.438973  -1.008559 -68.774591

$vectors
                          [,1]       [,2]        [,3]
Cattle.Manure..kg.  -0.1858209 -0.1905947  0.96392130
Poultry.Manure..kg. -0.8917497 -0.3792430 -0.24689505
Chicken.Manure..kg.  0.4126173 -0.9054548 -0.09949148

Display the Response surface

par(mfrow=c(1,1))
par(bg = "lightblue")
contour(model, ~Cattle.Manure..kg.+Poultry.Manure..kg.+Chicken.Manure..kg., image=TRUE, at=summary(model$canonical$xs))

Generate plots taking two factors at a time

par(mfrow=c(1,1))
par(bg = "lightblue")
persp(model, ~ Cattle.Manure..kg.+Poultry.Manure..kg., image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Maize Yield (kgs)",col.lab=33,contour="colors")

par(mfrow=c(1,1))
par(bg = "lightblue")
persp(model, ~ Cattle.Manure..kg.+Chicken.Manure..kg., image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Maize Yield (kgs)",col.lab=33,contour="colors")

par(mfrow=c(1,1))
par(bg = "lightblue")
persp(model, ~ Chicken.Manure..kg.+Poultry.Manure..kg., image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Maize Yield (kgs)",col.lab=33,contour="colors",cex.lab = 0.8)

par(mfrow=c(1,1))
par(bg = "lightblue")
contour(model, ~Cattle.Manure..kg.+Poultry.Manure..kg., image=TRUE, at=summary(model$canonical$xs))

persp(model, ~ Poultry.Manure..kg.+Cattle.Manure..kg., image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Maize Yield (kgs)",col.lab=33,contour="colors")

par(mfrow=c(1,1))
par(bg = "lightblue")
contour(model, ~Cattle.Manure..kg.+Poultry.Manure..kg., image=TRUE, at=summary(model$canonical$xs))

persp(model, ~ Cattle.Manure..kg.+Poultry.Manure..kg., image = TRUE,at = c(summary(model)$canonical$xs, Block="B2"),theta=30,zlab="Maize Yield (kgs)",col.lab=33,contour="colors")

Diagnostics/Model Adequacy Checking

For further investigation we can use steepest descent method in case we encounter a saddle point.In the second-order case, the steepest function still works, but it uses the ridge analysis method which is the analog of steepest ascent/descent in the sense that for a specified distance d, it finds the point at which the predicted response is a maximum/minimum among all predictor combinations at radius d.

Model adequacy checking is done to test the validity of our model given the numerous assumptions it is based on. The shapiro test evaluates the Null hypothesis such that “the samples come from a Normal distribution” against the alternative hypothesis “the samples do not come from a Normal distribution”. The Q-Q Norm plots also test the normality of the sample. The scatter plots of the residuals and the fitted model will check the distribution of the model over the entire dynamic range.

Finding direction for further analysis

steepest(model)
Path of steepest ascent from ridge analysis:
   dist Cattle.Manure..kg. Poultry.Manure..kg. Chicken.Manure..kg. |     yhat
1   0.0              0.000               0.000               0.000 |  363.680
2   0.5              0.497               0.026               0.053 |  430.976
3   1.0              0.912               0.392               0.118 |  473.193
4   1.5              1.132               0.985              -0.011 |  510.718
5   2.0              1.275               1.527              -0.203 |  554.738
6   2.5              1.395               2.034              -0.406 |  607.617
7   3.0              1.506               2.521              -0.612 |  669.814
8   3.5              1.611               2.997              -0.818 |  741.471
9   4.0              1.713               3.466              -1.024 |  822.737
10  4.5              1.813               3.930              -1.231 |  913.665
11  5.0              1.912               4.391              -1.438 | 1014.378
canonical.path(model, dist = seq(-5, 5, by = 0.5))
   dist Cattle.Manure..kg. Poultry.Manure..kg. Chicken.Manure..kg. |    yhat
1  -5.0              1.989               3.802              -2.011 | 941.956
2  -4.5              1.896               3.356              -1.805 | 849.607
3  -4.0              1.803               2.910              -1.598 | 766.942
4  -3.5              1.710               2.464              -1.392 | 694.040
5  -3.0              1.617               2.018              -1.186 | 630.857
6  -2.5              1.524               1.572              -0.979 | 577.383
7  -2.0              1.432               1.127              -0.773 | 533.750
8  -1.5              1.339               0.681              -0.567 | 499.713
9  -1.0              1.246               0.235              -0.361 | 475.395
10 -0.5              1.153              -0.211              -0.154 | 460.818
11  0.0              1.060              -0.657               0.052 | 455.948
12  0.5              0.967              -1.103               0.258 | 460.797
13  1.0              0.874              -1.549               0.465 | 475.411
14  1.5              0.781              -1.995               0.671 | 499.707
15  2.0              0.688              -2.440               0.877 | 533.665
16  2.5              0.595              -2.886               1.084 | 577.452
17  3.0              0.502              -3.332               1.290 | 630.898
18  3.5              0.410              -3.778               1.496 | 694.044
19  4.0              0.317              -4.224               1.703 | 767.018
20  4.5              0.224              -4.670               1.909 | 849.626
21  5.0              0.131              -5.116               2.115 | 941.954

Shapiro Test

shapiro.test(RSM$Maize.Yield..kg.ha.)

    Shapiro-Wilk normality test

data:  RSM$Maize.Yield..kg.ha.
W = 0.95898, p-value = 0.1546

Plots for checking normality

par(bg = "lightblue")
qqnorm(residuals(model)) 
qqline(residuals(model))

Residuals Plot

par(bg = "lightblue")
plot(fitted(model),residuals(model))

From the results of the ridge analysis for points within the radius of 5 in the canonical path, we see that the best point for next experimentation is at a distance of 1.5. The Shapiro-Wilk normality test does not return a p-value which is very small (less than 0.1) which shows that the data is not perfectly normally distributed.The Normal Q-Q plot conforms normality for the residuals. The scatter plot also indicates no trend i.e. there is uniform scatter over the entire dynamic range.

PART TWO

Optimization of deposition layer Uniformity and deposition layer Stress, as a function of chemical vapor deposition (CVD) reactor process (Pressure (measured in torr) and the ratio of the gaseous reactants H2 and WF6; H2/WF6))Using Response Surface Methodology.

Determine the number of factors

Surface response methodology usually involves two or more quantitative factors. For example, if you are studying the effect of temperature and pressure on a chemical reaction, you will need to define the ranges of values for both temperature and pressure.

Define the range of values for each factor

For each quantitative factor, you need to define the minimum and maximum values. The range of values should cover the expected operating range of the factors. For example, if the temperature of the reaction is expected to range from 50 to 150 degrees Celsius, you might choose a minimum value of 50 and a maximum value of 150 for the temperature factor.

Generate a design matrix

Once you have defined the range of values for each factor, you can generate a design matrix that specifies the combinations of factor levels to be tested. There are several types of design matrices that can be used for surface response methodology, including central composite designs and Box-Behnken designs. These designs are often generated using statistical software such as SAS, MINITAB, STATISTICA, or Design Expert.

Calculate the response variable for each combination of factor levels

To complete the dataset, you need to calculate the response variable for each combination of factor levels specified in the design matrix. The response variable is the outcome of interest, such as yield, purity, or efficiency. The response variable can be calculated using experimental data or by using a mathematical model.

Each run in the design matrix specifies a combination of factor levels to be tested. The first four runs test the minimum and maximum levels of each factor, while the remaining runs test the midpoint of each factor and the interaction between the factors. The response variable would need to be calculated for each of these nine runs to complete the dataset.

Load the following required libraries

library(pacman)
p_load(data.table, stringr, magrittr, foreach, ggplot2, zoo, knitr, plyr,
dplyr, ggthemes, readxl, stringi, Hmisc, tidyr, rockchalk, stargazer, rsm)

Load the dataset

rsm_data <- read.csv("rsm2.csv")
attach(rsm_data)
head(rsm_data,5)
  Run Pressure H2.WF6 Uniformity Stress CodedPress CodedH2.WF6
1   1    80.00   6.00        4.6   8.04       1.00        0.00
2   2    42.00   6.00        6.2   7.78       0.00        0.00
3   3    68.87   3.17        3.4   7.58       0.71       -0.71
4   4    15.13   8.83        6.9   7.27      -0.71        0.71
5   5     4.00   6.00        7.3   6.49      -1.00        0.00

To perform the steps outlined above in R, you can use the rsm package, which is designed to provide R support for standard response-surface methods. Here are the steps to create a sample dataset for surface response methodology using the rsm package:

Install and load the rsm package

You can install the rsm package using the following code: install.packages(“rsm”). You can then load the package using the following code: library(rsm).

Define the range of values for each factor:

You can use the ccd.pick() function to generate a central composite design with a specified number of factors and levels. For example, to generate a central composite design with two factors, temperature and pressure, and three levels for each factor, you can use the following

Data Source

A CCD with two responses

This example uses experimental data published in Czitrom and Spagon, (1997), Statistical Case Studies for Industrial Process Improvement. The material is copyrighted by the American Statistical Association and the Society for Industrial and Applied Mathematics, and is used with their permission. Specifically, Chapter 15, titled “Elimination of TiN Peeling During Exposure to CVD Tungsten Deposition Process Using Designed Experiments”, describes a semiconductor wafer processing experiment (labeled Experiment 2).

Goal, response variables, and factor variables

The goal of this experiment was to fit response surface models to the two responses, deposition layer Uniformity and deposition layer Stress, as a function of two particular controllable factors of the chemical vapor deposition (CVD) reactor process. These factors were Pressure (measured in torr) and the ratio of the gaseous reactants H2 and WF6 (called H2/WF6). The experiment also included an important third (categorical) response - the presence or absence of titanium nitride (TiN) peeling. The third response has been omitted in this example in order to focus on the response surface aspects of the experiment. To summarize, the goal is to obtain a response surface model for two responses, Uniformity and Stress. The factors are: Pressure and H2/WF6.

Experiment Description

The design is a 13-run CCI design with 3 centerpoints

The minimum and maximum values chosen for Pressure were 4 torr and 80 torr (0.5333 kPa and 10.6658 kPa). Although the international system of units indicates that the standard unit for pressure is Pascal, or 1 N/m2, we use torr to be consistent with the analysis appearing in the paper by Czitrom and Spagon. The minimum and maximum H2/WF6 ratios were chosen to be 2 and 10. Since response curvature, especially for Uniformity, was a distinct possibility, an experimental design that allowed estimating a second order (quadratic) model was needed. The experimenters decided to use a central composite inscribed (CCI) design. For two factors, this design is typically recommended to have 13 runs with 5 centerpoint runs. However, the experimenters, perhaps to conserve a limited supply of wafer resources, chose to include only 3 centerpoint runs. The design is still rotatable, but the uniform precision property has been sacrificed.

Table containing the CCI design points and experimental responses

The table below shows the CCI design and experimental responses, in the order in which they were run (presumably randomized). The last two columns show coded values of the factors. (The reader can download the data as a text file.)

Low values of both responses are better than high

Uniformity is calculated from four-point probe sheet resistance measurements made at 49 different locations across a wafer. The value in the table is the standard deviation of the 49 measurements divided by their mean, expressed as a percentage. So a smaller value of Uniformity indicates a more uniform layer - hence, lower values are desirable. The Stress calculation is based on an optical measurement of wafer bow, and again lower values are more desirable.

Analysis of DOE Data

Steps for fitting a response surface model

The steps for fitting a response surface (second-order or quadratic) model are as follows: * Fit the full model to the first response. * Use stepwise regression, forward selection, or backward elimination to identify important variables. * When selecting variables for inclusion in the model, follow the hierarchy principle and keep all main effects that are part of significant higher-order terms or interactions, even if the main effect p-value is larger than you would like (note that not all analysts agree with this principle). * Generate diagnostic residual plots (histograms, box plots, normal plots, etc.) for the model selected. * Examine the fitted model plot, interaction plots, and ANOVA statistics (R2, adjusted R2, lack-of-fit test, etc.). Use all these plots and statistics to determine whether the model fit is satisfactory. * Use contour plots of the response surface to explore the effect of changing factor levels on the response. * Repeat all the above steps for the second response variable. * After satisfactory models have been fit to both responses, you can overlay the surface contours for both responses. * Find optimal factor settings.

Fitting a Model to the Uniformity Response, Simplifying the Model and Checking Residuals

Fit full quadratic model to Uniformity response

data <- read.csv("rsm2.csv")
m <- data
head(m,5)
  Run Pressure H2.WF6 Uniformity Stress CodedPress CodedH2.WF6
1   1    80.00   6.00        4.6   8.04       1.00        0.00
2   2    42.00   6.00        6.2   7.78       0.00        0.00
3   3    68.87   3.17        3.4   7.58       0.71       -0.71
4   4    15.13   8.83        6.9   7.27      -0.71        0.71
5   5     4.00   6.00        7.3   6.49      -1.00        0.00

Assign New Variables Names

run = m[,1]
pressure = m[,2]
h2 = m[,3]
uniformity = m[,4]
stress = m[,5]
ph2 = pressure*h2
press2 = pressure*pressure
h22 = h2*h2

Coded variables.

cpress = m[,6]
ch2 = m[,7]
cph = cpress*ch2
ch22 = ch2*ch2
cpress2 = cpress*cpress

Create data frame.

df <- data.frame(run,pressure,h2,uniformity,stress,cpress,ch2,
                ph2,press2,h22,cph,ch22,cpress2)
head(df,5)
  run pressure   h2 uniformity stress cpress   ch2      ph2    press2     h22
1   1    80.00 6.00        4.6   8.04   1.00  0.00 480.0000 6400.0000 36.0000
2   2    42.00 6.00        6.2   7.78   0.00  0.00 252.0000 1764.0000 36.0000
3   3    68.87 3.17        3.4   7.58   0.71 -0.71 218.3179 4743.0769 10.0489
4   4    15.13 8.83        6.9   7.27  -0.71  0.71 133.5979  228.9169 77.9689
5   5     4.00 6.00        7.3   6.49  -1.00  0.00  24.0000   16.0000 36.0000
      cph   ch22 cpress2
1  0.0000 0.0000  1.0000
2  0.0000 0.0000  0.0000
3 -0.5041 0.5041  0.5041
4 -0.5041 0.5041  0.5041
5  0.0000 0.0000  1.0000

Fit full model.

z = lm(uniformity ~ pressure+h2+ph2+press2+h22,data=df)
summary(z)

Call:
lm.default(formula = uniformity ~ pressure + h2 + ph2 + press2 + 
    h22, data = df)

Residuals:
      1       2       3       4       5       6       7       8       9      10 
 0.5125  0.3334 -0.5068  0.6068 -0.6124  0.5334  0.2886  0.1751 -0.1886 -0.2752 
     11 
-0.8666 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)  1.137e+01  1.977e+00   5.753  0.00223 **
pressure    -1.252e-01  4.663e-02  -2.684  0.04361 * 
h2          -5.507e-01  5.082e-01  -1.084  0.32794   
ph2          1.118e-02  4.773e-03   2.342  0.06622 . 
press2       9.235e-05  4.231e-04   0.218  0.83584   
h22          2.088e-03  3.817e-02   0.055  0.95850   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.7259 on 5 degrees of freedom
Multiple R-squared:  0.8707,    Adjusted R-squared:  0.7415 
F-statistic: 6.736 on 5 and 5 DF,  p-value: 0.02823

Perform stepwise regression.

zz = step(z,direction="both")
Start:  AIC=-3.72
uniformity ~ pressure + h2 + ph2 + press2 + h22

           Df Sum of Sq    RSS     AIC
- h22       1    0.0016 2.6362 -5.7141
- press2    1    0.0251 2.6597 -5.6164
<none>                  2.6346 -3.7207
- h2        1    0.6189 3.2535 -3.3997
- ph2       1    2.8900 5.5246  2.4245
- pressure  1    3.7961 6.4307  4.0950

Step:  AIC=-5.71
uniformity ~ pressure + h2 + ph2 + press2

           Df Sum of Sq    RSS     AIC
- press2    1    0.0236 2.6598 -7.6162
<none>                  2.6362 -5.7141
+ h22       1    0.0016 2.6346 -3.7207
- ph2       1    2.8900 5.5262  0.4277
- h2        1    3.0077 5.6439  0.6595
- pressure  1    3.9601 6.5963  2.3748

Step:  AIC=-7.62
uniformity ~ pressure + h2 + ph2

           Df Sum of Sq     RSS     AIC
<none>                   2.6598 -7.6162
+ press2    1    0.0236  2.6362 -5.7141
+ h22       1    0.0001  2.6597 -5.6164
- ph2       1    2.8900  5.5498 -1.5255
- h2        1    3.0077  5.6675 -1.2946
- pressure  1    7.9682 10.6280  5.6216

Stepwise regression for Uniformity

Start: AIC=-3.79 Model: Uniformity ~ Pressure + H2/WF6 + Pressure*H2/WF6 + Pressure^2 + H2/WF6^2

Step 1: Remove H2/WF6^2, AIC=-5.79 Model: Uniformity ~ Pressure + H2/WF6 + Pressure*H2/WF6 + Pressure^2

Step 2: Remove Pressure^2, AIC=-7.69 Model: Uniformity ~ Pressure + H2/WF6 + Pressure*H2/WF6

Step 3: Remove H2/WF6, AIC=-8.88 Model: Uniformity ~ Pressure + Pressure*H2/WF6

The stepwise routine selects a model containing the intercept, Pressure, and the interaction term. However, many statisticians do not think an interaction term should be included in a model unless both main effects are also included. Thus, we will use the model from Step 2 that included Pressure, H2/WF6, and the interaction term. Interaction plots confirm the need for an interaction term in the model.

generate ANOVA table for selected model

anova(update(zz,~1),zz)
Analysis of Variance Table

Model 1: uniformity ~ 1
Model 2: uniformity ~ pressure + h2 + ph2
  Res.Df     RSS Df Sum of Sq      F   Pr(>F)   
1     10 20.3818                                
2      7  2.6598  3    17.722 15.547 0.001772 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Summarize the Model

summary(zz)

Call:
lm.default(formula = uniformity ~ pressure + h2 + ph2, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.9273 -0.3932  0.1479  0.3920  0.6295 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 11.195193   1.186352   9.437 3.13e-05 ***
pressure    -0.117395   0.025636  -4.579  0.00255 ** 
h2          -0.525696   0.186849  -2.813  0.02602 *  
ph2          0.011178   0.004053   2.758  0.02818 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6164 on 7 degrees of freedom
Multiple R-squared:  0.8695,    Adjusted R-squared:  0.8136 
F-statistic: 15.55 on 3 and 7 DF,  p-value: 0.001772

Generate ANOVA Table

anova(zz)
Analysis of Variance Table

Response: uniformity
          Df  Sum Sq Mean Sq F value    Pr(>F)    
pressure   1 14.6296 14.6296 38.5022 0.0004431 ***
h2         1  0.2024  0.2024  0.5326 0.4891950    
ph2        1  2.8900  2.8900  7.6059 0.0281821 *  
Residuals  7  2.6598  0.3800                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Perform lack-of-fit test.

lof = factor(paste(pressure,h2,ph2))
inner.model = lm(uniformity ~ pressure + h2 + ph2, data = df)
outer.model = lm(uniformity ~ lof)
anova(inner.model, outer.model)
Analysis of Variance Table

Model 1: uniformity ~ pressure + h2 + ph2
Model 2: uniformity ~ lof
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1      7 2.6598                           
2      2 1.1467  5    1.5131 0.5278 0.7559

Plot actual versus predicted.

par(mfrow=c(1,1),bg=rgb(1,1,0.8))
plot(predict(zz),df$uniformity,ylab="Observed Uniformity",
       xlab="Predicted Uniformity", col=4, pch=19)


### Add regression line and confidence bounds to the plot.
rline = lm(predict(zz)~df$uniformity)
abline(rline)
bnds = data.frame(predict.lm(rline, interval = "confidence"),uniformity)
bnds = bnds[order(bnds$fit),]
lines(bnds$uniformity,bnds[,2],col=2)
lines(bnds$uniformity,bnds[,3],col=2)

par(mfrow=c(1,1))

Generate normal probability plot of the effects.

Save parameters in a vector, but remove intercept.

qef = z$coef
qef = qef[-1]
  
## Sort effects and save labels.
sef = qef[order(qef)]
qlab = names(sef)
qlab=c("H2/WF6","Press","Press^2","H2/WF6^2","Press*H2/WF6")
## Generate theoretical quantiles.
ip = ppoints(length(sef))
zp = qnorm(ip)
  
  ## Generate normal probability plot of all effects (excluding the
  ## intercept).  
par(mfrow=c(1,1),bg=rgb(1,1,0.8))
plot(zp, sef, pch=19,
       ylab="Parameter Estimate", xlab="Theoretical Quantiles",
       main="Normal Probability Plot of Parameter Estimates")
qqline(sef, col=2)
abline(h=0, col=4)

## Add labels for effects.
  small2 = c(1:(length(sef)-3))
  text(zp[small2],sef[small2],label=qlab[small2],pos=4,cex=1)
  text(zp[-small2],sef[-small2],label=qlab[-small2],pos=2,cex=1)

## Generate interaction plots.
dfp = subset(df,pressure==15.13|pressure==68.87)
  
par(mfrow=c(1,1), bg=rgb(1,1,0.8), mar=c(5, 8, 2, 4))
interaction.plot(dfp$h2, dfp$pressure, dfp$uniformity, fun=mean,
                   type="b", pch=c(21,21), col=4,
                   xlab="H2/W6", ylab="Uniformity", trace.lab="Pressure")

interaction.plot(dfp$pressure, dfp$h2, dfp$uniformity, fun=mean,
                   type="b", pch=c(21,21), col=4,
                   trace.lab="H2/W6", ylab="Uniformity", xlab="Pressure")

Analysis of model selected by stepwise regression for Uniformity

Source DF Sum of Sq Mean Sq F value Pr(>F)
Model 3 17.739 5.9130 15.66 0.0017
Total error 7 2.643 0.3776

Lack-of-fit 5 1.4963 0.2993 0.52 0.7588 Pure error 2 1.1467 0.5734

Residual standard error: 0.6145 based on 7 degrees of freedom Multiple R-squared: 0.8703 Adjusted R-squared: 0.8148

Source Estimate Std. Error t value Pr(>|t|)
Intercept 5.9273 0.1853 31.993 7.54e-09
Pressure -1.9097 0.3066 -6.228 0.000433
H2/WF6 -0.2241 0.3066 -0.731 0.488607
Pressure*H2/WF6 1.6862 0.6095 2.767 0.027829

Generate x and y data for plotting.

xord = seq(4,80,by=4)
yord = seq(2,10,by=.5)

Generate predicted response surface and generate matrix of surface.

  model = function (a, b){
    zz$coefficients[1] +
      zz$coefficients[2]*a +
      zz$coefficients[3]*b +
      zz$coefficients[4]*a*b}
  pmatu = outer(xord,yord,model)

Generate contour plot.

A contour plot and perspective plot of Uniformity provide a visual display of the response surface.

par(mfrow=c(1,1), bg=rgb(1,1,0.8))
  contour(xord, yord, pmatu, nlevels=30, xlab="Pressure", ylab="H2/WF6",
          col="blue")

par(mfrow=c(1,1))

Generate perspective plot.

  par(mfrow=c(1,1), bg=rgb(1,1,0.8))
  persp(xord, yord, pmatu, xlab="Pressure", ylab="H2/WF6", zlab="Uniformity",
        theta=30, phi=30, ticktype="detailed", col="lightblue") 

  par(mfrow=c(1,1))

Generate four plots of residuals.

Residual plots We perform a residuals analysis to validate the model assumptions. We generate a normal plot, a box plot, a histogram and a run-order plot of the residuals.

par(mfrow=c(2,2), bg=rgb(1,1,0.8))
qqnorm(zz$residuals)
qqline(zz$residuals, col = 2)
abline(h=0)
boxplot(zz$residuals, horizontal=TRUE, main="Box Plot", xlab="Residual")
hist(zz$residuals, main="Histogram", xlab="Residual")
plot(run, zz$residuals, xlab="Actual Run Order", ylab="Residual",
       main="Run Order Plot", col=4, pch=19)

par(mfrow=c(1,1))

The residual plots do not indicate problems with the underlying assumptions.

Conclusions from the analysis

From the above output, we make the following conclusions. * The R2 is reasonable for fitting Uniformity (well known to be a difficult response to model). * The lack-of-fit test is not significant (very small “Prob > F” would indicate a lack of fit). * The residual plots do not reveal any major violations of the underlying assumptions. * The interaction plot shows why an interaction term is needed (parallel lines would suggest no interaction).

Fitting a Model to the Stress Response, Simplifying the Model and Checking Residuals

q = lm(stress ~ pressure+h2+ph2+press2+h22,data=df)
  summary(q)

Call:
lm.default(formula = stress ~ pressure + h2 + ph2 + press2 + 
    h22, data = df)

Residuals:
        1         2         3         4         5         6         7         8 
 0.042447 -0.010004 -0.019295  0.009363 -0.032516 -0.100004  0.033712 -0.012256 
        9        10        11 
-0.043644  0.022198  0.109996 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.427e+00  2.081e-01  26.073 1.55e-06 ***
pressure     4.748e-02  4.909e-03   9.671 0.000201 ***
h2           1.951e-01  5.350e-02   3.646 0.014803 *  
ph2          4.603e-04  5.025e-04   0.916 0.401714    
press2      -3.670e-04  4.454e-05  -8.240 0.000429 ***
h22         -7.498e-03  4.019e-03  -1.866 0.121067    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.07642 on 5 degrees of freedom
Multiple R-squared:  0.9919,    Adjusted R-squared:  0.9838 
F-statistic: 122.3 on 5 and 5 DF,  p-value: 3.192e-05

Perform stepwise regression.

  stepq = step(q,direction="both")
Start:  AIC=-53.25
stress ~ pressure + h2 + ph2 + press2 + h22

           Df Sum of Sq     RSS     AIC
- ph2       1   0.00490 0.03410 -53.539
<none>                  0.02920 -53.245
- h22       1   0.02033 0.04954 -49.433
- h2        1   0.07766 0.10686 -40.975
- press2    1   0.39658 0.42579 -25.769
- pressure  1   0.54626 0.57547 -22.455

Step:  AIC=-53.54
stress ~ pressure + h2 + press2 + h22

           Df Sum of Sq     RSS     AIC
<none>                  0.03410 -53.539
+ ph2       1   0.00490 0.02920 -53.245
- h22       1   0.02033 0.05444 -50.395
- h2        1   0.11110 0.14521 -39.602
- press2    1   0.39658 0.43069 -27.643
- pressure  1   0.98214 1.01625 -18.200
qq = lm(stress ~ pressure+h2+press2,data=df)
anova(qq,q)
Analysis of Variance Table

Model 1: stress ~ pressure + h2 + press2
Model 2: stress ~ pressure + h2 + ph2 + press2 + h22
  Res.Df      RSS Df Sum of Sq      F Pr(>F)
1      7 0.054435                           
2      5 0.029203  2  0.025232 2.1601 0.2108

ANOVA for reduced model for stress

  anova(update(qq,~1),qq)
Analysis of Variance Table

Model 1: stress ~ 1
Model 2: stress ~ pressure + h2 + press2
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     10 3.6001                                  
2      7 0.0544  3    3.5456 151.98 9.839e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Summary of model qq

summary(qq)

Call:
lm.default(formula = stress ~ pressure + h2 + press2, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.07576 -0.04519 -0.02985  0.04699  0.16647 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.567e+00  1.056e-01  52.706 2.32e-10 ***
pressure     4.819e-02  4.286e-03  11.241 9.84e-06 ***
h2           1.244e-01  1.102e-02  11.293 9.55e-06 ***
press2      -3.426e-04  4.912e-05  -6.974 0.000217 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.08818 on 7 degrees of freedom
Multiple R-squared:  0.9849,    Adjusted R-squared:  0.9784 
F-statistic:   152 on 3 and 7 DF,  p-value: 9.839e-07

Generate anova table.

  anova(qq)
Analysis of Variance Table

Response: stress
          Df  Sum Sq Mean Sq F value    Pr(>F)    
pressure   1 2.17573 2.17573 279.783 6.676e-07 ***
h2         1 0.99166 0.99166 127.521 9.550e-06 ***
press2     1 0.37822 0.37822  48.637 0.0002165 ***
Residuals  7 0.05444 0.00778                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The stepwise routine identifies a model containing the intercept, the main effects, and both squared terms. However, the fit of the full quadratic model indicates that neither the H2/WF6 squared term nor the interaction term are significant. A comparison of the full model and the model containing just the main effects and squared pressure terms indicates that there is no significant difference between the two models.

Perform lack-of-fit test.

lof = factor(paste(pressure,h2,ph2))
inner.model = lm(stress ~ pressure + h2 + press2, data = df)
outer.model = lm(stress ~ lof)
anova(inner.model, outer.model)
Analysis of Variance Table

Model 1: stress ~ pressure + h2 + press2
Model 2: stress ~ lof
  Res.Df      RSS Df Sum of Sq      F Pr(>F)
1      7 0.054435                           
2      2 0.022200  5  0.032235 0.5808 0.7301
## Plot actual versus predicted.
par(mfrow=c(1,1), bg=rgb(1,1,0.8))
plot(predict(qq),df$stress,ylab="Observed Stress",
       xlab="Predicted Stress", pch=19, col=4)
  ## Add regression line and confidence bounds to the plot.
rline = lm(predict(qq)~df$stress)
abline(rline)
bnds = data.frame(predict.lm(rline, interval = "confidence"), stress)
bnds = bnds[order(bnds$fit),]
lines(bnds$stress,bnds[,2], col=2)
lines(bnds$stress,bnds[,3], col=2)

par(mfrow=c(1,1))

Generate normal probability plot of the effects.

Save parameters in a vector, but remove intercept.

  qef = q$coef
  qef = qef[-1]
  
## Sort effects and save labels.
  sef = qef[order(qef)]
  qlab = names(sef)
  qlab=c("H2/WF6^2","Press^2","Press*H2/WF6","Press","H2/WF6")

Generate theoretical quantiles.

ip = ppoints(length(sef))
  zp = qnorm(ip)

Generate normal probability plot of all effects (excluding the intercept).

par(mfrow=c(1,1),bg=rgb(1,1,0.8))
  plot(zp, sef, pch=19,
       ylab="Parameter Estimate", xlab="Theoretical Quantiles",
       main="Normal Probability Plot of Parameter Estimates")
  ## Add labels for effects.
  small2 = c(1:(length(sef)-2))
  text(zp[small2],sef[small2],label=qlab[small2],pos=4,cex=1)
  text(zp[-small2],sef[-small2],label=qlab[-small2],pos=2,cex=1)

  par(mfrow=c(1,1))

Generate interaction plots.

dfp = subset(df,pressure==15.13|pressure==68.87)
  
par(mfrow=c(2,1), bg=rgb(1,1,0.8), mar=c(5, 8, 2, 4))
interaction.plot(dfp$h2, dfp$pressure, dfp$stress, fun=mean,
                   type="b", pch=c(21,21), col=2,
                   xlab="H2/W6", ylab="Stress", trace.lab="Pressure")
  
interaction.plot(dfp$pressure, dfp$h2, dfp$stress, fun=mean,
                   type="b", pch=c(21,21), col=2,
                   trace.lab="H2/W6", ylab="Stress", xlab="Pressure")

par(mfrow=c(1,1), mar=c(5, 4, 4, 2)+0.1)

Thus, we will proceed with the model containing main effects and the squared pressure term. The fact that the stepwise procedure selected a model for Stress containing a term that was not significant indicates that all output generated by statistical software should be carefully examined. In this case, the stepwise procedure identified the model with the lowest AIC (Akaike information criterion), but did not take into account contributions by individual terms. Other software using a different criteria may identify a different model, so it is important to understand the algorithms being used.

Generate x and y data for plotting.

xord = seq(4,80,by=4)
yord = seq(2,10,by=.5)

Generate predicted response surface and generate matrix of surface.

model = function (a, b){
    qq$coefficients[1] +
      qq$coefficients[2]*a +
      qq$coefficients[3]*b +
      qq$coefficients[4]*a*a}
pmats = outer(xord,yord,model)

Generate contour plot.

par(mfrow=c(1,1), bg=rgb(1,1,0.8))
  contour(xord, yord, pmats, nlevels=30, xlab="Pressure", ylab="H2/WF6",
          col="red")

  par(mfrow=c(1,1))

Generate perspective plot.

par(mfrow=c(1,1), bg=rgb(1,1,0.8))
persp(xord, yord, pmats, xlab="Pressure", ylab="H2/WF6", zlab="Stress",
        theta=30, phi=30, ticktype="detailed", col="lightblue") 

par(mfrow=c(1,1))

Residual plots

We perform a residuals analysis to validate the model by generating a run-order plot, box plot, histogram, and normal probability plot of the residuals.

Generate four plots of residuals.

par(mfrow=c(2,2), bg=rgb(1,1,0.8))
qqnorm(qq$residuals)
qqline(qq$residuals, col = 2)
abline(h=0)
boxplot(qq$residuals, horizontal=TRUE, main="Box Plot", xlab="Residual")
hist(qq$residuals, main="Histogram", xlab="Residual")
plot(run, qq$residuals, xlab="Actual Run Order", ylab="Residual",
       main="Run Order Plot", pch=19, col=4)

par(mfrow=c(1,1))

The residual plots do not indicate any major violations of the underlying assumptions.

Conclusions

From the above output, we make the following conclusions. * The R2 is very good for fitting Stress. * The lack-of-fit test is not significant (very small “Prob > F” would indicate a lack of fit). * The residual plots do not reveal any major violations of the underlying assumptions. * The nearly parallel lines in the interaction plots show why an interaction term is not needed.

Response Surface Contours for Both Responses

Overlay contour plots

We overlay the contour plots for the two responses to visually compare the surfaces over the region of interest.

Overlay uniformity and stress contour plots.

par(mfrow=c(1,1), bg=rgb(1,1,0.8))
contour(xord, yord, pmatu, nlevels=20, xlab="Pressure", ylab="H2/WF6",
          col="blue")
contour(xord, yord, pmats, nlevels=15, xlab="Pressure", ylab="H2/WF6",
          col="red",add=TRUE)

par(mfrow=c(1,1))

Summary

Final response surface models

The response surface models fit to (coded) Uniformity and Stress were: Uniformity = 5.93 - 1.91Pressure - 0.22H2/WF6 + 1.70PressureH2/WF6 Stress = 7.73 + 0.74Pressure + 0.50H2/WF6 - 0.49Pressure2*

###Trade-offs are often needed for multiple responses
The models and the corresponding contour plots show that trade-offs have to be made when trying to achieve low values for both Uniformity and Stress since a high value of Pressure is good for Uniformity while a low value of Pressure is good for Stress. While low values of H2/WF6 are good for both responses, the situation is further complicated by the fact that the Peeling response (not considered in this analysis) was unacceptable for values of H2/WF6 below approximately 5.

Uniformity was chosen as more important

In this case, the experimenters chose to focus on optimizing Uniformity while keeping H2/WF6 at 5. That meant setting Pressure at 80 torr.

Confirmation runs validated the model projections

A set of 16 verification runs at the chosen conditions confirmed that all goals, except those for the Stress response, were met by this set of process settings.