Setting up RMarkdown when opening it enables you to create dynamic, reproducible, and visually appealing reports, presentations, and documents, that can help you communicate your data analysis and research findings more effectively.
R is an open-source programming language and software environment for statistical computing and graphics. It is widely used by data analysts and statisticians for data analysis, visualization, and modeling. R provides a comprehensive set of tools for statistical analysis, including linear and nonlinear modeling, classical statistical tests, time-series analysis, and data visualization. RStudio is an integrated development environment (IDE) for R that provides a user-friendly interface for working with R. It includes a code editor, debugging tools, and data visualization tools that make it easier to work with R code. RStudio also supports the creation of interactive web applications using the Shiny package, making it a powerful tool for data visualization and analysis.
One of the key benefits of R and RStudio is their flexibility and extensibility. R has a large community of developers who contribute packages that extend its functionality, making it possible to perform a wide range of analyses with just a few lines of code. RStudio provides an interface for managing these packages and installing new ones. R and RStudio can be used for a wide range of data analysis tasks, including data cleaning and preparation, statistical modeling, and data visualization. They are particularly useful for working with large datasets, as they can handle data from a variety of sources and perform complex analyses with ease.
In summary, R and RStudio are powerful tools for statistical computing and data analysis. Their flexibility, extensibility, and user-friendly interface make them popular among data analysts and statisticians, and their ability to handle large datasets and perform complex analyses make them essential tools for many data-driven industries.
Response Surface Methodology (RSM) is a statistical technique used to model and optimize the relationship between a response variable and one or more independent variables. RSM involves designing experiments, fitting mathematical models to the experimental data, and using the models to predict the optimal conditions for a process. RSM is particularly useful for optimizing processes that involve multiple variables, as it allows researchers to explore the relationship between the variables and determine the optimal settings for each.
RSM has been used in a wide range of fields, including chemistry, engineering, agriculture, and social sciences. In chemistry, RSM has been used to optimize the conditions for chemical reactions, such as temperature, pressure, and concentrations of reactants. In engineering, RSM has been used to optimize the parameters of manufacturing processes, such as cutting speed and feed rate. In agriculture, RSM has been used to optimize the conditions for crop growth, such as fertilizer application and irrigation.
The main goal of RSM is to find the optimal values of the independent variables that result in the maximum or minimum value of the response variable. RSM accomplishes this by constructing a response surface, which is a mathematical model that describes the relationship between the independent variables and the response variable. The response surface can be visualized as a contour plot or a three-dimensional surface, which allows researchers to identify the optimal values of the independent variables that result in the desired response.
Overall, RSM is a powerful statistical technique that is used to optimize processes and improve the efficiency of experiments. Its ability to model the relationship between multiple variables makes it a valuable tool for researchers and practitioners in a wide range of fields.
fn.data <- "http://statacumen.com/teach/RSM/data/RSM_TAB_02-08.txt"
df.2.8 <- read.table(fn.data, header=TRUE, skip=1)
str(df.2.8)
'data.frame': 12 obs. of 3 variables:
$ x1: int -1 1 -1 1 -9 9 0 0 0 0 ...
$ x2: int -1 -1 1 1 0 0 -9 9 0 0 ...
$ y : int 43 78 69 73 48 76 65 74 76 79 ...
head(df.2.8,5)
x1 x2 y
1 -1 -1 43
2 1 -1 78
3 -1 1 69
4 1 1 73
5 -9 0 48
df.2.8[,c("x1","x2")] <- replace(df.2.8[,c("x1","x2")], (df.2.8[,c("x1","x2")] == 9)
, sqrt(2))
df.2.8[,c("x1","x2")] <- replace(df.2.8[,c("x1","x2")], (df.2.8[,c("x1","x2")] == -9)
, -sqrt(2))
head(df.2.8,5)
x1 x2 y
1 -1.000000 -1 43
2 1.000000 -1 78
3 -1.000000 1 69
4 1.000000 1 73
5 -1.414214 0 48
The Scatterplot matrix below shows some relationships between y and other variables.
library(ggplot2)
library(reshape)
library(GGally)
suppressMessages(suppressWarnings(library(GGally)))
p <- ggpairs(df.2.8, alpha = 0.1)
# put scatterplots on top so y axis is vertical
#p <- ggpairs(df.2.8, upper = list(continuous = "points")
# , lower = list(continuous = "cor")
# )
print(p)
# detach package after use so reshape2 works (old reshape (v.1) conflicts)
detach("package:GGally", unload=TRUE)
detach("package:reshape", unload=TRUE)
Correlation matrix indicates some (linear) correlations with y are different than zero, but if curvature exists, this summary is not very meaningful.
# correlation matrix and associated p-values testing "H0: rho == 0"
library(Hmisc)
rcorr(as.matrix(df.2.8))
x1 x2 y
x1 1.00 0.00 0.66
x2 0.00 1.00 0.28
y 0.66 0.28 1.00
n= 12
P
x1 x2 y
x1 1.0000 0.0193
x2 1.0000 0.3718
y 0.0193 0.3718
library(corrplot)
library(dplyr)
library(magrittr)
df.2.8 %>%
dplyr::select(x1:y) %>%
cor() %>%
round(3) %>%
corrplot(method = "color", addCoef.col="white", type = "upper",
title="Correlation between response variable and Predictor variables",
mar=c(0,0,2,0),
tl.cex=0.5, number.cex = 0.4)
Because this is a special kind of model (a full second-order model), we can get the test for higher order terms and lack of it simply by using rsm().
# load the rsm package
library(rsm)
# fit second-order (SO) model
# -- look up ?SO and see other options
rsm.2.8.y.SOx12 <- rsm(y ~ SO(x1, x2), data = df.2.8)
# which variables are available in the rsm object?
names(rsm.2.8.y.SOx12)
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "xlevels" "call" "terms" "model"
[13] "data" "b" "order" "B"
[17] "newlabs"
names(summary(rsm.2.8.y.SOx12))
[1] "call" "terms" "residuals" "coefficients"
[5] "aliased" "sigma" "df" "r.squared"
[9] "adj.r.squared" "fstatistic" "cov.unscaled" "canonical"
[13] "lof"
summary(rsm.2.8.y.SOx12)
Call:
rsm(formula = y ~ SO(x1, x2), data = df.2.8)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 79.75000 1.21361 65.7133 8.352e-10 ***
x1 9.82475 0.85815 11.4488 2.665e-05 ***
x2 4.21599 0.85815 4.9129 0.0026759 **
x1:x2 -7.75000 1.21361 -6.3859 0.0006938 ***
x1^2 -8.87500 0.95944 -9.2502 9.017e-05 ***
x2^2 -5.12500 0.95944 -5.3417 0.0017585 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.98, Adjusted R-squared: 0.9634
F-statistic: 58.85 on 5 and 6 DF, p-value: 5.119e-05
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
FO(x1, x2) 2 914.40 457.20 77.6054 5.155e-05
TWI(x1, x2) 1 240.25 240.25 40.7801 0.0006938
PQ(x1, x2) 2 578.92 289.46 49.1328 0.0001906
Residuals 6 35.35 5.89
Lack of fit 3 8.60 2.87 0.3214 0.8119178
Pure error 3 26.75 8.92
Stationary point of response surface:
x1 x2
0.55819271 -0.01073203
Eigenanalysis:
eigen() decomposition
$values
[1] -2.695206 -11.304794
$vectors
[,1] [,2]
x1 0.5312434 -0.8472193
x2 -0.8472193 -0.5312434
rsm.2.8.y.SOx12$studres <- rstudent(rsm.2.8.y.SOx12)
rsm.2.8.y.SOx12$studres
1 2 3 4 5 6
-0.61072616 -0.71491975 0.71491975 0.61072616 -0.06495026 0.06495026
7 8 9 10 11 12
0.98067965 -0.98067965 -2.37660148 -0.32922247 1.81973167 0.55959278
The following illustrates ftting several model types using rsm().
# fit the first-order model
rsm.2.8.y.FOx12 <- rsm(y ~ FO(x1, x2), data = df.2.8)
# externally Studentized residuals
rsm.2.8.y.FOx12$studres <- rstudent(rsm.2.8.y.FOx12)
summary(rsm.2.8.y.FOx12)
Call:
rsm(formula = y ~ FO(x1, x2), data = df.2.8)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 70.4167 2.8129 25.0338 1.244e-09 ***
x1 9.8247 3.4450 2.8519 0.01903 *
x2 4.2160 3.4450 1.2238 0.25210
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.5169, Adjusted R-squared: 0.4096
F-statistic: 4.815 on 2 and 9 DF, p-value: 0.03785
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
FO(x1, x2) 2 914.40 457.20 4.8154 0.03785
Residuals 9 854.51 94.95
Lack of fit 6 827.76 137.96 15.4722 0.02333
Pure error 3 26.75 8.92
Direction of steepest ascent (at radius 1):
x1 x2
0.9189626 0.3943447
Corresponding increment in original units:
x1 x2
0.9189626 0.3943447
# fit the first-order with two-way interaction model.
rsm.2.8.y.TWIx12 <- rsm(y ~ FO(x1, x2) + TWI(x1, x2), data = df.2.8)
# externally Studentized residuals
rsm.2.8.y.TWIx12$studres <- rstudent(rsm.2.8.y.TWIx12)
summary(rsm.2.8.y.TWIx12)
Call:
rsm(formula = y ~ FO(x1, x2) + TWI(x1, x2), data = df.2.8)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 70.4167 2.5295 27.8377 2.993e-09 ***
x1 9.8247 3.0980 3.1713 0.01317 *
x2 4.2160 3.0980 1.3609 0.21066
x1:x2 -7.7500 4.3813 -1.7689 0.11488
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.6527, Adjusted R-squared: 0.5225
F-statistic: 5.013 on 3 and 8 DF, p-value: 0.03039
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
FO(x1, x2) 2 914.40 457.20 5.9544 0.02607
TWI(x1, x2) 1 240.25 240.25 3.1289 0.11488
Residuals 8 614.26 76.78
Lack of fit 5 587.51 117.50 13.1779 0.02966
Pure error 3 26.75 8.92
Stationary point of response surface:
x1 x2
0.5439987 1.2677094
Eigenanalysis:
eigen() decomposition
$values
[1] 3.875 -3.875
$vectors
[,1] [,2]
x1 -0.7071068 -0.7071068
x2 0.7071068 -0.7071068
# Fit the second-order without interactions model
rsm.2.8.y.PQx12 <- rsm(y ~ FO(x1, x2) + PQ(x1, x2), data = df.2.8)
# externally Studentized residuals
rsm.2.8.y.PQx12$studres <- rstudent(rsm.2.8.y.PQx12)
summary(rsm.2.8.y.PQx12)
Call:
rsm(formula = y ~ FO(x1, x2) + PQ(x1, x2), data = df.2.8)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 79.7500 3.1373 25.4198 3.724e-08 ***
x1 9.8247 2.2184 4.4287 0.003049 **
x2 4.2160 2.2184 1.9004 0.099141 .
x1^2 -8.8750 2.4803 -3.5782 0.008997 **
x2^2 -5.1250 2.4803 -2.0663 0.077639 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.8442, Adjusted R-squared: 0.7552
F-statistic: 9.482 on 4 and 7 DF, p-value: 0.005903
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
FO(x1, x2) 2 914.40 457.20 11.613 0.005978
PQ(x1, x2) 2 578.92 289.46 7.352 0.019052
Residuals 7 275.60 39.37
Lack of fit 4 248.85 62.21 6.977 0.071196
Pure error 3 26.75 8.92
Stationary point of response surface:
x1 x2
0.5535069 0.4113161
Eigenanalysis:
eigen() decomposition
$values
[1] -5.125 -8.875
$vectors
[,1] [,2]
x1 0 -1
x2 -1 0
# compare the reduced first-order model to the full second-order model
anova(rsm.2.8.y.FOx12, rsm.2.8.y.SOx12)
Analysis of Variance Table
Model 1: y ~ FO(x1, x2)
Model 2: y ~ FO(x1, x2) + TWI(x1, x2) + PQ(x1, x2)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 9 854.51
2 6 35.35 3 819.17 46.349 0.0001524 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## conf int for parameters
confint(rsm.2.8.y.SOx12)
2.5 % 97.5 %
(Intercept) 76.780415 82.719585
FO(x1, x2)x1 7.724934 11.924561
FO(x1, x2)x2 2.116176 6.315804
TWI(x1, x2) -10.719585 -4.780415
PQ(x1, x2)x1^2 -11.222663 -6.527337
PQ(x1, x2)x2^2 -7.472663 -2.777337
# conf int for regression line
predict(rsm.2.8.y.SOx12, df.2.8[1:dim(df.2.8)[1],], interval = "confidence")
fit lwr upr
1 43.95926 39.26394 48.65459
2 79.10876 74.41343 83.80408
3 67.89124 63.19592 72.58657
4 72.04074 67.34541 76.73606
5 48.10571 43.41038 52.80104
6 75.89429 71.19896 80.58962
7 63.53769 58.84236 68.23302
8 75.46231 70.76698 80.15764
9 79.75000 76.78041 82.71959
10 79.75000 76.78041 82.71959
11 79.75000 76.78041 82.71959
12 79.75000 76.78041 82.71959
# pred int for new observations
predict(rsm.2.8.y.SOx12, df.2.8[1:dim(df.2.8)[1],], interval = "prediction")
fit lwr upr
1 43.95926 36.38828 51.53025
2 79.10876 71.53777 86.67974
3 67.89124 60.32026 75.46223
4 72.04074 64.46975 79.61172
5 48.10571 40.53472 55.67670
6 75.89429 68.32330 83.46528
7 63.53769 55.96670 71.10868
8 75.46231 67.89132 83.03330
9 79.75000 73.10981 86.39019
10 79.75000 73.10981 86.39019
11 79.75000 73.10981 86.39019
12 79.75000 73.10981 86.39019
The lack-of-fit (LOF) test is equivalent to a comparison between two models. First, we define the full model by setting up a categorical group variable that will take unique values for each distinct pair of (x1, x2) values. This group variable fits a model that is equivalent to a one-way ANOVA. The SSE for the full model is 26.75 with 3 df. This is the pure error SS and df. (Try this for yourself.) Second, we define the reduced model (reduced compared to the one-way ANOVA above) as the regression model fit (taking x1 and x2 as continuous variables). The SSE for the reduced model is 35.35 with 6 df. This is the residual SS. The LOF SS is the difference of SSE between the reduced and the full model: 35:35 26:75 = 8:6 with 6 3 = 3 df. The F-test is then the LOF SSE and df vs the full SSE and df, F = (8:6=3)=(26:75=3) = 0:32, where there are 3 df and 3 df in the numerator and denominator
summary(rsm.2.8.y.SOx12)$lof
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
FO(x1, x2) 2 914.40 457.20 77.6054 5.155e-05 ***
TWI(x1, x2) 1 240.25 240.25 40.7801 0.0006938 ***
PQ(x1, x2) 2 578.92 289.46 49.1328 0.0001906 ***
Residuals 6 35.35 5.89
Lack of fit 3 8.60 2.87 0.3214 0.8119178
Pure error 3 26.75 8.92
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# plot diagnistics
par(mfrow=c(2,4))
plot(df.2.8$x1, rsm.2.8.y.SOx12$studres, main="Residuals vs x1")
# horizontal line at zero
abline(h = 0, col = "gray75")
plot(df.2.8$x2, rsm.2.8.y.SOx12$studres, main="Residuals vs x2")
# horizontal line at zero
abline(h = 0, col = "gray75")
# residuals vs order of data
plot(rsm.2.8.y.SOx12$studres, main="Residuals vs Order of data")
# horizontal line at zero
abline(h = 0, col = "gray75")
plot(rsm.2.8.y.SOx12, which = c(1,4,6))
# Normality of Residuals
library(car)
qqPlot(rsm.2.8.y.SOx12$studres, las = 1, id = 3, main="QQ Plot")
[1] 9 11
# first-order model
par(mfrow=c(2,2))
contour(rsm.2.8.y.FOx12, ~ x1 + x2, image = TRUE, main="first-order model")
persp(rsm.2.8.y.FOx12, x2 ~ x1, zlab = "y", main="first-order model")
# second-order model
contour(rsm.2.8.y.SOx12, ~ x1 + x2, image = TRUE, main="second-order model")
persp(rsm.2.8.y.SOx12, x2 ~ x1, zlab = "y", main="second-order model")