1. Using the data provided in data here. Read in the data to R (directly to R from the link provided, not manually) and perform rsm analysis in r in the following order.

# Read the raw text file
raw_data <- readLines("https://statacumen.com/teach/RSM/data/RSM_TAB_02-08.txt")

# Remove the first commented line
cleaned_data <- raw_data[-1]

# Convert cleaned data into a structured dataframe
rsm_data <- read.table(text = cleaned_data, header = TRUE)

head(rsm_data)

##   x1 x2  y
## 1 -1 -1 43
## 2  1 -1 78
## 3 -1  1 69
## 4  1  1 73
## 5 -9  0 48
## 6  9  0 76

(a) Replace the coded values “9” with the square root of 2 (do this in R)

rsm_data[rsm_data == 9  | rsm_data == -9] <- sqrt(2)
print(rsm_data)

##           x1        x2  y
## 1  -1.000000 -1.000000 43
## 2   1.000000 -1.000000 78
## 3  -1.000000  1.000000 69
## 4   1.000000  1.000000 73
## 5   1.414214  0.000000 48
## 6   1.414214  0.000000 76
## 7   0.000000  1.414214 65
## 8   0.000000  1.414214 74
## 9   0.000000  0.000000 76
## 10  0.000000  0.000000 79
## 11  0.000000  0.000000 83
## 12  0.000000  0.000000 81

(b) Fit a ﬁrst order model

library(rsm)

## Warning: package 'rsm' was built under R version 4.3.3

model1 <- rsm( y~FO(x1, x2), data = rsm_data)
summary(model1)

## 
## Call:
## rsm(formula = y ~ FO(x1, x2), data = rsm_data)
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  69.2287     4.3251 16.0062 6.411e-08 ***
## x1            2.3193     5.0716  0.4573    0.6583    
## x2            2.7209     5.0716  0.5365    0.6046    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.04824,    Adjusted R-squared:  -0.1633 
## F-statistic: 0.2281 on 2 and 9 DF,  p-value: 0.8005
## 
## Analysis of Variance Table
## 
## Response: y
##             Df  Sum Sq Mean Sq F value Pr(>F)
## FO(x1, x2)   2   85.32  42.662  0.2281 0.8005
## Residuals    9 1683.59 187.066               
## Lack of fit  4 1224.34 306.086  3.3325 0.1095
## Pure error   5  459.25  91.850               
## 
## Direction of steepest ascent (at radius 1):
##        x1        x2 
## 0.6487014 0.7610430 
## 
## Corresponding increment in original units:
##        x1        x2 
## 0.6487014 0.7610430

The predictors are not significant (p-values > 0.6), showing how they do 
not influence y.

R^2 = 0.048 (very low), meaning the model explains only 4.8% of the variation in 
𝑦
y.

Lack of fit p-value = 0.1095 (not significant), so the model is acceptable but weak
- only the intercept (p < 0.001) shows a statistical significance, unlike all 
the other terms ie x1 with a P-value (0.6583) and that of x2 P-value (0.6046) 

- The summary of this models indicate that there is an insignificant Lack of fit 
(P = 0.1095), suggesting the model fits the data well

- Based on the reading from Multiple R-squared The model explains 4.824% of the
variability in y are explained by the predictor variables

- The F-statisticsvalue together with its P-value (0.2281, p = 0.8005) Suggests
the overall model is not significant      

The Corresponding increment tells us x2 should be increased more than x1 for
better response

(c) Fit a ﬁrst order model with two-way interactions

model2 <- rsm(y ~ FO(x1, x2) + TWI(x1, x2), data = rsm_data)
summary(model2)

## 
## Call:
## rsm(formula = y ~ FO(x1, x2) + TWI(x1, x2), data = rsm_data)
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  69.2287     4.2476 16.2985 2.021e-07 ***
## x1            2.3193     4.9807  0.4657    0.6539    
## x2            2.7209     4.9807  0.5463    0.5998    
## x1:x2        -7.7500     6.7160 -1.1540    0.2818    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.1841, Adjusted R-squared:  -0.1219 
## F-statistic: 0.6015 on 3 and 8 DF,  p-value: 0.632
## 
## Analysis of Variance Table
## 
## Response: y
##             Df  Sum Sq Mean Sq F value Pr(>F)
## FO(x1, x2)   2   85.32   42.66  0.2365 0.7947
## TWI(x1, x2)  1  240.25  240.25  1.3316 0.2818
## Residuals    8 1443.34  180.42               
## Lack of fit  3  984.09  328.03  3.5714 0.1022
## Pure error   5  459.25   91.85               
## 
## Stationary point of response surface:
##        x1        x2 
## 0.3510873 0.2992614 
## 
## Eigenanalysis:
## eigen() decomposition
## $values
## [1]  3.875 -3.875
## 
## $vectors
##          [,1]       [,2]
## x1 -0.7071068 -0.7071068
## x2  0.7071068 -0.7071068

The intercept (69.2287) is the predicted value of y when both x1 and x2 are zero.

both x1 (p = 0.6539) and  x2 (p = 0.5998) remains insignificant.

Interaction term (x1:x2 = -7.75, p = 0.2818) is not statistically significant 
at the 5% level (p > 0.05).

suggesting that the combined effect of x1 and x2 does not significantly impact y.

Multiple R-squared = 0.1841  The model now explains 18.41% variability in y, 
which is better than the first-order model (4.824%).

Adjusted R-squared = -0.1219 - After adjusting for predictors, still an 
improvement.

F-statistic = 0.6015, p = 0.632 still shows that the overrall model is still
insignificant.

Adding interaction improves fit, but not strongly.
FO(x1, x2) p-value = 0.7947 - The main effects (x1 and x2) are insignificant.

TWI(x1, x2) p-value = 0.2818 - The interaction term is not significant.

Lack of fit p-value = 0.1022 - Still insignificant lack of fit.

One positive and one negative eigenvalue indicate a saddle point.
no clear maximum or minimum.

(d) Fit a second order model

model3 <- rsm(y ~ FO(x1, x2) + TWI(x1, x2) + PQ(x1, x2), data = rsm_data)
summary(model3)

## 
## Call:
## rsm(formula = y ~ FO(x1, x2) + TWI(x1, x2) + PQ(x1, x2), data = rsm_data)
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  79.7500     5.0384 15.8285 4.034e-06 ***
## x1            7.2500     4.5994  1.5763   0.16603    
## x2            2.7500     4.5994  0.5979   0.57177    
## x1:x2        -7.7500     5.0384 -1.5382   0.17492    
## x1^2        -12.2338     4.8239 -2.5361   0.04431 *  
## x2^2         -5.3018     4.8239 -1.0991   0.31387    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.6556, Adjusted R-squared:  0.3686 
## F-statistic: 2.284 on 5 and 6 DF,  p-value: 0.1718
## 
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value  Pr(>F)
## FO(x1, x2)   2  85.32   42.66  0.4201 0.67489
## TWI(x1, x2)  1 240.25  240.25  2.3660 0.17492
## PQ(x1, x2)   2 834.09  417.05  4.1071 0.07521
## Residuals    6 609.25  101.54                
## Lack of fit  1 150.00  150.00  1.6331 0.25739
## Pure error   5 459.25   91.85                
## 
## Stationary point of response surface:
##         x1         x2 
## 0.27868009 0.05566335 
## 
## Eigenanalysis:
## eigen() decomposition
## $values
## [1]  -3.568852 -13.966682
## 
## $vectors
##          [,1]       [,2]
## x1  0.4082427 -0.9128734
## x2 -0.9128734 -0.4082427

Except for the intercept (p < 0.001) and first quadratic term (x1^2 p<0.05) 
which shows a statistical significance, all the other terms remains insignificant

Multiple R-squared = 0.6556 - The model now explains 65.56% of y, which is 
better than the first-order model (4.8%).

Adjusted R-squared = 36.86% - After adjusting for predictors, still an 
improvement.

F-statistic = 2.284, p = 0.1718
The model is still overally insignificance.

FO(x1, x2) p-value = 0.67489 - The main effects (x1 and x2) are insignificant.

TWI(x1, x2) p-value = 0.17492 - The interaction term is not significant.

Lack of fit p-value = 0.25739 - Still insignificant lack of fit model fits 
well the data

both eigen value are negative indicating that the stationary point is a  maximum

(e) Fit a second order model without interactions

model4 <- rsm(y ~ FO(x1, x2) + PQ(x1, x2), data = rsm_data)
summary(model4)

## 
## Call:
## rsm(formula = y ~ FO(x1, x2) + PQ(x1, x2), data = rsm_data)
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  79.7500     5.5081 14.4786 1.788e-06 ***
## x1            7.2500     5.0282  1.4419   0.19254    
## x2            2.7500     5.0282  0.5469   0.60143    
## x1^2        -12.2338     5.2736 -2.3198   0.05341 .  
## x2^2         -5.3018     5.2736 -1.0053   0.34821    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.5198, Adjusted R-squared:  0.2453 
## F-statistic: 1.894 on 4 and 7 DF,  p-value: 0.2164
## 
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value  Pr(>F)
## FO(x1, x2)   2  85.32   42.66  0.3515 0.71535
## PQ(x1, x2)   2 834.09  417.05  3.4365 0.09125
## Residuals    7 849.50  121.36                
## Lack of fit  2 390.25  195.12  2.1244 0.21489
## Pure error   5 459.25   91.85                
## 
## Stationary point of response surface:
##        x1        x2 
## 0.2963113 0.2593470 
## 
## Eigenanalysis:
## eigen() decomposition
## $values
## [1]  -5.301777 -12.233757
## 
## $vectors
##    [,1] [,2]
## x1    0   -1
## x2   -1    0

only the intercept (p < 0.001) shows a statistical significance, unlike all the
other terms .

Quadratic terms x1^2 is marginally significant (p = 0.0534)

Multiple R-squared = 0.5198  The model explains only 51.98% of the variation in y,
which is slightly better than the first-order model.

Adjusted R-squared = 0.2453  After adjusting for predictors, the model explains
only 24% of the variation, indicating overfitting.

In this model the Lack of fit (p-value = 0.21489) is still insignificant.

both eigen value are negative indicating that the stationary point is a  maximum

(f) Obtain diagnostics and plots of the estimated response surfaces for the best model.

# Residual diagnostics
par(mfrow = c(2,2))
plot(model3$fitted.values, model3$residuals)
abline(h=0, col="gray75")
plot(rsm_data$x1, model3$residuals)
abline(h=0, col="gray75")
plot(rsm_data$x2, model3$residuals)
abline(h=0, col="gray75")

# Response surface visualization
par(mfrow=c(1,1))

contour(model3, ~x1 + x2, image = TRUE, xlabs = c("Factor1", "Factor2"))

persp(model3, x1 ~ x2, col = terrain.colors(50), contours = "colors", 
      zlab = "Response")

# Predict response at stationary point
optimal_point <- data.frame(x1 = 0.361, x2 = 0.257)
predict(model3, optimal_point)

##        1 
## 80.41049

Linear models 2 - activity 2

C029/402442/2023 - Monicah Njeri Mwangi

2025-03-28

1. Using the data provided in data here. Read in the data to R (directly to R from the link provided, not manually) and perform rsm analysis in r in the following order.

(a) Replace the coded values “9” with the square root of 2 (do this in R)

(b) Fit a ﬁrst order model

(c) Fit a ﬁrst order model with two-way interactions

(d) Fit a second order model

(e) Fit a second order model without interactions

(f) Obtain diagnostics and plots of the estimated response surfaces for the best model.