Specifically, the problem studies the effect of four factors (x1 = superficial fluid velocity of the gas (cs/s), x2= kinematic velocity, x3= mesh opening (cm), and x4= dimensionless number relating the superficial velocity of the gas to the fluid) on the pressure drop in a screen plate bubble column (y).
First, we need to load the data-set using the link.
url <- "https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/data-table-B9.csv"
dat <- read.csv(url)
dat
## x1 x2 x3 x4 y
## 1 2.14 10.00 0.34 1.000 28.9
## 2 4.14 10.00 0.34 1.000 31.0
## 3 8.15 10.00 0.34 1.000 26.4
## 4 2.14 10.00 0.34 0.246 27.2
## 5 4.14 10.00 0.34 0.379 26.1
## 6 8.15 10.00 0.34 0.474 23.2
## 7 2.14 10.00 0.34 0.141 19.7
## 8 4.14 10.00 0.34 0.234 22.1
## 9 8.15 10.00 0.34 0.311 22.8
## 10 2.14 10.00 0.34 0.076 29.2
## 11 4.14 10.00 0.34 0.132 23.6
## 12 8.15 10.00 0.34 0.184 23.6
## 13 2.14 2.63 0.34 0.679 24.2
## 14 4.14 2.63 0.34 0.804 22.1
## 15 8.15 2.63 0.34 0.890 20.9
## 16 2.14 2.63 0.34 0.514 17.6
## 17 4.14 2.63 0.34 0.672 15.7
## 18 8.15 2.63 0.34 0.801 15.8
## 19 2.14 2.63 0.34 0.346 14.0
## 20 4.14 2.63 0.34 0.506 17.1
## 21 8.15 2.63 0.34 0.669 18.3
## 22 2.14 2.63 0.34 1.000 33.8
## 23 4.14 2.63 0.34 1.000 31.7
## 24 8.15 2.63 0.34 1.000 28.1
## 25 5.60 1.25 0.34 0.848 18.1
## 26 5.60 1.25 0.34 0.737 16.5
## 27 5.60 1.25 0.34 0.651 15.4
## 28 5.60 1.25 0.34 0.554 15.0
## 29 4.30 2.63 0.34 0.748 19.1
## 30 4.30 2.63 0.34 0.682 16.2
## 31 4.30 2.63 0.34 0.524 16.3
## 32 4.30 2.63 0.34 0.472 15.8
## 33 4.30 2.63 0.34 0.398 15.4
## 34 5.60 10.10 0.25 0.789 19.2
## 35 5.60 10.10 0.25 0.677 8.4
## 36 5.60 10.10 0.25 0.590 15.0
## 37 5.60 10.10 0.25 0.523 12.0
## 38 5.60 10.10 0.34 0.789 21.9
## 39 5.60 10.10 0.34 0.677 21.3
## 40 5.60 10.10 0.34 0.590 21.6
## 41 5.60 10.10 0.34 0.523 19.8
## 42 4.30 10.10 0.34 0.741 21.6
## 43 4.30 10.10 0.34 0.617 17.3
## 44 4.30 10.10 0.34 0.524 20.0
## 45 4.30 10.10 0.34 0.457 18.6
## 46 2.40 10.10 0.34 0.615 22.1
## 47 2.40 10.10 0.34 0.473 14.7
## 48 2.40 10.10 0.34 0.381 15.8
## 49 2.40 10.10 0.34 0.320 13.2
## 50 5.60 10.10 0.55 0.789 30.8
## 51 5.60 10.10 0.55 0.677 27.5
## 52 5.60 10.10 0.55 0.590 25.2
## 53 5.60 10.10 0.55 0.523 22.8
## 54 2.14 112.00 0.34 0.680 41.7
## 55 4.14 112.00 0.34 0.803 33.7
## 56 8.15 112.00 0.34 0.889 29.7
## 57 2.14 112.00 0.34 0.514 41.8
## 58 4.14 112.00 0.34 0.672 37.1
## 59 8.15 112.00 0.34 0.801 40.1
## 60 2.14 112.00 0.34 0.306 42.7
## 61 4.14 112.00 0.34 0.506 48.6
## 62 8.15 112.00 0.34 0.668 42.4
Fit the first-order model with all two-factor interactions.
full_model<-lm(y~(x1+x2+x3+x4)^2, data=dat)
full_model
##
## Call:
## lm(formula = y ~ (x1 + x2 + x3 + x4)^2, data = dat)
##
## Coefficients:
## (Intercept) x1 x2 x3 x4 x1:x2
## 15.88376 0.18695 0.37921 -11.99940 -8.86442 0.01155
## x1:x3 x1:x4 x2:x3 x2:x4 x3:x4
## NA -1.11525 NA -0.38547 72.85976
sum_full<-summary(full_model)
sum_full
##
## Call:
## lm(formula = y ~ (x1 + x2 + x3 + x4)^2, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.4804 -3.0766 -0.6635 2.9625 12.2221
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.88376 23.17863 0.685 0.49616
## x1 0.18696 0.78447 0.238 0.81255
## x2 0.37921 0.06332 5.989 1.89e-07 ***
## x3 -11.99940 67.31148 -0.178 0.85919
## x4 -8.86442 35.62553 -0.249 0.80446
## x1:x2 0.01155 0.00869 1.329 0.18955
## x1:x3 NA NA NA NA
## x1:x4 -1.11525 1.14847 -0.971 0.33592
## x2:x3 NA NA NA NA
## x2:x4 -0.38547 0.11962 -3.222 0.00218 **
## x3:x4 72.85976 103.15353 0.706 0.48308
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.683 on 53 degrees of freedom
## Multiple R-squared: 0.7496, Adjusted R-squared: 0.7118
## F-statistic: 19.83 on 8 and 53 DF, p-value: 1.947e-13
Answer: x1:x3 and x2:x3
Answer: 0.7496
Answer: 19.83
Answer: Reject H0 (at least one regression coefficient is significantly different from zero)
We fit the reduced model and compare it to the full interaction model using a partial F-test to see if the interaction terms as a group are significant.
reduced_model<-lm(y~x1+x2+x3+x4, data=dat)
reduced_model
##
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4, data = dat)
##
## Coefficients:
## (Intercept) x1 x2 x3 x4
## 5.8945 -0.4779 0.1827 35.4028 5.8439
anova_test<-anova(reduced_model,full_model)
anova_test
## Analysis of Variance Table
##
## Model 1: y ~ x1 + x2 + x3 + x4
## Model 2: y ~ (x1 + x2 + x3 + x4)^2
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 57 1432.8
## 2 53 1162.4 4 270.37 3.0819 0.02352 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Answer: 3.08
Answer: 0.0235
Answer: Reject H0 (at least one interaction term is significant and should be retained).
The best-fitting model includes x2, x3, x4, and the interaction x2:x4. Variable x1 is dropped because it is not statistically significant.
best_fitting_model<-step(full_model,direction = "backward")
## Start: AIC=199.73
## y ~ (x1 + x2 + x3 + x4)^2
##
##
## Step: AIC=199.73
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x1:x3 + x1:x4 + x2:x4 + x3:x4
##
##
## Step: AIC=199.73
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x1:x4 + x2:x4 + x3:x4
##
## Df Sum of Sq RSS AIC
## - x3:x4 1 10.942 1173.4 198.31
## - x1:x4 1 20.682 1183.1 198.82
## <none> 1162.4 199.73
## - x1:x2 1 38.737 1201.2 199.76
## - x2:x4 1 227.751 1390.2 208.82
##
## Step: AIC=198.31
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x1:x4 + x2:x4
##
## Df Sum of Sq RSS AIC
## - x1:x4 1 19.837 1193.2 197.35
## <none> 1173.4 198.31
## - x1:x2 1 38.709 1212.1 198.32
## - x2:x4 1 228.394 1401.8 207.34
## - x3 1 249.320 1422.7 208.26
##
## Step: AIC=197.35
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x2:x4
##
## Df Sum of Sq RSS AIC
## - x1:x2 1 32.307 1225.5 197.01
## <none> 1193.2 197.35
## - x2:x4 1 220.026 1413.2 205.84
## - x3 1 252.209 1445.4 207.24
##
## Step: AIC=197.01
## y ~ x1 + x2 + x3 + x4 + x2:x4
##
## Df Sum of Sq RSS AIC
## - x1 1 11.262 1236.8 195.57
## <none> 1225.5 197.01
## - x2:x4 1 207.286 1432.8 204.69
## - x3 1 248.430 1473.9 206.45
##
## Step: AIC=195.57
## y ~ x2 + x3 + x4 + x2:x4
##
## Df Sum of Sq RSS AIC
## <none> 1236.8 195.57
## - x3 1 243.60 1480.4 204.72
## - x2:x4 1 245.68 1482.4 204.81
summary(best_fitting_model)
##
## Call:
## lm(formula = y ~ x2 + x3 + x4 + x2:x4, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.959 -3.358 -1.131 3.040 11.646
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.52261 4.03964 0.377 0.70763
## x2 0.38056 0.06084 6.255 5.47e-08 ***
## x3 34.51062 10.29961 3.351 0.00144 **
## x4 9.52471 2.96093 3.217 0.00214 **
## x2:x4 -0.30472 0.09056 -3.365 0.00137 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.658 on 57 degrees of freedom
## Multiple R-squared: 0.7336, Adjusted R-squared: 0.7149
## F-statistic: 39.24 on 4 and 57 DF, p-value: 9.297e-16
\[ y\sim x2+x3+x4+x2:x4 \]
best_model<-lm(y~x2+x3+x4+x2:x4, data=dat)
best_model
##
## Call:
## lm(formula = y ~ x2 + x3 + x4 + x2:x4, data = dat)
##
## Coefficients:
## (Intercept) x2 x3 x4 x2:x4
## 1.5226 0.3806 34.5106 9.5247 -0.3047
sum_best<-summary(best_model)
sum_best
##
## Call:
## lm(formula = y ~ x2 + x3 + x4 + x2:x4, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.959 -3.358 -1.131 3.040 11.646
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.52261 4.03964 0.377 0.70763
## x2 0.38056 0.06084 6.255 5.47e-08 ***
## x3 34.51062 10.29961 3.351 0.00144 **
## x4 9.52471 2.96093 3.217 0.00214 **
## x2:x4 -0.30472 0.09056 -3.365 0.00137 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.658 on 57 degrees of freedom
## Multiple R-squared: 0.7336, Adjusted R-squared: 0.7149
## F-statistic: 39.24 on 4 and 57 DF, p-value: 9.297e-16
Answer: 0.7149
Answer:-0.3047
Answer: We omit \(x1\) because it is not in the model.
pt1 <- data.frame(x2 = 10, x3 = 0.5, x4 = 0.75)
pred1 <- predict(best_model, pt1, interval="confidence", level=0.95)
pred1
## fit lwr upr
## 1 27.44168 24.00618 30.87717
Answer: 27.44
Answer: 24.01
Answer: 30.88
We predict an individual response at \(x2 = 3\), \(x3 = 0.25\), and \(x4 = 0.85\).
pt2 <- data.frame(x2 = 3, x3 = 0.25, x4 = 0.85)
pred2 <- predict(best_model, pt2, interval="prediction", level=0.95)
pred2
## fit lwr upr
## 1 18.61092 8.860183 28.36165
Answer: 18.61
Answer: 28.36
Answer: True
Answer: x1 was not statistically significant as a predictor of pressure drop and its removal improved the model based on AIC
url <- "https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/data-table-B9.csv"
dat <- read.csv(url)
dat
## x1 x2 x3 x4 y
## 1 2.14 10.00 0.34 1.000 28.9
## 2 4.14 10.00 0.34 1.000 31.0
## 3 8.15 10.00 0.34 1.000 26.4
## 4 2.14 10.00 0.34 0.246 27.2
## 5 4.14 10.00 0.34 0.379 26.1
## 6 8.15 10.00 0.34 0.474 23.2
## 7 2.14 10.00 0.34 0.141 19.7
## 8 4.14 10.00 0.34 0.234 22.1
## 9 8.15 10.00 0.34 0.311 22.8
## 10 2.14 10.00 0.34 0.076 29.2
## 11 4.14 10.00 0.34 0.132 23.6
## 12 8.15 10.00 0.34 0.184 23.6
## 13 2.14 2.63 0.34 0.679 24.2
## 14 4.14 2.63 0.34 0.804 22.1
## 15 8.15 2.63 0.34 0.890 20.9
## 16 2.14 2.63 0.34 0.514 17.6
## 17 4.14 2.63 0.34 0.672 15.7
## 18 8.15 2.63 0.34 0.801 15.8
## 19 2.14 2.63 0.34 0.346 14.0
## 20 4.14 2.63 0.34 0.506 17.1
## 21 8.15 2.63 0.34 0.669 18.3
## 22 2.14 2.63 0.34 1.000 33.8
## 23 4.14 2.63 0.34 1.000 31.7
## 24 8.15 2.63 0.34 1.000 28.1
## 25 5.60 1.25 0.34 0.848 18.1
## 26 5.60 1.25 0.34 0.737 16.5
## 27 5.60 1.25 0.34 0.651 15.4
## 28 5.60 1.25 0.34 0.554 15.0
## 29 4.30 2.63 0.34 0.748 19.1
## 30 4.30 2.63 0.34 0.682 16.2
## 31 4.30 2.63 0.34 0.524 16.3
## 32 4.30 2.63 0.34 0.472 15.8
## 33 4.30 2.63 0.34 0.398 15.4
## 34 5.60 10.10 0.25 0.789 19.2
## 35 5.60 10.10 0.25 0.677 8.4
## 36 5.60 10.10 0.25 0.590 15.0
## 37 5.60 10.10 0.25 0.523 12.0
## 38 5.60 10.10 0.34 0.789 21.9
## 39 5.60 10.10 0.34 0.677 21.3
## 40 5.60 10.10 0.34 0.590 21.6
## 41 5.60 10.10 0.34 0.523 19.8
## 42 4.30 10.10 0.34 0.741 21.6
## 43 4.30 10.10 0.34 0.617 17.3
## 44 4.30 10.10 0.34 0.524 20.0
## 45 4.30 10.10 0.34 0.457 18.6
## 46 2.40 10.10 0.34 0.615 22.1
## 47 2.40 10.10 0.34 0.473 14.7
## 48 2.40 10.10 0.34 0.381 15.8
## 49 2.40 10.10 0.34 0.320 13.2
## 50 5.60 10.10 0.55 0.789 30.8
## 51 5.60 10.10 0.55 0.677 27.5
## 52 5.60 10.10 0.55 0.590 25.2
## 53 5.60 10.10 0.55 0.523 22.8
## 54 2.14 112.00 0.34 0.680 41.7
## 55 4.14 112.00 0.34 0.803 33.7
## 56 8.15 112.00 0.34 0.889 29.7
## 57 2.14 112.00 0.34 0.514 41.8
## 58 4.14 112.00 0.34 0.672 37.1
## 59 8.15 112.00 0.34 0.801 40.1
## 60 2.14 112.00 0.34 0.306 42.7
## 61 4.14 112.00 0.34 0.506 48.6
## 62 8.15 112.00 0.34 0.668 42.4
full_model<-lm(y~(x1+x2+x3+x4)^2, data=dat)
full_model
##
## Call:
## lm(formula = y ~ (x1 + x2 + x3 + x4)^2, data = dat)
##
## Coefficients:
## (Intercept) x1 x2 x3 x4 x1:x2
## 15.88376 0.18695 0.37921 -11.99940 -8.86442 0.01155
## x1:x3 x1:x4 x2:x3 x2:x4 x3:x4
## NA -1.11525 NA -0.38547 72.85976
sum_full<-summary(full_model)
sum_full
##
## Call:
## lm(formula = y ~ (x1 + x2 + x3 + x4)^2, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.4804 -3.0766 -0.6635 2.9625 12.2221
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.88376 23.17863 0.685 0.49616
## x1 0.18696 0.78447 0.238 0.81255
## x2 0.37921 0.06332 5.989 1.89e-07 ***
## x3 -11.99940 67.31148 -0.178 0.85919
## x4 -8.86442 35.62553 -0.249 0.80446
## x1:x2 0.01155 0.00869 1.329 0.18955
## x1:x3 NA NA NA NA
## x1:x4 -1.11525 1.14847 -0.971 0.33592
## x2:x3 NA NA NA NA
## x2:x4 -0.38547 0.11962 -3.222 0.00218 **
## x3:x4 72.85976 103.15353 0.706 0.48308
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.683 on 53 degrees of freedom
## Multiple R-squared: 0.7496, Adjusted R-squared: 0.7118
## F-statistic: 19.83 on 8 and 53 DF, p-value: 1.947e-13
reduced_model<-lm(y~x1+x2+x3+x4, data=dat)
reduced_model
##
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4, data = dat)
##
## Coefficients:
## (Intercept) x1 x2 x3 x4
## 5.8945 -0.4779 0.1827 35.4028 5.8439
anova_test<-anova(reduced_model,full_model)
anova_test
## Analysis of Variance Table
##
## Model 1: y ~ x1 + x2 + x3 + x4
## Model 2: y ~ (x1 + x2 + x3 + x4)^2
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 57 1432.8
## 2 53 1162.4 4 270.37 3.0819 0.02352 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
best_fitting_model<-step(full_model,direction = "backward")
## Start: AIC=199.73
## y ~ (x1 + x2 + x3 + x4)^2
##
##
## Step: AIC=199.73
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x1:x3 + x1:x4 + x2:x4 + x3:x4
##
##
## Step: AIC=199.73
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x1:x4 + x2:x4 + x3:x4
##
## Df Sum of Sq RSS AIC
## - x3:x4 1 10.942 1173.4 198.31
## - x1:x4 1 20.682 1183.1 198.82
## <none> 1162.4 199.73
## - x1:x2 1 38.737 1201.2 199.76
## - x2:x4 1 227.751 1390.2 208.82
##
## Step: AIC=198.31
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x1:x4 + x2:x4
##
## Df Sum of Sq RSS AIC
## - x1:x4 1 19.837 1193.2 197.35
## <none> 1173.4 198.31
## - x1:x2 1 38.709 1212.1 198.32
## - x2:x4 1 228.394 1401.8 207.34
## - x3 1 249.320 1422.7 208.26
##
## Step: AIC=197.35
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x2:x4
##
## Df Sum of Sq RSS AIC
## - x1:x2 1 32.307 1225.5 197.01
## <none> 1193.2 197.35
## - x2:x4 1 220.026 1413.2 205.84
## - x3 1 252.209 1445.4 207.24
##
## Step: AIC=197.01
## y ~ x1 + x2 + x3 + x4 + x2:x4
##
## Df Sum of Sq RSS AIC
## - x1 1 11.262 1236.8 195.57
## <none> 1225.5 197.01
## - x2:x4 1 207.286 1432.8 204.69
## - x3 1 248.430 1473.9 206.45
##
## Step: AIC=195.57
## y ~ x2 + x3 + x4 + x2:x4
##
## Df Sum of Sq RSS AIC
## <none> 1236.8 195.57
## - x3 1 243.60 1480.4 204.72
## - x2:x4 1 245.68 1482.4 204.81
summary(best_fitting_model)
##
## Call:
## lm(formula = y ~ x2 + x3 + x4 + x2:x4, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.959 -3.358 -1.131 3.040 11.646
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.52261 4.03964 0.377 0.70763
## x2 0.38056 0.06084 6.255 5.47e-08 ***
## x3 34.51062 10.29961 3.351 0.00144 **
## x4 9.52471 2.96093 3.217 0.00214 **
## x2:x4 -0.30472 0.09056 -3.365 0.00137 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.658 on 57 degrees of freedom
## Multiple R-squared: 0.7336, Adjusted R-squared: 0.7149
## F-statistic: 39.24 on 4 and 57 DF, p-value: 9.297e-16
best_model<-lm(y~x2+x3+x4+x2:x4, data=dat)
best_model
##
## Call:
## lm(formula = y ~ x2 + x3 + x4 + x2:x4, data = dat)
##
## Coefficients:
## (Intercept) x2 x3 x4 x2:x4
## 1.5226 0.3806 34.5106 9.5247 -0.3047
sum_best<-summary(best_model)
sum_best
##
## Call:
## lm(formula = y ~ x2 + x3 + x4 + x2:x4, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.959 -3.358 -1.131 3.040 11.646
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.52261 4.03964 0.377 0.70763
## x2 0.38056 0.06084 6.255 5.47e-08 ***
## x3 34.51062 10.29961 3.351 0.00144 **
## x4 9.52471 2.96093 3.217 0.00214 **
## x2:x4 -0.30472 0.09056 -3.365 0.00137 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.658 on 57 degrees of freedom
## Multiple R-squared: 0.7336, Adjusted R-squared: 0.7149
## F-statistic: 39.24 on 4 and 57 DF, p-value: 9.297e-16
pt1 <- data.frame(x2 = 10, x3 = 0.5, x4 = 0.75)
pred1 <- predict(best_model, pt1, interval="confidence", level=0.95)
pred1
## fit lwr upr
## 1 27.44168 24.00618 30.87717
pt2 <- data.frame(x2 = 3, x3 = 0.25, x4 = 0.85)
pred2 <- predict(best_model, pt2, interval="prediction", level=0.95)
pred2
## fit lwr upr
## 1 18.61092 8.860183 28.36165