Specifically, the problem studies the effect of four factors (x1 = superficial fluid velocity of the gas (cs/s), x2= kinematic velocity, x3= mesh opening (cm), and x4= dimensionless number relating the superficial velocity of the gas to the fluid) on the pressure drop in a screen plate bubble column (y).Round numeric answers to 2 decimal places unless otherwise stated. Use α = 0.05. The corresponding data file data-table-B9.csv is given and read into the analysis which is named “dat”.
The data contains 62 rows and 5 columns. You can then view the top values of the data:
## x1 x2 x3 x4 y
## 1 2.14 10 0.34 1.000 28.9
## 2 4.14 10 0.34 1.000 31.0
## 3 8.15 10 0.34 1.000 26.4
## 4 2.14 10 0.34 0.246 27.2
## 5 4.14 10 0.34 0.379 26.1
## 6 8.15 10 0.34 0.474 23.2
The data has four predictors (x1,x2,x3,x4) and one response (y).
Fit the first-order model(Q1) in R with all two factor interactions using the form: lm(y~(x1+x2+x3+x4)^2,data=dat))
##
## Call:
## lm(formula = y ~ (x1 + x2 + x3 + x4)^2, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.4804 -3.0766 -0.6635 2.9625 12.2221
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.88376 23.17863 0.685 0.49616
## x1 0.18696 0.78447 0.238 0.81255
## x2 0.37921 0.06332 5.989 1.89e-07 ***
## x3 -11.99940 67.31148 -0.178 0.85919
## x4 -8.86442 35.62553 -0.249 0.80446
## x1:x2 0.01155 0.00869 1.329 0.18955
## x1:x3 NA NA NA NA
## x1:x4 -1.11525 1.14847 -0.971 0.33592
## x2:x3 NA NA NA NA
## x2:x4 -0.38547 0.11962 -3.222 0.00218 **
## x3:x4 72.85976 103.15353 0.706 0.48308
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.683 on 53 degrees of freedom
## Multiple R-squared: 0.7496, Adjusted R-squared: 0.7118
## F-statistic: 19.83 on 8 and 53 DF, p-value: 1.947e-13
From the summary table we notice that the interactions x1:x3 and x2:x3 have their coefficients listed as NA therefore they are dropped due to singularities. It also indicated the \(R^{2}\) value as 0.7496 and F-statistic as 19.93.This shows that the model is statistically significant with a \(p_{value}\) that is very small and way less that 0.05. Therefore this leads to us rejecting the null hypothesis H0.
Fit second model into the model(FIT2) which has the main effects (x1+x2+x3+x4) in the form: lm(y~x1+x2+x3+x4,data=dat)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.9958 -3.3092 -0.2419 3.3924 10.5668
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.89453 4.32508 1.363 0.17828
## x1 -0.47790 0.34002 -1.406 0.16530
## x2 0.18271 0.01718 10.633 3.78e-15 ***
## x3 35.40284 11.09960 3.190 0.00232 **
## x4 5.84391 2.90978 2.008 0.04935 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.014 on 57 degrees of freedom
## Multiple R-squared: 0.6914, Adjusted R-squared: 0.6697
## F-statistic: 31.92 on 4 and 57 DF, p-value: 5.818e-14
The summary table indicates that the model has an \(R^{2}\) value of 0.6914 and the F-statistic of 31.92. With a \(p_{value}\) that is way less than 0.05 the model is therefore significant.
Using the two models we then test for the significance of FIT2 over Q1 using the ANOVA function in R. This will provide a \(p_{value}\) that compares the models. A \(p_{value}\) < 0.05 shows that there is significance and the null hypothesis \(H_{0}\) can be rejected. Whereas if the \(p_{value}\) > 0.05 the null hypothesis \(H_{0}\) is accepted.
## Analysis of Variance Table
##
## Model 1: y ~ x1 + x2 + x3 + x4
## Model 2: y ~ (x1 + x2 + x3 + x4)^2
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 57 1432.8
## 2 53 1162.4 4 270.37 3.0819 0.02352 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA provides a \(p_{value}\) of 0.02352 which is less than 0.05. So we reject the null hypothesis and find the best fitting model Using Step-wise regression.
In order to determine the best fitting model, we use step-wise regression also known as backward elimination starting with the full model using AIC.
## Start: AIC=199.73
## y ~ (x1 + x2 + x3 + x4)^2
##
##
## Step: AIC=199.73
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x1:x3 + x1:x4 + x2:x4 + x3:x4
##
##
## Step: AIC=199.73
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x1:x4 + x2:x4 + x3:x4
##
## Df Sum of Sq RSS AIC
## - x3:x4 1 10.942 1173.4 198.31
## - x1:x4 1 20.682 1183.1 198.82
## <none> 1162.4 199.73
## - x1:x2 1 38.737 1201.2 199.76
## - x2:x4 1 227.751 1390.2 208.82
##
## Step: AIC=198.31
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x1:x4 + x2:x4
##
## Df Sum of Sq RSS AIC
## - x1:x4 1 19.837 1193.2 197.35
## <none> 1173.4 198.31
## - x1:x2 1 38.709 1212.1 198.32
## - x2:x4 1 228.394 1401.8 207.34
## - x3 1 249.320 1422.7 208.26
##
## Step: AIC=197.35
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x2:x4
##
## Df Sum of Sq RSS AIC
## - x1:x2 1 32.307 1225.5 197.01
## <none> 1193.2 197.35
## - x2:x4 1 220.026 1413.2 205.84
## - x3 1 252.209 1445.4 207.24
##
## Step: AIC=197.01
## y ~ x1 + x2 + x3 + x4 + x2:x4
##
## Df Sum of Sq RSS AIC
## - x1 1 11.262 1236.8 195.57
## <none> 1225.5 197.01
## - x2:x4 1 207.286 1432.8 204.69
## - x3 1 248.430 1473.9 206.45
##
## Step: AIC=195.57
## y ~ x2 + x3 + x4 + x2:x4
##
## Df Sum of Sq RSS AIC
## <none> 1236.8 195.57
## - x3 1 243.60 1480.4 204.72
## - x2:x4 1 245.68 1482.4 204.81
Then we acquire the best model:
## [1] "(Intercept)" "x2" "x3" "x4" "x2:x4"
For the summary of the best model
##
## Call:
## lm(formula = y ~ x2 + x3 + x4 + x2:x4, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.959 -3.358 -1.131 3.040 11.646
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.52261 4.03964 0.377 0.70763
## x2 0.38056 0.06084 6.255 5.47e-08 ***
## x3 34.51062 10.29961 3.351 0.00144 **
## x4 9.52471 2.96093 3.217 0.00214 **
## x2:x4 -0.30472 0.09056 -3.365 0.00137 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.658 on 57 degrees of freedom
## Multiple R-squared: 0.7336, Adjusted R-squared: 0.7149
## F-statistic: 39.24 on 4 and 57 DF, p-value: 9.297e-16
Using the best model,we then determine the confidence and prediction intervals using the points shown in the table below.
| x2 | x3 | x4 | |
|---|---|---|---|
| Point 1 | 10 | 0.5 | 0.75 |
| Point 2 | 3 | 0.25 | 0.85 |
## 1
## 27.44168
## fit lwr upr
## 1 27.44168 24.00618 30.87717