An Engineer collects data regarding the study of the effects of four variables on a dimensionless factor used to describe pressure drops in a screen-plate bubble column. The results are provided in a .csv data set.
ds <- read.csv("data-table-B9.csv")
print(ds)
## x1 x2 x3 x4 y
## 1 2.14 10.00 0.34 1.000 28.9
## 2 4.14 10.00 0.34 1.000 31.0
## 3 8.15 10.00 0.34 1.000 26.4
## 4 2.14 10.00 0.34 0.246 27.2
## 5 4.14 10.00 0.34 0.379 26.1
## 6 8.15 10.00 0.34 0.474 23.2
## 7 2.14 10.00 0.34 0.141 19.7
## 8 4.14 10.00 0.34 0.234 22.1
## 9 8.15 10.00 0.34 0.311 22.8
## 10 2.14 10.00 0.34 0.076 29.2
## 11 4.14 10.00 0.34 0.132 23.6
## 12 8.15 10.00 0.34 0.184 23.6
## 13 2.14 2.63 0.34 0.679 24.2
## 14 4.14 2.63 0.34 0.804 22.1
## 15 8.15 2.63 0.34 0.890 20.9
## 16 2.14 2.63 0.34 0.514 17.6
## 17 4.14 2.63 0.34 0.672 15.7
## 18 8.15 2.63 0.34 0.801 15.8
## 19 2.14 2.63 0.34 0.346 14.0
## 20 4.14 2.63 0.34 0.506 17.1
## 21 8.15 2.63 0.34 0.669 18.3
## 22 2.14 2.63 0.34 1.000 33.8
## 23 4.14 2.63 0.34 1.000 31.7
## 24 8.15 2.63 0.34 1.000 28.1
## 25 5.60 1.25 0.34 0.848 18.1
## 26 5.60 1.25 0.34 0.737 16.5
## 27 5.60 1.25 0.34 0.651 15.4
## 28 5.60 1.25 0.34 0.554 15.0
## 29 4.30 2.63 0.34 0.748 19.1
## 30 4.30 2.63 0.34 0.682 16.2
## 31 4.30 2.63 0.34 0.524 16.3
## 32 4.30 2.63 0.34 0.472 15.8
## 33 4.30 2.63 0.34 0.398 15.4
## 34 5.60 10.10 0.25 0.789 19.2
## 35 5.60 10.10 0.25 0.677 8.4
## 36 5.60 10.10 0.25 0.590 15.0
## 37 5.60 10.10 0.25 0.523 12.0
## 38 5.60 10.10 0.34 0.789 21.9
## 39 5.60 10.10 0.34 0.677 21.3
## 40 5.60 10.10 0.34 0.590 21.6
## 41 5.60 10.10 0.34 0.523 19.8
## 42 4.30 10.10 0.34 0.741 21.6
## 43 4.30 10.10 0.34 0.617 17.3
## 44 4.30 10.10 0.34 0.524 20.0
## 45 4.30 10.10 0.34 0.457 18.6
## 46 2.40 10.10 0.34 0.615 22.1
## 47 2.40 10.10 0.34 0.473 14.7
## 48 2.40 10.10 0.34 0.381 15.8
## 49 2.40 10.10 0.34 0.320 13.2
## 50 5.60 10.10 0.55 0.789 30.8
## 51 5.60 10.10 0.55 0.677 27.5
## 52 5.60 10.10 0.55 0.590 25.2
## 53 5.60 10.10 0.55 0.523 22.8
## 54 2.14 112.00 0.34 0.680 41.7
## 55 4.14 112.00 0.34 0.803 33.7
## 56 8.15 112.00 0.34 0.889 29.7
## 57 2.14 112.00 0.34 0.514 41.8
## 58 4.14 112.00 0.34 0.672 37.1
## 59 8.15 112.00 0.34 0.801 40.1
## 60 2.14 112.00 0.34 0.306 42.7
## 61 4.14 112.00 0.34 0.506 48.6
## 62 8.15 112.00 0.34 0.668 42.4
x1 = superficial fluid velocity of the gas (cs/s)
x2 = kinematic velocity
x3 = mesh opening (cm)
x4 = dimensionless number relating the superficial velocity of the gas to the fluid
a = 0.05 (Given for statistical data analysis)
Round all answers to two decimal places.
fit1 <- lm(y~(x1+x2+x3+x4)^2, data=ds)
summary(fit1)
##
## Call:
## lm(formula = y ~ (x1 + x2 + x3 + x4)^2, data = ds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.4804 -3.0766 -0.6635 2.9625 12.2221
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.88376 23.17863 0.685 0.49616
## x1 0.18696 0.78447 0.238 0.81255
## x2 0.37921 0.06332 5.989 1.89e-07 ***
## x3 -11.99940 67.31148 -0.178 0.85919
## x4 -8.86442 35.62553 -0.249 0.80446
## x1:x2 0.01155 0.00869 1.329 0.18955
## x1:x3 NA NA NA NA
## x1:x4 -1.11525 1.14847 -0.971 0.33592
## x2:x3 NA NA NA NA
## x2:x4 -0.38547 0.11962 -3.222 0.00218 **
## x3:x4 72.85976 103.15353 0.706 0.48308
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.683 on 53 degrees of freedom
## Multiple R-squared: 0.7496, Adjusted R-squared: 0.7118
## F-statistic: 19.83 on 8 and 53 DF, p-value: 1.947e-13
The model results suggests that x1:x3 and x2:x3 interactions coefficients are NA due to singularities. This results shows collinarity, therefore the regression models of these interactions are approximately the same.
In the summary output above, the the r-squared value is 0.7496.
The F-statistic is 19.83. No rounding is necessary on this output.
Since the P-Value<<0.05, we reject the null hypothesis. As least one of the regression coefficients is significantly different from zero.
fit1_red <- lm(y~(x1+x2+x3+x4),data=ds)
anova(fit1_red,fit1)
## Analysis of Variance Table
##
## Model 1: y ~ (x1 + x2 + x3 + x4)
## Model 2: y ~ (x1 + x2 + x3 + x4)^2
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 57 1432.8
## 2 53 1162.4 4 270.37 3.0819 0.02352 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The partial F-Statistic is 3.08.
The P-Value is 0.0235.
Since the P-Value<0.05, we reject the null hypothesis. At least one interaction term is significant and should be retained.
fit1_step <- step(fit1,dorectopm="backward",trace=0)
summary(fit1_step)
##
## Call:
## lm(formula = y ~ x2 + x3 + x4 + x2:x4, data = ds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.959 -3.358 -1.131 3.040 11.646
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.52261 4.03964 0.377 0.70763
## x2 0.38056 0.06084 6.255 5.47e-08 ***
## x3 34.51062 10.29961 3.351 0.00144 **
## x4 9.52471 2.96093 3.217 0.00214 **
## x2:x4 -0.30472 0.09056 -3.365 0.00137 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.658 on 57 degrees of freedom
## Multiple R-squared: 0.7336, Adjusted R-squared: 0.7149
## F-statistic: 39.24 on 4 and 57 DF, p-value: 9.297e-16
The following terms are best for the best-fitting model: x2, x3, x4, x2:x4.
fit1_best <- lm(y~(x2+x3+x4+x2:x4),data=ds)
summary(fit1_best)
##
## Call:
## lm(formula = y ~ (x2 + x3 + x4 + x2:x4), data = ds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.959 -3.358 -1.131 3.040 11.646
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.52261 4.03964 0.377 0.70763
## x2 0.38056 0.06084 6.255 5.47e-08 ***
## x3 34.51062 10.29961 3.351 0.00144 **
## x4 9.52471 2.96093 3.217 0.00214 **
## x2:x4 -0.30472 0.09056 -3.365 0.00137 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.658 on 57 degrees of freedom
## Multiple R-squared: 0.7336, Adjusted R-squared: 0.7149
## F-statistic: 39.24 on 4 and 57 DF, p-value: 9.297e-16
The adjusted square value for the best model is 0.7149.
The coefficient for x2:x4 is -0.3047.
When calculating the confidence and prediction intervals for the model, we omit x1 because it is not included in the model.
point1 <- data.frame(x2=10,x3=0.5,x4=0.75)
predict(fit1_best,newdata=point1)
## 1
## 27.44168
Predicting y at the above points yields y=27.44.
predict(fit1_best,newdata=point1,interval="confidence",level = 0.95)
## fit lwr upr
## 1 27.44168 24.00618 30.87717
The lower bound at the predicted response from point 1 is 24.01.
The upper bound at the predicted response from point 1 is 30.88.
point2 <- data.frame(x2=3,x3=0.25,x4=0.85)
predict(fit1_best,newdata=point2)
## 1
## 18.61092
Predicting y at the above points yields y=18.61.
predict(fit1_best,newdata=point2,interval="prediction",level = 0.95)
## fit lwr upr
## 1 18.61092 8.860183 28.36165
The upper bound at the predicted response from point 2 is 28.36.
The prediction interval is always wider than the confidence interval at the same point because the prediction interval accounts for both the uncertainty in estimating the mean response and the variability of individual observations. True.
x1 was excluded from the best model because it was not statistically significant as a predictor of pressure drop and its removal improved the model based on AIC.