1 Description of the problem

the problem studies the effect of four factors (x1 = superficial fluid velocity of the gas (cs/s), x2= kinematic velocity, x3= mesh opening (cm), and x4= dimensionless number relating the superficial velocity of the gas to the fluid) on the pressure drop in a screen plate bubble column (y).

dat<-read.csv("data-table-B9.CSV",row.names = NULL)
dat
##      x1     x2   x3    x4    y
## 1  2.14  10.00 0.34 1.000 28.9
## 2  4.14  10.00 0.34 1.000 31.0
## 3  8.15  10.00 0.34 1.000 26.4
## 4  2.14  10.00 0.34 0.246 27.2
## 5  4.14  10.00 0.34 0.379 26.1
## 6  8.15  10.00 0.34 0.474 23.2
## 7  2.14  10.00 0.34 0.141 19.7
## 8  4.14  10.00 0.34 0.234 22.1
## 9  8.15  10.00 0.34 0.311 22.8
## 10 2.14  10.00 0.34 0.076 29.2
## 11 4.14  10.00 0.34 0.132 23.6
## 12 8.15  10.00 0.34 0.184 23.6
## 13 2.14   2.63 0.34 0.679 24.2
## 14 4.14   2.63 0.34 0.804 22.1
## 15 8.15   2.63 0.34 0.890 20.9
## 16 2.14   2.63 0.34 0.514 17.6
## 17 4.14   2.63 0.34 0.672 15.7
## 18 8.15   2.63 0.34 0.801 15.8
## 19 2.14   2.63 0.34 0.346 14.0
## 20 4.14   2.63 0.34 0.506 17.1
## 21 8.15   2.63 0.34 0.669 18.3
## 22 2.14   2.63 0.34 1.000 33.8
## 23 4.14   2.63 0.34 1.000 31.7
## 24 8.15   2.63 0.34 1.000 28.1
## 25 5.60   1.25 0.34 0.848 18.1
## 26 5.60   1.25 0.34 0.737 16.5
## 27 5.60   1.25 0.34 0.651 15.4
## 28 5.60   1.25 0.34 0.554 15.0
## 29 4.30   2.63 0.34 0.748 19.1
## 30 4.30   2.63 0.34 0.682 16.2
## 31 4.30   2.63 0.34 0.524 16.3
## 32 4.30   2.63 0.34 0.472 15.8
## 33 4.30   2.63 0.34 0.398 15.4
## 34 5.60  10.10 0.25 0.789 19.2
## 35 5.60  10.10 0.25 0.677  8.4
## 36 5.60  10.10 0.25 0.590 15.0
## 37 5.60  10.10 0.25 0.523 12.0
## 38 5.60  10.10 0.34 0.789 21.9
## 39 5.60  10.10 0.34 0.677 21.3
## 40 5.60  10.10 0.34 0.590 21.6
## 41 5.60  10.10 0.34 0.523 19.8
## 42 4.30  10.10 0.34 0.741 21.6
## 43 4.30  10.10 0.34 0.617 17.3
## 44 4.30  10.10 0.34 0.524 20.0
## 45 4.30  10.10 0.34 0.457 18.6
## 46 2.40  10.10 0.34 0.615 22.1
## 47 2.40  10.10 0.34 0.473 14.7
## 48 2.40  10.10 0.34 0.381 15.8
## 49 2.40  10.10 0.34 0.320 13.2
## 50 5.60  10.10 0.55 0.789 30.8
## 51 5.60  10.10 0.55 0.677 27.5
## 52 5.60  10.10 0.55 0.590 25.2
## 53 5.60  10.10 0.55 0.523 22.8
## 54 2.14 112.00 0.34 0.680 41.7
## 55 4.14 112.00 0.34 0.803 33.7
## 56 8.15 112.00 0.34 0.889 29.7
## 57 2.14 112.00 0.34 0.514 41.8
## 58 4.14 112.00 0.34 0.672 37.1
## 59 8.15 112.00 0.34 0.801 40.1
## 60 2.14 112.00 0.34 0.306 42.7
## 61 4.14 112.00 0.34 0.506 48.6
## 62 8.15 112.00 0.34 0.668 42.4

2 full F-test:

#fitting the model

model <- lm(y~x1+x2+x3+x4,data=dat)
full_model<-lm(y~(x1+x2+x3+x4)^2, data=dat)
full_model
## 
## Call:
## lm(formula = y ~ (x1 + x2 + x3 + x4)^2, data = dat)
## 
## Coefficients:
## (Intercept)           x1           x2           x3           x4        x1:x2  
##    15.88376      0.18695      0.37921    -11.99940     -8.86442      0.01155  
##       x1:x3        x1:x4        x2:x3        x2:x4        x3:x4  
##          NA     -1.11525           NA     -0.38547     72.85976
summary_full<-summary(full_model)
summary_full
## 
## Call:
## lm(formula = y ~ (x1 + x2 + x3 + x4)^2, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.4804 -3.0766 -0.6635  2.9625 12.2221 
## 
## Coefficients: (2 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  15.88376   23.17863   0.685  0.49616    
## x1            0.18696    0.78447   0.238  0.81255    
## x2            0.37921    0.06332   5.989 1.89e-07 ***
## x3          -11.99940   67.31148  -0.178  0.85919    
## x4           -8.86442   35.62553  -0.249  0.80446    
## x1:x2         0.01155    0.00869   1.329  0.18955    
## x1:x3              NA         NA      NA       NA    
## x1:x4        -1.11525    1.14847  -0.971  0.33592    
## x2:x3              NA         NA      NA       NA    
## x2:x4        -0.38547    0.11962  -3.222  0.00218 ** 
## x3:x4        72.85976  103.15353   0.706  0.48308    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.683 on 53 degrees of freedom
## Multiple R-squared:  0.7496, Adjusted R-squared:  0.7118 
## F-statistic: 19.83 on 8 and 53 DF,  p-value: 1.947e-13

2.1 Question 01:

the dropped interactions are Answer: x1:x3 and x2:x3

2.2 Question 02:

R^2 value is:0.7496

2.3 Question 03:

F-statistics for the overall regression is:19.83

2.4 Question 04:

conclusion about the full regression model is:Reject H0 (at least one regression coefficient is significantly different from zero)

3 Partial F-test:

#fitting partail f-test

3.1 Question 05:

reduced_model<-lm(y~x1+x2+x3+x4, data=dat)
reduced_model
## 
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4, data = dat)
## 
## Coefficients:
## (Intercept)           x1           x2           x3           x4  
##      5.8945      -0.4779       0.1827      35.4028       5.8439
anova_test<-anova(reduced_model,full_model)
anova_test
## Analysis of Variance Table
## 
## Model 1: y ~ x1 + x2 + x3 + x4
## Model 2: y ~ (x1 + x2 + x3 + x4)^2
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     57 1432.8                              
## 2     53 1162.4  4    270.37 3.0819 0.02352 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

the partial F-test is:3.08

3.2 Question 06:

P-value for the partial F-Test fo all two-factor interaction is: 0.0235

3.3 Question 07:

conclusion about the two-factor interactions:Reject H0 (at least one interaction term is significant and should be retained)

4 Stepwise Regression

4.1 Question 08:

best_fitting_model<-step(full_model,direction = "backward")
## Start:  AIC=199.73
## y ~ (x1 + x2 + x3 + x4)^2
## 
## 
## Step:  AIC=199.73
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x1:x3 + x1:x4 + x2:x4 + x3:x4
## 
## 
## Step:  AIC=199.73
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x1:x4 + x2:x4 + x3:x4
## 
##         Df Sum of Sq    RSS    AIC
## - x3:x4  1    10.942 1173.4 198.31
## - x1:x4  1    20.682 1183.1 198.82
## <none>               1162.4 199.73
## - x1:x2  1    38.737 1201.2 199.76
## - x2:x4  1   227.751 1390.2 208.82
## 
## Step:  AIC=198.31
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x1:x4 + x2:x4
## 
##         Df Sum of Sq    RSS    AIC
## - x1:x4  1    19.837 1193.2 197.35
## <none>               1173.4 198.31
## - x1:x2  1    38.709 1212.1 198.32
## - x2:x4  1   228.394 1401.8 207.34
## - x3     1   249.320 1422.7 208.26
## 
## Step:  AIC=197.35
## y ~ x1 + x2 + x3 + x4 + x1:x2 + x2:x4
## 
##         Df Sum of Sq    RSS    AIC
## - x1:x2  1    32.307 1225.5 197.01
## <none>               1193.2 197.35
## - x2:x4  1   220.026 1413.2 205.84
## - x3     1   252.209 1445.4 207.24
## 
## Step:  AIC=197.01
## y ~ x1 + x2 + x3 + x4 + x2:x4
## 
##         Df Sum of Sq    RSS    AIC
## - x1     1    11.262 1236.8 195.57
## <none>               1225.5 197.01
## - x2:x4  1   207.286 1432.8 204.69
## - x3     1   248.430 1473.9 206.45
## 
## Step:  AIC=195.57
## y ~ x2 + x3 + x4 + x2:x4
## 
##         Df Sum of Sq    RSS    AIC
## <none>               1236.8 195.57
## - x3     1    243.60 1480.4 204.72
## - x2:x4  1    245.68 1482.4 204.81
summary(best_fitting_model)
## 
## Call:
## lm(formula = y ~ x2 + x3 + x4 + x2:x4, data = dat)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.959 -3.358 -1.131  3.040 11.646 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.52261    4.03964   0.377  0.70763    
## x2           0.38056    0.06084   6.255 5.47e-08 ***
## x3          34.51062   10.29961   3.351  0.00144 ** 
## x4           9.52471    2.96093   3.217  0.00214 ** 
## x2:x4       -0.30472    0.09056  -3.365  0.00137 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.658 on 57 degrees of freedom
## Multiple R-squared:  0.7336, Adjusted R-squared:  0.7149 
## F-statistic: 39.24 on 4 and 57 DF,  p-value: 9.297e-16

The best-fitting model includes x2, x3, x4, and the interaction x2:x4. Variable x1 is dropped because it is not statistically significant. \[ (y\sim x2+x3+x4+x2:x4) \]

4.2 Question 09:

best_model<-lm(y~x2+x3+x4+x2:x4, data=dat)
best_model
## 
## Call:
## lm(formula = y ~ x2 + x3 + x4 + x2:x4, data = dat)
## 
## Coefficients:
## (Intercept)           x2           x3           x4        x2:x4  
##      1.5226       0.3806      34.5106       9.5247      -0.3047
summary(best_model)
## 
## Call:
## lm(formula = y ~ x2 + x3 + x4 + x2:x4, data = dat)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.959 -3.358 -1.131  3.040 11.646 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.52261    4.03964   0.377  0.70763    
## x2           0.38056    0.06084   6.255 5.47e-08 ***
## x3          34.51062   10.29961   3.351  0.00144 ** 
## x4           9.52471    2.96093   3.217  0.00214 ** 
## x2:x4       -0.30472    0.09056  -3.365  0.00137 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.658 on 57 degrees of freedom
## Multiple R-squared:  0.7336, Adjusted R-squared:  0.7149 
## F-statistic: 39.24 on 4 and 57 DF,  p-value: 9.297e-16

The Adjusted R square value for the best model is 0.7149

4.3 Question 10:

The estimated coefficient for the x2:x4 interaction term is -0.3047

5 Confidence and Prediction Intervals

5.1 Question 11:

We should omit \(x1\) because it is not in the model

5.2 Question 12:

pt1 <- data.frame(x2 = 10, x3 = 0.5, x4 = 0.75)
pred1 <- predict(best_model, pt1, interval="confidence", level=0.95)
pred1
##        fit      lwr      upr
## 1 27.44168 24.00618 30.87717

the predicted value of y is 27.44

5.3 Question 13:

The lower bound of the 95% confidence interval on the mean response at Point 1 (x2 = 10, x3 = 0.5, x4 = 0.75) is 24.01

5.4 Question 14:

The upper bound of the 95% confidence interval on the mean response at Point 1 is 30.88

5.5 Question 15:

pt2 <- data.frame(x2 = 3, x3 = 0.25, x4 = 0.85)
pred2 <- predict(best_model, pt2, interval="prediction", level=0.95)
pred2
##        fit      lwr      upr
## 1 18.61092 8.860183 28.36165

The predicted (fitted) value of y at Point 2 (x2 = 3, x3 = 0.25, x4 = 0.85)is 18.61

5.6 Question 16:

The upper bound of the 95% prediction interval at Point 2 (x2 = 3, x3 = 0.25, x4 = 0.85) is 28.36

5.7 Question 17:

The prediction interval is always wider than the confidence interval at the same point because the prediction interval accounts for both the uncertainty in estimating the mean response and the variability of individual observations is [True]

5.8 Question 18:

x1 was not statistically significant as a predictor of pressure drop and its removal improved the model based on AIC