In an industrial experiment 30 rubber tires were tested for abrasion loss. The input variables were hardness of the rubber in Shore units and tensile strength in kg/sq m. The output variable loss measured the abrasion loss in kg/hr. Fit a suitable model. Discuss model adequacy using suitable graphical methods as well as diagnostic tests. Predict the absrasion loss for a new tire with hard=70 and tens=170. Give the 95% interval. Repeat the prediction with hard=100 and tens=250. Comment on assumptions and validity of predictions.A snippet of the data is shown below and the full dataset is available in the file abrasionLoss.txt.
Firstly, fit the full model and use the dignostic plot to analyse the adequancy of the model.
##
## Call:
## lm(formula = loss ~ hard + tens, data = abrasion)
##
## Residuals:
## Min 1Q Median 3Q Max
## -79.385 -14.608 3.816 19.755 65.981
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 885.1611 61.7516 14.334 3.84e-14 ***
## hard -6.5708 0.5832 -11.267 1.03e-11 ***
## tens -1.3743 0.1943 -7.073 1.32e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 36.49 on 27 degrees of freedom
## Multiple R-squared: 0.8402, Adjusted R-squared: 0.8284
## F-statistic: 71 on 2 and 27 DF, p-value: 1.767e-11
## Analysis of Variance Table
##
## Response: loss
## Df Sum Sq Mean Sq F value Pr(>F)
## hard 1 122455 122455 91.970 3.458e-10 ***
## tens 1 66607 66607 50.025 1.325e-07 ***
## Residuals 27 35950 1331
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Although there are three outiler points(14,19,28), the model is good due to the scatters in the RF and SL are randomly distributed.
Next use the diagnostic check(both boxcox and jarque-bera)
## Loading required package: MASS
## Loading required package: tseries
##
## Jarque Bera Test
##
## data: resid(ans1)
## X-squared = 1.1266, df = 2, p-value = 0.5693
the output of the test suggest lamda is close to 1 and the residual of the model is normally distributed, so the model is good.
Finally we do the prediction:
## 2.5 % 97.5 %
## (Intercept) 758.457323 1011.8648954
## hard -7.767432 -5.3742274
## tens -1.773001 -0.9756228
## fit lwr upr
## 1 191.57 115.3423 267.7978
## fit lwr upr
## 1 -115.4998 -207.2442 -23.75546
the prediction of hard=70 and tens=170 is good, but the second prediction doesn’t good because the output is negative.This result comes because the prediction is out of range.
The goal of this study reported in an unpublished technical report is to predict the staffing requirement for hospitals in a particular provincial jurisdiction with 17 hospitals. The data, summarized below, are available in the file staffing.txt. The output variable iswork and the rows indicate particular hospitals.
new - mean number of new patients per day xrays - number of xrays per month beds - mean number of beds in use pop - population of jurisdiction in 1000’s length - average duration of stay in hospital work - number of hours of nursing needed per 7 day week
Fit the full model and discuss its adequacy. Find the best subset model using stagewise regression and best subset regression. Discuss the adequacy of the fitted model.
##
## Call:
## lm(formula = work ~ ., data = staffing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -611.93 -431.41 -70.77 332.60 1576.06
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1962.94816 1071.36170 1.832 0.0941 .
## new -15.85167 97.65299 -0.162 0.8740
## xrays 0.05593 0.02126 2.631 0.0234 *
## beds 1.58962 3.09208 0.514 0.6174
## pop -4.21867 7.17656 -0.588 0.5685
## length -394.31412 209.63954 -1.881 0.0867 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 642.1 on 11 degrees of freedom
## Multiple R-squared: 0.9908, Adjusted R-squared: 0.9867
## F-statistic: 237.8 on 5 and 11 DF, p-value: 8.068e-11
## Analysis of Variance Table
##
## Response: work
## Df Sum Sq Mean Sq F value Pr(>F)
## new 1 480612694 480612694 1165.7505 1.629e-12 ***
## xrays 1 7231656 7231656 17.5407 0.001516 **
## beds 1 598469 598469 1.4516 0.253550
## pop 1 276098 276098 0.6697 0.430531
## length 1 1458572 1458572 3.5378 0.086703 .
## Residuals 11 4535052 412277
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
now we use the AIC and BIC to select the best subset regression.
## Start: AIC=224.4
## work ~ new + xrays + beds + pop + length
##
## Df Sum of Sq RSS AIC
## - new 1 10863 4545916 222.44
## - beds 1 108962 4644015 222.80
## - pop 1 142465 4677517 222.93
## <none> 4535052 224.40
## - length 1 1458572 5993624 227.14
## - xrays 1 2853834 7388887 230.70
##
## Step: AIC=222.44
## work ~ xrays + beds + pop + length
##
## Df Sum of Sq RSS AIC
## - pop 1 367483 4913399 221.76
## <none> 4545916 222.44
## - length 1 2008920 6554835 226.66
## - xrays 1 2874412 7420328 228.77
## - beds 1 19070222 23616137 248.45
##
## Step: AIC=221.76
## work ~ xrays + beds + length
##
## Df Sum of Sq RSS AIC
## <none> 4913399 221.76
## - length 1 1658984 6572383 224.71
## - xrays 1 2628688 7542086 227.05
## - beds 1 32726195 37639593 254.38
##
## Call:
## lm(formula = work ~ xrays + beds + length, data = staffing)
##
## Coefficients:
## (Intercept) xrays beds length
## 1523.38924 0.05299 0.97848 -320.95083
##
## Call:
## lm(formula = work ~ xrays + beds + length, data = staffing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -687.40 -380.60 -25.03 281.91 1630.50
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1523.38924 786.89772 1.936 0.0749 .
## xrays 0.05299 0.02009 2.637 0.0205 *
## beds 0.97848 0.10515 9.305 4.12e-07 ***
## length -320.95083 153.19222 -2.095 0.0563 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 614.8 on 13 degrees of freedom
## Multiple R-squared: 0.9901, Adjusted R-squared: 0.9878
## F-statistic: 432 on 3 and 13 DF, p-value: 2.894e-13
## Start: AIC=229.4
## work ~ new + xrays + beds + pop + length
##
## Df Sum of Sq RSS AIC
## - new 1 10863 4545916 226.61
## - beds 1 108962 4644015 226.97
## - pop 1 142465 4677517 227.09
## <none> 4535052 229.40
## - length 1 1458572 5993624 231.31
## - xrays 1 2853834 7388887 234.87
##
## Step: AIC=226.61
## work ~ xrays + beds + pop + length
##
## Df Sum of Sq RSS AIC
## - pop 1 367483 4913399 225.09
## <none> 4545916 226.61
## - length 1 2008920 6554835 230.00
## - xrays 1 2874412 7420328 232.10
## - beds 1 19070222 23616137 251.78
##
## Step: AIC=225.1
## work ~ xrays + beds + length
##
## Df Sum of Sq RSS AIC
## <none> 4913399 225.09
## - length 1 1658984 6572383 227.21
## - xrays 1 2628688 7542086 229.55
## - beds 1 32726195 37639593 256.88
##
## Call:
## lm(formula = work ~ xrays + beds + length, data = staffing)
##
## Coefficients:
## (Intercept) xrays beds length
## 1523.38924 0.05299 0.97848 -320.95083
##
## Call:
## lm(formula = work ~ xrays + beds + length, data = staffing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -687.40 -380.60 -25.03 281.91 1630.50
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1523.38924 786.89772 1.936 0.0749 .
## xrays 0.05299 0.02009 2.637 0.0205 *
## beds 0.97848 0.10515 9.305 4.12e-07 ***
## length -320.95083 153.19222 -2.095 0.0563 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 614.8 on 13 degrees of freedom
## Multiple R-squared: 0.9901, Adjusted R-squared: 0.9878
## F-statistic: 432 on 3 and 13 DF, p-value: 2.894e-13
now, we use the best subset regression.
## Loading required package: bestglm
## Loading required package: leaps
## BIC
## BICq equivalent for q in (0.25804914597404, 0.680450993834172)
## Best Model:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1523.38923568 786.89772473 1.935943 7.492387e-02
## xrays 0.05298733 0.02009194 2.637243 2.050318e-02
## beds 0.97848162 0.10515362 9.305258 4.121293e-07
## length -320.95082518 153.19222065 -2.095086 5.631250e-02
## Analysis of Variance Table
##
## Response: work
## Df Sum Sq Mean Sq F value Pr(>F)
## xrays 1 441952483 441952483 1169.3296 4.041e-14 ***
## beds 1 46187675 46187675 122.2046 5.556e-08 ***
## length 1 1658984 1658984 4.3894 0.05631 .
## Residuals 13 4913399 377954
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
this result also indicates the best subset regression is : work=1523.38923568+0.05298733xrays + 0.97848162lbeds-320.95082518*length and the model is good from the anova above, which suggests the variales is statistically significant.
## fit lwr upr
## 1 11163.85 9719.929 12607.77
so the 95% prediction interval is (9719.929, 12607.77)
Assumption: The model is correct and the residual of the model is normally disributed.And the parameters are known.
Thirteen pieces of iron we varying allows with varying iron (Fe) content were placed in seawater for2 months and the weight loss (loss) in mg per square decimeter per day was measured.
##
## Jarque Bera Test
##
## data: resid(ans3)
## X-squared = 1.1877, df = 2, p-value = 0.5522
using the Jarque-Bera test we know the residual of the polynomial regression model of degree 6 does follow the normal distribution due to the p-value is 0.552, so it is a statistically good model for prediction purposes.
now use the backward regression to choose the best polynomial regression
## polynomial 1 polynomial 2 polynomial 3 polynomial 4 polynomial 5
## AIC 69.78062 71.42649 63.04931 61.75842 52.28024
## BIC 71.47547 73.68629 65.87406 65.14812 56.23488
## polynomial 6
## AIC 51.61314
## BIC 56.13274
so the AIC suggest the model the polynomial regression of degree 6 is the best where the BIC give the more parsimonious model is also polynomial regression of degree 6.