12.59 A government agency pays research contractors a fee to cover overhead costs, over and above the direct costs of a research project. Although overhead costs vary considerably among contracts, they are usually a substantial share of the total contract cost. An agency task force obtained data on overhead costs are a percentage of direct costs, number of employees of the contractor, size of contract as a percentage of the contractor’s yearly income, and personnel costs as a percentage of direct costs. The data are given below for the 86 research contractors as follows: contractor, overhead costs as a percentage of direct costs, number of employees, size of contract, and personnel costs as a percentage of direct costs. Cont. OverCost NumEmp Size PerCosts 1 66.4 293 2.14 69 2 73.7 117 1.15 61 3 62.8 356 0.49 59 4 69.7 579 1.78 50 5 69.5 400 1.00 70 6 60.1 154 0.88 63 7 76.4 1,234 1.24 70 8 70.1 343 2.08 55 9 60.0 186 1.87 60 10 65.6 65 2.29 64 11 66.5 788 3.07 70 12 66.5 600 2.98 60 13 71.0 871 2.32 58 14 68.3 562 3.07 62 15 68.7 337 1.33 59 16 66.1 296 3.70 76 17 56.4 126 1.99 54 18 60.9 252 2.74 62 19 66.4 439 2.08 60 20 72.2 558 3.43 65 21 63.3 379 1.99 63 22 72.6 453 1.24 61 23 70.1 233 2.86 37 24 56.2 194 1.24 65 25 74.8 435 4.00 58 26 78.1 194 0.85 64 27 64.0 94 3.58 52 28 76.0 609 1.96 61 29 66.5 183 2.47 42 30 63.3 502 2.38 74 31 72.6 1,182 2.35 66 32 76.4 7,216 3.97 68 33 65.3 512 2.08 59 34 73.1 1,236 2.59 50 35 76.0 2,247 2.56 61 36 57 .8 65 0.91 72 37 80.1 157 1.66 53 38 66.6 423 1.96 64 39 59.8 429 2.11 54 40 64.2 487 1.06 58 41 67 .7 218 2.08 60 42 74.6 190 2.62 59 43 67 .9 169 1.03 50 44 72.7 1,422 1.87 69 45 65.5 269 1.15 73 46 72.4 531 2.77 68 47 78.7 421 3.76 45 48 61.9 235 2.56 80 49 85.6 1,866 1.90 37 50 58.1 88 1.87 59 51 75.6 1,833 4.45 55 52 63.0 870 1.54 66 53 67 .9 946 2.29 56 54 65.8 422 4.72 65 55 57 .1 79 2.74 64 56 74.7 393 4.54 64 57 66.1 229 1.66 68 58 68.5 316 3.07 57 59 55.2 224 1.54 66 60 60.9 573 1.09 70 61 72.3 461 2.50 66 62 70.2 732 1.48 68 63 62.2 189 2.02 64 64 58.1 195 2.29 65 65 66.2 962 2.17 60 66 84.1 964 4.90 45 67 81.6 921 3.28 54 68 76.7 214 2.62 79 69 60.4 127 0.76 63 70 80.9 3,766 3.19 55 71 74.8 1,576 3.52 54 72 79.2 764 3.04 57 73 68.1 408 1.36 50 74 66.8 370 1.57 70 75 83.6 769 2.23 53 76 61.7 1,041 3.01 63 77 76.2 546 2.86 63 78 64.3 147 1.27 51 79 71.3 148 1.72 55 80 63.8 501 1.42 57 81 80.4 1,686 2.26 57 82 80.1 1,264 2.68 58 83 59.9 229 0.43 67 84 65.5 111 0.28 57 85 73.0 2,138 3.82 63 86 67 .0 356 3.58 55 a. Obtain correlations of all pairs of variables. Is there a severe collinearity problem with the data? There are 4 variables.
Dependent variable(Y): The overhead cost as a fraction of the direct cost.
Independent variables are(X’s): Number of employees, size of contract, and personnel costs as a percentage of direct costs.

##                 Cont   Overcost      Numemp       Size     Percost
## Cont      1.00000000  0.1564774  0.11110776  0.1139612 -0.08232542
## Overcost  0.15647736  1.0000000  0.42553526  0.3788295 -0.31971781
## Numemp    0.11110776  0.4255353  1.00000000  0.3415081 -0.02738404
## Size      0.11396118  0.3788295  0.34150815  1.0000000 -0.13990702
## Percost  -0.08232542 -0.3197178 -0.02738404 -0.1399070  1.00000000

It seems that the correlation size between the variables is large enough to be considered a collinearity problem.

  1. Plot overhead costs against each of the other variables. Locate a possible high influence outlier. From the scatterplots, I can only see one point that stands out as an outlier Contract # 32, Overhead Cost = 76.4, Number of Employees = 7216, Size of Contract = 3.97, Personnel Cost = 68. Plotting the Cook’s Distance Identifier, one can also see that there is one value that has a higher value than .5, which is the 76.4 Overhead Cost that we identified on the scatterplots.

  2. Obtain a regression equation (with overhead costs as the dependent variable) using all the data including any potential outlier.

## 
## Call:
## lm(formula = respro$Overcost ~ respro$Numemp + respro$Size + 
##     respro$Percost)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.2009  -3.9458  -0.5484   4.1301  13.5649 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    78.3514919  5.3410931  14.670  < 2e-16 ***
## respro$Numemp   0.0026371  0.0007484   3.524 0.000699 ***
## respro$Size     1.5763366  0.6917787   2.279 0.025287 *  
## respro$Percost -0.2448104  0.0807239  -3.033 0.003245 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.032 on 82 degrees of freedom
## Multiple R-squared:  0.3192, Adjusted R-squared:  0.2943 
## F-statistic: 12.81 on 3 and 82 DF,  p-value: 6.027e-07

The regression equation is: Overhead Cost = 78.3514900 + (0.0026371 * Number of Employees) + (1.5763367 * Size of Contract) + (-0.2448104 * Personnel Cost)

  1. Delete the potential outlier, and get a revised regression equation. How much did the slopes change?
## 
## Call:
## lm(formula = Overcost ~ Numemp + Size + Percost)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.9425  -3.6993  -0.6803   4.4170  14.0105 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 75.048492   5.227472  14.357  < 2e-16 ***
## Numemp       0.005160   0.001115   4.629 1.38e-05 ***
## Size         1.363228   0.665299   2.049   0.0437 *  
## Percost     -0.204980   0.078346  -2.616   0.0106 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.766 on 81 degrees of freedom
## Multiple R-squared:  0.3771, Adjusted R-squared:  0.3541 
## F-statistic: 16.35 on 3 and 81 DF,  p-value: 2.138e-08

The new regression equation is: Overhead Cost = 75.048492 + (0.005160 * Number of Employees) + (1.363228 * Size of Contract) + (-0.204980 * Personnel Cost). The slope of the Number of Employees increased by:

## [1] 0.0025229

The slope of the Size of Contract decreased by:

## [1] -0.2131087

The slope of the Personel Cost increased by:

## [1] 0.0398304

12.60 Consider the outlier-deleted regression model of Exercise 12.59. a. Locate the F statistic. What null hypothesis is being tested? What can we conclude based on the F statistic? F-statistic: 16.35 Ho: B1=B2=B3=0 Ha: At least one coefficient is NOT equal to zero With a p-value: 6.027e-07 < .05, we reject the NULL and we can conclude that there is sufficient evidence that at least one of the variables has some predictive power. b. Locate the t statistic for each independent variable. What conclusions can we reach based on the t tests? Ho: B0 = 0 Ha: B0 is NOT 0 t value: 14.357 p value: < 2e-16

Ho: B1 = 0 Ha: B1 is NOT 0 t value: 4.629 p value: 1.38e-05

Ho: B2 = 0 Ha: B2 is NOT 0 t value: 2.049 p value: 0.0437

Ho: B3 = 0 Ha: B3 is NOT 0 t value: -2.616 p value: 0.0106

In all of the above, the p values are lower than .05, so we reject the NULL in each case. There is enough evidence to conclude that each independent variable has some predictability when they are all present in the model.

12.61 Use the outlier-deleted data of Exercise 12.59 to predict overhead costs of a contract when the contractor has 500 employees, the contract is 2.50% of the contractor’s income, and personnel costs are 55% of direct costs. Obtain a 95% prediction interval. Would overhead costs equal to 88.9% of direct costs be unreasonable in this situation?

##        fit      lwr      upr
## 1 69.76267 58.18292 81.34242

The 95% confidence interval is between 58.18292 and 81.34242. A 88.9% overhead cost would be unreasonable in this situation because it falls outside of the confidence interval.

13.54 A soil scientist wants to relate the daily evaporation from the soil to air temperature, relative humidity, and wind speed. The scientist collects data at a number of locations in Texas on the variables maximum, minimum, and average soil temperature (x1, x2, x3); maximum, minimum, and average air temperature (x4, x5, x6); maximum, minimum, and average relative humidity (x7,x8, x9); and total wind (x10). The response is the daily amount of evaporation from the soil (y). The data are given below. Obs x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 y 1 84 65 147 85 59 151 95 40 398 273 30 2 84 65 149 86 61 159 94 28 345 140 34 3 84 66 142 83 64 152 94 41 388 318 33 4 79 67 147 83 65 158 94 50 406 282 26 5 81 68 167 88 69 180 93 46 379 311 41 6 74 66 131 77 67 147 96 73 478 446 4 7 73 66 131 78 69 159 96 72 462 294 5 8 75 67 134 84 68 159 95 70 464 313 20 9 84 68 161 89 71 195 95 63 430 455 31 10 86 72 169 91 76 206 93 56 406 604 38 11 88 73 178 91 76 208 94 55 393 610 43 12 90 74 187 94 76 211 94 51 385 520 47 13 88 72 171 94 75 211 96 54 405 663 45 14 88 72 171 92 70 201 95 51 392 467 45 15 81 69 154 87 68 167 95 61 448 184 11 16 79 68 149 83 68 162 95 59 436 177 10 17 84 69 160 87 66 173 95 42 392 173 30 18 84 70 160 87 68 177 94 44 392 76 29 19 84 70 168 88 70 169 95 48 398 72 23 20 77 67 147 83 66 170 97 60 431 183 16 21 87 67 166 92 67 196 96 44 379 76 37 22 89 69 171 92 72 199 94 48 393 230 50 23 89 72 180 94 72 204 95 48 394 193 36 24 93 72 186 92 73 201 94 47 386 400 54 25 93 74 188 93 72 206 95 47 389 339 44 26 94 75 199 94 72 208 96 45 370 172 41 27 93 74 193 95 73 214 95 50 396 238 45 28 93 74 196 95 70 210 96 45 380 118 42 29 96 75 198 95 71 207 93 40 365 93 50 30 95 76 202 95 69 202 93 39 357 269 48 31 84 73 173 96 69 173 94 58 418 128 17 32 91 71 170 91 69 168 94 44 420 423 20 33 88 72 179 89 70 189 93 50 399 415 15 34 89 72 179 95 71 210 98 46 389 300 42 35 91 72 182 96 73 208 95 43 384 193 44 36 92 74 196 97 75 215 96 46 389 195 41 37 94 75 192 96 69 198 95 36 380 215 49 38 96 75 195 95 67 196 97 24 354 185 53 39 93 76 198 94 75 211 93 43 364 466 53 40 88 74 188 92 73 198 95 52 405 399 21 41 88 74 178 90 74 197 95 61 447 232 1 42 91 72 175 94 70 205 94 42 380 275 44 43 92 72 190 95 71 209 96 44 379 166 44 44 92 73 189 96 72 208 93 42 372 189 46 45 94 75 194 95 71 208 93 43 373 164 47 46 96 76 202 96 71 208 94 40 368 139 50

  1. Fit the following model to the data and display the fitted model. y = bo + b1 x1 + b2 x2 + b3 x3 + b4 x4 + b5 x5 + b6 x6 + b7 x7 + b8 x8 + b9 x9 + b10 x10 + e
## 
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + 
##     x10)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.756  -2.061   0.221   2.948  12.698 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  33.789953 118.676140   0.285  0.77753   
## x1            1.349042   0.790245   1.707  0.09666 . 
## x2           -0.252103   1.111376  -0.227  0.82187   
## x3           -0.461898   0.299406  -1.543  0.13189   
## x4            0.529305   0.583671   0.907  0.37068   
## x5            0.013451   0.781837   0.017  0.98637   
## x6            0.194243   0.210396   0.923  0.36221   
## x7            0.563309   1.094915   0.514  0.61015   
## x8            0.505484   0.467209   1.082  0.28669   
## x9           -0.469216   0.152507  -3.077  0.00405 **
## x10           0.011747   0.009166   1.282  0.20844   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.68 on 35 degrees of freedom
## Multiple R-squared:  0.838,  Adjusted R-squared:  0.7918 
## F-statistic: 18.11 on 10 and 35 DF,  p-value: 4.976e-11

The regression equation is: y = 33.789953 + (1.349042 * x1) + (-0.252103 * x2) + (-0.461898 * x3) + (0.529305 * x4) + (0.013451 * x5) + (0.194243 * x6) + (0.563309 * x7) + (0.505484 * x8) + (-0.469216 * x9) + (0.011747 * x10) + 6.68

  1. Produce a 95% confidence interval on the average evaporation for the following values of the explanatory variables: x1 = 90, x2 = 70, x3 = 150, x4 = 85, x5 = 65 x6 = 180, x7 = 95, x8 = 40, x9 = 375, x10 = 450
##        fit      lwr      upr
## 1 52.16445 32.25766 72.07124

The 95% confidence interval is from 32.25766 to 72.07124.

13.55 Refer to Exercise 13.54. a. Is there a strong correlation between any of the pairs of explanatory variables?

##             Obs          x1          x2          x3          x4
## Obs  1.00000000  0.70921893  0.76248586  0.76176841  0.72126089
## x1   0.70921893  1.00000000  0.84165983  0.93769315  0.89559846
## x2   0.76248586  0.84165983  1.00000000  0.93322587  0.83995526
## x3   0.76176841  0.93769315  0.93322587  1.00000000  0.91407951
## x4   0.72126089  0.89559846  0.83995526  0.91407951  1.00000000
## x5   0.41975750  0.44859243  0.68085223  0.59405737  0.57048796
## x6   0.62549423  0.80805131  0.81826848  0.86979753  0.87378089
## x7  -0.01855013 -0.17857770 -0.16568357 -0.16097906 -0.10336549
## x8  -0.36046463 -0.68686311 -0.33981754 -0.53306937 -0.52668394
## x9  -0.36745365 -0.75817271 -0.48163942 -0.67949321 -0.65547006
## x10 -0.29810113 -0.08947968  0.03038564 -0.09348006 -0.08994321
## y    0.31391476  0.76673943  0.53931454  0.68754417  0.72313685
##              x5          x6          x7         x8          x9         x10
## Obs  0.41975750  0.62549423 -0.01855013 -0.3604646 -0.36745365 -0.29810113
## x1   0.44859243  0.80805131 -0.17857770 -0.6868631 -0.75817271 -0.08947968
## x2   0.68085223  0.81826848 -0.16568357 -0.3398175 -0.48163942  0.03038564
## x3   0.59405737  0.86979753 -0.16097906 -0.5330694 -0.67949321 -0.09348006
## x4   0.57048796  0.87378089 -0.10336549 -0.5266839 -0.65547006 -0.08994321
## x5   1.00000000  0.78293539 -0.12150816  0.1908723 -0.06533297  0.41088727
## x6   0.78293539  1.00000000 -0.04177869 -0.3020213 -0.53655853  0.12845485
## x7  -0.12150816 -0.04177869  1.00000000  0.1721152  0.27207261 -0.14687459
## x8   0.19087227 -0.30202133  0.17211519  1.0000000  0.91117134  0.34572396
## x9  -0.06533297 -0.53655853  0.27207261  0.9111713  1.00000000  0.22291955
## x10  0.41088727  0.12845485 -0.14687459  0.3457240  0.22291955  1.00000000
## y    0.33251430  0.71060941 -0.18804136 -0.6720591 -0.82551484  0.04998831
##               y
## Obs  0.31391476
## x1   0.76673943
## x2   0.53931454
## x3   0.68754417
## x4   0.72313685
## x5   0.33251430
## x6   0.71060941
## x7  -0.18804136
## x8  -0.67205914
## x9  -0.82551484
## x10  0.04998831
## y    1.00000000

Yes, for example, one can see that between x1 and x2,x3,x4,x6, and x9,x10, there are strong correlations. There are also others that can be identified from the correlation table above.

What problems may result if several of the explanatory variables are highly correlated?

##        x1        x2        x3        x4        x5        x6        x7 
## 23.113092 13.533879 36.438101  8.818112  8.286190 19.632290  1.755311 
##        x8        x9       x10 
## 22.101796 20.386838  1.883657

A VIF value > 10 indicates that variable is a linear combination of the other predictor variables. Colinearity may result in which two or more variables in the multiple regression model are highly correlated. One varibale can be linearly predicted from the other with a high degree of accuracy. Changes in the coefficients can be erratic due to small changes in the model or the data.

  1. Evaluate whether the conditions of normality and equal variance hold for your model in Exercise 13.54. From the Q-Q plot, we can see that some of the residuals do NOT lie on the line, so normality does NOt hold. From the equality of variance plot, we can see that the points lie within the same distance of the zero line as fitte values increase, so equality of variance does hold.

13.56 Refer to Exercise 13.54. a. Formulate a new model using a variable selection procedure with AIC and then BIC as the criterion to select the independent variables. Was there a large difference in the selected variables for the two methods? AIC using stepwise selection

## Start:  AIC=247.88
## y ~ 1
## 
##        Df Sum of Sq    RSS    AIC
## + x9    1    6570.9 3071.3 197.25
## + x1    1    5668.5 3973.6 209.10
## + x4    1    5042.1 4600.0 215.84
## + x6    1    4868.9 4773.2 217.54
## + x3    1    4558.0 5084.1 220.44
## + x8    1    4355.0 5287.1 222.24
## + x2    1    2804.5 6837.6 234.07
## + x5    1    1066.1 8576.0 244.49
## <none>              9642.1 247.88
## + x7    1     340.9 9301.2 248.23
## + x10   1      24.1 9618.0 249.77
## 
## Step:  AIC=197.26
## y ~ x9
## 
##        Df Sum of Sq    RSS    AIC
## + x6    1     970.1 2101.1 181.79
## + x5    1     751.5 2319.7 186.35
## + x4    1     560.2 2511.1 189.99
## + x10   1     555.6 2515.6 190.07
## + x1    1     449.9 2621.3 191.97
## + x8    1     364.6 2706.6 193.44
## + x3    1     287.2 2784.1 194.74
## + x2    1     252.1 2819.1 195.31
## <none>              3071.3 197.25
## + x7    1      13.9 3057.3 199.05
## - x9    1    6570.9 9642.1 247.88
## 
## Step:  AIC=181.79
## y ~ x9 + x6
## 
##        Df Sum of Sq    RSS    AIC
## + x10   1    221.16 1879.9 178.68
## + x3    1    214.74 1886.4 178.83
## + x2    1    138.89 1962.2 180.65
## <none>              2101.1 181.79
## + x4    1     10.38 2090.7 183.56
## + x8    1      7.80 2093.3 183.62
## + x1    1      4.81 2096.3 183.69
## + x5    1      0.30 2100.8 183.79
## + x7    1      0.07 2101.0 183.79
## - x6    1    970.14 3071.3 197.25
## - x9    1   2672.06 4773.2 217.54
## 
## Step:  AIC=178.68
## y ~ x9 + x6 + x10
## 
##        Df Sum of Sq    RSS    AIC
## + x3    1    114.33 1765.6 177.79
## + x2    1    104.94 1775.0 178.03
## <none>              1879.9 178.68
## + x5    1     46.65 1833.3 179.52
## + x7    1     15.78 1864.2 180.29
## + x4    1      2.24 1877.7 180.62
## + x8    1      0.64 1879.3 180.66
## + x1    1      0.00 1879.9 180.68
## - x10   1    221.16 2101.1 181.79
## - x6    1    635.68 2515.6 190.07
## - x9    1   2876.51 4756.5 219.38
## 
## Step:  AIC=177.79
## y ~ x9 + x6 + x10 + x3
## 
##        Df Sum of Sq    RSS    AIC
## + x1    1    126.03 1639.6 176.38
## <none>              1765.6 177.79
## + x4    1     58.98 1706.6 178.23
## - x3    1    114.33 1879.9 178.68
## - x10   1    120.76 1886.4 178.83
## + x5    1     19.02 1746.6 179.29
## + x8    1     14.04 1751.6 179.42
## + x2    1      3.52 1762.1 179.70
## + x7    1      2.60 1763.0 179.72
## - x6    1    522.48 2288.1 187.72
## - x9    1   2812.94 4578.6 219.62
## 
## Step:  AIC=176.38
## y ~ x9 + x6 + x10 + x3 + x1
## 
##        Df Sum of Sq    RSS    AIC
## <none>              1639.6 176.38
## - x10   1     93.51 1733.1 176.94
## + x8    1     28.76 1610.8 177.57
## + x4    1     21.53 1618.1 177.78
## - x1    1    126.03 1765.6 177.79
## + x2    1      2.58 1637.0 178.31
## + x7    1      0.76 1638.8 178.36
## + x5    1      0.20 1639.4 178.38
## - x3    1    240.36 1879.9 180.68
## - x6    1    531.03 2170.6 187.29
## - x9    1   1679.84 3319.4 206.83
## 
## Call:
## lm(formula = y ~ x9 + x6 + x10 + x3 + x1)
## 
## Coefficients:
## (Intercept)           x9           x6          x10           x3  
##    85.18258     -0.33144      0.36800      0.01097     -0.41346  
##          x1  
##     0.90948

The AIC selected model is: lm(formula = y ~ x9 + x6 + x10 + x3 + x1)

BIC using stepwise

## Start:  AIC=204.26
## y ~ (x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10)
## 
##        Df Sum of Sq    RSS    AIC
## - x5    1      0.01 1561.6 200.43
## - x2    1      2.30 1563.9 200.50
## - x7    1     11.81 1573.4 200.77
## - x4    1     36.69 1598.3 201.50
## - x6    1     38.03 1599.6 201.53
## - x8    1     52.23 1613.8 201.94
## - x10   1     73.27 1634.8 202.54
## - x3    1    106.19 1667.8 203.45
## - x1    1    130.02 1691.6 204.11
## <none>              1561.6 204.26
## - x9    1    422.34 1983.9 211.44
## 
## Step:  AIC=200.43
## y ~ x1 + x2 + x3 + x4 + x6 + x7 + x8 + x9 + x10
## 
##        Df Sum of Sq    RSS    AIC
## - x2    1      2.30 1563.9 196.67
## - x7    1     13.13 1574.7 196.98
## - x4    1     36.68 1598.3 197.67
## - x8    1     52.30 1613.9 198.11
## - x6    1     59.61 1621.2 198.32
## - x10   1     83.25 1644.8 198.99
## - x3    1    107.58 1669.2 199.66
## <none>              1561.6 200.43
## - x1    1    137.28 1698.9 200.47
## + x5    1      0.01 1561.6 204.26
## - x9    1    443.14 2004.7 208.09
## 
## Step:  AIC=196.67
## y ~ x1 + x3 + x4 + x6 + x7 + x8 + x9 + x10
## 
##        Df Sum of Sq    RSS    AIC
## - x7    1     14.82 1578.7 193.27
## - x4    1     35.73 1599.6 193.88
## - x8    1     53.37 1617.2 194.38
## - x6    1     62.29 1626.2 194.63
## - x10   1     81.90 1645.8 195.19
## <none>              1563.9 196.67
## - x1    1    139.70 1703.6 196.77
## + x2    1      2.30 1561.6 200.43
## + x5    1      0.02 1563.9 200.50
## - x3    1    295.22 1859.1 200.79
## - x9    1    495.35 2059.2 205.50
## 
## Step:  AIC=193.27
## y ~ x1 + x3 + x4 + x6 + x8 + x9 + x10
## 
##        Df Sum of Sq    RSS    AIC
## - x4    1     32.13 1610.8 190.37
## - x8    1     39.37 1618.1 190.58
## - x10   1     70.18 1648.9 191.44
## - x6    1    118.58 1697.3 192.78
## - x1    1    126.08 1704.8 192.98
## <none>              1578.7 193.27
## + x7    1     14.82 1563.9 196.67
## + x2    1      3.99 1574.7 196.98
## + x5    1      2.30 1576.4 197.03
## - x3    1    297.23 1875.9 197.38
## - x9    1    538.99 2117.7 202.96
## 
## Step:  AIC=190.37
## y ~ x1 + x3 + x6 + x8 + x9 + x10
## 
##        Df Sum of Sq    RSS    AIC
## - x8    1     28.76 1639.6 187.36
## - x10   1     56.77 1667.6 188.13
## <none>              1610.8 190.37
## - x1    1    140.76 1751.6 190.40
## - x6    1    239.40 1850.2 192.91
## + x4    1     32.13 1578.7 193.27
## - x3    1    267.97 1878.8 193.62
## + x7    1     11.21 1599.6 193.88
## + x2    1      2.55 1608.3 194.13
## + x5    1      2.06 1608.8 194.14
## - x9    1    511.06 2121.9 199.22
## 
## Step:  AIC=187.36
## y ~ x1 + x3 + x6 + x9 + x10
## 
##        Df Sum of Sq    RSS    AIC
## - x10   1     93.51 1733.1 186.08
## - x1    1    126.03 1765.6 186.93
## <none>              1639.6 187.36
## - x3    1    240.36 1879.9 189.82
## + x8    1     28.76 1610.8 190.37
## + x4    1     21.53 1618.1 190.58
## + x2    1      2.58 1637.0 191.11
## + x7    1      0.76 1638.8 191.16
## + x5    1      0.20 1639.4 191.18
## - x6    1    531.03 2170.6 196.43
## - x9    1   1679.84 3319.4 215.97
## 
## Step:  AIC=186.08
## y ~ x1 + x3 + x6 + x9
## 
##        Df Sum of Sq    RSS    AIC
## <none>              1733.1 186.08
## - x1    1    153.28 1886.4 186.15
## + x10   1     93.51 1639.6 187.36
## + x8    1     65.50 1667.6 188.13
## + x5    1     18.78 1714.3 189.41
## + x7    1      5.81 1727.3 189.75
## + x4    1      5.41 1727.7 189.76
## + x2    1      2.09 1731.0 189.85
## - x3    1    363.20 2096.3 191.00
## - x6    1    873.67 2606.8 201.03
## - x9    1   1588.15 3321.2 212.17
## 
## Call:
## lm(formula = y ~ x1 + x3 + x6 + x9)
## 
## Coefficients:
## (Intercept)           x1           x3           x6           x9  
##     75.0453       0.9967      -0.4871       0.4310      -0.3155

The BIC selected model is: lm(formula = y ~ x1 + x3 + x6 + x9)

The BIC model has one less independent variable and a different order than the AIC model.

  1. Compare the standard errors of the estimated Bs in the model selected by BIC to those in the full model fit in Exercise 13.54. Was there an increase or a decrease in the standard errors of the estimated Bs? Full Model
## 
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + 
##     x10)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.756  -2.061   0.221   2.948  12.698 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  33.789953 118.676140   0.285  0.77753   
## x1            1.349042   0.790245   1.707  0.09666 . 
## x2           -0.252103   1.111376  -0.227  0.82187   
## x3           -0.461898   0.299406  -1.543  0.13189   
## x4            0.529305   0.583671   0.907  0.37068   
## x5            0.013451   0.781837   0.017  0.98637   
## x6            0.194243   0.210396   0.923  0.36221   
## x7            0.563309   1.094915   0.514  0.61015   
## x8            0.505484   0.467209   1.082  0.28669   
## x9           -0.469216   0.152507  -3.077  0.00405 **
## x10           0.011747   0.009166   1.282  0.20844   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.68 on 35 degrees of freedom
## Multiple R-squared:  0.838,  Adjusted R-squared:  0.7918 
## F-statistic: 18.11 on 10 and 35 DF,  p-value: 4.976e-11

BIC Model

## 
## Call:
## lm(formula = y ~ x1 + x3 + x6 + x9)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.924  -2.191  -0.190   3.696  13.342 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 75.04531   43.31738   1.732   0.0907 .  
## x1           0.99673    0.52343   1.904   0.0639 .  
## x3          -0.48709    0.16617  -2.931   0.0055 ** 
## x6           0.43098    0.09480   4.546 4.76e-05 ***
## x9          -0.31551    0.05147  -6.130 2.83e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.502 on 41 degrees of freedom
## Multiple R-squared:  0.8203, Adjusted R-squared:  0.8027 
## F-statistic: 46.78 on 4 and 41 DF,  p-value: 9.357e-15

Variable FULL Model SError BIC Model SError x1 0.790245 0.52343 x3 0.299406 0.16617 x6 0.210396 0.09480 x9 0.152507 0.05147

One can see that there was a decrease in the standard errors of the coefficients from the FULL model to the BIC model.

  1. Produce a 95% confidence interval on the average evaporation for the values of the explanatory variables given in Exercise 13.54. Was there a large difference in the two point estimators? Compare the widths of the two intervals.
##        fit      lwr      upr
## 1 52.16445 37.59044 66.73846
## [1] 29.14802
## [1] 39.81358
## [1] 10.66556

The 95% confidence interval for the average values is from 37.59044 to 66.73846, having a width of 29.14802. The point estimator is 52.16445. The previous 95% confidence interval for specific values of the variables is from 32.25766 to 72.07124, having a width of 39.81358. The point estimator is 52.16445. The difference between both intervals is 10.66556. Finding a confidence interval looking for specific value will always be greater. There was no difference in the point estimator.

13.57 Refer to Exercise 13.54. The agronomist is concerned that there may be a distinct difference between the models for land in West Texas and for land in East Texas. Observations 1-23 are data values from East Texas and 24-46 are from West Texas. a. At the alpha = .05 level, are there differences between the models for the two regions? East Texas Model

## 
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + 
##     x10)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3166 -2.0323  0.8094  2.0474  8.6761 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 218.79505  163.07246   1.342   0.2045  
## x1            0.59270    0.88770   0.668   0.5170  
## x2           -0.90811    1.23914  -0.733   0.4777  
## x3           -0.17982    0.31606  -0.569   0.5799  
## x4            1.37167    0.96221   1.426   0.1795  
## x5           -0.25658    0.97847  -0.262   0.7976  
## x6            0.10949    0.26955   0.406   0.6918  
## x7           -1.89590    1.55738  -1.217   0.2469  
## x8            0.38836    0.83701   0.464   0.6510  
## x9           -0.28217    0.24242  -1.164   0.2671  
## x10           0.01802    0.01007   1.790   0.0987 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.885 on 12 degrees of freedom
## Multiple R-squared:  0.9289, Adjusted R-squared:  0.8696 
## F-statistic: 15.68 on 10 and 12 DF,  p-value: 2.112e-05

West Texas Model

## 
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + 
##     x10)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1322 -1.3352  0.3272  0.9545  8.6838 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -439.88251  153.17460  -2.872 0.014044 *  
## x1             3.69682    0.83555   4.424 0.000829 ***
## x2            -0.52349    1.22672  -0.427 0.677126    
## x3            -0.60663    0.33816  -1.794 0.098038 .  
## x4             3.46894    0.77831   4.457 0.000783 ***
## x5            -0.06594    0.94702  -0.070 0.945632    
## x6             0.08710    0.18639   0.467 0.648647    
## x7             0.87672    0.91059   0.963 0.354648    
## x8             0.79402    0.40029   1.984 0.070653 .  
## x9            -0.45562    0.14901  -3.058 0.009941 ** 
## x10            0.04056    0.01423   2.850 0.014617 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.054 on 12 degrees of freedom
## Multiple R-squared:  0.9561, Adjusted R-squared:  0.9196 
## F-statistic: 26.16 on 10 and 12 DF,  p-value: 1.284e-06

The West Texas model has a higher adjusted R squared, lower overall p value, more variables with lower p values, lower residual standard error. If the data is split between the 2 areas of Texas, than the full regression model is more appropriate for West Texas.

  1. For each of the two regions, produce a 95% confidence interval on the average evaporation for the values of the explanatory variables given in Exercise 13.54. East Texas model
##        fit      lwr      upr
## 1 38.94039 22.49648 55.38431
## [1] 32.88783

East Texas model 95% confidence interval for the average values is from 22.49648 to 55.38431.

West Texas model

##       fit      lwr      upr
## 1 33.8854 11.87099 55.89981
## [1] 44.02882

West Texas model 95% confidence interval for the average values is from 11.87099 to 55.89981.

  1. Was the there a large difference in the point estimators for the two regions? Compare the widths of the intervals for the two regions. The East Texas interval has a width of 32.88783. The point estimator is 38.94039. The West Texas interval has a width of 44.02882. The point estimator is 33.8854. Therefore, the West Texas interval is wider and the point estimator is lower with a noticable difference.