During periods of high electricity demand, especially during the hot summer months, the power output from a gas turbine engine can drop dramatically. One way to counter this drop in power is by cooling the inlet air to the gas turbine. An increasingly popular cooling method uses high pressure inlet fogging. The performance of a sample of 67 gas turbines augmented with high pressure inlet fogging was investigated in the Journal of Engineering for Gas Turbines and Power (January 2005). One measure of performance is heat rate (kilojoules per kilowatt per hour). Heat rates for the 67 gas turbines, saved in the gasturbine file. Read some of the engineering applications from this journal here:
gasturbine<-read.delim("https://raw.githubusercontent.com/kvaranyak4/STAT3220/main/GASTURBINE.txt")
head(gasturbine)
names(gasturbine)
[1] "ENGINE" "SHAFTS" "RPM" "CPRATIO" "INLET.TEMP"
[6] "EXH.TEMP" "AIRFLOW" "POWER" "HEATRATE"
gasturbine$log.CPRATIO<-log(gasturbine$CPRATIO)
gasturbine$log.AIRFLOW<-log(gasturbine$AIRFLOW)
gasturbine$log.POWER<-log(gasturbine$POWER)
hist(gasturbine$HEATRATE, xlab="Heat Rate", main="Histogram of Heat Rate")
The distribution of the response variable, heat rate, is unimodal and skewed right. It is continuous, so it should still be suitable for regression.
Recall (From Unit 2.4)
#updated model: REMOVE LOG.POWER
gasmod3<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP+log.AIRFLOW,data=gasturbine)
gasmod3vif<-round(vif(gasmod3),3)
gasmod3vif
RPM INLET.TEMP EXH.TEMP log.AIRFLOW
10.057 3.575 3.334 10.683
mean(gasmod3vif)
[1] 6.91225
#updated model: REMOVE LOG.AIRFLOW
gasmod4<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP+log.POWER,data=gasturbine)
gasmod4vif<-round(vif(gasmod4),3)
gasmod4vif
RPM INLET.TEMP EXH.TEMP log.POWER
9.777 3.676 3.271 12.128
mean(gasmod4vif)
[1] 7.213
#updated model: REMOVE BOTH
gasmod5<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP,data=gasturbine)
gasmod5vif<-round(vif(gasmod5),3)
gasmod5vif
RPM INLET.TEMP EXH.TEMP
1.727 3.570 2.551
mean(gasmod5vif)
[1] 2.616
It appears the model without power and airflow have resolved the issue of severe multicollinearity. HOWEVER, we may have also decided to remove a combination of RPM and power/airflow, as that also had a strong relationship. As a researcher, or with your client, you may discuss which of those three variables would be most important to analyze moving forward. Here, we will remove the transformations as we want the most least complex model.
Starting with the subset of predictors: RPM, INLET.TEMP,and EXH.TEMP
#updated model
gasmod5<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP,data=gasturbine)
summary(gasmod5)
Call:
lm(formula = HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP, data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-1025.8 -297.9 -115.3 225.8 1425.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.436e+04 7.333e+02 19.582 < 2e-16 ***
RPM 1.051e-01 1.071e-02 9.818 2.55e-14 ***
INLET.TEMP -9.223e+00 7.869e-01 -11.721 < 2e-16 ***
EXH.TEMP 1.243e+01 2.071e+00 6.000 1.06e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 465 on 63 degrees of freedom
Multiple R-squared: 0.9189, Adjusted R-squared: 0.915
F-statistic: 237.9 on 3 and 63 DF, p-value: < 2.2e-16
Try adding qualitative variable Engine
#updated model
gasmod6<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP+ENGINE,data=gasturbine)
summary(gasmod6)
Call:
lm(formula = HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + ENGINE,
data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-1030.37 -247.21 -70.62 258.72 1400.82
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.607e+04 1.262e+03 12.732 < 2e-16 ***
RPM 1.077e-01 1.198e-02 8.991 8.85e-13 ***
INLET.TEMP -9.899e+00 9.374e-01 -10.561 2.12e-15 ***
EXH.TEMP 1.116e+01 2.202e+00 5.068 4.00e-06 ***
ENGINEAeroderiv -4.404e+02 2.870e+02 -1.534 0.130
ENGINETraditional -3.667e+02 2.233e+02 -1.642 0.106
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 461.2 on 61 degrees of freedom
Multiple R-squared: 0.9227, Adjusted R-squared: 0.9164
F-statistic: 145.6 on 5 and 61 DF, p-value: < 2.2e-16
Perform nested F test to determine if Engine type is significant.
#anova(reduced,complete)
anova(gasmod5,gasmod6)
For the sake of this example, we will not consider any further testing.
We have covered this individually and will come back to it in a complete example at the end of the unit.
We have covered this individually and will come back to it in a complete example at the end of the unit.
First we will evaluate model assumptions
#There are a few options to view plots for assumptions
#store residuals from the model
gasres<-residuals(gasmod5)
sum(gasres)
[1] 3.552714e-14
mean(gasres)
[1] 5.140995e-16
#Residuals Plots of explanatory variables vs residuals
residualPlots(gasmod5,tests=F)
#Residual vs Fitted and QQ plot
plot(gasmod5, which=c(1,2))
#histogram of residuals
hist(gasres)
Due to rounding errors within R we see the mean and sum of the residuals are not quite zero.
Add your notes for each assumption:
Then we will identify outliers and influential observations.
#Cooks Distance Thresholds
plot(gasmod5,which=4)
#Leverage vs Studentized Residuals
influencePlot(gasmod5,fill=F)
#Deleted Studentized Residuals vs Predicted values
ols_plot_resid_stud_fit(gasmod5)
Label the observations that exceed the threshold for each statistic:
After identifying the observations that exceed one or more of the thresholds above, we investigate further into their DFFits and DFBetas.
#We can use various functions to store and view these statistics for all observations
# We will not print these outputs for every observation into reports
rstudent(gasmod5)
1 2 3 4 5 6
0.893217621 1.317528848 -0.692830495 -1.629043379 -0.319227791 -0.220252658
7 8 9 10 11 12
-0.233471856 0.374990192 -0.341559467 -0.680104090 3.389504856 2.116744668
13 14 15 16 17 18
-0.253968278 0.011011147 0.920546030 -0.303301225 -0.707442804 -0.947506151
19 20 21 22 23 24
-0.649930306 0.488456104 1.541375696 0.493725647 -0.528770732 -0.649760795
25 26 27 28 29 30
-0.295878983 -0.443466676 -0.856333719 -0.004877388 -0.196240609 -0.795751546
31 32 33 34 35 36
-0.837057024 -2.367042399 0.363745397 0.012398052 -1.128484921 2.749086328
37 38 39 40 41 42
0.224875075 0.043825526 -1.568397362 -0.516762527 0.288577835 -0.286838938
43 44 45 46 47 48
-0.454237853 -0.520022587 1.285279709 -0.257821207 2.942194529 1.259842609
49 50 51 52 53 54
-0.344977609 0.691549153 -0.326595642 0.658329323 1.297334067 0.046125589
55 56 57 58 59 60
0.458976459 -0.683477727 -0.209983135 -0.860119314 0.428369068 -1.098019290
61 62 63 64 65 66
0.548011700 -1.431960144 -0.794961623 0.722757693 -0.538743015 0.683098419
67
-0.452855099
hatvalues(gasmod5)
1 2 3 4 5 6 7
0.18485774 0.08148766 0.04852153 0.04180664 0.02953543 0.02090636 0.02675674
8 9 10 11 12 13 14
0.03347980 0.03328687 0.02797014 0.04627741 0.03905134 0.06132261 0.03904983
15 16 17 18 19 20 21
0.09447390 0.03578988 0.04966096 0.05602507 0.03200633 0.04489513 0.03346492
22 23 24 25 26 27 28
0.02275577 0.03188287 0.04122101 0.02458622 0.03941415 0.03917409 0.10819365
29 30 31 32 33 34 35
0.08678043 0.03123228 0.03193504 0.06782971 0.06092269 0.03689212 0.04035201
36 37 38 39 40 41 42
0.14870478 0.05221727 0.06619795 0.04114569 0.03406287 0.04011764 0.04930437
43 44 45 46 47 48 49
0.08149383 0.05111380 0.08925279 0.07719304 0.02446354 0.02446354 0.05725267
50 51 52 53 54 55 56
0.06910879 0.04056331 0.07048539 0.04601525 0.03741620 0.02774057 0.03778892
57 58 59 60 61 62 63
0.04257101 0.04879281 0.06952830 0.03789756 0.20397249 0.18110748 0.04996695
64 65 66 67
0.27295893 0.11306102 0.05760075 0.03264216
cooks.distance(gasmod5)
1 2 3 4 5 6
4.537897e-02 3.805605e-02 6.170622e-03 2.820616e-02 7.865760e-04 2.629333e-04
7 8 9 10 11 12
3.803540e-04 1.234573e-03 1.018545e-03 3.356037e-03 1.194756e-01 4.313782e-02
13 14 15 16 17 18
1.069302e-03 1.251617e-06 2.215619e-02 8.661275e-04 6.590471e-03 1.334232e-02
19 20 21 22 23 24
3.524009e-03 2.838050e-03 2.012551e-02 1.436298e-03 2.328622e-03 4.579823e-03
25 26 27 28 29 30
5.597668e-04 2.043391e-03 7.506244e-03 7.331528e-07 9.290596e-04 5.133514e-03
31 32 33 34 35 36
5.806060e-03 9.498435e-02 2.175884e-03 1.495731e-06 1.332918e-02 2.989222e-01
37 38 39 40 41 42
7.071683e-04 3.458752e-05 2.579141e-02 2.381973e-03 8.829767e-04 1.082513e-03
43 44 45 46 47 48
4.635059e-03 3.684404e-03 4.005790e-02 1.411003e-03 4.838904e-02 9.858692e-03
49 50 51 52 53 54
1.832476e-03 8.950190e-03 1.143616e-03 8.290733e-03 2.007800e-02 2.100772e-05
55 56 57 58 59 60
1.521703e-03 4.625639e-03 4.976870e-04 9.526546e-03 3.472962e-03 1.183408e-02
61 62 63 64 65 66
1.945420e-02 1.115140e-01 8.358358e-03 4.940477e-02 9.354983e-03 7.191049e-03
67
1.752125e-03
dffits(gasmod5)
1 2 3 4 5 6
0.425362695 0.392431237 -0.156456917 -0.340273949 -0.055690680 -0.032184608
7 8 9 10 11 12
-0.038711537 0.069792041 -0.063380257 -0.115367354 0.746637455 0.426713445
13 14 15 16 17 18
-0.064912990 0.002219687 0.297338371 -0.058434377 -0.161718462 -0.230830397
19 20 21 22 23 24
-0.118181220 0.105900892 0.286810219 0.075340741 -0.095958207 -0.134726673
25 26 27 28 29 30
-0.046974865 -0.089829495 -0.172910016 -0.001698842 -0.060493984 -0.142879357
31 32 33 34 35 36
-0.152032537 -0.638511006 0.092648067 0.002426514 -0.231404802 1.148975438
37 38 39 40 41 42
0.052782991 0.011668687 -0.324894209 -0.097041437 0.058995878 -0.065322088
43 44 45 46 47 48
-0.135302132 -0.120693600 0.402355212 -0.074567961 0.465917825 0.199505207
49 50 51 52 53 54
-0.085014131 0.188425741 -0.067153544 0.181286164 0.284925934 0.009093948
55 56 57 58 59 60
0.077527720 -0.135447627 -0.044278010 -0.194804528 0.117097421 -0.217923939
61 62 63 64 65 66
0.277403278 -0.673419464 -0.182313250 0.442855356 -0.192349579 0.168880642
67
-0.083186942
dfbetas(gasmod5)
(Intercept) RPM INLET.TEMP EXH.TEMP
1 -0.2543877689 0.2702354892 -0.0179373762 0.1674385619
2 0.3169951733 0.0722423682 0.0181069528 -0.2216629859
3 0.0350496972 -0.1276702220 -0.0614516574 0.0338096397
4 -0.2663107890 -0.0117487442 0.0237567805 0.1466728252
5 0.0196958361 -0.0239481520 0.0056188036 -0.0177883757
6 -0.0148881596 0.0037126094 -0.0068511914 0.0138466510
7 -0.0191213854 -0.0003158663 -0.0149794844 0.0236436589
8 0.0470740849 -0.0302973389 -0.0012477769 -0.0252372918
9 -0.0283704899 0.0455909083 0.0292700817 -0.0114556375
10 -0.0499094478 0.0732438446 0.0313107132 -0.0032886764
11 0.0146396925 0.3343853805 -0.1771736110 0.1326314944
12 0.1232879232 -0.1253086407 -0.3120523904 0.2018511581
13 0.0372577985 -0.0498266103 -0.0259600958 -0.0006396223
14 0.0017823141 -0.0001860978 0.0001692894 -0.0012330015
15 0.1977245682 -0.2021856920 -0.2216410817 0.0761575306
16 -0.0093013852 0.0388802988 0.0411335435 -0.0331455805
17 -0.0471210539 0.1240764064 0.1202542833 -0.0834658159
18 -0.0494525368 0.1810355233 0.1754822278 -0.1340271108
19 -0.0010742113 0.0801707160 0.0566949872 -0.0570467492
20 0.0008172527 0.0803922302 0.0696819070 -0.0618691555
21 0.1187745030 -0.2061358762 -0.1569317982 0.0785190805
22 0.0428244933 -0.0272943223 -0.0048392793 -0.0189445851
23 -0.0400212720 0.0679344585 0.0488067593 -0.0230029314
24 -0.0217465205 0.0997861868 0.0923538285 -0.0746654188
25 -0.0198922461 0.0257376418 0.0079269807 0.0026016883
26 -0.0537032798 0.0633301465 0.0408688041 -0.0064580429
27 -0.0457276255 0.1323798380 0.1082248136 -0.0759656482
28 -0.0002085654 -0.0003846534 0.0009064725 -0.0006180903
29 -0.0488622090 -0.0085233208 0.0048446371 0.0276434498
30 -0.0313198381 -0.0467032435 0.0332095017 -0.0082556456
31 -0.0309523904 -0.0909612222 -0.0329026308 0.0498176640
32 -0.1931169238 -0.4379105485 -0.4048748978 0.4843855487
33 0.0737598798 0.0073407101 0.0255699763 -0.0682455525
34 0.0017093348 0.0002373894 0.0006738580 -0.0016300107
35 -0.1135493407 -0.1005506429 -0.1155057555 0.1720134599
36 0.4159486523 -0.8478462527 -1.0529590426 0.6883629665
37 0.0268483526 -0.0399948767 -0.0364759545 0.0172814081
38 0.0084380646 -0.0082805629 -0.0065775191 0.0008495385
39 -0.2454907677 -0.0399489002 -0.0450242507 0.1928232646
40 0.0170560223 -0.0339095604 -0.0699575479 0.0463968659
41 -0.0348131223 -0.0146242346 -0.0118366061 0.0358625427
42 0.0407190770 0.0172278269 0.0079654353 -0.0367484298
43 0.0995720234 0.0284920021 0.0215456293 -0.0889537913
44 0.0716272593 0.0386398233 0.0191398725 -0.0696826252
45 -0.1217605776 0.1276654686 0.3324443647 -0.1981042737
46 0.0565912155 0.0012618619 -0.0154921335 -0.0262649022
47 -0.1617510315 0.1304506996 0.2378441134 -0.0829745042
48 -0.0692615120 0.0558587640 0.1018444381 -0.0355295392
49 0.0570234576 0.0198389084 0.0097739121 -0.0496148764
50 -0.1149315241 0.0389499084 0.1092792057 -0.0136862294
51 0.0367650040 0.0105587319 -0.0119780590 -0.0172472463
52 -0.0931367393 0.0387876514 0.1192455431 -0.0365763464
53 0.0118140083 0.0849943750 0.2219197485 -0.1910320807
54 -0.0058482057 -0.0008127183 0.0002289599 0.0040314977
55 -0.0103240501 -0.0063519200 0.0335033940 -0.0176457828
56 0.0660508365 0.0010923255 -0.0527086876 -0.0042019584
57 0.0268198360 0.0039992886 -0.0088200040 -0.0120370028
58 0.0744459021 0.1050025785 0.0891544108 -0.1373239192
59 -0.0692634855 0.0257390261 0.0707605276 -0.0124353092
60 0.1099146775 0.0638837819 0.0032344382 -0.0874225183
61 -0.0201404718 0.1856806933 -0.0271551865 0.0248305889
62 0.2369879424 -0.4289108189 0.0899488620 -0.2068438091
63 0.0208192376 -0.1098873963 0.0117639805 -0.0198152615
64 0.1723990277 0.1463708012 0.3612199728 -0.4209772281
65 -0.1271968460 -0.0118060343 -0.1041312850 0.1680582067
66 -0.0799292230 0.1375459933 0.0898178147 -0.0286093893
67 0.0032007185 -0.0184425594 0.0334306140 -0.0313356072
# Instead we can subset by the observations of interest
dffits(gasmod5)[c(11,36,61,64)]
11 36 61 64
0.7466375 1.1489754 0.2774033 0.4428554
summary(dffits(gasmod5))
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.67342 -0.13501 -0.05569 0.01663 0.11150 1.14898
dfbetas(gasmod5)[c(11,36,61,64),]
(Intercept) RPM INLET.TEMP EXH.TEMP
11 0.01463969 0.3343854 -0.17717361 0.13263149
36 0.41594865 -0.8478463 -1.05295904 0.68836297
61 -0.02014047 0.1856807 -0.02715519 0.02483059
64 0.17239903 0.1463708 0.36121997 -0.42097723
summary(dfbetas(gasmod5))
(Intercept) RPM INLET.TEMP
Min. :-0.2663108 Min. :-0.847846 Min. :-1.052959
1st Qu.:-0.0479916 1st Qu.:-0.025621 1st Qu.:-0.016715
Median :-0.0002086 Median : 0.003999 Median : 0.007927
Mean : 0.0013664 Mean :-0.001314 Mean :-0.001471
3rd Qu.: 0.0417718 3rd Qu.: 0.070088 3rd Qu.: 0.052751
Max. : 0.4159486 Max. : 0.334385 Max. : 0.361220
EXH.TEMP
Min. :-0.4209772
1st Qu.:-0.0533308
Median :-0.0124353
Mean : 0.0005346
3rd Qu.: 0.0242371
Max. : 0.6883630
Variable transformation: We can directly use a function of the response variable within lm() or add a new variable to the table
# directly transform in the lm() function
gasmod5sqrt<-lm(sqrt(HEATRATE)~RPM+INLET.TEMP+EXH.TEMP,data=gasturbine)
summary(gasmod5sqrt)
Call:
lm(formula = sqrt(HEATRATE) ~ RPM + INLET.TEMP + EXH.TEMP, data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-4.4872 -1.3479 -0.5044 0.9321 6.2759
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.218e+02 3.346e+00 36.392 < 2e-16 ***
RPM 4.640e-04 4.886e-05 9.496 9.05e-14 ***
INLET.TEMP -4.367e-02 3.590e-03 -12.163 < 2e-16 ***
EXH.TEMP 5.706e-02 9.449e-03 6.039 9.12e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.122 on 63 degrees of freedom
Multiple R-squared: 0.9206, Adjusted R-squared: 0.9168
F-statistic: 243.4 on 3 and 63 DF, p-value: < 2.2e-16
# OR Create new variable
gasturbine$sqrty<-sqrt(gasturbine$HEATRATE)
#remember to remove original response heatrate
gasmodsqrt2<-lm(sqrty~RPM+INLET.TEMP+EXH.TEMP,data=gasturbine)
summary(gasmodsqrt2)
Call:
lm(formula = sqrty ~ RPM + INLET.TEMP + EXH.TEMP, data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-4.4872 -1.3479 -0.5044 0.9321 6.2759
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.218e+02 3.346e+00 36.392 < 2e-16 ***
RPM 4.640e-04 4.886e-05 9.496 9.05e-14 ***
INLET.TEMP -4.367e-02 3.590e-03 -12.163 < 2e-16 ***
EXH.TEMP 5.706e-02 9.449e-03 6.039 9.12e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.122 on 63 degrees of freedom
Multiple R-squared: 0.9206, Adjusted R-squared: 0.9168
F-statistic: 243.4 on 3 and 63 DF, p-value: < 2.2e-16
plot(gasmod5sqrt,which=c(1,2))
Removing observations from analysis by subsetting the data
#We can remove particular observations
subsetgas<-gasturbine[-c(11,36,61,64),]
gasmod5rem<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP,data=subsetgas)
summary(gasmod5rem)
Call:
lm(formula = HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP, data = subsetgas)
Residuals:
Min 1Q Median 3Q Max
-1005.32 -242.35 -89.03 208.32 1273.08
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.404e+04 6.640e+02 21.146 < 2e-16 ***
RPM 1.068e-01 1.058e-02 10.098 1.78e-14 ***
INLET.TEMP -8.334e+00 8.352e-01 -9.979 2.79e-14 ***
EXH.TEMP 1.095e+01 2.190e+00 5.000 5.44e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 406.4 on 59 degrees of freedom
Multiple R-squared: 0.9157, Adjusted R-squared: 0.9114
F-statistic: 213.7 on 3 and 59 DF, p-value: < 2.2e-16
influencePlot(gasmod5rem,fill=F)
We can cycle through a combination of variable transformations and subsetting to find a good model fit.
Penguins Revisited-
Consider our final model from Unit 2
\(E(bodymass)=\beta_0+\beta_1BillDep+\beta_2BillLen+\beta_3FlipperLen+\beta_4SpeciesC+\beta_5SpeciesG+\beta_6SexM+\beta_7BillDep*BillLen\)
First fit the model and store it as “penmodfinal”
Is the mean zero assumption violated? How do you know?
Is the constant variance assumption violated? How do you know?
Is the normality assumptions violated? How do you know?
Is the independence assumption violated? How do you know?
If you noted any violations, attempt to correct them.