During periods of high electricity demand, especially during the hot summer months, the power output from a gas turbine engine can drop dramatically. One way to counter this drop in power is by cooling the inlet air to the gas turbine. An increasingly popular cooling method uses high pressure inlet fogging. The performance of a sample of 67 gas turbines augmented with high pressure inlet fogging was investigated in the Journal of Engineering for Gas Turbines and Power (January 2005). One measure of performance is heat rate (kilojoules per kilowatt per hour). Heat rates for the 67 gas turbines, saved in the gasturbine file.
Check the appropriateness of response variable for regression: View a histogram of response variable. It should be continuous, and approximately unimodal and symmetric, with few outliers.
gasturbine<-read.delim("https://raw.githubusercontent.com/kvaranyak4/STAT3220/main/GASTURBINE.txt")
head(gasturbine)
names(gasturbine)
[1] "ENGINE" "SHAFTS" "RPM" "CPRATIO" "INLET.TEMP"
[6] "EXH.TEMP" "AIRFLOW" "POWER" "HEATRATE"
hist(gasturbine$HEATRATE, xlab="Heat Rate", main="Histogram of Heat Rate")
The distribution of the response variable, heat rate, is unimodal and skewed right. It is continuous, so it should still be suitable for regression.
We will explore the relationship with quantitative variables with scatter plots and correlations and classify each relationship as linear, curvilinear, or none. We explore the box plots and means for each qualitative variable explanatory variable then classify the relationships as existent or not. We will not explore interactions in this example.
#Scatter plots for quantitative variables
for (i in names(gasturbine)[3:8]) {
plot((gasturbine[,i]), gasturbine$HEATRATE,xlab=i,ylab="Heat Rate")
}
#Attempt Transformations
plot(log(gasturbine$RPM), gasturbine$HEATRATE,ylab="Heat Rate")
plot(log(gasturbine$CPRATIO), gasturbine$HEATRATE,ylab="Heat Rate")
plot(log(gasturbine$POWER), gasturbine$HEATRATE,ylab="Heat Rate")
plot(log(gasturbine$AIRFLOW), gasturbine$HEATRATE,ylab="Heat Rate")
#Correlations for quantitative variables
round(cor(gasturbine[3:8],gasturbine$HEATRATE,use="complete.obs"),3)
[,1]
RPM 0.844
CPRATIO -0.735
INLET.TEMP -0.801
EXH.TEMP -0.314
AIRFLOW -0.703
POWER -0.697
round(cor(log(gasturbine[3:8]),gasturbine$HEATRATE,use="complete.obs"),3)
[,1]
RPM 0.810
CPRATIO -0.818
INLET.TEMP -0.813
EXH.TEMP -0.303
AIRFLOW -0.840
POWER -0.871
#Summary Statistics for response variable grouped by each level of the response
tapply(gasturbine$HEATRATE,gasturbine$ENGINE,summary)
$Advanced
Min. 1st Qu. Median Mean 3rd Qu. Max.
9105 9295 9669 9764 9933 11588
$Aeroderiv
Min. 1st Qu. Median Mean 3rd Qu. Max.
8714 10708 12414 12312 13697 16243
$Traditional
Min. 1st Qu. Median Mean 3rd Qu. Max.
10086 10598 11183 11544 11956 14796
tapply(gasturbine$HEATRATE,gasturbine$SHAFTS,summary)
$`1`
Min. 1st Qu. Median Mean 3rd Qu. Max.
9105 9918 10592 10930 11674 14796
$`2`
Min. 1st Qu. Median Mean 3rd Qu. Max.
10951 11223 11654 12536 13232 16243
$`3`
Min. 1st Qu. Median Mean 3rd Qu. Max.
8714 8903 9092 9092 9280 9469
#Box plots for Qualitative
boxplot(HEATRATE~ENGINE,gasturbine, ylab="Heat Rate")
boxplot(HEATRATE~SHAFTS,gasturbine, ylab="Heat Rate")
# Summary counts for qualitative variables
table(gasturbine$ENGINE,gasturbine$SHAFTS)
1 2 3
Advanced 21 0 0
Aeroderiv 1 4 2
Traditional 35 4 0
Do any of the explanatory variables have relationships with each other? We will look at pairwise correlations and VIF to evaluate multicollinearity in the quantitative explanatory variables.
#Store the transformations in the data table
gasturbine$log.CPRATIO<-log(gasturbine$CPRATIO)
gasturbine$log.AIRFLOW<-log(gasturbine$AIRFLOW)
gasturbine$log.POWER<-log(gasturbine$POWER)
#Regular correlation
# Use the correlation function with no second argument
# to find the correlations with the explanatory variables with each other
gasturcor<-round(cor(gasturbine[,c(3,5,6,10:12)]),4)
gasturcor
RPM INLET.TEMP EXH.TEMP log.CPRATIO log.AIRFLOW log.POWER
RPM 1.0000 -0.5536 -0.1715 -0.5819 -0.9177 -0.9083
INLET.TEMP -0.5536 1.0000 0.7283 0.7440 0.6649 0.7221
EXH.TEMP -0.1715 0.7283 1.0000 0.1712 0.4068 0.4515
log.CPRATIO -0.5819 0.7440 0.1712 1.0000 0.5308 0.5757
log.AIRFLOW -0.9177 0.6649 0.4068 0.5308 1.0000 0.9964
log.POWER -0.9083 0.7221 0.4515 0.5757 0.9964 1.0000
#Correlation Visualization
# The argument is the stored correlation values from the cor() function
corrplot(gasturcor)
# Scatter plot matrix
plot(gasturbine[c(3,5,6,10:12)])
#A new correlation function
# this will also output the associated p-values with Ho: rho=0
rcorr(as.matrix(gasturbine[,c(3,5,6,10:12)]))
RPM INLET.TEMP EXH.TEMP log.CPRATIO log.AIRFLOW log.POWER
RPM 1.00 -0.55 -0.17 -0.58 -0.92 -0.91
INLET.TEMP -0.55 1.00 0.73 0.74 0.66 0.72
EXH.TEMP -0.17 0.73 1.00 0.17 0.41 0.45
log.CPRATIO -0.58 0.74 0.17 1.00 0.53 0.58
log.AIRFLOW -0.92 0.66 0.41 0.53 1.00 1.00
log.POWER -0.91 0.72 0.45 0.58 1.00 1.00
n= 67
P
RPM INLET.TEMP EXH.TEMP log.CPRATIO log.AIRFLOW log.POWER
RPM 0.0000 0.1653 0.0000 0.0000 0.0000
INLET.TEMP 0.0000 0.0000 0.0000 0.0000 0.0000
EXH.TEMP 0.1653 0.0000 0.1660 0.0006 0.0001
log.CPRATIO 0.0000 0.0000 0.1660 0.0000 0.0000
log.AIRFLOW 0.0000 0.0000 0.0006 0.0000 0.0000
log.POWER 0.0000 0.0000 0.0001 0.0000 0.0000
There is concern of strong pairwise relationships with several pairs of variables.
#Multicollinearity VIF
# First fit the model with all of the quantitative variables
gasmod1<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP+log.CPRATIO+log.AIRFLOW+log.POWER,data=gasturbine)
summary(gasmod1)
Call:
lm(formula = HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + log.CPRATIO +
log.AIRFLOW + log.POWER, data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-337.73 -139.16 -60.72 75.76 735.19
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.312e+04 1.984e+03 21.737 < 2e-16 ***
RPM 6.700e-02 1.202e-02 5.575 6.22e-07 ***
INLET.TEMP 1.036e+00 9.487e-01 1.092 0.279
EXH.TEMP 1.326e+01 1.599e+00 8.291 1.56e-11 ***
log.CPRATIO 1.578e+02 2.534e+02 0.623 0.536
log.AIRFLOW 7.625e+03 5.632e+02 13.539 < 2e-16 ***
log.POWER -7.433e+03 5.247e+02 -14.166 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 213.7 on 60 degrees of freedom
Multiple R-squared: 0.9837, Adjusted R-squared: 0.9821
F-statistic: 603 on 6 and 60 DF, p-value: < 2.2e-16
#store and round the VIF values
gasmod1vif<-round(vif(gasmod1),3)
gasmod1vif
RPM INLET.TEMP EXH.TEMP log.CPRATIO log.AIRFLOW log.POWER
10.300 24.579 7.203 7.086 1085.142 1225.107
#find the average VIF
mean(gasmod1vif)
[1] 393.2362
Yes, there is evidence of severe multicollinearity because all VIFs are MUCH greater than 10 and the average VIF is MUCH greater than 3 at 393.2362.
Because we have quite a few variables and severe multicollinearity, we need to address that. It is not clear from EDA what variables should remain and which variables should be removed.
We will use variable selection procedures to narrow down our quantitative variables to a best set of predictors. We will use the entry and remain significance levels of 0.15
# Note: Details=T gives all of the steps within each process
# In most cases, you can set Details=F
# backwards elimination
#Default: prem = 0.3
ols_step_backward_p(gasmod1,p_rem=0.15,details=T)
Backward Elimination Method
---------------------------
Candidate Terms:
1. RPM
2. INLET.TEMP
3. EXH.TEMP
4. log.CPRATIO
5. log.AIRFLOW
6. log.POWER
Step => 0
Model => HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + log.CPRATIO + log.AIRFLOW + log.POWER
R2 => 0.984
Initiating stepwise selection...
Step => 1
Removed => log.CPRATIO
Model => HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + log.AIRFLOW + log.POWER
R2 => 0.98358
No more variables to be removed.
Variables Removed:
=> log.CPRATIO
Stepwise Summary
--------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
--------------------------------------------------------------------------
0 Full Model 917.568 935.205 729.036 0.98369 0.98206
1 log.CPRATIO 916.000 931.433 727.152 0.98358 0.98224
--------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
-------------------------------------------------------------------
R 0.992 RMSE 202.837
R-Squared 0.984 MSE 41142.862
Adj. R-Squared 0.982 Coef. Var 1.921
Pred R-Squared 0.975 AIC 916.000
MAE 152.893 SBC 931.433
-------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------------
Regression 165140636.668 5 33028127.334 730.877 0.0000
Residual 2756571.779 61 45189.701
Total 167897208.448 66
---------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------------
(Intercept) 43134.882 1973.376 21.858 0.000 39188.873 47080.890
RPM 0.066 0.012 0.291 5.568 0.000 0.042 0.090
INLET.TEMP 1.337 0.813 0.115 1.644 0.105 -0.289 2.962
EXH.TEMP 12.538 1.098 0.347 11.422 0.000 10.343 14.733
log.AIRFLOW 7529.364 539.316 7.262 13.961 0.000 6450.935 8607.794
log.POWER -7347.619 503.882 -8.082 -14.582 0.000 -8355.194 -6340.044
--------------------------------------------------------------------------------------------------
# forward selection
#default: penter = 0.3
ols_step_forward_p(gasmod1,p_enter=0.15,details=T)
Forward Selection Method
------------------------
Candidate Terms:
1. RPM
2. INLET.TEMP
3. EXH.TEMP
4. log.CPRATIO
5. log.AIRFLOW
6. log.POWER
Step => 0
Model => HEATRATE ~ 1
R2 => 0
Initiating stepwise selection...
Selection Metrics Table
------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
------------------------------------------------------------------
log.POWER 0.00000 0.758 0.755 1088.159
RPM 0.00000 0.712 0.708 1099.849
log.AIRFLOW 0.00000 0.705 0.701 1101.457
log.CPRATIO 0.00000 0.670 0.665 1109.066
INLET.TEMP 0.00000 0.641 0.635 1114.713
EXH.TEMP 0.00959 0.099 0.085 1176.358
------------------------------------------------------------------
Step => 1
Selected => log.POWER
Model => HEATRATE ~ log.POWER
R2 => 0.758
Selection Metrics Table
------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
------------------------------------------------------------------
log.CPRATIO 0.00000 0.909 0.906 1024.868
log.AIRFLOW 0.00000 0.865 0.861 1051.171
INLET.TEMP 2e-05 0.820 0.814 1070.432
RPM 0.03660 0.774 0.767 1085.549
EXH.TEMP 0.14832 0.766 0.759 1087.954
------------------------------------------------------------------
Step => 2
Selected => log.CPRATIO
Model => HEATRATE ~ log.POWER + log.CPRATIO
R2 => 0.909
Selection Metrics Table
------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
------------------------------------------------------------------
log.AIRFLOW 0.00236 0.921 0.918 1016.961
RPM 0.10842 0.913 0.908 1024.106
EXH.TEMP 0.27423 0.911 0.906 1025.587
INLET.TEMP 0.45619 0.910 0.905 1026.273
------------------------------------------------------------------
Step => 3
Selected => log.AIRFLOW
Model => HEATRATE ~ log.POWER + log.CPRATIO + log.AIRFLOW
R2 => 0.921
Selection Metrics Table
----------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------
EXH.TEMP 0.00000 0.975 0.974 941.630
RPM 0.00000 0.947 0.943 993.145
INLET.TEMP 0.00000 0.944 0.941 995.852
----------------------------------------------------------------
Step => 4
Selected => EXH.TEMP
Model => HEATRATE ~ log.POWER + log.CPRATIO + log.AIRFLOW + EXH.TEMP
R2 => 0.975
Selection Metrics Table
----------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------
RPM 0.00000 0.983 0.982 916.887
INLET.TEMP 0.76662 0.975 0.973 943.533
----------------------------------------------------------------
Step => 5
Selected => RPM
Model => HEATRATE ~ log.POWER + log.CPRATIO + log.AIRFLOW + EXH.TEMP + RPM
R2 => 0.983
Selection Metrics Table
----------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------
INLET.TEMP 0.27902 0.984 0.982 917.568
----------------------------------------------------------------
Step => 6
Selected => INLET.TEMP
Model => HEATRATE ~ log.POWER + log.CPRATIO + log.AIRFLOW + EXH.TEMP + RPM + INLET.TEMP
R2 => 0.984
Variables Selected:
=> log.POWER
=> log.CPRATIO
=> log.AIRFLOW
=> EXH.TEMP
=> RPM
=> INLET.TEMP
Stepwise Summary
----------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
----------------------------------------------------------------------------
0 Base Model 1181.327 1185.737 987.298 0.00000 0.00000
1 log.POWER 1088.159 1094.773 892.613 0.75839 0.75467
2 log.CPRATIO 1024.868 1033.687 828.649 0.90882 0.90597
3 log.AIRFLOW 1016.961 1027.984 819.495 0.92135 0.91761
4 EXH.TEMP 941.630 954.858 748.697 0.97520 0.97360
5 RPM 916.887 932.320 727.870 0.98336 0.98200
6 INLET.TEMP 917.568 935.205 729.036 0.98369 0.98206
----------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
-------------------------------------------------------------------
R 0.992 RMSE 202.184
R-Squared 0.984 MSE 40878.488
Adj. R-Squared 0.982 Coef. Var 1.931
Pred R-Squared 0.975 AIC 917.568
MAE 154.185 SBC 935.205
-------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------------
Regression 165158349.755 6 27526391.626 603.019 0.0000
Residual 2738858.692 60 45647.645
Total 167897208.448 66
---------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------------
(Intercept) 43116.228 1983.576 21.737 0.000 39148.486 47083.970
log.POWER -7433.155 524.714 -8.176 -14.166 0.000 -8482.740 -6383.570
log.CPRATIO 157.821 253.354 0.027 0.623 0.536 -348.962 664.604
log.AIRFLOW 7624.558 563.172 7.354 13.539 0.000 6498.047 8751.070
EXH.TEMP 13.259 1.599 0.367 8.291 0.000 10.060 16.457
RPM 0.067 0.012 0.295 5.575 0.000 0.043 0.091
INLET.TEMP 1.036 0.949 0.089 1.092 0.279 -0.861 2.934
--------------------------------------------------------------------------------------------------
# stepwise regression
#Default: pent = 0.1, prem = 0.3
ols_step_both_p(gasmod1,p_ent=0.15,p_rem=0.15,details=T)
Stepwise Selection Method
-------------------------
Candidate Terms:
1. RPM
2. INLET.TEMP
3. EXH.TEMP
4. log.CPRATIO
5. log.AIRFLOW
6. log.POWER
Step => 0
Model => HEATRATE ~ 1
R2 => 0
Initiating stepwise selection...
Step => 1
Selected => log.POWER
Model => HEATRATE ~ log.POWER
R2 => 0.758
Step => 2
Selected => log.CPRATIO
Model => HEATRATE ~ log.POWER + log.CPRATIO
R2 => 0.909
Step => 3
Selected => log.AIRFLOW
Model => HEATRATE ~ log.POWER + log.CPRATIO + log.AIRFLOW
R2 => 0.921
Step => 4
Selected => EXH.TEMP
Model => HEATRATE ~ log.POWER + log.CPRATIO + log.AIRFLOW + EXH.TEMP
R2 => 0.975
Step => 5
Removed => log.CPRATIO
Model => HEATRATE ~ log.POWER + log.AIRFLOW + EXH.TEMP
=> 0.97519
Step => 6
Selected => RPM
Model => HEATRATE ~ log.POWER + log.AIRFLOW + EXH.TEMP + RPM
R2 => 0.983
Step => 7
Selected => INLET.TEMP
Model => HEATRATE ~ log.POWER + log.AIRFLOW + EXH.TEMP + RPM + INLET.TEMP
R2 => 0.984
Stepwise Summary
--------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
--------------------------------------------------------------------------------
0 Base Model 1181.327 1185.737 987.298 0.00000 0.00000
1 log.POWER (+) 1088.159 1094.773 892.613 0.75839 0.75467
2 log.CPRATIO (+) 1024.868 1033.687 828.649 0.90882 0.90597
3 log.AIRFLOW (+) 1016.961 1027.984 819.495 0.92135 0.91761
4 EXH.TEMP (+) 941.630 954.858 748.697 0.97520 0.97360
5 log.CPRATIO (-) 939.661 950.684 747.255 0.97519 0.97401
6 RPM (+) 916.905 930.133 727.384 0.98285 0.98175
7 INLET.TEMP (+) 916.000 931.433 727.152 0.98358 0.98224
--------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
-------------------------------------------------------------------
R 0.992 RMSE 202.837
R-Squared 0.984 MSE 41142.862
Adj. R-Squared 0.982 Coef. Var 1.921
Pred R-Squared 0.975 AIC 916.000
MAE 152.893 SBC 931.433
-------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------------
Regression 165140636.668 5 33028127.334 730.877 0.0000
Residual 2756571.779 61 45189.701
Total 167897208.448 66
---------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------------
(Intercept) 43134.882 1973.376 21.858 0.000 39188.873 47080.890
log.POWER -7347.619 503.882 -8.082 -14.582 0.000 -8355.194 -6340.044
log.AIRFLOW 7529.364 539.316 7.262 13.961 0.000 6450.935 8607.794
EXH.TEMP 12.538 1.098 0.347 11.422 0.000 10.343 14.733
RPM 0.066 0.012 0.291 5.568 0.000 0.042 0.090
INLET.TEMP 1.337 0.813 0.115 1.644 0.105 -0.289 2.962
--------------------------------------------------------------------------------------------------
#updated model
gasmod2<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP+log.AIRFLOW+log.POWER,data=gasturbine)
summary(gasmod2)
Call:
lm(formula = HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + log.AIRFLOW +
log.POWER, data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-345.13 -130.99 -49.58 68.14 757.16
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.313e+04 1.973e+03 21.858 < 2e-16 ***
RPM 6.613e-02 1.188e-02 5.568 6.13e-07 ***
INLET.TEMP 1.337e+00 8.130e-01 1.644 0.105
EXH.TEMP 1.254e+01 1.098e+00 11.422 < 2e-16 ***
log.AIRFLOW 7.529e+03 5.393e+02 13.961 < 2e-16 ***
log.POWER -7.348e+03 5.039e+02 -14.582 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 212.6 on 61 degrees of freedom
Multiple R-squared: 0.9836, Adjusted R-squared: 0.9822
F-statistic: 730.9 on 5 and 61 DF, p-value: < 2.2e-16
gasmod2vif<-round(vif(gasmod2),3)
gasmod2vif
RPM INLET.TEMP EXH.TEMP log.AIRFLOW log.POWER
10.163 18.231 3.429 1005.242 1141.209
mean(gasmod2vif)
[1] 435.6548
This did not resolve our problem of sever multicollinearity. We should remove log.power, log.airflow (or both).
#updated model: REMOVE LOG.POWER
gasmod3<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP+log.AIRFLOW,data=gasturbine)
gasmod3vif<-round(vif(gasmod3),3)
gasmod3vif
RPM INLET.TEMP EXH.TEMP log.AIRFLOW
10.057 3.575 3.334 10.683
mean(gasmod3vif)
[1] 6.91225
#updated model: REMOVE LOG.AIRFLOW
gasmod4<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP+log.POWER,data=gasturbine)
gasmod4vif<-round(vif(gasmod4),3)
gasmod4vif
RPM INLET.TEMP EXH.TEMP log.POWER
9.777 3.676 3.271 12.128
mean(gasmod4vif)
[1] 7.213
#updated model: REMOVE BOTH
gasmod5<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP,data=gasturbine)
gasmod5vif<-round(vif(gasmod5),3)
gasmod5vif
RPM INLET.TEMP EXH.TEMP
1.727 3.570 2.551
mean(gasmod5vif)
[1] 2.616
It appears the model without power and airflow have resolved the issue of severe multicollinearity. HOWEVER, we may have also decided to remove a combination of RPM and power/airflow, as that also had a strong relationship. As a researcher, or with your client, you may discuss which of those three variables would be most important to analyze moving forward. Here, we will remove the transformations as we want the most least complex model.
Reminder we looked at multicollinearity and variable screening to end up with a starting subset of predictors of: RPM, INLET.TEMP,and EXH.TEMP
#updated model
gasmod5<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP,data=gasturbine)
summary(gasmod5)
Call:
lm(formula = HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP, data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-1025.8 -297.9 -115.3 225.8 1425.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.436e+04 7.333e+02 19.582 < 2e-16 ***
RPM 1.051e-01 1.071e-02 9.818 2.55e-14 ***
INLET.TEMP -9.223e+00 7.869e-01 -11.721 < 2e-16 ***
EXH.TEMP 1.243e+01 2.071e+00 6.000 1.06e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 465 on 63 degrees of freedom
Multiple R-squared: 0.9189, Adjusted R-squared: 0.915
F-statistic: 237.9 on 3 and 63 DF, p-value: < 2.2e-16
Try adding qualitative variable Engine
#updated model
gasmod6<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP+ENGINE,data=gasturbine)
summary(gasmod6)
Call:
lm(formula = HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + ENGINE,
data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-1030.37 -247.21 -70.62 258.72 1400.82
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.607e+04 1.262e+03 12.732 < 2e-16 ***
RPM 1.077e-01 1.198e-02 8.991 8.85e-13 ***
INLET.TEMP -9.899e+00 9.374e-01 -10.561 2.12e-15 ***
EXH.TEMP 1.116e+01 2.202e+00 5.068 4.00e-06 ***
ENGINEAeroderiv -4.404e+02 2.870e+02 -1.534 0.130
ENGINETraditional -3.667e+02 2.233e+02 -1.642 0.106
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 461.2 on 61 degrees of freedom
Multiple R-squared: 0.9227, Adjusted R-squared: 0.9164
F-statistic: 145.6 on 5 and 61 DF, p-value: < 2.2e-16
Perform nested F test to determine if Engine type is significant.
#anova(reduced,complete)
anova(gasmod5,gasmod6)
For the sake of this example, we will not consider any further testing.
We have covered this individually and will come back to it in a complete example at the end of the unit.
We have covered this individually and will come back to it in a complete example at the end of the unit.
First we will evaluate model assumptions
#There are a few options to view plots for assumptions
#Residuals Plots of explanatory variables vs residuals
residualPlots(gasmod5,tests=F)
#Residual vs Fitted and QQ plot
plot(gasmod5, which=c(1,2))
#histogram of residuals
hist(residuals(gasmod5))
Then we will identify outliers and influential observations.
#Cooks Distance Thresholds
plot(gasmod6,which=4)
#Leverage vs Studentized Residuals
influencePlot(gasmod6,fill=F)
#Deleted Studentized Residuals vs PRedicted values
ols_plot_resid_stud_fit(gasmod6)
#We can use various functions to store and view these statistics for all observations
rstudent(gasmod6)
1 2 3 4 5 6
1.243137605 0.953567034 -0.559949611 -1.833746826 -0.112418409 -0.033200806
7 8 9 10 11 12
-0.058156152 0.508333893 -0.155456628 -0.459542582 3.392627996 2.117199804
13 14 15 16 17 18
0.125129449 -0.046196504 0.723384265 -0.095140393 -0.547670707 -0.750470873
19 20 21 22 23 24
-0.304450493 0.673205397 1.751907503 0.654343118 -0.350120635 -0.410544775
25 26 27 28 29 30
-0.065042611 -0.337870874 -0.635228203 -0.346550240 -0.678320209 -0.841277386
31 32 33 34 35 36
-0.838605023 -2.442358463 0.238709494 0.005484449 -1.126363967 2.658061417
37 38 39 40 41 42
0.289557055 -0.017061404 -1.707924830 -0.908013420 0.059113080 -0.432805037
43 44 45 46 47 48
-0.458244154 -0.666508619 1.213017501 -0.185981656 2.607630221 0.919011926
49 50 51 52 53 54
-0.444586661 0.696961236 -0.481522312 0.641272157 0.922182297 -0.158439097
55 56 57 58 59 60
0.134434489 -0.874455736 -0.346208078 -1.172232330 0.425819134 -1.319935880
61 62 63 64 65 66
0.312984792 -1.400726730 -0.672890374 1.236692924 -0.388154139 1.314663248
67
-0.259974071
hatvalues(gasmod6)
1 2 3 4 5 6 7
0.26374942 0.13147635 0.08910503 0.05381744 0.06186189 0.03974379 0.04564992
8 9 10 11 12 13 14
0.04072182 0.04546682 0.04670065 0.06051855 0.04056920 0.15105297 0.04330309
15 16 17 18 19 20 21
0.11159742 0.05097723 0.05849761 0.06956934 0.07525361 0.08443891 0.04384658
22 23 24 25 26 27 28
0.03386829 0.04351253 0.06120637 0.04520265 0.04336079 0.05661082 0.14685603
29 30 31 32 33 34 35
0.15177767 0.04110694 0.04868216 0.09528677 0.07081525 0.04347583 0.05496934
36 37 38 39 40 41 42
0.16171973 0.05366053 0.06842493 0.05028882 0.07660326 0.06014061 0.05777986
43 44 45 46 47 48 49
0.08247021 0.05996719 0.09328251 0.07893593 0.06754455 0.06754455 0.06168252
50 51 52 53 54 55 56
0.07005364 0.04880852 0.07153452 0.09716238 0.05178620 0.06510467 0.04837655
57 58 59 60 61 62 63
0.04881945 0.08198603 0.07056967 0.05277623 0.29345156 0.24559458 0.15567950
64 65 66 67
0.38302799 0.25851075 0.19767167 0.17439179
cooks.distance(gasmod6)
1 2 3 4 5 6
9.145076e-02 2.297542e-02 5.170060e-03 3.068826e-02 1.411780e-04 7.730344e-06
7 8 9 10 11 12
2.741110e-05 1.850724e-03 1.949730e-04 1.746814e-03 1.054108e-01 2.988429e-02
13 14 15 16 17 18
4.719345e-04 1.636721e-05 1.104177e-02 8.237422e-05 3.142081e-03 7.069216e-03
19 20 21 22 23 24
1.276130e-03 7.029261e-03 2.268777e-02 2.525267e-03 9.429995e-04 1.856764e-03
25 26 27 28 29 30
3.393481e-05 8.750893e-04 4.075533e-03 3.495914e-03 1.384450e-02 5.081101e-03
31 32 33 34 35 36
6.027341e-03 9.682884e-02 7.351555e-04 2.316568e-07 1.224537e-02 2.066258e-01
37 38 39 40 41 42
8.044465e-04 3.622858e-06 2.495904e-02 1.143256e-02 3.788560e-05 1.940360e-03
43 44 45 46 47 48
3.186991e-03 4.766569e-03 2.503616e-02 5.019982e-04 7.496492e-02 1.022260e-02
49 50 51 52 53 54
2.194442e-03 6.150573e-03 2.008227e-03 5.332067e-03 1.529104e-02 2.322085e-04
55 56 57 58 59 60
2.131901e-04 6.503885e-03 1.040315e-03 2.032877e-02 2.325779e-03 1.598408e-02
61 62 63 64 65 66
6.882715e-03 1.048030e-01 1.404026e-02 1.568862e-01 8.878107e-03 7.013186e-02
67
2.416298e-03
dffits(gasmod6)
1 2 3 4 5 6
0.744050027 0.371008804 -0.175132164 -0.437334197 -0.028867929 -0.006754446
7 8 9 10 11 12
-0.012719259 0.104734650 -0.033928216 -0.101712085 0.861065944 0.435364744
13 14 15 16 17 18
0.052781760 -0.009828368 0.256384190 -0.022050313 -0.136514215 -0.205211062
19 20 21 22 23 24
-0.086849841 0.204444401 0.375158869 0.122513521 -0.074676723 -0.104827152
25 26 27 28 29 30
-0.014152212 -0.071932520 -0.155608758 -0.143780813 -0.286935214 -0.174185378
31 32 33 34 35 36
-0.189705350 -0.792629512 0.065899491 0.001169254 -0.271653969 1.167486584
37 38 39 40 41 42
0.068950616 -0.004623949 -0.393014628 -0.261530121 0.014953266 -0.107177681
43 44 45 46 47 48
-0.137383755 -0.168341462 0.389073077 -0.054445570 0.701822231 0.247344503
49 50 51 52 53 54
-0.113988902 0.191291220 -0.109076264 0.177998911 0.302524828 -0.037026805
55 56 57 58 59 60
0.035476051 -0.197161779 -0.078433600 -0.350315199 0.117334519 -0.311562901
61 62 63 64 65 66
0.201706769 -0.799209067 -0.288939128 0.974415977 -0.229187589 0.652545303
67
-0.119482982
dfbetas(gasmod6)
(Intercept) RPM INLET.TEMP EXH.TEMP ENGINEAeroderiv
1 -0.4616252956 0.4813707982 0.1923546503 0.2474519866 0.0163454215
2 0.2941293684 0.1012291086 -0.0592393708 -0.2353768288 -0.2225153990
3 0.0816216531 -0.1351407226 -0.1021617893 0.0248528426 0.0070221299
4 -0.2854701803 -0.0814920749 0.0566271282 0.2286492592 0.2063272041
5 0.0187689328 -0.0127622742 -0.0098064543 -0.0083815425 -0.0050162064
6 0.0022577393 -0.0003116136 -0.0034381246 0.0011540048 -0.0018923341
7 0.0032957585 -0.0017728807 -0.0076681469 0.0043273436 -0.0027288509
8 0.0018309521 -0.0310832458 0.0226470204 -0.0227564500 0.0225520605
9 0.0067974867 0.0190325418 0.0027008214 -0.0100662840 -0.0134274831
10 0.0330229277 0.0434078651 -0.0147681851 -0.0197235327 -0.0451325514
11 -0.0083433409 0.4922496924 -0.0444452026 0.0326711505 -0.2839038186
12 0.0699847018 -0.0722039788 -0.2404521325 0.1691626314 -0.0592428387
13 -0.0405153289 0.0322127844 0.0336742129 0.0056781247 0.0115444107
14 -0.0048578184 -0.0006887353 -0.0009359947 0.0057223443 0.0026128095
15 0.1709015862 -0.1526854425 -0.2006583232 0.0340405220 -0.0548898822
16 0.0081106284 0.0111624539 0.0049757207 -0.0132798704 -0.0090726320
17 0.0200255447 0.0925475813 0.0561845396 -0.0782467640 -0.0471752875
18 0.0468293295 0.1410500864 0.0804652675 -0.1302495554 -0.0815875121
19 0.0533427648 0.0355056348 -0.0094012951 -0.0446631804 -0.0496219464
20 -0.0810185081 0.1470938758 0.1542798286 -0.0768958275 -0.0001442681
21 -0.0702709373 -0.2143575310 -0.0614273416 0.1375233203 0.1397583971
22 -0.0219619055 -0.0217687026 0.0331990042 -0.0100275661 0.0322626913
23 0.0161048078 0.0408565364 0.0082328696 -0.0254798946 -0.0289461582
24 0.0402195466 0.0603878096 0.0215813744 -0.0630221750 -0.0486405066
25 0.0052468132 0.0046302026 -0.0035025886 -0.0019150894 -0.0062067909
26 -0.0061063251 0.0441060037 0.0158984986 -0.0112493118 -0.0174067326
27 0.0497007703 0.0937833031 0.0279886031 -0.0799102087 -0.0705097337
28 -0.0554134237 -0.0455244938 0.0732219438 -0.0153486341 0.0733156253
29 -0.2372320929 -0.0646037470 0.0793632837 0.1582879533 0.1783248575
30 -0.0116410460 -0.0833978191 0.0056159734 0.0091828640 0.0542423839
31 0.0140299272 -0.1302444267 -0.0709693715 0.0606572757 0.0470373048
32 -0.0233428958 -0.5979011317 -0.5030653876 0.5382255324 0.2133892106
33 0.0378566710 0.0140643935 0.0128106931 -0.0503268081 -0.0237725619
34 0.0003429671 0.0002987724 0.0004064828 -0.0007466865 -0.0002341270
35 -0.0214090302 -0.1501949499 -0.1532886837 0.1768335498 0.0543505928
36 0.4469699197 -0.8303862943 -1.0322844998 0.6021667003 -0.0368952410
37 0.0127699700 -0.0488569028 -0.0363105704 0.0248452796 0.0112355243
38 -0.0025083727 0.0030406311 0.0025964300 -0.0001893908 0.0002638342
39 -0.2084784081 -0.1091030122 -0.0424833246 0.2509178991 0.1561643991
40 -0.1352378569 -0.0765171117 -0.0223324791 0.1416996592 0.1698422490
41 0.0028832039 -0.0030774923 -0.0065254530 0.0047097066 -0.0056065357
42 0.0027910235 0.0278039651 0.0323168203 -0.0430701360 0.0217495695
43 0.0542061596 0.0321491875 0.0236094477 -0.0858681352 -0.0070144470
44 0.0024929420 0.0535841873 0.0560453095 -0.0711223924 0.0300980029
45 -0.0347395629 0.1393177658 0.2582444670 -0.2020336420 -0.0778248391
46 0.0300937938 0.0012068846 -0.0131160735 -0.0201447002 -0.0064018228
47 0.3616857279 0.1554790890 -0.0680823663 -0.2490642116 -0.4724339056
48 0.1274695679 0.0547957820 -0.0239943939 -0.0877781593 -0.1665007523
49 0.0187966599 0.0276748596 0.0273566802 -0.0541860438 0.0132279631
50 -0.0720814425 0.0447400741 0.0998190172 -0.0160818828 -0.0106577582
51 -0.0052138136 0.0156770115 0.0084393195 -0.0121843843 0.0295070130
52 -0.0506803634 0.0434110963 0.1006005165 -0.0388331050 -0.0167395794
53 0.1779008791 0.0789263611 0.0411173707 -0.2017116947 -0.1896618886
54 -0.0042622045 0.0023795010 0.0090025003 -0.0074908463 0.0142436155
55 0.0201113016 -0.0010400489 -0.0047112872 -0.0127596473 -0.0203274523
56 -0.0258847901 -0.0024670234 -0.0125968442 0.0225424737 0.0718591696
57 0.0026308202 0.0062094306 0.0020134861 -0.0108758514 0.0197246031
58 -0.1177794552 0.1560581070 0.2242789267 -0.1304967655 0.1149860506
59 -0.0419952615 0.0291987947 0.0631717980 -0.0140451074 -0.0080251530
60 -0.0574206914 0.0805555716 0.0909389311 -0.0587965778 0.1000371507
61 0.0559781825 0.0614377409 -0.0708084649 0.0118169199 0.0035799256
62 0.1044317936 -0.1983608786 0.1895215915 -0.2805063640 -0.2615428200
63 0.0079579424 0.0227519931 0.0641241378 -0.0714429716 -0.1678295813
64 0.0453956802 0.0150146037 0.5336556913 -0.5673301450 0.4721213557
65 -0.0625650309 0.0713389317 -0.0266745149 0.0849028217 -0.1187416934
66 -0.3376168664 0.0444442130 0.2013063713 0.1335537388 0.5376932607
67 0.0035229428 0.0396234351 0.0400650227 -0.0441935765 -0.0790326742
ENGINETraditional
1 0.3105796747
2 -0.1896131041
3 -0.0821640856
4 0.1316629086
5 -0.0183141372
6 -0.0044088031
7 -0.0075281478
8 0.0432267216
9 -0.0174172815
10 -0.0643784116
11 0.0337035730
12 0.0040235630
13 0.0365427464
14 0.0005652888
15 -0.0990744581
16 -0.0119613136
17 -0.0497914801
18 -0.0841030522
19 -0.0654343393
20 0.1029660979
21 0.1810262863
22 0.0677272565
23 -0.0383938783
24 -0.0586474048
25 -0.0095514545
26 -0.0213187329
27 -0.0844172781
28 0.0556254784
29 0.1639146941
30 -0.0118814304
31 -0.0424705701
32 -0.1268550560
33 -0.0113296357
34 0.0001290727
35 -0.0583542402
36 -0.2674127716
37 0.0085394697
38 0.0007613867
39 0.0608278075
40 0.1852756424
41 -0.0086227268
42 0.0403537021
43 0.0049805211
44 0.0625304421
45 -0.0364465093
46 -0.0079732603
47 -0.5416713727
48 -0.1909022405
49 0.0292293693
50 0.0071515967
51 0.0448182591
52 -0.0013155861
53 -0.2096487411
54 0.0194482863
55 -0.0266978648
56 0.0911943758
57 0.0280412216
58 0.2184286010
59 0.0032502278
60 0.1647087061
61 -0.0795860796
62 0.0552728528
63 0.0109140430
64 0.1546419984
65 0.0112760035
66 0.2808704953
67 0.0004734618
#We can subset by the observations of interest
dffits(gasmod6)[c(11,36,61,64)]
11 36 61 64
0.8610659 1.1674866 0.2017068 0.9744160
dfbetas(gasmod6)[c(11,36,61,64),]
(Intercept) RPM INLET.TEMP EXH.TEMP ENGINEAeroderiv
11 -0.008343341 0.49224969 -0.04444520 0.03267115 -0.283903819
36 0.446969920 -0.83038629 -1.03228450 0.60216670 -0.036895241
61 0.055978183 0.06143774 -0.07080846 0.01181692 0.003579926
64 0.045395680 0.01501460 0.53365569 -0.56733015 0.472121356
ENGINETraditional
11 0.03370357
36 -0.26741277
61 -0.07958608
64 0.15464200
summary(dfbetas(gasmod6))
(Intercept) RPM INLET.TEMP
Min. :-0.461625 Min. :-0.8303863 Min. :-1.032285
1st Qu.:-0.030312 1st Qu.:-0.0264260 1st Qu.:-0.018550
Median : 0.002791 Median : 0.0150146 Median : 0.005616
Mean :-0.000495 Mean :-0.0000617 Mean : 0.001572
3rd Qu.: 0.035440 3rd Qu.: 0.0491621 3rd Qu.: 0.056115
Max. : 0.446970 Max. : 0.4922497 Max. : 0.533656
EXH.TEMP ENGINEAeroderiv ENGINETraditional
Min. :-0.5673301 Min. :-0.472434 Min. :-0.5416714
1st Qu.:-0.0564913 1st Qu.:-0.046154 1st Qu.:-0.0404322
Median :-0.0112493 Median :-0.002729 Median : 0.0004735
Mean :-0.0009895 Mean : 0.001614 Mean : 0.0012985
3rd Qu.: 0.0236939 3rd Qu.: 0.029803 3rd Qu.: 0.0440225
Max. : 0.6021667 Max. : 0.537693 Max. : 0.3105797
Variable transformation: We can directly use a function of the response variable within lm() or add a new variable to the table
# directly transform in the lm() function
gasmod5sqrt<-lm(sqrt(HEATRATE)~RPM+INLET.TEMP+EXH.TEMP,data=gasturbine)
summary(gasmod5sqrt)
Call:
lm(formula = sqrt(HEATRATE) ~ RPM + INLET.TEMP + EXH.TEMP, data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-4.4872 -1.3479 -0.5044 0.9321 6.2759
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.218e+02 3.346e+00 36.392 < 2e-16 ***
RPM 4.640e-04 4.886e-05 9.496 9.05e-14 ***
INLET.TEMP -4.367e-02 3.590e-03 -12.163 < 2e-16 ***
EXH.TEMP 5.706e-02 9.449e-03 6.039 9.12e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.122 on 63 degrees of freedom
Multiple R-squared: 0.9206, Adjusted R-squared: 0.9168
F-statistic: 243.4 on 3 and 63 DF, p-value: < 2.2e-16
# OR Create new variable
gasturbine$sqrty<-sqrt(gasturbine$HEATRATE)
#remember to remove original response heatrate
gasmodsqrt2<-lm(sqrty~RPM+INLET.TEMP+EXH.TEMP,data=gasturbine)
summary(gasmodsqrt2)
Call:
lm(formula = sqrty ~ RPM + INLET.TEMP + EXH.TEMP, data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-4.4872 -1.3479 -0.5044 0.9321 6.2759
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.218e+02 3.346e+00 36.392 < 2e-16 ***
RPM 4.640e-04 4.886e-05 9.496 9.05e-14 ***
INLET.TEMP -4.367e-02 3.590e-03 -12.163 < 2e-16 ***
EXH.TEMP 5.706e-02 9.449e-03 6.039 9.12e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.122 on 63 degrees of freedom
Multiple R-squared: 0.9206, Adjusted R-squared: 0.9168
F-statistic: 243.4 on 3 and 63 DF, p-value: < 2.2e-16
Removing observations from analysis by subsetting the data
#We can remove particular observations
subsetgas<-gasturbine[-c(11,36,61,64),]
gasmod5rem<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP,data=subsetgas)
summary(gasmod5rem)
Call:
lm(formula = HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP, data = subsetgas)
Residuals:
Min 1Q Median 3Q Max
-1005.32 -242.35 -89.03 208.32 1273.08
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.404e+04 6.640e+02 21.146 < 2e-16 ***
RPM 1.068e-01 1.058e-02 10.098 1.78e-14 ***
INLET.TEMP -8.334e+00 8.352e-01 -9.979 2.79e-14 ***
EXH.TEMP 1.095e+01 2.190e+00 5.000 5.44e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 406.4 on 59 degrees of freedom
Multiple R-squared: 0.9157, Adjusted R-squared: 0.9114
F-statistic: 213.7 on 3 and 59 DF, p-value: < 2.2e-16