During periods of high electricity demand, especially during the hot summer months, the power output from a gas turbine engine can drop dramatically. One way to counter this drop in power is by cooling the inlet air to the gas turbine. An increasingly popular cooling method uses high pressure inlet fogging. The performance of a sample of 67 gas turbines augmented with high pressure inlet fogging was investigated in the Journal of Engineering for Gas Turbines and Power (January 2005). One measure of performance is heat rate (kilojoules per kilowatt per hour). Heat rates for the 67 gas turbines, saved in the gasturbine file.
Check the appropriateness of response variable for regression: View a histogram of response variable. It should be continuous, and approximately unimodal and symmetric, with few outliers.
gasturbine<-read.delim("https://raw.githubusercontent.com/kvaranyak4/STAT3220/main/GASTURBINE.txt")
head(gasturbine)
names(gasturbine)
[1] "ENGINE" "SHAFTS" "RPM" "CPRATIO" "INLET.TEMP"
[6] "EXH.TEMP" "AIRFLOW" "POWER" "HEATRATE"
hist(gasturbine$HEATRATE, xlab="Heat Rate", main="Histogram of Heat Rate")
The distribution of the response variable, heat rate, is unimodal and skewed right. It is continuous, so it should still be suitable for regression.
We will explore the relationship with quantitative variables with scatter plots and correlations and classify each relationship as linear, curvilinear, or none. We explore the box plots and means for each qualitative variable explanatory variable then classify the relationships as existent or not. We will not explore interactions in this example.
#Scatter plots for quantitative variables
for (i in names(gasturbine)[3:8]) {
plot(gasturbine[,i], gasturbine$HEATRATE,xlab=i,ylab="Heat Rate")
}
#Correlations for quantitative variables
round(cor(gasturbine[3:8],gasturbine$HEATRATE,use="complete.obs"),3)
[,1]
RPM 0.844
CPRATIO -0.735
INLET.TEMP -0.801
EXH.TEMP -0.314
AIRFLOW -0.703
POWER -0.697
#Summary Statistics for response variable grouped by each level of the response
tapply(gasturbine$HEATRATE,gasturbine$ENGINE,summary)
$Advanced
Min. 1st Qu. Median Mean 3rd Qu. Max.
9105 9295 9669 9764 9933 11588
$Aeroderiv
Min. 1st Qu. Median Mean 3rd Qu. Max.
8714 10708 12414 12312 13697 16243
$Traditional
Min. 1st Qu. Median Mean 3rd Qu. Max.
10086 10598 11183 11544 11956 14796
tapply(gasturbine$HEATRATE,gasturbine$SHAFTS,summary)
$`1`
Min. 1st Qu. Median Mean 3rd Qu. Max.
9105 9918 10592 10930 11674 14796
$`2`
Min. 1st Qu. Median Mean 3rd Qu. Max.
10951 11223 11654 12536 13232 16243
$`3`
Min. 1st Qu. Median Mean 3rd Qu. Max.
8714 8903 9092 9092 9280 9469
#Box plots for Qualitative
boxplot(HEATRATE~ENGINE,gasturbine, ylab="Heat Rate")
boxplot(HEATRATE~SHAFTS,gasturbine, ylab="Heat Rate")
# Summary counts for qualitative variables
table(gasturbine$ENGINE,gasturbine$SHAFTS)
1 2 3
Advanced 21 0 0
Aeroderiv 1 4 2
Traditional 35 4 0
Do any of the explanatory variables have relationships with each other? We will look at pairwise correlations and VIF to evaluate multicollinearity in the quantitative explanatory variables.
#Regular correlation
# Use the correlation function with no second argument
# to find the correlations with the explanaotry varibles with eachother
gasturcor<-round(cor(gasturbine[,3:8]),4)
gasturcor
RPM CPRATIO INLET.TEMP EXH.TEMP AIRFLOW POWER
RPM 1.0000 -0.4903 -0.5536 -0.1715 -0.6876 -0.6169
CPRATIO -0.4903 1.0000 0.6851 0.1139 0.3826 0.4473
INLET.TEMP -0.5536 0.6851 1.0000 0.7283 0.6808 0.7503
EXH.TEMP -0.1715 0.1139 0.7283 1.0000 0.5665 0.6309
AIRFLOW -0.6876 0.3826 0.6808 0.5665 1.0000 0.9776
POWER -0.6169 0.4473 0.7503 0.6309 0.9776 1.0000
#Correlation Visualization
# The argument is the stored correlation values from the cor() function
corrplot(gasturcor)
# Scatter plot matrix
plot(gasturbine[3:8])
#A new correlation function
# this will also output the assocaited p-values with Ho: rho=0
rcorr(as.matrix(gasturbine[,3:8]))
RPM CPRATIO INLET.TEMP EXH.TEMP AIRFLOW POWER
RPM 1.00 -0.49 -0.55 -0.17 -0.69 -0.62
CPRATIO -0.49 1.00 0.69 0.11 0.38 0.45
INLET.TEMP -0.55 0.69 1.00 0.73 0.68 0.75
EXH.TEMP -0.17 0.11 0.73 1.00 0.57 0.63
AIRFLOW -0.69 0.38 0.68 0.57 1.00 0.98
POWER -0.62 0.45 0.75 0.63 0.98 1.00
n= 67
P
RPM CPRATIO INLET.TEMP EXH.TEMP AIRFLOW POWER
RPM 0.0000 0.0000 0.1653 0.0000 0.0000
CPRATIO 0.0000 0.0000 0.3585 0.0014 0.0001
INLET.TEMP 0.0000 0.0000 0.0000 0.0000 0.0000
EXH.TEMP 0.1653 0.3585 0.0000 0.0000 0.0000
AIRFLOW 0.0000 0.0014 0.0000 0.0000 0.0000
POWER 0.0000 0.0001 0.0000 0.0000 0.0000
There is concern of strong pairwise relationships.
#Multicollinearity VIF
# First fit the model with all of the quantitative variables
gasmod1<-lm(HEATRATE~.-ENGINE-SHAFTS,data=gasturbine)
# Syntax Note: We can use the . to indicate all the variables in the data frame
# And use the - to exclude a particular variable from the model
summary(gasmod1)
Call:
lm(formula = HEATRATE ~ . - ENGINE - SHAFTS, data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-1003.32 -307.35 -91.44 271.18 1405.52
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.431e+04 1.112e+03 12.869 < 2e-16 ***
RPM 8.058e-02 1.611e-02 5.002 5.25e-06 ***
CPRATIO -6.775e+00 3.038e+01 -0.223 0.824301
INLET.TEMP -9.507e+00 1.529e+00 -6.217 5.33e-08 ***
EXH.TEMP 1.415e+01 3.469e+00 4.081 0.000135 ***
AIRFLOW -2.553e+00 1.746e+00 -1.462 0.148892
POWER 4.257e-03 4.217e-03 1.009 0.316804
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 458.8 on 60 degrees of freedom
Multiple R-squared: 0.9248, Adjusted R-squared: 0.9173
F-statistic: 123 on 6 and 60 DF, p-value: < 2.2e-16
#store and round the VIF values
gasmod1vif<-round(vif(gasmod1),3)
gasmod1vif
RPM CPRATIO INLET.TEMP EXH.TEMP AIRFLOW POWER
4.015 5.213 13.852 7.351 49.136 49.765
#find the average VIF
mean(gasmod1vif)
[1] 21.55533
Yes, there is evidence of severe multicollinearity because several VIFs are much greater than 10 (inlet temp, air flow, and power) and the average VIF is greater than 3 at 21.55.
Because we have quite a few variables and severe multicollinearity, we need to address that. It is not clear from EDA what variables should remain and which variables should be removed.
We will use variable selection procedures to narrow down our quantitative variables to a best set of predictors. We will use the entry and remain significance levels of 0.15
# Note: Details=T gives all of the steps within each process
# In most cases, you can set Details=F
# backwards elimination
#Default: prem = 0.3
ols_step_backward_p(gasmod1,p_rem=0.15,details=T)
Backward Elimination Method
---------------------------
Candidate Terms:
1. RPM
2. CPRATIO
3. INLET.TEMP
4. EXH.TEMP
5. AIRFLOW
6. POWER
Step => 0
Model => HEATRATE ~ RPM + CPRATIO + INLET.TEMP + EXH.TEMP + AIRFLOW + POWER
R2 => 0.925
Initiating stepwise selection...
Step => 1
Removed => CPRATIO
Model => HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + AIRFLOW + POWER
R2 => 0.92473
Step => 2
Removed => POWER
Model => HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + AIRFLOW
R2 => 0.92351
No more variables to be removed.
Variables Removed:
=> CPRATIO
=> POWER
Stepwise Summary
------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
------------------------------------------------------------------------
0 Full Model 1019.966 1037.604 NA 0.92479 0.91727
1 CPRATIO 1018.022 1033.455 NA 0.92473 0.91856
2 POWER 1017.095 1030.323 NA 0.92351 0.91858
------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
--------------------------------------------------------------------
R 0.961 RMSE 437.803
R-Squared 0.924 MSE 207128.472
Adj. R-Squared 0.919 Coef. Var 4.113
Pred R-Squared 0.908 AIC 1017.095
MAE 340.078 SBC 1030.323
--------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------------
Regression 155055243.172 4 38763810.793 187.149 0.0000
Residual 12841965.276 62 207128.472
Total 167897208.448 66
---------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------------
(Intercept) 13617.924 813.306 16.744 0.000 11992.148 15243.700
RPM 0.089 0.013 0.391 6.608 0.000 0.062 0.116
INLET.TEMP -9.186 0.770 -0.791 -11.923 0.000 -10.726 -7.646
EXH.TEMP 14.363 2.260 0.397 6.356 0.000 9.846 18.880
AIRFLOW -0.848 0.437 -0.120 -1.939 0.057 -1.721 0.026
--------------------------------------------------------------------------------------------------
# forward selection
#default: penter = 0.3
ols_step_forward_p(gasmod1,p_enter=0.15,details=T)
Forward Selection Method
------------------------
Candidate Terms:
1. RPM
2. CPRATIO
3. INLET.TEMP
4. EXH.TEMP
5. AIRFLOW
6. POWER
Step => 0
Model => HEATRATE ~ 1
R2 => 0
Initiating stepwise selection...
Selection Metrics Table
-----------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
-----------------------------------------------------------------
RPM 0.00000 0.712 0.708 1099.849
INLET.TEMP 0.00000 0.641 0.635 1114.713
CPRATIO 0.00000 0.540 0.533 1131.347
AIRFLOW 0.00000 0.494 0.487 1137.639
POWER 0.00000 0.486 0.478 1138.758
EXH.TEMP 0.00959 0.099 0.085 1176.358
-----------------------------------------------------------------
Step => 1
Selected => RPM
Model => HEATRATE ~ RPM
R2 => 0.712
Selection Metrics Table
-----------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
-----------------------------------------------------------------
INLET.TEMP 0.00000 0.873 0.869 1047.326
CPRATIO 0.00000 0.848 0.843 1059.194
POWER 0.00048 0.763 0.755 1088.995
EXH.TEMP 0.00860 0.742 0.734 1094.565
AIRFLOW 0.00991 0.741 0.733 1094.833
-----------------------------------------------------------------
Step => 2
Selected => INLET.TEMP
Model => HEATRATE ~ RPM + INLET.TEMP
R2 => 0.873
Selection Metrics Table
----------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------
EXH.TEMP 0.00000 0.919 0.915 1019.041
CPRATIO 6e-05 0.902 0.897 1032.019
AIRFLOW 0.44970 0.874 0.868 1048.714
POWER 0.46919 0.874 0.868 1048.764
----------------------------------------------------------------
Step => 3
Selected => EXH.TEMP
Model => HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP
R2 => 0.919
Selection Metrics Table
----------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------
AIRFLOW 0.05701 0.924 0.919 1017.095
POWER 0.11395 0.922 0.917 1018.319
CPRATIO 0.88511 0.919 0.914 1021.018
----------------------------------------------------------------
Step => 4
Selected => AIRFLOW
Model => HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + AIRFLOW
R2 => 0.924
Selection Metrics Table
----------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------
POWER 0.32494 0.925 0.919 1018.022
CPRATIO 0.99054 0.924 0.917 1019.095
----------------------------------------------------------------
No more variables to be added.
Variables Selected:
=> RPM
=> INLET.TEMP
=> EXH.TEMP
=> AIRFLOW
Stepwise Summary
------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
------------------------------------------------------------------------
0 Base Model 1181.327 1185.737 NA 0.00000 0.00000
1 RPM 1099.849 1106.463 NA 0.71233 0.70791
2 INLET.TEMP 1047.326 1056.145 NA 0.87251 0.86853
3 EXH.TEMP 1019.041 1030.064 NA 0.91887 0.91501
4 AIRFLOW 1017.095 1030.323 NA 0.92351 0.91858
------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
--------------------------------------------------------------------
R 0.961 RMSE 437.803
R-Squared 0.924 MSE 207128.472
Adj. R-Squared 0.919 Coef. Var 4.113
Pred R-Squared 0.908 AIC 1017.095
MAE 340.078 SBC 1030.323
--------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------------
Regression 155055243.172 4 38763810.793 187.149 0.0000
Residual 12841965.276 62 207128.472
Total 167897208.448 66
---------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------------
(Intercept) 13617.924 813.306 16.744 0.000 11992.148 15243.700
RPM 0.089 0.013 0.391 6.608 0.000 0.062 0.116
INLET.TEMP -9.186 0.770 -0.791 -11.923 0.000 -10.726 -7.646
EXH.TEMP 14.363 2.260 0.397 6.356 0.000 9.846 18.880
AIRFLOW -0.848 0.437 -0.120 -1.939 0.057 -1.721 0.026
--------------------------------------------------------------------------------------------------
# stepwise regression
#Default: pent = 0.1, prem = 0.3
ols_step_both_p(gasmod1,p_ent=0.15,p_rem=0.15,details=T)
Stepwise Selection Method
-------------------------
Candidate Terms:
1. RPM
2. CPRATIO
3. INLET.TEMP
4. EXH.TEMP
5. AIRFLOW
6. POWER
Step => 0
Model => HEATRATE ~ 1
R2 => 0
Initiating stepwise selection...
Step => 1
Selected => RPM
Model => HEATRATE ~ RPM
R2 => 0.712
Step => 2
Selected => INLET.TEMP
Model => HEATRATE ~ RPM + INLET.TEMP
R2 => 0.873
Step => 3
Selected => EXH.TEMP
Model => HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP
R2 => 0.919
Step => 4
Selected => AIRFLOW
Model => HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + AIRFLOW
R2 => 0.924
No more variables to be added or removed.
Stepwise Summary
----------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
----------------------------------------------------------------------------
0 Base Model 1181.327 1185.737 NA 0.00000 0.00000
1 RPM (+) 1099.849 1106.463 NA 0.71233 0.70791
2 INLET.TEMP (+) 1047.326 1056.145 NA 0.87251 0.86853
3 EXH.TEMP (+) 1019.041 1030.064 NA 0.91887 0.91501
4 AIRFLOW (+) 1017.095 1030.323 NA 0.92351 0.91858
----------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
--------------------------------------------------------------------
R 0.961 RMSE 437.803
R-Squared 0.924 MSE 207128.472
Adj. R-Squared 0.919 Coef. Var 4.113
Pred R-Squared 0.908 AIC 1017.095
MAE 340.078 SBC 1030.323
--------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------------
Regression 155055243.172 4 38763810.793 187.149 0.0000
Residual 12841965.276 62 207128.472
Total 167897208.448 66
---------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------------
(Intercept) 13617.924 813.306 16.744 0.000 11992.148 15243.700
RPM 0.089 0.013 0.391 6.608 0.000 0.062 0.116
INLET.TEMP -9.186 0.770 -0.791 -11.923 0.000 -10.726 -7.646
EXH.TEMP 14.363 2.260 0.397 6.356 0.000 9.846 18.880
AIRFLOW -0.848 0.437 -0.120 -1.939 0.057 -1.721 0.026
--------------------------------------------------------------------------------------------------
#updated model
gasmod2<-lm(HEATRATE~.-ENGINE-SHAFTS-POWER-CPRATIO,data=gasturbine)
# Syntax Note: We can use the . to indicate all the variables in the data frame
# And use the - to exclude a particular variable from the model
summary(gasmod2)
Call:
lm(formula = HEATRATE ~ . - ENGINE - SHAFTS - POWER - CPRATIO,
data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-1007.7 -290.5 -106.0 240.1 1414.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.362e+04 8.133e+02 16.744 < 2e-16 ***
RPM 8.882e-02 1.344e-02 6.608 1.02e-08 ***
INLET.TEMP -9.186e+00 7.704e-01 -11.923 < 2e-16 ***
EXH.TEMP 1.436e+01 2.260e+00 6.356 2.76e-08 ***
AIRFLOW -8.475e-01 4.370e-01 -1.939 0.057 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 455.1 on 62 degrees of freedom
Multiple R-squared: 0.9235, Adjusted R-squared: 0.9186
F-statistic: 187.1 on 4 and 62 DF, p-value: < 2.2e-16
gasmod2vif<-round(vif(gasmod2),3)
gasmod2vif
RPM INLET.TEMP EXH.TEMP AIRFLOW
2.840 3.572 3.170 3.128
mean(gasmod2vif)
[1] 3.1775
The average VIF is slightly greater than 3, and there are not individual VIFs greater than 10. We conclude this is no longer a severe issue and we can continue with our analysis. Moving forward, we might consider adding the remaining qualitative variables, interactions, and higher order (or variable transformations). We then would go forward and assess the model.
Reminder we looked at multicollinearity and variable screening to end up with a starting subset of predictors of: RPM, INLET.TEMP,EXH.TEMP, and AIRFLOW.
#updated model
gasmod2<-lm(HEATRATE~.-ENGINE-SHAFTS-POWER-CPRATIO,data=gasturbine)
# Syntax Note: We can use the . to indicate all the variables in the data frame
# And use the - to exclude a particular variable from the model
summary(gasmod2)
Call:
lm(formula = HEATRATE ~ . - ENGINE - SHAFTS - POWER - CPRATIO,
data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-1007.7 -290.5 -106.0 240.1 1414.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.362e+04 8.133e+02 16.744 < 2e-16 ***
RPM 8.882e-02 1.344e-02 6.608 1.02e-08 ***
INLET.TEMP -9.186e+00 7.704e-01 -11.923 < 2e-16 ***
EXH.TEMP 1.436e+01 2.260e+00 6.356 2.76e-08 ***
AIRFLOW -8.475e-01 4.370e-01 -1.939 0.057 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 455.1 on 62 degrees of freedom
Multiple R-squared: 0.9235, Adjusted R-squared: 0.9186
F-statistic: 187.1 on 4 and 62 DF, p-value: < 2.2e-16
Try adding qualitative variable Engine
#updated model
gasmod3<-lm(HEATRATE~.-SHAFTS-POWER-CPRATIO,data=gasturbine)
# Syntax Note: We can use the . to indicate all the variables in the data frame
# And use the - to exclude a particular variable from the model
summary(gasmod3)
Call:
lm(formula = HEATRATE ~ . - SHAFTS - POWER - CPRATIO, data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-1002.38 -257.50 -60.08 252.72 1397.12
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.537e+04 1.284e+03 11.968 < 2e-16 ***
ENGINEAeroderiv -4.245e+02 2.806e+02 -1.513 0.1356
ENGINETraditional -3.772e+02 2.183e+02 -1.728 0.0891 .
RPM 9.065e-02 1.459e-02 6.214 5.38e-08 ***
INLET.TEMP -9.908e+00 9.161e-01 -10.816 1.01e-15 ***
EXH.TEMP 1.314e+01 2.377e+00 5.530 7.36e-07 ***
AIRFLOW -8.534e-01 4.338e-01 -1.967 0.0538 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 450.8 on 60 degrees of freedom
Multiple R-squared: 0.9274, Adjusted R-squared: 0.9201
F-statistic: 127.7 on 6 and 60 DF, p-value: < 2.2e-16
Perform nested F test to determine if Engine type is significant.
#reduced model
redgasmod3<-lm(HEATRATE~.-ENGINE-SHAFTS-POWER-CPRATIO,data=gasturbine)
#anova(reduced,complete)
anova(redgasmod3,gasmod3)
For the sake of this example, we will not consider any further testing.
We have covered this individually and will come back to it in a complete example at the end of the unit.
We have covered this individually and will come back to it in a complete example at the end of the unit.
First we will evaluate model assumptions
#There are a few options to view plots for assumptions
#Residuals Plots of explanatory variables vs residuals
residualPlots(redgasmod3,tests=F)
#Residual vs Fitted and QQ plot
plot(redgasmod3, which=c(1,2))
#histogram of residuals
hist(residuals(redgasmod3))
Then we will identify outliers and influential observations.
#Cooks Distance Thresholds
plot(redgasmod3,which=4)
#Leverage vs Studentized Residuals
influencePlot(redgasmod3,fill=F)
#Deleted Studentized Residuals vs PRedicted values
ols_plot_resid_stud_fit(redgasmod3)
#We can use various functions to store and view these statistics for all observations
rstudent(redgasmod3)
1 2 3 4 5
0.8747974427 1.5560964778 -0.7996856609 -1.7098460010 -0.5940324181
6 7 8 9 10
-0.3412340709 -0.2931152339 0.4972900835 -0.2354706794 -0.3311493834
11 12 13 14 15
3.4522455825 1.9210862780 -0.5410136060 -0.1288260698 0.8352669389
16 17 18 19 20
-0.6395357114 -0.7796092939 -0.8579938744 -0.4185245333 0.3666865968
21 22 23 24 25
1.1206603654 0.3672529549 -0.4375840299 -0.6539765902 -0.1689423085
26 27 28 29 30
-0.1441743577 -0.5847991132 0.2047194030 0.0464239138 -0.9006937286
31 32 33 34 35
-0.9227536301 -2.3783545874 0.3188332240 -0.1362454337 -1.2230894198
36 37 38 39 40
2.5890770233 0.2898046600 0.3710733462 -1.6844836671 -0.8878650471
41 42 43 44 45
-0.1370180599 -0.3583036352 -0.6634539981 -0.2406050715 1.9256194391
46 47 48 49 50
-0.0006913961 2.5280837055 0.8806886673 -0.4450685015 0.8795627566
51 52 53 54 55
0.0180834913 1.2394007710 1.1312075637 -0.4108297552 0.5718074367
56 57 58 59 60
-0.6844366633 -0.2384435371 -0.7815297552 0.5834226891 -0.7829079484
61 62 63 64 65
1.2699005058 -1.1968672345 -0.8411249929 0.7749438168 -0.6789274575
66 67
0.4408135249 -0.6462413281
hatvalues(redgasmod3)
1 2 3 4 5 6 7
0.18516671 0.09055176 0.05058714 0.04222701 0.04740169 0.02442905 0.02753493
8 9 10 11 12 13 14
0.03680293 0.03666050 0.06343149 0.04641455 0.05362948 0.08048917 0.04411916
15 16 17 18 19 20 21
0.09715741 0.06230112 0.05045614 0.05908423 0.04799477 0.04943415 0.09104218
22 23 24 25 26 27 28
0.02774620 0.03464087 0.04124602 0.02928731 0.06436712 0.06151213 0.11866326
29 30 31 32 33 34 35
0.10176951 0.03312873 0.03305905 0.06825286 0.06162853 0.04263042 0.04149892
36 37 38 39 40 41 42
0.15868147 0.05313439 0.09215184 0.04256770 0.06449028 0.08767073 0.05038647
43 44 45 46 47 48 49
0.09087757 0.07303413 0.15265733 0.09438350 0.08759824 0.07018133 0.05939823
50 51 52 53 54 55 56
0.07612455 0.07252967 0.13584379 0.05570074 0.08920490 0.03044303 0.03783830
57 58 59 60 61 62 63
0.04271859 0.05120118 0.07468712 0.06875776 0.28843679 0.19696150 0.05017137
64 65 66 67
0.27321053 0.11685791 0.07469814 0.04105443
cooks.distance(redgasmod3)
1 2 3 4 5 6
3.491295e-02 4.713870e-02 6.854665e-03 2.500359e-02 3.548878e-03 5.915837e-04
7 8 9 10 11 12
4.938184e-04 1.913029e-03 4.285386e-04 1.507041e-03 9.864721e-02 4.008823e-02
13 14 15 16 17 18
5.183343e-03 1.556699e-04 1.508924e-02 5.487211e-03 6.500385e-03 9.284774e-03
19 20 21 22 23 24
1.789958e-03 1.418308e-03 2.505464e-02 7.807054e-04 1.392366e-03 3.714128e-03
25 26 27 28 29 30
1.749663e-04 2.905886e-04 4.531159e-03 1.146267e-03 4.963521e-05 5.576280e-03
31 32 33 34 35 36
5.836246e-03 7.708231e-02 1.354886e-03 1.679746e-04 1.285081e-02 2.315621e-01
37 38 39 40 41 42
9.567366e-04 2.834803e-03 2.450483e-02 1.090574e-02 3.666206e-04 1.381810e-03
43 44 45 46 47 48
8.880251e-03 9.262997e-04 1.280157e-01 1.012739e-08 1.129043e-01 1.175094e-02
49 50 51 52 53 54
2.534576e-03 1.279568e-02 5.198401e-06 4.788085e-02 1.502836e-02 3.351071e-03
55 56 57 58 59 60
2.075794e-03 3.716375e-03 5.152710e-04 6.633792e-03 5.553912e-03 9.108151e-03
61 62 63 64 65 66
1.294602e-01 6.978273e-02 7.509579e-03 4.544294e-02 1.230543e-02 3.178680e-03
67
3.609809e-03
dffits(redgasmod3)
1 2 3 4 5
0.4170177822 0.4910163704 -0.1845915466 -0.3590218226 -0.1325110636
6 7 8 9 10
-0.0539978261 -0.0493223042 0.0972060771 -0.0459352749 -0.0861800315
11 12 13 14 15
0.7616385631 0.4573177718 -0.1600657725 -0.0276767751 0.2740042045
16 17 18 19 20
-0.1648469880 -0.1797117702 -0.2150030447 -0.0939719424 0.0836214074
21 22 23 24 25
0.3546691948 0.0620407828 -0.0828917707 -0.1356437040 -0.0293449126
26 27 28 29 30
-0.0378152804 -0.1497175656 0.0751183771 0.0156263192 -0.1667228743
31 32 33 34 35
-0.1706204055 -0.6437066706 0.0817084707 -0.0287502728 -0.2544956112
36 37 38 39 40
1.1244181724 0.0686513425 0.1182238653 -0.3551835576 -0.2331148430
41 42 43 44 45
-0.0424746021 -0.0825343464 -0.2097625635 -0.0675360767 0.8173343575
46 47 48 49 50
-0.0002232044 0.7833323891 0.2419547113 -0.1118434960 0.2524773824
51 52 53 54 55
0.0050569669 0.4914004305 0.2747374976 -0.1285718521 0.1013227680
56 57 58 59 60
-0.1357297558 -0.0503702616 -0.1815509201 0.1657531823 -0.2127356355
61 62 63 64 65
0.8085153666 -0.5927457707 -0.1933151549 0.4751323690 -0.2469659817
66 67
0.1252472215 -0.1337142764
dfbetas(redgasmod3)
(Intercept) RPM INLET.TEMP EXH.TEMP AIRFLOW
1 -2.279223e-01 1.957424e-01 -1.714323e-02 1.546567e-01 -1.703446e-02
2 4.051086e-01 1.641254e-01 1.763526e-02 -3.046695e-01 1.553495e-01
3 5.328342e-02 -9.167441e-02 -7.190908e-02 1.855818e-02 3.730057e-02
4 -2.298860e-01 1.281016e-02 2.404519e-02 1.222945e-01 3.582115e-02
5 7.090661e-02 1.586245e-02 8.533875e-03 -6.592543e-02 8.135270e-02
6 -1.074914e-02 1.733130e-02 -1.113851e-02 1.021559e-02 2.050504e-02
7 -1.729453e-02 4.882113e-03 -1.901343e-02 2.297319e-02 8.291715e-03
8 6.892439e-02 -1.309476e-02 -2.380980e-03 -4.298349e-02 2.920960e-02
9 -2.384423e-02 1.582593e-02 2.055320e-02 -9.380213e-04 -1.393464e-02
10 -5.215280e-02 -1.201412e-02 1.712361e-02 2.701556e-02 -6.443648e-02
11 -6.310142e-03 2.396598e-01 -1.793847e-01 1.394825e-01 -4.139894e-02
12 -1.263191e-02 -2.386408e-01 -2.793844e-01 2.709710e-01 -2.384336e-01
13 1.075038e-01 -3.471785e-02 -5.779328e-02 -3.575671e-02 7.810913e-02
14 -1.403894e-02 7.576035e-03 -2.217761e-03 8.828123e-03 9.381602e-03
15 1.371472e-01 -1.717714e-01 -2.002165e-01 8.220513e-02 -4.553760e-02
16 3.302202e-02 1.321499e-01 8.525896e-02 -1.111006e-01 1.075346e-01
17 -3.523563e-02 1.207868e-01 1.319771e-01 -9.251472e-02 2.256074e-02
18 -6.259175e-02 9.740242e-02 1.603259e-01 -8.742298e-02 -4.892268e-02
19 -2.612367e-02 6.633212e-03 3.814709e-02 -9.256823e-03 -5.423808e-02
20 -1.137406e-02 3.130572e-02 5.304719e-02 -3.056345e-02 -2.533875e-02
21 -5.406246e-02 -2.970979e-01 -1.106295e-01 1.774639e-01 -2.820510e-01
22 1.580971e-02 -3.234500e-02 -2.955686e-03 -1.044288e-03 -2.631145e-02
23 -4.026990e-02 2.925503e-02 4.101491e-02 -6.763031e-03 -2.338916e-02
24 -2.088700e-02 7.622221e-02 9.300848e-02 -6.593646e-02 -3.339925e-03
25 -1.557716e-02 4.125643e-03 4.827110e-03 6.531960e-03 -1.175689e-02
26 -2.668523e-02 1.525396e-03 1.404215e-02 8.497781e-03 -2.354488e-02
27 -7.031641e-02 1.483705e-02 7.699529e-02 -7.210947e-03 -9.022251e-02
28 1.826501e-02 2.663362e-02 -3.881407e-02 1.354813e-02 2.231275e-02
29 1.310619e-02 5.340019e-03 -1.303871e-03 -8.565375e-03 5.997017e-03
30 -1.255556e-02 -1.628429e-02 3.662579e-02 -2.602052e-02 3.988996e-02
31 -1.533375e-02 -5.853538e-02 -3.706078e-02 3.538696e-02 3.146085e-02
32 -1.951174e-01 -3.749007e-01 -4.055211e-01 4.590853e-01 -5.068469e-02
33 5.296540e-02 -4.558618e-04 2.263107e-02 -4.981505e-02 -8.744398e-03
34 -1.166616e-02 4.563944e-03 -7.686496e-03 1.145432e-02 1.054809e-02
35 -8.877626e-02 -5.853790e-02 -1.262739e-01 1.486379e-01 4.230838e-02
36 2.151594e-01 -8.028265e-01 -9.902407e-01 7.095892e-01 -2.819409e-01
37 3.479171e-02 -3.456257e-02 -4.723973e-02 1.600133e-02 9.019337e-03
38 9.345339e-02 -1.616277e-02 -5.802030e-02 -2.118574e-02 6.274143e-02
39 -2.023247e-01 7.164835e-03 -4.998667e-02 1.572170e-01 6.491780e-02
40 1.015845e-01 5.409207e-02 -1.260661e-01 1.891516e-03 1.601236e-01
41 2.967446e-02 2.513911e-02 4.987646e-03 -2.949294e-02 3.128177e-02
42 5.060180e-02 2.436244e-02 9.652842e-03 -4.654654e-02 1.209514e-02
43 1.607071e-01 7.481794e-02 2.995105e-02 -1.469370e-01 6.740431e-02
44 1.218980e-02 -9.061398e-03 9.873985e-03 -1.290838e-02 -3.699955e-02
45 8.082375e-02 4.844164e-01 5.031571e-01 -5.088269e-01 5.267458e-01
46 9.039503e-05 -5.697722e-05 -3.956386e-05 -2.167776e-05 -9.525738e-05
47 -4.395842e-01 -3.259929e-01 2.277366e-01 2.277862e-01 -6.650165e-01
48 -1.356079e-01 -9.108045e-02 7.774062e-02 6.348874e-02 -1.952836e-01
49 7.499536e-02 3.328899e-02 1.209338e-02 -6.687915e-02 2.125665e-02
50 -9.344513e-02 8.676330e-02 1.375736e-01 -4.954967e-02 7.664740e-02
51 -2.482939e-04 1.638301e-03 5.911439e-04 -6.125042e-04 3.357212e-03
52 -1.845437e-04 2.724614e-01 2.243131e-01 -2.147092e-01 3.408524e-01
53 -4.474193e-02 -1.364584e-02 1.972718e-01 -9.954898e-02 -1.145640e-01
54 9.333002e-02 6.713827e-02 -4.523703e-03 -7.641058e-02 9.796458e-02
55 2.831006e-03 1.272207e-02 4.103671e-02 -3.308989e-02 3.018857e-02
56 5.606741e-02 -2.217269e-03 -5.264625e-02 -1.607383e-03 -4.903711e-03
57 2.827113e-02 5.395006e-03 -1.008651e-02 -1.357048e-02 2.960642e-03
58 4.125379e-02 4.983561e-02 8.206200e-02 -9.466783e-02 -3.937500e-02
59 -6.299498e-02 5.468502e-02 9.553217e-02 -3.448811e-02 4.356264e-02
60 3.272354e-03 -5.313130e-02 5.875525e-03 6.155496e-03 -1.425210e-01
61 1.622022e-01 6.287909e-01 -7.737918e-02 -1.387786e-01 4.375216e-01
62 9.743448e-02 -3.875700e-01 8.006443e-02 -8.228089e-02 -1.681696e-01
63 2.524545e-02 -8.294340e-02 1.213881e-02 -2.426282e-02 1.233960e-02
64 1.699380e-01 1.314209e-01 3.868923e-01 -4.113384e-01 1.441854e-02
65 -1.208286e-01 1.624592e-02 -1.325716e-01 1.707134e-01 4.451658e-02
66 -7.411863e-02 3.233025e-02 5.996087e-02 9.769816e-03 -5.992088e-02
67 3.251465e-02 1.728521e-02 4.640075e-02 -6.703951e-02 6.052772e-02
#We can subset by the observations of interest
dffits(redgasmod3)[c(11,36,61,64)]
11 36 61 64
0.7616386 1.1244182 0.8085154 0.4751324
dfbetas(redgasmod3)[c(11,36,61,64),]
(Intercept) RPM INLET.TEMP EXH.TEMP AIRFLOW
11 -0.006310142 0.2396598 -0.17938469 0.1394825 -0.04139894
36 0.215159421 -0.8028265 -0.99024074 0.7095892 -0.28194088
61 0.162202163 0.6287909 -0.07737918 -0.1387786 0.43752165
64 0.169938038 0.1314209 0.38689228 -0.4113384 0.01441854
summary(dfbetas(redgasmod3))
(Intercept) RPM INLET.TEMP
Min. :-0.4395842 Min. :-0.802826 Min. :-0.9902407
1st Qu.:-0.0377528 1st Qu.:-0.016223 1st Qu.:-0.0379374
Median :-0.0001845 Median : 0.007165 Median : 0.0085339
Mean : 0.0012984 Mean : 0.001273 Mean :-0.0004318
3rd Qu.: 0.0531244 3rd Qu.: 0.041562 3rd Qu.: 0.0497240
Max. : 0.4051086 Max. : 0.628791 Max. : 0.5031571
EXH.TEMP AIRFLOW
Min. :-0.5088269 Min. :-0.665017
1st Qu.:-0.0578702 1st Qu.:-0.038187
Median :-0.0072109 Median : 0.009019
Mean :-0.0006093 Mean : 0.001198
3rd Qu.: 0.0172798 3rd Qu.: 0.041099
Max. : 0.7095892 Max. : 0.526746
Variable transformation: We can directly use a function of the response variable within lm() or add a new variable to the table
# directly transform in the lm() function
gasmodsqrt<-lm(sqrt(HEATRATE)~.-ENGINE-SHAFTS-POWER-CPRATIO,data=gasturbine)
summary(gasmodsqrt)
Call:
lm(formula = sqrt(HEATRATE) ~ . - ENGINE - SHAFTS - POWER - CPRATIO,
data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-4.3749 -1.3484 -0.3872 1.1073 6.1302
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.172e+02 3.616e+00 32.409 < 2e-16 ***
RPM 3.631e-04 5.976e-05 6.076 8.28e-08 ***
INLET.TEMP -4.344e-02 3.425e-03 -12.683 < 2e-16 ***
EXH.TEMP 6.904e-02 1.005e-02 6.872 3.58e-09 ***
AIRFLOW -5.240e-03 1.943e-03 -2.697 0.009 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.023 on 62 degrees of freedom
Multiple R-squared: 0.9289, Adjusted R-squared: 0.9243
F-statistic: 202.6 on 4 and 62 DF, p-value: < 2.2e-16
# OR Create new variable
gasturbine$sqrty<-sqrt(gasturbine$HEATRATE)
#remember to remove original response heatrate
gasmodsqrt2<-lm(sqrty~.-ENGINE-SHAFTS-POWER-CPRATIO-HEATRATE,data=gasturbine)
summary(gasmodsqrt2)
Call:
lm(formula = sqrty ~ . - ENGINE - SHAFTS - POWER - CPRATIO -
HEATRATE, data = gasturbine)
Residuals:
Min 1Q Median 3Q Max
-4.3749 -1.3484 -0.3872 1.1073 6.1302
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.172e+02 3.616e+00 32.409 < 2e-16 ***
RPM 3.631e-04 5.976e-05 6.076 8.28e-08 ***
INLET.TEMP -4.344e-02 3.425e-03 -12.683 < 2e-16 ***
EXH.TEMP 6.904e-02 1.005e-02 6.872 3.58e-09 ***
AIRFLOW -5.240e-03 1.943e-03 -2.697 0.009 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.023 on 62 degrees of freedom
Multiple R-squared: 0.9289, Adjusted R-squared: 0.9243
F-statistic: 202.6 on 4 and 62 DF, p-value: < 2.2e-16
Removing observations from analysis by subsetting the data
#We can remove particular observations
subsetgas<-gasturbine[-c(11,36,61,64),]
gasmod2obs<-lm(HEATRATE~.-ENGINE-SHAFTS-POWER-CPRATIO-sqrty,data=subsetgas)
# Syntax Note: We can use the . to indicate all the variables in the data frame
# And use the - to exclude a particular variable from the model
summary(gasmod2obs)
Call:
lm(formula = HEATRATE ~ . - ENGINE - SHAFTS - POWER - CPRATIO -
sqrty, data = subsetgas)
Residuals:
Min 1Q Median 3Q Max
-957.18 -214.19 -93.73 233.18 1030.40
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.325e+04 7.218e+02 18.361 < 2e-16 ***
RPM 8.526e-02 1.370e-02 6.226 5.77e-08 ***
INLET.TEMP -8.347e+00 8.048e-01 -10.372 7.88e-15 ***
EXH.TEMP 1.320e+01 2.316e+00 5.698 4.26e-07 ***
AIRFLOW -9.429e-01 4.004e-01 -2.355 0.0219 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 391.6 on 58 degrees of freedom
Multiple R-squared: 0.9231, Adjusted R-squared: 0.9178
F-statistic: 174 on 4 and 58 DF, p-value: < 2.2e-16