During periods of high electricity demand, especially during the hot summer months, the power output from a gas turbine engine can drop dramatically. One way to counter this drop in power is by cooling the inlet air to the gas turbine. An increasingly popular cooling method uses high pressure inlet fogging. The performance of a sample of 67 gas turbines augmented with high pressure inlet fogging was investigated in the Journal of Engineering for Gas Turbines and Power (January 2005). One measure of performance is heat rate (kilojoules per kilowatt per hour). Heat rates for the 67 gas turbines, saved in the gasturbine file.

Unit 3.1: Multicollinearity and Variable Screening

Step 1: Collect the Data

Check the appropriateness of response variable for regression: View a histogram of response variable. It should be continuous, and approximately unimodal and symmetric, with few outliers.

gasturbine<-read.delim("https://raw.githubusercontent.com/kvaranyak4/STAT3220/main/GASTURBINE.txt")
head(gasturbine)
names(gasturbine)
[1] "ENGINE"     "SHAFTS"     "RPM"        "CPRATIO"    "INLET.TEMP"
[6] "EXH.TEMP"   "AIRFLOW"    "POWER"      "HEATRATE"  
hist(gasturbine$HEATRATE, xlab="Heat Rate", main="Histogram of Heat Rate") 

The distribution of the response variable, heat rate, is unimodal and skewed right. It is continuous, so it should still be suitable for regression.

Step 2: Hypothesize Relationship (Exploratory Data Analysis)

We will explore the relationship with quantitative variables with scatter plots and correlations and classify each relationship as linear, curvilinear, or none. We explore the box plots and means for each qualitative variable explanatory variable then classify the relationships as existent or not. We will not explore interactions in this example.

#Scatter plots for quantitative variables
for (i in names(gasturbine)[3:8]) {
  plot((gasturbine[,i]), gasturbine$HEATRATE,xlab=i,ylab="Heat Rate")
}

#Attempt Transformations
plot(log(gasturbine$RPM), gasturbine$HEATRATE,ylab="Heat Rate")

plot(log(gasturbine$CPRATIO), gasturbine$HEATRATE,ylab="Heat Rate")

plot(log(gasturbine$POWER), gasturbine$HEATRATE,ylab="Heat Rate")

plot(log(gasturbine$AIRFLOW), gasturbine$HEATRATE,ylab="Heat Rate")

#Correlations for quantitative variables
round(cor(gasturbine[3:8],gasturbine$HEATRATE,use="complete.obs"),3)
             [,1]
RPM         0.844
CPRATIO    -0.735
INLET.TEMP -0.801
EXH.TEMP   -0.314
AIRFLOW    -0.703
POWER      -0.697
round(cor(log(gasturbine[3:8]),gasturbine$HEATRATE,use="complete.obs"),3)
             [,1]
RPM         0.810
CPRATIO    -0.818
INLET.TEMP -0.813
EXH.TEMP   -0.303
AIRFLOW    -0.840
POWER      -0.871
#Summary Statistics for response variable grouped by each level of the response
tapply(gasturbine$HEATRATE,gasturbine$ENGINE,summary)
$Advanced
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   9105    9295    9669    9764    9933   11588 

$Aeroderiv
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   8714   10708   12414   12312   13697   16243 

$Traditional
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10086   10598   11183   11544   11956   14796 
tapply(gasturbine$HEATRATE,gasturbine$SHAFTS,summary)
$`1`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   9105    9918   10592   10930   11674   14796 

$`2`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10951   11223   11654   12536   13232   16243 

$`3`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   8714    8903    9092    9092    9280    9469 
#Box plots for Qualitative
boxplot(HEATRATE~ENGINE,gasturbine, ylab="Heat Rate")

boxplot(HEATRATE~SHAFTS,gasturbine, ylab="Heat Rate")

# Summary counts for qualitative variables
table(gasturbine$ENGINE,gasturbine$SHAFTS)
             
               1  2  3
  Advanced    21  0  0
  Aeroderiv    1  4  2
  Traditional 35  4  0
  • Most of the variables have high correlations. Exhaust temperature has the weakest correlation.
  • CPRATIO, POWER and AIRFLOW should have a log transformation because their correlations with HEATRATE were much higher, indicating a stronger linear form with that transformation.
  • Engine type appears to have a different average heat rate for the advanced group.
  • Note: Shafts=3 only has two observations. Further, we would not be able to fit an interaction between Engine and Shaft because some level combinations do not have any observations.

Step 2 (continued): Multicollinearity

Do any of the explanatory variables have relationships with each other? We will look at pairwise correlations and VIF to evaluate multicollinearity in the quantitative explanatory variables.

#Store the transformations in the data table
gasturbine$log.CPRATIO<-log(gasturbine$CPRATIO)
gasturbine$log.AIRFLOW<-log(gasturbine$AIRFLOW)
gasturbine$log.POWER<-log(gasturbine$POWER)

#Regular correlation
# Use the correlation function with no second argument
# to find the correlations with the explanatory variables with each other

gasturcor<-round(cor(gasturbine[,c(3,5,6,10:12)]),4)
gasturcor
                RPM INLET.TEMP EXH.TEMP log.CPRATIO log.AIRFLOW log.POWER
RPM          1.0000    -0.5536  -0.1715     -0.5819     -0.9177   -0.9083
INLET.TEMP  -0.5536     1.0000   0.7283      0.7440      0.6649    0.7221
EXH.TEMP    -0.1715     0.7283   1.0000      0.1712      0.4068    0.4515
log.CPRATIO -0.5819     0.7440   0.1712      1.0000      0.5308    0.5757
log.AIRFLOW -0.9177     0.6649   0.4068      0.5308      1.0000    0.9964
log.POWER   -0.9083     0.7221   0.4515      0.5757      0.9964    1.0000
#Correlation Visualization
# The argument is the stored correlation values from the cor() function
corrplot(gasturcor)

# Scatter plot matrix
plot(gasturbine[c(3,5,6,10:12)])

#A new correlation function
# this will also output the associated p-values with Ho: rho=0
rcorr(as.matrix(gasturbine[,c(3,5,6,10:12)]))
              RPM INLET.TEMP EXH.TEMP log.CPRATIO log.AIRFLOW log.POWER
RPM          1.00      -0.55    -0.17       -0.58       -0.92     -0.91
INLET.TEMP  -0.55       1.00     0.73        0.74        0.66      0.72
EXH.TEMP    -0.17       0.73     1.00        0.17        0.41      0.45
log.CPRATIO -0.58       0.74     0.17        1.00        0.53      0.58
log.AIRFLOW -0.92       0.66     0.41        0.53        1.00      1.00
log.POWER   -0.91       0.72     0.45        0.58        1.00      1.00

n= 67 


P
            RPM    INLET.TEMP EXH.TEMP log.CPRATIO log.AIRFLOW log.POWER
RPM                0.0000     0.1653   0.0000      0.0000      0.0000   
INLET.TEMP  0.0000            0.0000   0.0000      0.0000      0.0000   
EXH.TEMP    0.1653 0.0000              0.1660      0.0006      0.0001   
log.CPRATIO 0.0000 0.0000     0.1660               0.0000      0.0000   
log.AIRFLOW 0.0000 0.0000     0.0006   0.0000                  0.0000   
log.POWER   0.0000 0.0000     0.0001   0.0000      0.0000               

There is concern of strong pairwise relationships with several pairs of variables.

  • log.Power & log.airflow (r=0.92)
  • log.Power & inlet_temp (r=0.72)
  • log.airflow & RPM (r=0.92)
#Multicollinearity VIF
# First fit the model with all of the quantitative variables
gasmod1<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP+log.CPRATIO+log.AIRFLOW+log.POWER,data=gasturbine)
summary(gasmod1)

Call:
lm(formula = HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + log.CPRATIO + 
    log.AIRFLOW + log.POWER, data = gasturbine)

Residuals:
    Min      1Q  Median      3Q     Max 
-337.73 -139.16  -60.72   75.76  735.19 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.312e+04  1.984e+03  21.737  < 2e-16 ***
RPM          6.700e-02  1.202e-02   5.575 6.22e-07 ***
INLET.TEMP   1.036e+00  9.487e-01   1.092    0.279    
EXH.TEMP     1.326e+01  1.599e+00   8.291 1.56e-11 ***
log.CPRATIO  1.578e+02  2.534e+02   0.623    0.536    
log.AIRFLOW  7.625e+03  5.632e+02  13.539  < 2e-16 ***
log.POWER   -7.433e+03  5.247e+02 -14.166  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 213.7 on 60 degrees of freedom
Multiple R-squared:  0.9837,    Adjusted R-squared:  0.9821 
F-statistic:   603 on 6 and 60 DF,  p-value: < 2.2e-16
#store and round the VIF values
gasmod1vif<-round(vif(gasmod1),3)
gasmod1vif
        RPM  INLET.TEMP    EXH.TEMP log.CPRATIO log.AIRFLOW   log.POWER 
     10.300      24.579       7.203       7.086    1085.142    1225.107 
#find the average VIF
mean(gasmod1vif)
[1] 393.2362

Yes, there is evidence of severe multicollinearity because all VIFs are MUCH greater than 10 and the average VIF is MUCH greater than 3 at 393.2362.

Step 2 (continued): Variable Screening

Because we have quite a few variables and severe multicollinearity, we need to address that. It is not clear from EDA what variables should remain and which variables should be removed.

We will use variable selection procedures to narrow down our quantitative variables to a best set of predictors. We will use the entry and remain significance levels of 0.15

# Note: Details=T gives all of the steps within each process
# In most cases, you can set Details=F

# backwards elimination
#Default: prem = 0.3
ols_step_backward_p(gasmod1,p_rem=0.15,details=T)
Backward Elimination Method 
---------------------------

Candidate Terms: 

1. RPM 
2. INLET.TEMP 
3. EXH.TEMP 
4. log.CPRATIO 
5. log.AIRFLOW 
6. log.POWER 


Step   => 0 
Model  => HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + log.CPRATIO + log.AIRFLOW + log.POWER 
R2     => 0.984 

Initiating stepwise selection... 

Step     => 1 
Removed  => log.CPRATIO 
Model    => HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + log.AIRFLOW + log.POWER 
R2       => 0.98358 


No more variables to be removed.

Variables Removed: 

=> log.CPRATIO 

                              Stepwise Summary                              
--------------------------------------------------------------------------
Step    Variable         AIC        SBC       SBIC        R2       Adj. R2 
--------------------------------------------------------------------------
 0      Full Model     917.568    935.205    729.036    0.98369    0.98206 
 1      log.CPRATIO    916.000    931.433    727.152    0.98358    0.98224 
--------------------------------------------------------------------------

Final Model Output 
------------------

                           Model Summary                            
-------------------------------------------------------------------
R                         0.992       RMSE                 202.837 
R-Squared                 0.984       MSE                41142.862 
Adj. R-Squared            0.982       Coef. Var              1.921 
Pred R-Squared            0.975       AIC                  916.000 
MAE                     152.893       SBC                  931.433 
-------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                   ANOVA                                    
---------------------------------------------------------------------------
                     Sum of                                                
                    Squares        DF     Mean Square       F         Sig. 
---------------------------------------------------------------------------
Regression    165140636.668         5    33028127.334    730.877    0.0000 
Residual        2756571.779        61       45189.701                      
Total         167897208.448        66                                      
---------------------------------------------------------------------------

                                       Parameter Estimates                                         
--------------------------------------------------------------------------------------------------
      model         Beta    Std. Error    Std. Beta       t        Sig         lower        upper 
--------------------------------------------------------------------------------------------------
(Intercept)    43134.882      1973.376                  21.858    0.000    39188.873    47080.890 
        RPM        0.066         0.012        0.291      5.568    0.000        0.042        0.090 
 INLET.TEMP        1.337         0.813        0.115      1.644    0.105       -0.289        2.962 
   EXH.TEMP       12.538         1.098        0.347     11.422    0.000       10.343       14.733 
log.AIRFLOW     7529.364       539.316        7.262     13.961    0.000     6450.935     8607.794 
  log.POWER    -7347.619       503.882       -8.082    -14.582    0.000    -8355.194    -6340.044 
--------------------------------------------------------------------------------------------------
# forward selection
#default: penter = 0.3
ols_step_forward_p(gasmod1,p_enter=0.15,details=T)
Forward Selection Method 
------------------------

Candidate Terms: 

1. RPM 
2. INLET.TEMP 
3. EXH.TEMP 
4. log.CPRATIO 
5. log.AIRFLOW 
6. log.POWER 


Step   => 0 
Model  => HEATRATE ~ 1 
R2     => 0 

Initiating stepwise selection... 

                     Selection Metrics Table                       
------------------------------------------------------------------
Predictor      Pr(>|t|)    R-Squared    Adj. R-Squared      AIC    
------------------------------------------------------------------
log.POWER       0.00000        0.758             0.755    1088.159 
RPM             0.00000        0.712             0.708    1099.849 
log.AIRFLOW     0.00000        0.705             0.701    1101.457 
log.CPRATIO     0.00000        0.670             0.665    1109.066 
INLET.TEMP      0.00000        0.641             0.635    1114.713 
EXH.TEMP        0.00959        0.099             0.085    1176.358 
------------------------------------------------------------------

Step      => 1 
Selected  => log.POWER 
Model     => HEATRATE ~ log.POWER 
R2        => 0.758 

                     Selection Metrics Table                       
------------------------------------------------------------------
Predictor      Pr(>|t|)    R-Squared    Adj. R-Squared      AIC    
------------------------------------------------------------------
log.CPRATIO     0.00000        0.909             0.906    1024.868 
log.AIRFLOW     0.00000        0.865             0.861    1051.171 
INLET.TEMP        2e-05        0.820             0.814    1070.432 
RPM             0.03660        0.774             0.767    1085.549 
EXH.TEMP        0.14832        0.766             0.759    1087.954 
------------------------------------------------------------------

Step      => 2 
Selected  => log.CPRATIO 
Model     => HEATRATE ~ log.POWER + log.CPRATIO 
R2        => 0.909 

                     Selection Metrics Table                       
------------------------------------------------------------------
Predictor      Pr(>|t|)    R-Squared    Adj. R-Squared      AIC    
------------------------------------------------------------------
log.AIRFLOW     0.00236        0.921             0.918    1016.961 
RPM             0.10842        0.913             0.908    1024.106 
EXH.TEMP        0.27423        0.911             0.906    1025.587 
INLET.TEMP      0.45619        0.910             0.905    1026.273 
------------------------------------------------------------------

Step      => 3 
Selected  => log.AIRFLOW 
Model     => HEATRATE ~ log.POWER + log.CPRATIO + log.AIRFLOW 
R2        => 0.921 

                    Selection Metrics Table                      
----------------------------------------------------------------
Predictor     Pr(>|t|)    R-Squared    Adj. R-Squared      AIC   
----------------------------------------------------------------
EXH.TEMP       0.00000        0.975             0.974    941.630 
RPM            0.00000        0.947             0.943    993.145 
INLET.TEMP     0.00000        0.944             0.941    995.852 
----------------------------------------------------------------

Step      => 4 
Selected  => EXH.TEMP 
Model     => HEATRATE ~ log.POWER + log.CPRATIO + log.AIRFLOW + EXH.TEMP 
R2        => 0.975 

                    Selection Metrics Table                      
----------------------------------------------------------------
Predictor     Pr(>|t|)    R-Squared    Adj. R-Squared      AIC   
----------------------------------------------------------------
RPM            0.00000        0.983             0.982    916.887 
INLET.TEMP     0.76662        0.975             0.973    943.533 
----------------------------------------------------------------

Step      => 5 
Selected  => RPM 
Model     => HEATRATE ~ log.POWER + log.CPRATIO + log.AIRFLOW + EXH.TEMP + RPM 
R2        => 0.983 

                    Selection Metrics Table                      
----------------------------------------------------------------
Predictor     Pr(>|t|)    R-Squared    Adj. R-Squared      AIC   
----------------------------------------------------------------
INLET.TEMP     0.27902        0.984             0.982    917.568 
----------------------------------------------------------------

Step      => 6 
Selected  => INLET.TEMP 
Model     => HEATRATE ~ log.POWER + log.CPRATIO + log.AIRFLOW + EXH.TEMP + RPM + INLET.TEMP 
R2        => 0.984 


Variables Selected: 

=> log.POWER 
=> log.CPRATIO 
=> log.AIRFLOW 
=> EXH.TEMP 
=> RPM 
=> INLET.TEMP 

                               Stepwise Summary                               
----------------------------------------------------------------------------
Step    Variable         AIC         SBC        SBIC        R2       Adj. R2 
----------------------------------------------------------------------------
 0      Base Model     1181.327    1185.737    987.298    0.00000    0.00000 
 1      log.POWER      1088.159    1094.773    892.613    0.75839    0.75467 
 2      log.CPRATIO    1024.868    1033.687    828.649    0.90882    0.90597 
 3      log.AIRFLOW    1016.961    1027.984    819.495    0.92135    0.91761 
 4      EXH.TEMP        941.630     954.858    748.697    0.97520    0.97360 
 5      RPM             916.887     932.320    727.870    0.98336    0.98200 
 6      INLET.TEMP      917.568     935.205    729.036    0.98369    0.98206 
----------------------------------------------------------------------------

Final Model Output 
------------------

                           Model Summary                            
-------------------------------------------------------------------
R                         0.992       RMSE                 202.184 
R-Squared                 0.984       MSE                40878.488 
Adj. R-Squared            0.982       Coef. Var              1.931 
Pred R-Squared            0.975       AIC                  917.568 
MAE                     154.185       SBC                  935.205 
-------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                   ANOVA                                    
---------------------------------------------------------------------------
                     Sum of                                                
                    Squares        DF     Mean Square       F         Sig. 
---------------------------------------------------------------------------
Regression    165158349.755         6    27526391.626    603.019    0.0000 
Residual        2738858.692        60       45647.645                      
Total         167897208.448        66                                      
---------------------------------------------------------------------------

                                       Parameter Estimates                                         
--------------------------------------------------------------------------------------------------
      model         Beta    Std. Error    Std. Beta       t        Sig         lower        upper 
--------------------------------------------------------------------------------------------------
(Intercept)    43116.228      1983.576                  21.737    0.000    39148.486    47083.970 
  log.POWER    -7433.155       524.714       -8.176    -14.166    0.000    -8482.740    -6383.570 
log.CPRATIO      157.821       253.354        0.027      0.623    0.536     -348.962      664.604 
log.AIRFLOW     7624.558       563.172        7.354     13.539    0.000     6498.047     8751.070 
   EXH.TEMP       13.259         1.599        0.367      8.291    0.000       10.060       16.457 
        RPM        0.067         0.012        0.295      5.575    0.000        0.043        0.091 
 INLET.TEMP        1.036         0.949        0.089      1.092    0.279       -0.861        2.934 
--------------------------------------------------------------------------------------------------
# stepwise regression
#Default: pent = 0.1, prem = 0.3
ols_step_both_p(gasmod1,p_ent=0.15,p_rem=0.15,details=T)
Stepwise Selection Method 
-------------------------

Candidate Terms: 

1. RPM 
2. INLET.TEMP 
3. EXH.TEMP 
4. log.CPRATIO 
5. log.AIRFLOW 
6. log.POWER 


Step   => 0 
Model  => HEATRATE ~ 1 
R2     => 0 

Initiating stepwise selection... 

Step      => 1 
Selected  => log.POWER 
Model     => HEATRATE ~ log.POWER 
R2        => 0.758 

Step      => 2 
Selected  => log.CPRATIO 
Model     => HEATRATE ~ log.POWER + log.CPRATIO 
R2        => 0.909 

Step      => 3 
Selected  => log.AIRFLOW 
Model     => HEATRATE ~ log.POWER + log.CPRATIO + log.AIRFLOW 
R2        => 0.921 

Step      => 4 
Selected  => EXH.TEMP 
Model     => HEATRATE ~ log.POWER + log.CPRATIO + log.AIRFLOW + EXH.TEMP 
R2        => 0.975 

Step     => 5 
Removed  => log.CPRATIO 
Model    => HEATRATE ~ log.POWER + log.AIRFLOW + EXH.TEMP 
  => 0.97519 

Step      => 6 
Selected  => RPM 
Model     => HEATRATE ~ log.POWER + log.AIRFLOW + EXH.TEMP + RPM 
R2        => 0.983 

Step      => 7 
Selected  => INLET.TEMP 
Model     => HEATRATE ~ log.POWER + log.AIRFLOW + EXH.TEMP + RPM + INLET.TEMP 
R2        => 0.984 

                                 Stepwise Summary                                 
--------------------------------------------------------------------------------
Step    Variable             AIC         SBC        SBIC        R2       Adj. R2 
--------------------------------------------------------------------------------
 0      Base Model         1181.327    1185.737    987.298    0.00000    0.00000 
 1      log.POWER (+)      1088.159    1094.773    892.613    0.75839    0.75467 
 2      log.CPRATIO (+)    1024.868    1033.687    828.649    0.90882    0.90597 
 3      log.AIRFLOW (+)    1016.961    1027.984    819.495    0.92135    0.91761 
 4      EXH.TEMP (+)        941.630     954.858    748.697    0.97520    0.97360 
 5      log.CPRATIO (-)     939.661     950.684    747.255    0.97519    0.97401 
 6      RPM (+)             916.905     930.133    727.384    0.98285    0.98175 
 7      INLET.TEMP (+)      916.000     931.433    727.152    0.98358    0.98224 
--------------------------------------------------------------------------------

Final Model Output 
------------------

                           Model Summary                            
-------------------------------------------------------------------
R                         0.992       RMSE                 202.837 
R-Squared                 0.984       MSE                41142.862 
Adj. R-Squared            0.982       Coef. Var              1.921 
Pred R-Squared            0.975       AIC                  916.000 
MAE                     152.893       SBC                  931.433 
-------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                   ANOVA                                    
---------------------------------------------------------------------------
                     Sum of                                                
                    Squares        DF     Mean Square       F         Sig. 
---------------------------------------------------------------------------
Regression    165140636.668         5    33028127.334    730.877    0.0000 
Residual        2756571.779        61       45189.701                      
Total         167897208.448        66                                      
---------------------------------------------------------------------------

                                       Parameter Estimates                                         
--------------------------------------------------------------------------------------------------
      model         Beta    Std. Error    Std. Beta       t        Sig         lower        upper 
--------------------------------------------------------------------------------------------------
(Intercept)    43134.882      1973.376                  21.858    0.000    39188.873    47080.890 
  log.POWER    -7347.619       503.882       -8.082    -14.582    0.000    -8355.194    -6340.044 
log.AIRFLOW     7529.364       539.316        7.262     13.961    0.000     6450.935     8607.794 
   EXH.TEMP       12.538         1.098        0.347     11.422    0.000       10.343       14.733 
        RPM        0.066         0.012        0.291      5.568    0.000        0.042        0.090 
 INLET.TEMP        1.337         0.813        0.115      1.644    0.105       -0.289        2.962 
--------------------------------------------------------------------------------------------------
  • Backwards elimination: rpm, inlet.temp, exh.temp, log.airflow, log.power
  • Forward Selection: log.power, log.cpratio, log.airflow, exh.temp, inlet.temp, rpm
  • Stepwise Regression: log.power, log.airflow, exh.temp, rpm, inlet.temp
  • In this instance the three procedures resulted in slightly different subsets of variables. This may not always happen.
  • Let’s use the stepwise set of predictors to fit a model and determine if our problem of severe multicollinearity has been resolved.
#updated model
gasmod2<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP+log.AIRFLOW+log.POWER,data=gasturbine)

summary(gasmod2)

Call:
lm(formula = HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + log.AIRFLOW + 
    log.POWER, data = gasturbine)

Residuals:
    Min      1Q  Median      3Q     Max 
-345.13 -130.99  -49.58   68.14  757.16 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.313e+04  1.973e+03  21.858  < 2e-16 ***
RPM          6.613e-02  1.188e-02   5.568 6.13e-07 ***
INLET.TEMP   1.337e+00  8.130e-01   1.644    0.105    
EXH.TEMP     1.254e+01  1.098e+00  11.422  < 2e-16 ***
log.AIRFLOW  7.529e+03  5.393e+02  13.961  < 2e-16 ***
log.POWER   -7.348e+03  5.039e+02 -14.582  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 212.6 on 61 degrees of freedom
Multiple R-squared:  0.9836,    Adjusted R-squared:  0.9822 
F-statistic: 730.9 on 5 and 61 DF,  p-value: < 2.2e-16
gasmod2vif<-round(vif(gasmod2),3)
gasmod2vif
        RPM  INLET.TEMP    EXH.TEMP log.AIRFLOW   log.POWER 
     10.163      18.231       3.429    1005.242    1141.209 
mean(gasmod2vif)
[1] 435.6548

This did not resolve our problem of sever multicollinearity. We should remove log.power, log.airflow (or both).

#updated model: REMOVE LOG.POWER
gasmod3<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP+log.AIRFLOW,data=gasturbine)
gasmod3vif<-round(vif(gasmod3),3)
gasmod3vif
        RPM  INLET.TEMP    EXH.TEMP log.AIRFLOW 
     10.057       3.575       3.334      10.683 
mean(gasmod3vif)
[1] 6.91225
#updated model: REMOVE LOG.AIRFLOW
gasmod4<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP+log.POWER,data=gasturbine)
gasmod4vif<-round(vif(gasmod4),3)
gasmod4vif
       RPM INLET.TEMP   EXH.TEMP  log.POWER 
     9.777      3.676      3.271     12.128 
mean(gasmod4vif)
[1] 7.213
#updated model: REMOVE BOTH
gasmod5<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP,data=gasturbine)
gasmod5vif<-round(vif(gasmod5),3)
gasmod5vif
       RPM INLET.TEMP   EXH.TEMP 
     1.727      3.570      2.551 
mean(gasmod5vif)
[1] 2.616

It appears the model without power and airflow have resolved the issue of severe multicollinearity. HOWEVER, we may have also decided to remove a combination of RPM and power/airflow, as that also had a strong relationship. As a researcher, or with your client, you may discuss which of those three variables would be most important to analyze moving forward. Here, we will remove the transformations as we want the most least complex model.

Unit 3.2: Residuals and Assumptions

Step 3: Estimate the Model Parameters

Reminder we looked at multicollinearity and variable screening to end up with a starting subset of predictors of: RPM, INLET.TEMP,and EXH.TEMP

#updated model
gasmod5<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP,data=gasturbine)

summary(gasmod5)

Call:
lm(formula = HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP, data = gasturbine)

Residuals:
    Min      1Q  Median      3Q     Max 
-1025.8  -297.9  -115.3   225.8  1425.1 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.436e+04  7.333e+02  19.582  < 2e-16 ***
RPM          1.051e-01  1.071e-02   9.818 2.55e-14 ***
INLET.TEMP  -9.223e+00  7.869e-01 -11.721  < 2e-16 ***
EXH.TEMP     1.243e+01  2.071e+00   6.000 1.06e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 465 on 63 degrees of freedom
Multiple R-squared:  0.9189,    Adjusted R-squared:  0.915 
F-statistic: 237.9 on 3 and 63 DF,  p-value: < 2.2e-16

Try adding qualitative variable Engine

#updated model
gasmod6<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP+ENGINE,data=gasturbine)

summary(gasmod6)

Call:
lm(formula = HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP + ENGINE, 
    data = gasturbine)

Residuals:
     Min       1Q   Median       3Q      Max 
-1030.37  -247.21   -70.62   258.72  1400.82 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        1.607e+04  1.262e+03  12.732  < 2e-16 ***
RPM                1.077e-01  1.198e-02   8.991 8.85e-13 ***
INLET.TEMP        -9.899e+00  9.374e-01 -10.561 2.12e-15 ***
EXH.TEMP           1.116e+01  2.202e+00   5.068 4.00e-06 ***
ENGINEAeroderiv   -4.404e+02  2.870e+02  -1.534    0.130    
ENGINETraditional -3.667e+02  2.233e+02  -1.642    0.106    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 461.2 on 61 degrees of freedom
Multiple R-squared:  0.9227,    Adjusted R-squared:  0.9164 
F-statistic: 145.6 on 5 and 61 DF,  p-value: < 2.2e-16

Perform nested F test to determine if Engine type is significant.

#anova(reduced,complete)
anova(gasmod5,gasmod6)
  • Hypotheses:
    • \(H_0: \beta_4=\beta_5=0\) (the engine type does not contribute to predicting heat rate)
    • \(H_a:\beta_4, \beta_5 \neq 0\) (the engine type contributes to predicting heat rate)
  • Distribution of test statistic: F 2 with 61 DF
  • Test Statistic: F=1.5137
  • Pvalue: 0.2282
  • Decision: 0.2282>0.05 -> FAIL TO REJECT H0
  • Conclusion: The engine type is not significant at predicting heat rate. We will remove both dummy variables in the model and not test them individually.

For the sake of this example, we will not consider any further testing.

Step 4: Specify the distribution of the errors and find the estimate of the variance

We have covered this individually and will come back to it in a complete example at the end of the unit.

Step 5: Evaluate the Utility of the model

We have covered this individually and will come back to it in a complete example at the end of the unit.

Step 6: Check the Model Assumptions

First we will evaluate model assumptions

#There are a few options to view plots for assumptions

#Residuals Plots of explanatory variables vs residuals
residualPlots(gasmod5,tests=F)

#Residual vs Fitted and QQ plot
plot(gasmod5, which=c(1,2))

#histogram of residuals
hist(residuals(gasmod5))

  • Lack of Fit: Residual Plots
  • Constant Variance: Residual Plots and Residual vs Fitted
  • Normality: QQ Plot and Histogram of Residuals
  • Independence: We do not have time series data. For the sake of this calss we will not explore this assumption other than stating we do not believe the observations are dependent on each other.

Unit 3.3 Influential Diagnostics

Then we will identify outliers and influential observations.

#Cooks Distance Thresholds
plot(gasmod6,which=4)

#Leverage vs Studentized Residuals
influencePlot(gasmod6,fill=F)
#Deleted Studentized Residuals vs PRedicted values
ols_plot_resid_stud_fit(gasmod6)

#We can use various functions to store and view these statistics for all observations
rstudent(gasmod6)
           1            2            3            4            5            6 
 1.243137605  0.953567034 -0.559949611 -1.833746826 -0.112418409 -0.033200806 
           7            8            9           10           11           12 
-0.058156152  0.508333893 -0.155456628 -0.459542582  3.392627996  2.117199804 
          13           14           15           16           17           18 
 0.125129449 -0.046196504  0.723384265 -0.095140393 -0.547670707 -0.750470873 
          19           20           21           22           23           24 
-0.304450493  0.673205397  1.751907503  0.654343118 -0.350120635 -0.410544775 
          25           26           27           28           29           30 
-0.065042611 -0.337870874 -0.635228203 -0.346550240 -0.678320209 -0.841277386 
          31           32           33           34           35           36 
-0.838605023 -2.442358463  0.238709494  0.005484449 -1.126363967  2.658061417 
          37           38           39           40           41           42 
 0.289557055 -0.017061404 -1.707924830 -0.908013420  0.059113080 -0.432805037 
          43           44           45           46           47           48 
-0.458244154 -0.666508619  1.213017501 -0.185981656  2.607630221  0.919011926 
          49           50           51           52           53           54 
-0.444586661  0.696961236 -0.481522312  0.641272157  0.922182297 -0.158439097 
          55           56           57           58           59           60 
 0.134434489 -0.874455736 -0.346208078 -1.172232330  0.425819134 -1.319935880 
          61           62           63           64           65           66 
 0.312984792 -1.400726730 -0.672890374  1.236692924 -0.388154139  1.314663248 
          67 
-0.259974071 
hatvalues(gasmod6)
         1          2          3          4          5          6          7 
0.26374942 0.13147635 0.08910503 0.05381744 0.06186189 0.03974379 0.04564992 
         8          9         10         11         12         13         14 
0.04072182 0.04546682 0.04670065 0.06051855 0.04056920 0.15105297 0.04330309 
        15         16         17         18         19         20         21 
0.11159742 0.05097723 0.05849761 0.06956934 0.07525361 0.08443891 0.04384658 
        22         23         24         25         26         27         28 
0.03386829 0.04351253 0.06120637 0.04520265 0.04336079 0.05661082 0.14685603 
        29         30         31         32         33         34         35 
0.15177767 0.04110694 0.04868216 0.09528677 0.07081525 0.04347583 0.05496934 
        36         37         38         39         40         41         42 
0.16171973 0.05366053 0.06842493 0.05028882 0.07660326 0.06014061 0.05777986 
        43         44         45         46         47         48         49 
0.08247021 0.05996719 0.09328251 0.07893593 0.06754455 0.06754455 0.06168252 
        50         51         52         53         54         55         56 
0.07005364 0.04880852 0.07153452 0.09716238 0.05178620 0.06510467 0.04837655 
        57         58         59         60         61         62         63 
0.04881945 0.08198603 0.07056967 0.05277623 0.29345156 0.24559458 0.15567950 
        64         65         66         67 
0.38302799 0.25851075 0.19767167 0.17439179 
cooks.distance(gasmod6)
           1            2            3            4            5            6 
9.145076e-02 2.297542e-02 5.170060e-03 3.068826e-02 1.411780e-04 7.730344e-06 
           7            8            9           10           11           12 
2.741110e-05 1.850724e-03 1.949730e-04 1.746814e-03 1.054108e-01 2.988429e-02 
          13           14           15           16           17           18 
4.719345e-04 1.636721e-05 1.104177e-02 8.237422e-05 3.142081e-03 7.069216e-03 
          19           20           21           22           23           24 
1.276130e-03 7.029261e-03 2.268777e-02 2.525267e-03 9.429995e-04 1.856764e-03 
          25           26           27           28           29           30 
3.393481e-05 8.750893e-04 4.075533e-03 3.495914e-03 1.384450e-02 5.081101e-03 
          31           32           33           34           35           36 
6.027341e-03 9.682884e-02 7.351555e-04 2.316568e-07 1.224537e-02 2.066258e-01 
          37           38           39           40           41           42 
8.044465e-04 3.622858e-06 2.495904e-02 1.143256e-02 3.788560e-05 1.940360e-03 
          43           44           45           46           47           48 
3.186991e-03 4.766569e-03 2.503616e-02 5.019982e-04 7.496492e-02 1.022260e-02 
          49           50           51           52           53           54 
2.194442e-03 6.150573e-03 2.008227e-03 5.332067e-03 1.529104e-02 2.322085e-04 
          55           56           57           58           59           60 
2.131901e-04 6.503885e-03 1.040315e-03 2.032877e-02 2.325779e-03 1.598408e-02 
          61           62           63           64           65           66 
6.882715e-03 1.048030e-01 1.404026e-02 1.568862e-01 8.878107e-03 7.013186e-02 
          67 
2.416298e-03 
dffits(gasmod6)
           1            2            3            4            5            6 
 0.744050027  0.371008804 -0.175132164 -0.437334197 -0.028867929 -0.006754446 
           7            8            9           10           11           12 
-0.012719259  0.104734650 -0.033928216 -0.101712085  0.861065944  0.435364744 
          13           14           15           16           17           18 
 0.052781760 -0.009828368  0.256384190 -0.022050313 -0.136514215 -0.205211062 
          19           20           21           22           23           24 
-0.086849841  0.204444401  0.375158869  0.122513521 -0.074676723 -0.104827152 
          25           26           27           28           29           30 
-0.014152212 -0.071932520 -0.155608758 -0.143780813 -0.286935214 -0.174185378 
          31           32           33           34           35           36 
-0.189705350 -0.792629512  0.065899491  0.001169254 -0.271653969  1.167486584 
          37           38           39           40           41           42 
 0.068950616 -0.004623949 -0.393014628 -0.261530121  0.014953266 -0.107177681 
          43           44           45           46           47           48 
-0.137383755 -0.168341462  0.389073077 -0.054445570  0.701822231  0.247344503 
          49           50           51           52           53           54 
-0.113988902  0.191291220 -0.109076264  0.177998911  0.302524828 -0.037026805 
          55           56           57           58           59           60 
 0.035476051 -0.197161779 -0.078433600 -0.350315199  0.117334519 -0.311562901 
          61           62           63           64           65           66 
 0.201706769 -0.799209067 -0.288939128  0.974415977 -0.229187589  0.652545303 
          67 
-0.119482982 
dfbetas(gasmod6)
     (Intercept)           RPM    INLET.TEMP      EXH.TEMP ENGINEAeroderiv
1  -0.4616252956  0.4813707982  0.1923546503  0.2474519866    0.0163454215
2   0.2941293684  0.1012291086 -0.0592393708 -0.2353768288   -0.2225153990
3   0.0816216531 -0.1351407226 -0.1021617893  0.0248528426    0.0070221299
4  -0.2854701803 -0.0814920749  0.0566271282  0.2286492592    0.2063272041
5   0.0187689328 -0.0127622742 -0.0098064543 -0.0083815425   -0.0050162064
6   0.0022577393 -0.0003116136 -0.0034381246  0.0011540048   -0.0018923341
7   0.0032957585 -0.0017728807 -0.0076681469  0.0043273436   -0.0027288509
8   0.0018309521 -0.0310832458  0.0226470204 -0.0227564500    0.0225520605
9   0.0067974867  0.0190325418  0.0027008214 -0.0100662840   -0.0134274831
10  0.0330229277  0.0434078651 -0.0147681851 -0.0197235327   -0.0451325514
11 -0.0083433409  0.4922496924 -0.0444452026  0.0326711505   -0.2839038186
12  0.0699847018 -0.0722039788 -0.2404521325  0.1691626314   -0.0592428387
13 -0.0405153289  0.0322127844  0.0336742129  0.0056781247    0.0115444107
14 -0.0048578184 -0.0006887353 -0.0009359947  0.0057223443    0.0026128095
15  0.1709015862 -0.1526854425 -0.2006583232  0.0340405220   -0.0548898822
16  0.0081106284  0.0111624539  0.0049757207 -0.0132798704   -0.0090726320
17  0.0200255447  0.0925475813  0.0561845396 -0.0782467640   -0.0471752875
18  0.0468293295  0.1410500864  0.0804652675 -0.1302495554   -0.0815875121
19  0.0533427648  0.0355056348 -0.0094012951 -0.0446631804   -0.0496219464
20 -0.0810185081  0.1470938758  0.1542798286 -0.0768958275   -0.0001442681
21 -0.0702709373 -0.2143575310 -0.0614273416  0.1375233203    0.1397583971
22 -0.0219619055 -0.0217687026  0.0331990042 -0.0100275661    0.0322626913
23  0.0161048078  0.0408565364  0.0082328696 -0.0254798946   -0.0289461582
24  0.0402195466  0.0603878096  0.0215813744 -0.0630221750   -0.0486405066
25  0.0052468132  0.0046302026 -0.0035025886 -0.0019150894   -0.0062067909
26 -0.0061063251  0.0441060037  0.0158984986 -0.0112493118   -0.0174067326
27  0.0497007703  0.0937833031  0.0279886031 -0.0799102087   -0.0705097337
28 -0.0554134237 -0.0455244938  0.0732219438 -0.0153486341    0.0733156253
29 -0.2372320929 -0.0646037470  0.0793632837  0.1582879533    0.1783248575
30 -0.0116410460 -0.0833978191  0.0056159734  0.0091828640    0.0542423839
31  0.0140299272 -0.1302444267 -0.0709693715  0.0606572757    0.0470373048
32 -0.0233428958 -0.5979011317 -0.5030653876  0.5382255324    0.2133892106
33  0.0378566710  0.0140643935  0.0128106931 -0.0503268081   -0.0237725619
34  0.0003429671  0.0002987724  0.0004064828 -0.0007466865   -0.0002341270
35 -0.0214090302 -0.1501949499 -0.1532886837  0.1768335498    0.0543505928
36  0.4469699197 -0.8303862943 -1.0322844998  0.6021667003   -0.0368952410
37  0.0127699700 -0.0488569028 -0.0363105704  0.0248452796    0.0112355243
38 -0.0025083727  0.0030406311  0.0025964300 -0.0001893908    0.0002638342
39 -0.2084784081 -0.1091030122 -0.0424833246  0.2509178991    0.1561643991
40 -0.1352378569 -0.0765171117 -0.0223324791  0.1416996592    0.1698422490
41  0.0028832039 -0.0030774923 -0.0065254530  0.0047097066   -0.0056065357
42  0.0027910235  0.0278039651  0.0323168203 -0.0430701360    0.0217495695
43  0.0542061596  0.0321491875  0.0236094477 -0.0858681352   -0.0070144470
44  0.0024929420  0.0535841873  0.0560453095 -0.0711223924    0.0300980029
45 -0.0347395629  0.1393177658  0.2582444670 -0.2020336420   -0.0778248391
46  0.0300937938  0.0012068846 -0.0131160735 -0.0201447002   -0.0064018228
47  0.3616857279  0.1554790890 -0.0680823663 -0.2490642116   -0.4724339056
48  0.1274695679  0.0547957820 -0.0239943939 -0.0877781593   -0.1665007523
49  0.0187966599  0.0276748596  0.0273566802 -0.0541860438    0.0132279631
50 -0.0720814425  0.0447400741  0.0998190172 -0.0160818828   -0.0106577582
51 -0.0052138136  0.0156770115  0.0084393195 -0.0121843843    0.0295070130
52 -0.0506803634  0.0434110963  0.1006005165 -0.0388331050   -0.0167395794
53  0.1779008791  0.0789263611  0.0411173707 -0.2017116947   -0.1896618886
54 -0.0042622045  0.0023795010  0.0090025003 -0.0074908463    0.0142436155
55  0.0201113016 -0.0010400489 -0.0047112872 -0.0127596473   -0.0203274523
56 -0.0258847901 -0.0024670234 -0.0125968442  0.0225424737    0.0718591696
57  0.0026308202  0.0062094306  0.0020134861 -0.0108758514    0.0197246031
58 -0.1177794552  0.1560581070  0.2242789267 -0.1304967655    0.1149860506
59 -0.0419952615  0.0291987947  0.0631717980 -0.0140451074   -0.0080251530
60 -0.0574206914  0.0805555716  0.0909389311 -0.0587965778    0.1000371507
61  0.0559781825  0.0614377409 -0.0708084649  0.0118169199    0.0035799256
62  0.1044317936 -0.1983608786  0.1895215915 -0.2805063640   -0.2615428200
63  0.0079579424  0.0227519931  0.0641241378 -0.0714429716   -0.1678295813
64  0.0453956802  0.0150146037  0.5336556913 -0.5673301450    0.4721213557
65 -0.0625650309  0.0713389317 -0.0266745149  0.0849028217   -0.1187416934
66 -0.3376168664  0.0444442130  0.2013063713  0.1335537388    0.5376932607
67  0.0035229428  0.0396234351  0.0400650227 -0.0441935765   -0.0790326742
   ENGINETraditional
1       0.3105796747
2      -0.1896131041
3      -0.0821640856
4       0.1316629086
5      -0.0183141372
6      -0.0044088031
7      -0.0075281478
8       0.0432267216
9      -0.0174172815
10     -0.0643784116
11      0.0337035730
12      0.0040235630
13      0.0365427464
14      0.0005652888
15     -0.0990744581
16     -0.0119613136
17     -0.0497914801
18     -0.0841030522
19     -0.0654343393
20      0.1029660979
21      0.1810262863
22      0.0677272565
23     -0.0383938783
24     -0.0586474048
25     -0.0095514545
26     -0.0213187329
27     -0.0844172781
28      0.0556254784
29      0.1639146941
30     -0.0118814304
31     -0.0424705701
32     -0.1268550560
33     -0.0113296357
34      0.0001290727
35     -0.0583542402
36     -0.2674127716
37      0.0085394697
38      0.0007613867
39      0.0608278075
40      0.1852756424
41     -0.0086227268
42      0.0403537021
43      0.0049805211
44      0.0625304421
45     -0.0364465093
46     -0.0079732603
47     -0.5416713727
48     -0.1909022405
49      0.0292293693
50      0.0071515967
51      0.0448182591
52     -0.0013155861
53     -0.2096487411
54      0.0194482863
55     -0.0266978648
56      0.0911943758
57      0.0280412216
58      0.2184286010
59      0.0032502278
60      0.1647087061
61     -0.0795860796
62      0.0552728528
63      0.0109140430
64      0.1546419984
65      0.0112760035
66      0.2808704953
67      0.0004734618
#We can subset by the observations of interest
dffits(gasmod6)[c(11,36,61,64)]
       11        36        61        64 
0.8610659 1.1674866 0.2017068 0.9744160 
dfbetas(gasmod6)[c(11,36,61,64),]
    (Intercept)         RPM  INLET.TEMP    EXH.TEMP ENGINEAeroderiv
11 -0.008343341  0.49224969 -0.04444520  0.03267115    -0.283903819
36  0.446969920 -0.83038629 -1.03228450  0.60216670    -0.036895241
61  0.055978183  0.06143774 -0.07080846  0.01181692     0.003579926
64  0.045395680  0.01501460  0.53365569 -0.56733015     0.472121356
   ENGINETraditional
11        0.03370357
36       -0.26741277
61       -0.07958608
64        0.15464200
summary(dfbetas(gasmod6))
  (Intercept)             RPM               INLET.TEMP       
 Min.   :-0.461625   Min.   :-0.8303863   Min.   :-1.032285  
 1st Qu.:-0.030312   1st Qu.:-0.0264260   1st Qu.:-0.018550  
 Median : 0.002791   Median : 0.0150146   Median : 0.005616  
 Mean   :-0.000495   Mean   :-0.0000617   Mean   : 0.001572  
 3rd Qu.: 0.035440   3rd Qu.: 0.0491621   3rd Qu.: 0.056115  
 Max.   : 0.446970   Max.   : 0.4922497   Max.   : 0.533656  
    EXH.TEMP          ENGINEAeroderiv     ENGINETraditional   
 Min.   :-0.5673301   Min.   :-0.472434   Min.   :-0.5416714  
 1st Qu.:-0.0564913   1st Qu.:-0.046154   1st Qu.:-0.0404322  
 Median :-0.0112493   Median :-0.002729   Median : 0.0004735  
 Mean   :-0.0009895   Mean   : 0.001614   Mean   : 0.0012985  
 3rd Qu.: 0.0236939   3rd Qu.: 0.029803   3rd Qu.: 0.0440225  
 Max.   : 0.6021667   Max.   : 0.537693   Max.   : 0.3105797  
  • Cook’s Distance: Identifies influential observations. Threshold is 4/n
  • Leverage (Hat): Identifies outliers in the x direction (extreme vaues of explanatory variables). Threshold is 2(k+1)/n
  • Studentizied Residuals: Identifies outliers in the y direction. Threshold is +/- 2
  • Circles on “influence plot” are indicative of the cook’s distance value
  • Deleted Studentizied Residuals (R Student): Identifies general outliers and influential observations. Threshold is +/- 2

Next Steps to resolve

Variable transformation: We can directly use a function of the response variable within lm() or add a new variable to the table

# directly transform in the lm() function

gasmod5sqrt<-lm(sqrt(HEATRATE)~RPM+INLET.TEMP+EXH.TEMP,data=gasturbine)
summary(gasmod5sqrt)

Call:
lm(formula = sqrt(HEATRATE) ~ RPM + INLET.TEMP + EXH.TEMP, data = gasturbine)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.4872 -1.3479 -0.5044  0.9321  6.2759 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.218e+02  3.346e+00  36.392  < 2e-16 ***
RPM          4.640e-04  4.886e-05   9.496 9.05e-14 ***
INLET.TEMP  -4.367e-02  3.590e-03 -12.163  < 2e-16 ***
EXH.TEMP     5.706e-02  9.449e-03   6.039 9.12e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.122 on 63 degrees of freedom
Multiple R-squared:  0.9206,    Adjusted R-squared:  0.9168 
F-statistic: 243.4 on 3 and 63 DF,  p-value: < 2.2e-16
# OR Create new variable
gasturbine$sqrty<-sqrt(gasturbine$HEATRATE)

#remember to remove original response heatrate
gasmodsqrt2<-lm(sqrty~RPM+INLET.TEMP+EXH.TEMP,data=gasturbine)
summary(gasmodsqrt2)

Call:
lm(formula = sqrty ~ RPM + INLET.TEMP + EXH.TEMP, data = gasturbine)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.4872 -1.3479 -0.5044  0.9321  6.2759 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.218e+02  3.346e+00  36.392  < 2e-16 ***
RPM          4.640e-04  4.886e-05   9.496 9.05e-14 ***
INLET.TEMP  -4.367e-02  3.590e-03 -12.163  < 2e-16 ***
EXH.TEMP     5.706e-02  9.449e-03   6.039 9.12e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.122 on 63 degrees of freedom
Multiple R-squared:  0.9206,    Adjusted R-squared:  0.9168 
F-statistic: 243.4 on 3 and 63 DF,  p-value: < 2.2e-16

Removing observations from analysis by subsetting the data

#We can remove particular observations
subsetgas<-gasturbine[-c(11,36,61,64),]
gasmod5rem<-lm(HEATRATE~RPM+INLET.TEMP+EXH.TEMP,data=subsetgas)
summary(gasmod5rem)

Call:
lm(formula = HEATRATE ~ RPM + INLET.TEMP + EXH.TEMP, data = subsetgas)

Residuals:
     Min       1Q   Median       3Q      Max 
-1005.32  -242.35   -89.03   208.32  1273.08 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.404e+04  6.640e+02  21.146  < 2e-16 ***
RPM          1.068e-01  1.058e-02  10.098 1.78e-14 ***
INLET.TEMP  -8.334e+00  8.352e-01  -9.979 2.79e-14 ***
EXH.TEMP     1.095e+01  2.190e+00   5.000 5.44e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 406.4 on 59 degrees of freedom
Multiple R-squared:  0.9157,    Adjusted R-squared:  0.9114 
F-statistic: 213.7 on 3 and 59 DF,  p-value: < 2.2e-16