2022-05-03
Task: Create models to predict the electrical energy output of a Combined Cycle Power Plant
Predictive Models – (1) Exhaustive Search Linear Regression Model – (2) Stepwise Linear Regression Model
Predictor Variables: – Temperature (T) in the range 1.81°C to 37.11°C, – Ambient Pressure (AP) in the range 992.89-1033.30 milibar, – Relative Humidity (RH) in the range 25.56% to 100.16% – Exhaust Vacuum (V) in the range 25.36-81.56 cm Hg
Outcome Variable: – Net hourly electrical energy output (PE) 420.26-495.76 MW
Model performance – R2, RMSE, & MAE
R was used to create the models
## MODEL INFO:
## Observations: 7654
## Dependent Variable: PE
## Type: OLS linear regression
##
## MODEL FIT:
## F(4,7649) = 24925.15, p = 0.00
## R² = 0.93
## Adj. R² = 0.93
##
## Standard errors: OLS
## ---------------------------------------------------
## Est. S.E. t val. p
## ----------------- -------- ------- --------- ------
## (Intercept) 460.08 10.91 42.15 0.00
## AT -1.99 0.02 -115.87 0.00
## V -0.23 0.01 -28.13 0.00
## AP 0.06 0.01 5.35 0.00
## RH -0.16 0.00 -33.94 0.00
## ---------------------------------------------------
predictions <- best.model %>% predict(power_training)
data.frame(
R2 = R2(predictions, power_training$PE),
RMSE = RMSE(predictions, power_training$PE),
MAE = MAE(predictions, power_training$PE)
)## R2 RMSE MAE
## 1 0.9287468 4.555574 3.615614
power_intercept_only <- lm(PE ~ 1, data =power_training)
power_all <- lm(PE ~ ., data = power_training)
power_step <- stepAIC(power_intercept_only,
direction=c("both","backward",'forward'),
scope=list(lower=power_intercept_only, upper=power_all),
trace=0)
summ(power_step)## MODEL INFO:
## Observations: 7654
## Dependent Variable: PE
## Type: OLS linear regression
##
## MODEL FIT:
## F(4,7649) = 24925.15, p = 0.00
## R² = 0.93
## Adj. R² = 0.93
##
## Standard errors: OLS
## ---------------------------------------------------
## Est. S.E. t val. p
## ----------------- -------- ------- --------- ------
## (Intercept) 460.08 10.91 42.15 0.00
## AT -1.99 0.02 -115.87 0.00
## RH -0.16 0.00 -33.94 0.00
## V -0.23 0.01 -28.13 0.00
## AP 0.06 0.01 5.35 0.00
## ---------------------------------------------------
library(caret)
predictions <- power_step %>% predict(power_valid)
data.frame(
R2 = R2(predictions, power_valid$PE),
RMSE = RMSE(predictions, power_valid$PE),
MAE = MAE(predictions, power_valid$PE)
)## R2 RMSE MAE
## 1 0.9284334 4.564303 3.65499
The Stepwise regression package used the Akaike information criterion (AIC) to automatically select the variables. AIC is difficult to explain to non-data scientist members of the product team and does not allow for subject matter experts (SME) to contribute to the model building process.
Exhaustive search regression selects the best model from all possible subsets according to goodness-of-fit criteria, which are easy to interpret. It is an ideal technique for model selection when the number of variables is less than 20. It also allows SME to contribute to the model building process.
With R2, RMSE, & MAE are similar for both approaches. I choose to use the Exhaustive search regression due to its explainability and parsimonious nature.