Module IV Regression Models for Continuous Outcomes

Introduction

This module explores ten regression models commonly used to predict continuous outcome variables. Regression analysis seeks to model the relationship between a dependent variable and one or more independent variables. We will discuss the formulation, assumptions, applications, strengths, and weaknesses of each model. This module also covers model comparison criteria, adequacy measures, and interpretation of results.

Regression Models

1. Multiple Linear Regression (MLR)

Model Formulation: MLR assumes a linear and additive relationship between the dependent variable (y) and a set of independent variables (x1, x2, … xn). It is mathematically represented as: y = β0 + β1x1 + β2x2 + ... + βnxn + ε Where:
- y: The dependent variable (the one you’re trying to predict)
- x1, x2, ..., xn: The independent variables (the factors that you believe affect y)
- β0: The intercept (the value of y when all x variables are zero)
- β1, β2, ..., βn: The regression coefficients (representing the change in y for a one-unit change in the corresponding x variable)
- ε: The error term (capturing the unexplained variability)
Model Assumptions:
- Linearity: The relationship between y and each x is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: The variance of the error term is constant across all levels of independent variables.
- Normality: The error term is normally distributed.
- No Multicollinearity: Independent variables are not highly correlated with each other.
- No Influential Outliers: Outliers do not unduly affect the model.
Model Applications: MLR is widely used in various fields such as social sciences, economics, engineering, and marketing. Examples include: predicting house prices based on size and features, modeling the impact of advertising on sales, or understanding the relationship between income and education.
Strengths: Simple to understand and interpret, widely applicable, provides relationships and can be extended to more predictors.
Weaknesses: Relies heavily on strict linearity assumptions, sensitive to outliers, does not handle high dimensionality data well.

2. Polynomial Regression (PR)

Model Formulation: Extends MLR by allowing non-linear relationships between the dependent and independent variables. It introduces polynomial terms (e.g., x², x³) to the model. A second-order polynomial regression can be represented as: y = β0 + β1x + β2x² + ...+ ε
Model Assumptions: Similar to MLR, but linearity is relaxed. Still requires independence, homoscedasticity and normality of errors as well as no influential outliers. The added terms may introduce problems with multicollinearity.
Model Applications: Used when the relationship between variables is curvilinear, such as modeling growth curves, dose-response relationships, or the effect of temperature on product sales.
Strengths: Can model non-linear relationships with higher order polynomials, can be considered a simple extension of the linear model.
Weaknesses: More susceptible to overfitting if polynomial degrees are too high, may also lead to multicollinearity.

3. Ridge Regression

Model Formulation: A regularization method used when multicollinearity is present. It adds a penalty term (L2 norm) to the loss function, which shrinks the regression coefficients toward zero, this can be expressed as: minimize (sum of squared errors + lambda * sum of squared coefficients) Where lambda is a hyperparameter.
Model Assumptions: Does not require the independency assumption to be met, can perform better when there is multicollinearity, though still benefits from meeting the rest of the assumptions of MLR including linearity, homoscedasticity and normality of errors as well as no influential outliers.
Model Applications: High-dimensional datasets, where many independent variables are highly correlated, such as genetics data, image data, and signal processing data.
Strengths: Prevents overfitting, provides good regularization and variable selection and is therefore useful for multicollinearity
Weaknesses: Does not necessarily eliminate predictors.

4. LASSO Regression

Model Formulation: Similar to Ridge Regression, LASSO also uses a penalty term, but it uses the L1 norm rather than the L2 norm, shrinking coefficients to exactly zero, thereby performing variable selection: minimize (sum of squared errors + lambda * sum of absolute value of coefficients) Where lambda is a hyperparameter.
Model Assumptions: Does not require the independency assumption to be met, can perform better when there is multicollinearity, though still benefits from meeting the rest of the assumptions of MLR including linearity, homoscedasticity and normality of errors as well as no influential outliers.
Model Applications: Used for feature selection, identifying the most influential predictors, as well as high-dimensional data settings, such as genome-wide association studies.
Strengths: Performs variable selection by shrinking some coefficients to zero, more interpretable than ridge, reduces overfitting, handles multicollinearity well and can identify most important features.
Weaknesses: Sensitive to outliers, may not be appropriate when high correlation among variables is desired.

5. Robust Regression

Model Formulation: Designed to be less sensitive to outliers in the data by reducing their effect on coefficient estimation. Several techniques exist, such as M-estimation, but a common approach is to use weights to reduce the contribution of outliers in the estimation. minimize (sum of p(y-y^)) Where p is a loss function less sensitive to large outliers.
Model Assumptions: Allows deviations from normality assumption of MLR, does not require assumptions of independency or homoscedasticity, but still benefits from the linearity assumption.
Model Applications: Datasets with outliers or non-normal errors, such as financial data or data from a flawed experiment.
Strengths: Robust to outliers, less influenced by non-normal errors, and provides reliable regression estimates even in presence of outliers.
Weaknesses: Less effective when the dataset is largely non-normal.

6. Quantile Regression

Model Formulation: Estimates conditional quantiles of the dependent variable rather than the mean. It allows modeling different parts of the distribution separately, and can model different effects of the predictors for different levels of the dependent variable. This model minimizes the absolute deviations from the quantile. Minimize(sum of |y-y^| when y>y^ + sum( |y-y^| when y<y^)) Where tau (the quantile) is defined as a value between 0 and 1.
Model Assumptions: Linearity between variables is still maintained, does not require normality or homoscedasticity of errors.
Model Applications: Used when the effect of predictors is not consistent across the distribution, such as modeling the effect of education on income across different income levels, where there is heteroscedasticity, and where the median is more appropriate than the mean.
Strengths: Can model relationships at different parts of the distribution, can handle non-normal distributions and heteroscedasticity well.
Weaknesses: Less intuitive interpretation than linear regression, computationally intensive, and requires a larger sample size.

7. Elastic Net Regression

Model Formulation: Combines the penalty terms of ridge and LASSO regression. Can perform well when both multicollinearity and important predictors are present. It adds both L1 and L2 penalties to the loss function for variable selection and shrinkage: minimize (sum of squared errors + alpha*lambda * sum of absolute value of coefficients + (1-alpha)*lambda* sum of squared coefficients) Where alpha is a hyperparameter between 0 and 1.
Model Assumptions: Does not require independency, but benefits from meeting assumptions of linearity, homoscedasticity, and normality of errors as well as no influential outliers.
Model Applications: Handles high-dimensional data with multicollinearity and identifies the most important predictors. Used in a wide variety of predictive modeling tasks
Strengths: Can use both penalties to improve prediction, handles both multicollinearity and performs variable selection.
Weaknesses: Sensitive to outliers, more complex tuning.

8. Support Vector Regression (SVR)

Model Formulation: Uses kernel functions to create hyperplanes in higher dimensional feature space to model non linear relationships between the dependent and independent variables. minimize sum of loss function+ lambda*regularization term, where regularization prevents overfitting.
Model Assumptions: Does not require linearity between predictor and outcome variables as well as having assumptions of normality of errors, independency or homoscedasticity.
Model Applications: Suitable for capturing complex non-linear relationships between input and output variables, widely used in pattern recognition and machine learning tasks.
Strengths: Effective for non-linear relationships, can handle high dimensional data, robust to outliers and noise.
Weaknesses: Less interpretable, computationally intensive, may require significant parameter tuning, it is often called a ‘black box’ method.

9. Principal Component Regression (PCR)

Model Formulation: Uses principal component analysis (PCA) to reduce dimensionality by transforming the original predictors into principal components, then performs linear regression using these components: y ~ b0+b1(PC1) + b2(PC2) + ...+ bn(PCn) Where PC’s are the selected number of principle components
Model Assumptions: PCA requires linearity, and assumes data is continuous and normally distributed, although transformations can assist. These are the same assumptions maintained when regressing components on the dependent variable.
Model Applications: Used when there are many highly correlated independent variables, such as in microarray data analysis, spectral analysis, and sensor data.
Strengths: Reduces dimensionality, handles multicollinearity, can be interpretable when loadings of the components are examined.
Weaknesses: Loses information by combining predictors, may not select variables important to the dependent variable.

10. Partial Least Squares Regression (PLSR)

Model Formulation: Similar to PCR but creates latent variables in a way that maximizes the covariance with the dependent variable, and then performs linear regression using these latent variables, this is more relevant to model prediction. y ~ b0+b1(LV1) + b2(LV2) + ...+ bn(LVn) Where LV’s are the selected latent variables
Model Assumptions: PLS does not require many assumptions though does benefit from the underlying assumptions of linearity, continuity, though can still be used for non-continuous and non linear relationships.
Model Applications: High-dimensional and multicollinear data, particularly useful in chemometrics and sensory analysis.
Strengths: Handles high-dimensional, multicollinear data, reduces dimension in a supervised manner, better prediction performance than PCR.
Weaknesses: Computationally intensive and less interpretable than MLR, often requires cross-validation to decide latent variables.

Model Comparison Criteria

When comparing regression models, we must use performance metrics that reflect our goal:

Akaike Information Criterion (AIC): Balances model fit and complexity by penalizing models with more parameters. Lower AIC values indicate a better model. AIC = -2 * log-likelihood + 2 * p where p is the number of parameters.
Bayesian Information Criterion (BIC): Similar to AIC, but with a stronger penalty for complexity. Lower BIC values indicate a better model. BIC = -2 * log-likelihood + p*log(n) where n is number of observations.
Hannan-Quinn Information Criterion (HQIC): Aims to balance between model fit and number of parameters, taking into account the number of observations. HQIC = -2 * log-likelihood + 2p * log(log(n))

Model Adequacy Measures

We need to evaluate the performance of our selected models. Here are important measures:

Root Mean Squared Error (RMSE): Measures the average magnitude of the prediction errors. Lower RMSE values indicate better model performance:

RMSE = sqrt( sum( (yi - y^i)^2 ) / n ) Where yi is observed values, y^i is predicted values and n is the sample size
Mean Absolute Error (MAE): Measures the average absolute error. Lower MAE values indicate better model performance. MAE = sum( |yi - y^i| ) / n
Mean Squared Prediction Error (MSPE): measures the average squared difference between actual and predicted values. MSPE = sum((yi-y^i)^2)/n
Mean Absolute Percentage Error (MAPE): measures the mean absolute percentage error of the prediction. This is useful for relative accuracy. MAPE = (1/n) * sum( | (y-y^i) / y | ) *100
Root Mean Squared Percentage Error (RMSPE): this metric focuses on relative error and can be less sensitive to outliers.

RMSPE = sqrt((1/n) * sum((y-y^i/y)^2)) *100

Root Mean Squared Logarithmic Error (RMSLE): RMSLE is robust to outliers and penalizes underpredictions over overpredictions. RMSLE = sqrt((1/n) * sum((log(y+1) -log(y^i +1))^2)
Root Relative Squared Error (RRSE) This metric compares the performance of a regression model relative to the performance of a baseline model. RRSE = sqrt( sum( (y-y^i)^2) /sum( (y-mean(y))^2) )

Rules for Training and Testing:

Data Splitting: Divide your dataset into training (70-80%) and testing (20-30%) sets. The training data is used to build the model, while testing evaluates its performance on unseen data.
Cross-Validation: To ensure robust models, cross-validation such as k-fold cross validation (5 or 10 fold) is used in the training step, using k-1 subsets to train and one to test until all subsets have been used to test.

Prediction Accuracy:

This is assessed using the model adequacy measures mentioned earlier, in order to check how well the model has performed on unseen data. A model with good prediction accuracy will have lower errors for your chosen metric and low AIC, BIC values and will meet all the model assumptions as well as no influential outliers.

R Code Implementation

Here’s the R code implementation for the paper’s project, and the general setup for this module, combined, for clarity:

for more information refer to leading paper 1

# Load necessary libraries
library(tidyverse)
library(caret)
library(glmnet)
library(e1071)
library(pls)
library(reshape2)
library(MASS) #For robust regression
library(quantreg) #For quantile regression
library(lmridge) #For ridge regression

# Load dataset from the paper
y1<-c(1.13,1.11,0.629,0.616,0.265,0.320,
      0.849,0.378,0.056,0.044,0.241,0.221,
      0.110,0.190,0.151,0.049)
y2<-c(10.511,5.407,6.365,5.322,6.359,
      3.411,4.03,3.271,3.017,2.567,
      2.198,1.714,2.237,1.878,1.594,
      1.465)
x1<-c(6.283,6.283,6.283,6.283,12.566,
      12.566,12.566,12.566,18.85,18.85,
      18.85,18.85,25.133,25.133,25.133,
      25.133)
x2<-c(0.01,0.015,0.025,0.02,0.01,0.02,
      0.015,0.025,0.01,0.015,0.02,0.25,
      0.01,0.015,0.025,0.02)
x3<-c(0.2,0.5,1.5,1.0,0.5,1.5,0.2,
      1.0,1.0,1.5,0.2,0.5,1.5,1.0,
      0.2,0.5)

data1<-data.frame(y1,x1,x2,x3)
data2<-data.frame(y2,x1,x2,x3)

# Set seed for reproducibility
set.seed(123)

# Function to calculate performance metrics:
mape <- function(actual, predicted) {
  mean(abs((actual - predicted) / actual)) * 100
}

rmspe <- function(actual, predicted) {
  sqrt(mean(((actual - predicted) / actual)^2)) * 100
}

rmsle <- function(actual, predicted) {
  sqrt(mean((log(actual + 1) - log(predicted + 1))^2))
}

rrse <- function(actual, predicted) {
  sqrt(sum((actual - predicted)^2) / sum((actual - mean(actual))^2)) * 100
}


#---Data1 Analysis----------
# Split the data for dataset 1
train_index1 <- createDataPartition(data1$y1, p = 0.70, list = FALSE)
train1 <- data1[train_index1, ]
test1 <- data1[-train_index1, ]

# Fit each regression model to data1

mlr1 <- lm(y1 ~ ., data = train1)
poly1 <- lm(y1 ~ poly(x1+x2+x3, 2), data = train1)
ridge1 <- glmnet(as.matrix(train1[, -1]), train1$y1, alpha = 0, lambda = 0.1)
lasso1 <- glmnet(as.matrix(train1[, -1]), train1$y1, alpha = 1, lambda = 0.1)
robust1 <- rlm(y1 ~ ., data = train1)
quantile1 <- rq(y1 ~ ., data = train1)
enet1 <- glmnet(as.matrix(train1[, -1]), train1$y1, alpha = 0.5, lambda = 0.1)
svm1 <- svm(y1 ~ ., data = train1)
pca1 <- prcomp(train1[, -1])
pcr1 <- lm(train1$y1 ~ pca1$x[, 1:3])
pls1 <- plsr(y1 ~ ., data = train1, ncomp = 1)

# Make predictions for dataset 1
pred_mlr1 <- predict(mlr1, newdata = test1)
pred_poly1 <- predict(poly1, newdata = test1)
pred_ridge1 <- predict(ridge1, newx = as.matrix(test1[, -1]))
pred_lasso1 <- predict(lasso1, newx = as.matrix(test1[, -1]))
pred_robust1 <- predict(robust1, newdata = test1)
pred_quantile1 <- predict(quantile1, newdata = test1)
pred_enet1 <- predict(enet1, newx = as.matrix(test1[, -1]))
pred_svm1 <- predict(svm1, newdata = test1)
pred_pcr1 <- predict(pcr1, newdata = test1)
pred_pls1 <- predict(pls1, newdata = test1)

# Calculate performance metrics for dataset 1
rmse1 <- c(RMSE(pred_mlr1, test1$y1), RMSE(pred_poly1, test1$y1), RMSE(pred_ridge1, test1$y1),
          RMSE(pred_lasso1, test1$y1), RMSE(pred_robust1, test1$y1), RMSE(pred_quantile1, test1$y1),
          RMSE(pred_enet1, test1$y1), RMSE(pred_svm1, test1$y1), RMSE(pred_pcr1, test1$y1),
          RMSE(pred_pls1, test1$y1))

mae1 <- c(MAE(pred_mlr1, test1$y1), MAE(pred_poly1, test1$y1), MAE(pred_ridge1, test1$y1),
         MAE(pred_lasso1, test1$y1), MAE(pred_robust1, test1$y1), MAE(pred_quantile1, test1$y1),
         MAE(pred_enet1, test1$y1), MAE(pred_svm1, test1$y1), MAE(pred_pcr1, test1$y1),
         MAE(pred_pls1, test1$y1))

mspe1 <- c(mean((pred_mlr1 - test1$y1)^2), mean((pred_poly1 - test1$y1)^2),
          mean((pred_ridge1 - test1$y1)^2), mean((pred_lasso1 - test1$y1)^2),
          mean((pred_robust1 - test1$y1)^2), mean((pred_quantile1 - test1$y1)^2),
          mean((pred_enet1 - test1$y1)^2), mean((pred_svm1 - test1$y1)^2),
          mean((pred_pcr1 - test1$y1)^2), mean((pred_pls1 - test1$y1)^2))

mape_values1 <- sapply(list(pred_mlr1, pred_poly1, pred_ridge1, pred_lasso1, pred_robust1,
                           pred_quantile1, pred_enet1, pred_svm1, pred_pcr1, pred_pls1),
                      mape, actual = test1$y1)

rmspe_values1 <- sapply(list(pred_mlr1, pred_poly1, pred_ridge1, pred_lasso1, pred_robust1,
                            pred_quantile1, pred_enet1, pred_svm1, pred_pcr1, pred_pls1),
                       rmspe, actual = test1$y1)

rmsle_values1 <- sapply(list(pred_mlr1, pred_poly1, pred_ridge1, pred_lasso1, pred_robust1,
                            pred_quantile1, pred_enet1, pred_svm1, pred_pcr1, pred_pls1),
                       rmsle, actual = test1$y1)

rrse_values1 <- sapply(list(pred_mlr1, pred_poly1, pred_ridge1, pred_lasso1, pred_robust1,
                           pred_quantile1, pred_enet1, pred_svm1, pred_pcr1, pred_pls1),
                      rrse, actual = test1$y1)
#---Data2 Analysis--------------------------------------------------
# Split data for dataset 2
train_index2 <- createDataPartition(data2$y2, p = 0.70, list = FALSE)
train2 <- data2[train_index2, ]
test2 <- data2[-train_index2, ]

# Fit each regression model to data2
mlr2 <- lm(y2 ~ ., data = train2)
poly2 <- lm(y2 ~ poly(x1+x2+x3, 2), data = train2)
ridge2 <- glmnet(as.matrix(train2[, -1]), train2$y2, alpha = 0, lambda = 0.1)
lasso2 <- glmnet(as.matrix(train2[, -1]), train2$y2, alpha = 1, lambda = 0.1)
robust2 <- rlm(y2 ~ ., data = train2)
quantile2 <- rq(y2 ~ ., data = train2)
enet2 <- glmnet(as.matrix(train2[, -1]), train2$y2, alpha = 0.5, lambda = 0.1)
svm2 <- svm(y2 ~ ., data = train2)
pca2 <- prcomp(train2[, -1])
pcr2 <- lm(train2$y2 ~ pca2$x[, 1:3])
pls2 <- plsr(y2 ~ ., data = train2, ncomp = 1)

# Make predictions for dataset 2
pred_mlr2 <- predict(mlr2, newdata = test2)
pred_poly2 <- predict(poly2, newdata = test2)
pred_ridge2 <- predict(ridge2, newx = as.matrix(test2[, -1]))
pred_lasso2 <- predict(lasso2, newx = as.matrix(test2[, -1]))
pred_robust2 <- predict(robust2, newdata = test2)
pred_quantile2 <- predict(quantile2, newdata = test2)
pred_enet2 <- predict(enet2, newx = as.matrix(test2[, -1]))
pred_svm2 <- predict(svm2, newdata = test2)
pred_pcr2 <- predict(pcr2, newdata = test2)
pred_pls2 <- predict(pls2, newdata = test2)

# Calculate performance metrics for dataset 2
rmse2 <- c(RMSE(pred_mlr2, test2$y2), RMSE(pred_poly2, test2$y2), RMSE(pred_ridge2, test2$y2),
          RMSE(pred_lasso2, test2$y2), RMSE(pred_robust2, test2$y2), RMSE(pred_quantile2, test2$y2),
          RMSE(pred_enet2, test2$y2), RMSE(pred_svm2, test2$y2), RMSE(pred_pcr2, test2$y2),
          RMSE(pred_pls2, test2$y2))

mae2 <- c(MAE(pred_mlr2, test2$y2), MAE(pred_poly2, test2$y2), MAE(pred_ridge2, test2$y2),
         MAE(pred_lasso2, test2$y2), MAE(pred_robust2, test2$y2), MAE(pred_quantile2, test2$y2),
         MAE(pred_enet2, test2$y2), MAE(pred_svm2, test2$y2), MAE(pred_pcr2, test2$y2),
         MAE(pred_pls2, test2$y2))

mspe2 <- c(mean((pred_mlr2 - test2$y2)^2), mean((pred_poly2 - test2$y2)^2),
          mean((pred_ridge2 - test2$y2)^2), mean((pred_lasso2 - test2$y2)^2),
          mean((pred_robust2 - test2$y2)^2), mean((pred_quantile2 - test2$y2)^2),
          mean((pred_enet2 - test2$y2)^2), mean((pred_svm2 - test2$y2)^2),
          mean((pred_pcr2 - test2$y2)^2), mean((pred_pls2 - test2$y2)^2))

mape_values2 <- sapply(list(pred_mlr2, pred_poly2, pred_ridge2, pred_lasso2, pred_robust2,
                           pred_quantile2, pred_enet2, pred_svm2, pred_pcr2, pred_pls2),
                      mape, actual = test2$y2)

rmspe_values2 <- sapply(list(pred_mlr2, pred_poly2, pred_ridge2, pred_lasso2, pred_robust2,
                            pred_quantile2, pred_enet2, pred_svm2, pred_pcr2, pred_pls2),
                       rmspe, actual = test2$y2)

rmsle_values2 <- sapply(list(pred_mlr2, pred_poly2, pred_ridge2, pred_lasso2, pred_robust2,
                            pred_quantile2, pred_enet2, pred_svm2, pred_pcr2, pred_pls2),
                       rmsle, actual = test2$y2)

rrse_values2 <- sapply(list(pred_mlr2, pred_poly2, pred_ridge2, pred_lasso2, pred_robust2,
                           pred_quantile2, pred_enet2, pred_svm2, pred_pcr2, pred_pls2),
                      rrse, actual = test2$y2)
#----Results datasets 1 and 2---------
# Print the results

models <- c("Multiple Linear", "Polynomial", "Ridge", "LASSO", "Robust", "Quantile", "Elastic Net", "Support Vector", "Principal Component", "Partial Least Squares")

#Results of dataset 1
results1 <- data.frame(Model = models, MAPE=mape_values1,
                      RMSPE=rmspe_values1,RMSLE=rmsle_values1,
                      RRSE=rrse_values1,RMSE = rmse1, MAE = mae1, MSPE = mspe1)
print("Results of Dataset 1:")

## [1] "Results of Dataset 1:"

print(results1)

##                    Model      MAPE     RMSPE     RMSLE      RRSE      RMSE
## 1        Multiple Linear  92.06693 131.35447 0.1024155  48.50278 0.1444310
## 2             Polynomial  50.03937  55.96331 0.1617911  83.63918 0.2490598
## 3                  Ridge 126.05383 193.86311 0.1245270  58.72979 0.1748849
## 4                  LASSO 192.86619 322.55493 0.1750360  82.90445 0.2468719
## 5                 Robust  92.11187 130.79694 0.1026964  48.30596 0.1438450
## 6               Quantile 165.57744 273.75930 0.1573123  75.33322 0.2243264
## 7            Elastic Net 161.95901 263.41448 0.1509045  71.61626 0.2132581
## 8         Support Vector 107.58466 183.79801 0.1259697  63.55148 0.1892429
## 9    Principal Component 243.90461 430.24947 0.3008068 242.29064 0.4165526
## 10 Partial Least Squares 167.85200 277.39142 0.1597457  76.14422 0.2267414
##          MAE       MSPE
## 1  0.1365192 0.02086032
## 2  0.1699102 0.06203079
## 3  0.1650732 0.03058473
## 4  0.2239165 0.06094575
## 5  0.1359791 0.02069137
## 6  0.2014490 0.05032233
## 7  0.1977790 0.04547902
## 8  0.1443883 0.03581287
## 9  0.3378421 0.17351609
## 10 0.2038365 0.05141165

#Results of dataset 2
results2 <- data.frame(Model = models, MAPE=mape_values2,
                      RMSPE=rmspe_values2,RMSLE=rmsle_values2,
                      RRSE=rrse_values2,RMSE = rmse2, MAE = mae2, MSPE = mspe2)
print("Results of Dataset 2:")

## [1] "Results of Dataset 2:"

print(results2)

##                    Model     MAPE     RMSPE     RMSLE      RRSE      RMSE
## 1        Multiple Linear 27.17829  31.61822 0.2118920  87.08413 1.2203774
## 2             Polynomial 26.02151  28.91882 0.2032419  95.77639 1.3421887
## 3                  Ridge 28.13141  31.74961 0.2109106  83.96896 1.1767220
## 4                  LASSO 27.02412  30.43311 0.2044316  81.58652 1.1433351
## 5                 Robust 24.08007  27.54319 0.1870348  73.78565 1.0340155
## 6               Quantile 26.63775  30.40799 0.1987743  60.68540 0.8504315
## 7            Elastic Net 27.59326  31.03227 0.2074045  82.73323 1.1594048
## 8         Support Vector 13.40172  17.48950 0.1209169  42.74463 0.5990136
## 9    Principal Component 82.31937 105.17055 0.5863573 318.38691 2.5760228
## 10 Partial Least Squares 31.10855  33.54725 0.2246186  83.98865 1.1769979
##          MAE      MSPE
## 1  0.9677504 1.4893210
## 2  1.0040389 1.8014704
## 3  0.9640298 1.3846747
## 4  0.9418115 1.3072152
## 5  0.8446390 1.0691881
## 6  0.7252566 0.7232337
## 7  0.9531776 1.3442195
## 8  0.4523308 0.3588172
## 9  2.2416069 6.6358932
## 10 1.0254447 1.3853241

#---Model Comparison plots-----------
# Create a data frame with the results
results <- data.frame(Model = models, MAE=mae1, MSPE = mspe1, RMSLE = rmsle_values1, RMSE = rmse1)

# Melt the data frame for easier plotting
results_melted <- melt(results, id.vars = "Model", variable.name = "Metric", value.name = "Value")

# Create the bar plot
ggplot(results_melted, aes(x = Model, y = Value, fill = Metric)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(x = "Model", y = "Value") +
  scale_fill_manual(values = c("MAE" = "red", "MSPE" = "blue", "RMSLE" = "green", "RMSE" = "orange")) +
  theme_bw()

# Melt the data frame for easier plotting
results_melted <- melt(results, id.vars = "Model", variable.name = "Metric", value.name = "Value")

# Create the grouped bar plot
ggplot(results_melted, aes(x = Model, y = Value, fill = Metric)) +
  geom_bar(stat = "identity", position = "dodge", color = "black", width = 0.7) +
  labs(x = "Model", y = "Value") +
  scale_fill_manual(values = c("MAE" = "red", "MSPE" = "blue", "RMSLE" = "green", "RMSE" = "orange")) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  ggtitle("Comparison of Regression Models") +
  theme(plot.title = element_text(hjust = 0.5))

# Normalize the values between 0 and 1
normalize <- function(x) {
  (x - min(x)) / (max(x) - min(x))
}

results_normalized <- as.data.frame(lapply(results[, -1], normalize))
results_normalized$Model <- results$Model

# Reshape the data frame to long format
results_long <- tidyr::pivot_longer(results_normalized, cols = -Model, names_to = "Metric", values_to = "Value")

# Plot the parallel coordinate plot
ggplot(results_long, aes(x = Metric, y = Value, group = Model, color = Model)) +
  geom_line() +
  geom_point() +
  scale_color_manual(values = c("#FF0000", "#00FF00", "#0000FF", "#FF00FF", "#00FFFF", "#FFFF00", "#FFA500", "#800080", "#008080", "#FFC0CB")) +
  theme_minimal() +
  labs(title = "Comparison of Regression Models", subtitle = "Metrics: MAE, MSPE, RMSLE, RMSE", x = "Metric", y = "Value", color = "Model") +
  theme(legend.position = "right")

Interpreting Results from the Paper

Model Performance: The paper assessed model performance using MAPE, RMSPE, RMSLE, and RRSE (Lower is better in all of these). The results from this paper showed that SVR had lower error values than the rest of the models and the median model had consistently higher errors. Note that other metrics, like R2, can be used.
Non-parametric testing: They use the Friedman test and Wilcoxon test which showed that SVR was the best model, and there was no significant difference between any pair of models, or when a significant difference was found, it was only for the LR vs PCR and PR vs Median regression models.
Real-World Insights: The paper applied these models to machining composite materials, and the data is used in this module. The researchers determined that SVR was best for predicting responses. By using metrics such as the ones they used, and by using the statistical testing that they have used, they can evaluate the performance of a variety of different models.

This module provides a framework for understanding and implementing a wide variety of regression models using R. Remember to adapt these models and code to your specific needs and data and to examine all the assumptions of the models before drawing conclusions.