R Markdown

The best-performing models can be identified by comparing the RMSE (Root Mean Square Error) and R-squared values from the res dataframe. Lower RMSE values and higher R-squared values indicate better model performance. Additionally, the MARS model’s feature importance, calculated using varImp(), provides insights into the most influential predictors. To determine if the MARS model selects the informative predictors (X1–X5), one should inspect the importance scores of these predictors relative to others. If X1–X5 have high importance scores, it indicates that they are considered informative by the MARS model.

library(caret)
## Loading required package: ggplot2
## Loading required package: lattice
library(mlbench)
library(ggplot2)

# Set seed for reproducibility
set.seed(200)

# Generate training and test data
trainingData <- mlbench.friedman1(200, sd = 1)
trainingData$x <- data.frame(trainingData$x)

testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)

# Lists to store RMSE and R-squared values
rmses <- c()
r2s <- c()
methods <- c()

# K-NN model
set.seed(0)
knnModel <- train(x = trainingData$x, y = trainingData$y, method = "knn",
                  preProc = c("center", "scale"), tuneLength = 10)

knnPred <- predict(knnModel, newdata = testData$x)
knnPR <- postResample(pred = knnPred, obs = testData$y)
rmses <- c(rmses, knnPR[1])
r2s <- c(r2s, knnPR[2])
methods <- c(methods, "KNN")

# Neural Network model
nnGrid <- expand.grid(.decay = c(0, 0.01, 0.1), .size = 1:10, .bag = FALSE)
set.seed(0)
nnetModel <- train(x = trainingData$x, y = trainingData$y, method = "nnet",
                   preProc = c("center", "scale"), linout = TRUE, trace = FALSE,
                   MaxNWts = 10 * (ncol(trainingData$x) + 1) + 10 + 1, maxit = 500)

nnetPred <- predict(nnetModel, newdata = testData$x)
nnetPR <- postResample(pred = nnetPred, obs = testData$y)
rmses <- c(rmses, nnetPR[1])
r2s <- c(r2s, nnetPR[2])
methods <- c(methods, "NN")

# Averaged Neural Network model
set.seed(0)
avNNetModel <- train(x = trainingData$x, y = trainingData$y, method = "avNNet",
                     preProc = c("center", "scale"), linout = TRUE, trace = FALSE,
                     MaxNWts = 10 * (ncol(trainingData$x) + 1) + 10 + 1, maxit = 500)
## Warning: executing %dopar% sequentially: no parallel backend registered
avNNetPred <- predict(avNNetModel, newdata = testData$x)
avNNetPR <- postResample(pred = avNNetPred, obs = testData$y)
rmses <- c(rmses, avNNetPR[1])
r2s <- c(r2s, avNNetPR[2])
methods <- c(methods, "AvgNN")

# MARS model
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38)
set.seed(0)
marsModel <- train(x = trainingData$x, y = trainingData$y, method = "earth",
                   preProc = c("center", "scale"), tuneGrid = marsGrid)
## Loading required package: earth
## Loading required package: Formula
## Loading required package: plotmo
## Loading required package: plotrix
marsPred <- predict(marsModel, newdata = testData$x)
marsPR <- postResample(pred = marsPred, obs = testData$y)
rmses <- c(rmses, marsPR[1])
r2s <- c(r2s, marsPR[2])
methods <- c(methods, "MARS")

# Calculate feature importance for MARS model
marsImportance <- varImp(marsModel)

# Support Vector Machine (SVM) model
set.seed(0)
svmRModel <- train(x = trainingData$x, y = trainingData$y, method = "svmRadial",
                   preProc = c("center", "scale"), tuneLength = 20)

svmRPred <- predict(svmRModel, newdata = testData$x)
svmPR <- postResample(pred = svmRPred, obs = testData$y)
rmses <- c(rmses, svmPR[1])
r2s <- c(r2s, svmPR[2])
methods <- c(methods, "SVM")

# Create a dataframe to display the results
res <- data.frame(rmse = rmses, r2 = r2s)
rownames(res) <- methods

# Order the dataframe so that the best results are at the bottom
res <- res[order(-res$rmse), ]
print("Final Results:")
## [1] "Final Results:"
print(res)
##           rmse        r2
## KNN   3.204059 0.6819919
## NN    2.649316 0.7177210
## SVM   2.059719 0.8279547
## AvgNN 2.055993 0.8323657
## MARS  1.322734 0.9291489

Which models appear to give the best performance?

The provided R code evaluates the performance of several machine learning models using the caret package on the Friedman1 dataset. The evaluated models include K-Nearest Neighbors (KNN), Neural Network (NN), Averaged Neural Network (AvgNN), Multivariate Adaptive Regression Splines (MARS), and Support Vector Machine with Radial Kernel (SVM). The performance of each model is assessed using Root Mean Squared Error (RMSE) and R-squared (R2) metrics on a test dataset. And we see that the RMSE is best for the KNN model and according to the R2 the MARS has the best performance.

Does MARS select the informative predictors (those named X1–X5)?

The provided R code evaluates the performance of several machine learning models on the Friedman1 dataset, including Multivariate Adaptive Regression Splines (MARS). To determine if MARS selects the informative predictors (X1–X5), one would typically inspect the variable importance results specific to the MARS model. However, the code snippet provided does include a line calculating the variable importance for the MARS model using varImp(marsModel). This suggests that the code is indeed checking the importance of predictors for the MARS model. And we see from the viusalization that X1 is the highly important.

Including Plots

You can also embed plots, for example:

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.