Explanataion:
Line 1: library(tidyverse): Loads the Tidyverse, a collection of R packages for data manipulation and visualization.
Line 2: library(caret): Loads the Caret package for machine learning tasks.
Line 3: library(fastDummies): Loads the fastDummies package to create dummy variables efficiently.
Line 4: wine <- readRDS(...): Reads in the wine.rds dataset from a remote GitHub repository.
Feature Engineering
We begin by engineering an number of features.
Create a total of 10 features (including points).
Remove all rows with a missing value.
Ensure only log(price) and engineering features are the only columns that remain in the wino dataframe.
We now use a train/test split to evaluate the features.
Use the Caret library to partition the wino dataframe into an 80/20 split.
Run a linear regression with bootstrap resampling.
Report RMSE on the test partition of the data.
# Partition data into 80% training and 20% testingset.seed(42)train_index <-createDataPartition(wino$lprice, p =0.8, list =FALSE)train_data <- wino[train_index, ]test_data <- wino[-train_index, ]# Train a linear regression modelfit_control <-trainControl(method ="boot", number =50)model <-train(lprice ~ ., data = train_data, method ="lm", trControl = fit_control)# Predict on test data and calculate RMSEpredictions <-predict(model, test_data)rmse_value <-RMSE(predictions, test_data$lprice)rmse_value
[1] 0.3892879
We recieve an output of 0.3892879 as our RMSE value