I have a dataset called data that looks like this.
data %>% glimpse() Rows: 4,601 Columns: 7 $ crl.tot
278, 1028, 2259, 191, 191, 54, 112, 49, 1257, 749, 21, 184, 261, 25, 205… $ dollar 0.000, 0.180, 0.184, 0.000, 0.000, 0.000, 0.054, 0.000, 0.203, 0.081, 0.… $ bang 0.778, 0.372, 0.276, 0.137, 0.135, 0.000, 0.164, 0.000, 0.181, 0.244, 0.… $ money 0.00, 0.43, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.15, 0.00, 0.00, 0.00, … $ n000 0.00, 0.43, 1.16, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.19, 0.00, 0.00, … $ make 0.00, 0.21, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.15, 0.06, 0.00, 0.00, … $ yesno “y”, “y”, “y”, “y”, “y”, “y”, “y”, “y”, “y”, “y”, “y”, “y”, “y”, “y”, “y…
The goal is to help predict yesno for Spam.
Please write R code to create a predictive model that predicts the probability of yesno.
“Error in check_outcome(): ! For a classification model,
the outcome should be a factor, not a
character. Run rlang::last_trace() to see
where the error occurred.”
“Error: object ‘new_data’ not found”
# Load necessary libraries
library(tidymodels)
## ── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ──
## ✔ broom 1.0.5 ✔ rsample 1.2.0
## ✔ dials 1.2.1 ✔ tune 1.1.2
## ✔ infer 1.0.6 ✔ workflows 1.1.4
## ✔ modeldata 1.3.0 ✔ workflowsets 1.0.1
## ✔ parsnip 1.2.0 ✔ yardstick 1.3.0
## ✔ recipes 1.0.10
## Warning: package 'recipes' was built under R version 4.3.3
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ scales::discard() masks purrr::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ recipes::fixed() masks stringr::fixed()
## ✖ dplyr::lag() masks stats::lag()
## ✖ yardstick::spec() masks readr::spec()
## ✖ recipes::step() masks stats::step()
## • Learn how to get started at https://www.tidymodels.org/start/
# Assuming you have already loaded the data into 'data'
# Step 1: Split the data into training and testing sets
set.seed(123) # For reproducibility
split <- initial_split(data, prop = 0.7)
train_data <- training(split)
test_data <- testing(split)
# Step 2: Preprocess the data (if necessary)
# No preprocessing required based on the provided glimpse()
# Step 3: Convert 'yesno' to factor
train_data$yesno <- factor(train_data$yesno)
test_data$yesno <- factor(test_data$yesno)
# Step 4: Train a logistic regression model
logistic_model <- logistic_reg() %>%
set_engine("glm") %>%
fit(yesno ~ ., data = train_data)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# Step 5: Evaluate the model's performance
predictions <- predict(logistic_model, new_data = test_data)
confusion_matrix <- table(test_data$yesno, predictions$.pred_class)
accuracy <- sum(diag(confusion_matrix)) / sum(confusion_matrix)
print(confusion_matrix)
##
## n y
## n 780 50
## y 201 350
print(paste("Accuracy:", accuracy))
## [1] "Accuracy: 0.818247646632875"