library(tidymodels)
library(glmnet)
Assignment 09
Open the assign09.qmd
file and complete the exercises.
We will be working the the diamonds
dataset and tidymodels
to predict the carat
of a diamond based on other variables.
The Grading Rubric is available at the end of this document.
Exercises
We will start by loading our required packages.
Exercise 1
Create a histogram using geom_histogram(binwidth = 0.1)
, showing the distribution of carat
in the diamonds
dataset. Set the fill to “blue” and the color to “black”. In the narrative below describe what the distribution looks like.
# Load Diamond Dataset
data("diamonds")
# Create a histogram of carat
ggplot(diamonds, aes(x = carat)) + geom_histogram(binwidth = 0.1, fill = "blue", color = "black") + labs(title = "Distribution of Diamond Carat", x = "Carat", y = "Count")
Exercise 2
Repeat the histogram, but this time plot sqrt(carat)
instead of carat
. Describe if and how the distribution changed.
# Create Histogram of sqrt carat
ggplot(diamonds, aes(x = sqrt(carat))) +
geom_histogram(binwidth = 0.1, fill = "blue", color = "black") +
labs(title = "Distribution of Square Root of Carat", x = "Sqrt(Carat)", y = "Count")
Exercise 3
Below set.seed()
, split the data into two datasets: train_data
will contain 80% of the data using stratified sampling on carat
, test_data
will contain the remaining 20% of the data.
# set a seed for reproducibility
set.seed(1234)
# Split data
<- initial_split(diamonds, prop = 0.8, strata = carat)
data_split <- training(data_split)
train_data <- testing(data_split) test_data
Exercise 4
Exercise 4 is already completed for you. It creates a recipe called lm_all_recipe
that uses carat
as the target variable and all other variables as predictors. It creates dummy variables for all nominal predictors so we can use the recipe for reguralized regression.
# recipe using all predictors
<- recipe(carat ~ ., data = train_data) |>
lm_all_recipe step_dummy(all_nominal_predictors())
Exercise 5
Below is a model specified for reguralized regression model called lasso_spec
. Add a second specification called lm_spec
for just plain old linear regression using the “lm” engine.
# Define the lasso model specification
<- linear_reg(penalty = 0.01, mixture = 1) |>
lasso_spec set_engine("glmnet")
# Define the linear regression model specification.
<- linear_reg() |>
lm_spec set_engine("lm")
Exercise 6
Create two workflows. lm_all_workflow
should use the lm_spec
model specification and lm_all_recipe
. lasso_all_workflow
should use the lasso_spec
model and lm_all_recipe
.
# Create two workflows
<- workflow() |>
lm_all_workflow add_model(lm_spec) |>
add_recipe(lm_all_recipe)
<- workflow() |>
lasso_all_workflow add_model(lasso_spec) |>
add_recipe(lm_all_recipe)
Exercise 7
Fit two models. lm_all_fit
should use the lm_all_workflow
, and lasso_all_fit
should use the lasso_all_workflow
# Fit the two models
<- lm_all_workflow |>
lm_all_fit fit(data = train_data)
<- lasso_all_workflow |>
lasso_all_fit fit(data = train_data)
Exercise 8
Make predictions into two new tibbles: lm_all_predictions
and lasso_all_predictions
# Make predictions on the test data
<- predict(lm_all_fit, test_data) |>
lm_all_predictions bind_cols(test_data)
<- predict(lasso_all_fit, test_data) |>
lasso_all_predictions bind_cols(test_data)
Exercise 9
Compute and display the rmse for each model. Discuss which one performed better and why in the narrative below.
# Compute RMSE for each model
<- rmse(lm_all_predictions, truth = carat, estimate = .pred)
lm_rmse <- rmse(lasso_all_predictions, truth = carat, estimate = .pred)
lasso_rmse
# Display RMSEs
lm_rmse
# A tibble: 1 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 rmse standard 0.0744
lasso_rmse
# A tibble: 1 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 rmse standard 0.0812
Submission
To submit your assignment:
- Change the author name to your name in the YAML portion at the top of this document
- Render your document to html and publish it to RPubs.
- Submit the link to your Rpubs document in the Brightspace comments section for this assignment.
- Click on the “Add a File” button and upload your .qmd file for this assignment to Brightspace.
Grading Rubric
Item (percent overall) |
100% - flawless | 67% - minor issues | 33% - moderate issues | 0% - major issues or not attempted |
---|---|---|---|---|
Document formatting: correctly implemented instructions (9%) |
||||
Exercises - 9% each (81% ) |
||||
Submitted properly to Brightspace (10%) |
NA | NA | You must submit according to instructions to receive any credit for this portion. |