The purpose of this assignment is to practice the methods we learned to examine model results based on residuals.
Create a linear model using carat in the diamonds dataframe to predict price.
# Place your code here.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.4
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
dm <- lm(price ~ carat, data = diamonds)
Create a graph to show how the fitted values in this model are related to the residuals. Do you see any problems? You don’t need to correct them.
Diamonds predicted to be high prices tend to be inaccurate
# Place your code here
res <- dm$residuals
fvalues <- dm$fitted.values
data <- cbind(diamonds, res, fvalues)
ggplot(data, aes(x=fvalues, y=res)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Create a graph to show how the residuals in this model are related to the values of cut. Do you see any problems? You don’t need to correct them.
Fair cut diamonds are a weakness in the model
# Place your code here.
ggplot(data, aes(x=cut, y=res)) +
geom_boxplot()
Create a graph to show how the residuals in this model are related to the values of color. Do you see any problems? You don’t need to correct them.
The model does not do as well with the colors I and J.
# Place your code here.
ggplot(data, aes(x=color, y=res)) +
geom_boxplot()
Create a graph to show how the residuals in this model are related to the values of clarity. Do you see any problems? You don’t need to correct them.
The model does not do as well with the clarities I1, SI2, and IF.
# Place your code here.
ggplot(data, aes(x=clarity, y=res)) +
geom_boxplot()