Using R,build a multiple regression model for data that interests you. Include in this model at least one quadratic term
, one dichotomous term
, and one dichotomous vs. quantitative interaction term
. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?
Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope – a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem.
From the original data examples with missing values were removed (the majority having the predicted value missing), and the ranges of the continuous values have been scaled for use with an ANN (by dividing by 200).
Name Data Type Meas. Description
---- --------- ----- -----------
Sex nominal M, F, and I (infant)
Length continuous mm Longest shell measurement
Diameter continuous mm perpendicular to length
Height continuous mm with meat in shell
Whole weight continuous grams whole abalone
Shucked weight continuous grams weight of meat
Viscera weight continuous grams gut weight (after bleeding)
Shell weight continuous grams after being dried
Rings integer +1.5 gives the age in years
As `Rings` +1.5 gives the age in years, we will use `Rings` as response variable.
if (!require(RCurl)) install.packages("RCurl")
if (!require(plyr)) install.packages("plyr")
if (!require(data.table)) install.packages("data.table")
if (!require(corrplot)) install.packages("corrplot")
if (!require(ggplot2)) install.packages("ggplot2")
# readlines of original dataset from the url directly
dt.original <- readLines("http://mlr.cs.umass.edu/ml/machine-learning-databases/abalone/abalone.data")
#leave the original copy and make data frame as dt
dt <- as.data.frame(dt.original, stringsAsFactors = F)
#split the elements of character vector by the separator ","
dt <- strsplit(dt$dt, ",")
#combine lists elements into a dataframe and also set the frame as data table
dt <- ldply(dt)
dt <- as.data.table(dt)
n <- readLines("http://mlr.cs.umass.edu/ml/machine-learning-databases/abalone/abalone.names")
n <- n[89:97]
#create for loop to change the name
l <- list(NULL)
for(line in 1:length(n)){
lines <- unlist(strsplit(n[line], '\t'))[2]
l[line] <- lines
names <- unlist(l)
}
#set dt names
colnames(dt) <- names
# check the data
head(dt)
## Sex Length Diameter Height Whole weight Shucked weight Viscera weight
## 1: M 0.455 0.365 0.095 0.514 0.2245 0.101
## 2: M 0.35 0.265 0.09 0.2255 0.0995 0.0485
## 3: F 0.53 0.42 0.135 0.677 0.2565 0.1415
## 4: M 0.44 0.365 0.125 0.516 0.2155 0.114
## 5: I 0.33 0.255 0.08 0.205 0.0895 0.0395
## 6: I 0.425 0.3 0.095 0.3515 0.141 0.0775
## Shell weight Rings
## 1: 0.15 15
## 2: 0.07 7
## 3: 0.21 9
## 4: 0.155 10
## 5: 0.055 7
## 6: 0.12 8
dim(dt)
## [1] 4177 9
str(dt)
## Classes 'data.table' and 'data.frame': 4177 obs. of 9 variables:
## $ Sex : chr "M" "M" "F" "M" ...
## $ Length : chr "0.455" "0.35" "0.53" "0.44" ...
## $ Diameter : chr "0.365" "0.265" "0.42" "0.365" ...
## $ Height : chr "0.095" "0.09" "0.135" "0.125" ...
## $ Whole weight : chr "0.514" "0.2255" "0.677" "0.516" ...
## $ Shucked weight: chr "0.2245" "0.0995" "0.2565" "0.2155" ...
## $ Viscera weight: chr "0.101" "0.0485" "0.1415" "0.114" ...
## $ Shell weight : chr "0.15" "0.07" "0.21" "0.155" ...
## $ Rings : chr "15" "7" "9" "10" ...
## - attr(*, ".internal.selfref")=<externalptr>
# change format
dt$Sex <- ifelse(dt$Sex == 'M', 0, 1) # Male == 0, Female == 1
dt[,2:9] <- lapply(dt[,2:9], function (x) as.numeric(as.character(x)))
# check the data again
str(dt)
## Classes 'data.table' and 'data.frame': 4177 obs. of 9 variables:
## $ Sex : num 0 0 1 0 1 1 1 1 0 1 ...
## $ Length : num 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...
## $ Diameter : num 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
## $ Height : num 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
## $ Whole weight : num 0.514 0.226 0.677 0.516 0.205 ...
## $ Shucked weight: num 0.2245 0.0995 0.2565 0.2155 0.0895 ...
## $ Viscera weight: num 0.101 0.0485 0.1415 0.114 0.0395 ...
## $ Shell weight : num 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
## $ Rings : num 15 7 9 10 7 8 20 16 9 19 ...
## - attr(*, ".internal.selfref")=<externalptr>
table(is.na(dt))
##
## FALSE
## 37593
summary(dt)
## Sex Length Diameter Height
## Min. :0.0000 Min. :0.075 Min. :0.0550 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.450 1st Qu.:0.3500 1st Qu.:0.1150
## Median :1.0000 Median :0.545 Median :0.4250 Median :0.1400
## Mean :0.6342 Mean :0.524 Mean :0.4079 Mean :0.1395
## 3rd Qu.:1.0000 3rd Qu.:0.615 3rd Qu.:0.4800 3rd Qu.:0.1650
## Max. :1.0000 Max. :0.815 Max. :0.6500 Max. :1.1300
## Whole weight Shucked weight Viscera weight Shell weight
## Min. :0.0020 Min. :0.0010 Min. :0.0005 Min. :0.0015
## 1st Qu.:0.4415 1st Qu.:0.1860 1st Qu.:0.0935 1st Qu.:0.1300
## Median :0.7995 Median :0.3360 Median :0.1710 Median :0.2340
## Mean :0.8287 Mean :0.3594 Mean :0.1806 Mean :0.2388
## 3rd Qu.:1.1530 3rd Qu.:0.5020 3rd Qu.:0.2530 3rd Qu.:0.3290
## Max. :2.8255 Max. :1.4880 Max. :0.7600 Max. :1.0050
## Rings
## Min. : 1.000
## 1st Qu.: 8.000
## Median : 9.000
## Mean : 9.934
## 3rd Qu.:11.000
## Max. :29.000
knitr::kable(head(dt))
Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings |
---|---|---|---|---|---|---|---|---|
0 | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 |
0 | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 |
1 | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 |
0 | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |
1 | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 |
1 | 0.425 | 0.300 | 0.095 | 0.3515 | 0.1410 | 0.0775 | 0.120 | 8 |
pairs(dt, gap = 1)
We will build a model to prediict rings
according to the explanatory variables.
attach(dt)
lm <- lm(Rings ~ Sex + Length + Diameter + Height + `Whole weight` + `Shucked weight` + `Viscera weight` + `Shell weight`)
summary(lm)
##
## Call:
## lm(formula = Rings ~ Sex + Length + Diameter + Height + `Whole weight` +
## `Shucked weight` + `Viscera weight` + `Shell weight`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.9064 -1.3447 -0.3859 0.8984 13.9779
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.32279 0.27600 12.039 < 2e-16 ***
## Sex -0.38289 0.07348 -5.211 1.97e-07 ***
## Length -1.33763 1.81962 -0.735 0.462
## Diameter 13.02142 2.23104 5.836 5.74e-09 ***
## Height 11.69866 1.54349 7.579 4.25e-14 ***
## `Whole weight` 9.19536 0.73043 12.589 < 2e-16 ***
## `Shucked weight` -20.31284 0.82096 -24.743 < 2e-16 ***
## `Viscera weight` -9.79915 1.29995 -7.538 5.82e-14 ***
## `Shell weight` 8.62593 1.13323 7.612 3.32e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.211 on 4168 degrees of freedom
## Multiple R-squared: 0.5307, Adjusted R-squared: 0.5298
## F-statistic: 589.1 on 8 and 4168 DF, p-value: < 2.2e-16
We will perform backwards elimination
as the Length
shows the highest p-value.
lm <- update(lm, .~. -Length)
summary(lm)
##
## Call:
## lm(formula = Rings ~ Sex + Diameter + Height + `Whole weight` +
## `Shucked weight` + `Viscera weight` + `Shell weight`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.8630 -1.3419 -0.3845 0.8962 14.0207
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.24776 0.25642 12.666 < 2e-16 ***
## Sex -0.38422 0.07345 -5.231 1.77e-07 ***
## Diameter 11.55128 0.98894 11.680 < 2e-16 ***
## Height 11.66746 1.54282 7.562 4.84e-14 ***
## `Whole weight` 9.20263 0.73032 12.601 < 2e-16 ***
## `Shucked weight` -20.36181 0.81821 -24.886 < 2e-16 ***
## `Viscera weight` -9.88548 1.29456 -7.636 2.76e-14 ***
## `Shell weight` 8.65156 1.13263 7.638 2.71e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.211 on 4169 degrees of freedom
## Multiple R-squared: 0.5306, Adjusted R-squared: 0.5298
## F-statistic: 673.3 on 7 and 4169 DF, p-value: < 2.2e-16
Since all variables p-value is less than our predetermined threshold of 0.05, we stop the backward elimination process.
Nevertheless, all of these predictors have p-values below our significance threshold, so we have no reason to exclude any specific predictor. We decide to include all predictors in the final model:
\(\widehat{ring} = 3.24 -0.38Sex + 11.55Diameter + 11.66Height + 9.20WholeWeight -20.36ShuckedWeight -9.88VisceraWeight + 8.65ShellWeight\)
The number of degrees of freedom in each subsequent model increases as predictors are excluded, as expected (only 1 as there is no missing values in updated model).
The Adjusted R-squared did not change. The Adjusted R-squred is considerably small, thus we should not read too much into the model.
Let’s build a model that has at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term.
Sex
is dichotomous
variable. Let’s take a look at variables one by one to find suitable quadratic term.
lm.diameter <- lm(Rings ~ Diameter)
summary(lm.diameter)
##
## Call:
## lm(formula = Rings ~ Diameter)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1868 -1.6932 -0.7200 0.9066 15.9999
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.3186 0.1727 13.42 <2e-16 ***
## Diameter 18.6699 0.4115 45.37 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.639 on 4175 degrees of freedom
## Multiple R-squared: 0.3302, Adjusted R-squared: 0.3301
## F-statistic: 2059 on 1 and 4175 DF, p-value: < 2.2e-16
lm.height <- lm(Rings ~ Height)
summary(lm.height)
##
## Call:
## lm(formula = Rings ~ Height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.496 -1.657 -0.607 0.839 17.112
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.9385 0.1443 27.30 <2e-16 ***
## Height 42.9714 0.9904 43.39 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.677 on 4175 degrees of freedom
## Multiple R-squared: 0.3108, Adjusted R-squared: 0.3106
## F-statistic: 1882 on 1 and 4175 DF, p-value: < 2.2e-16
lm.whole.weight <- lm(Rings ~ `Whole weight`)
summary(lm.whole.weight)
##
## Call:
## lm(formula = Rings ~ `Whole weight`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2693 -1.7518 -0.6874 1.0177 15.7029
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.98924 0.08244 84.78 <2e-16 ***
## `Whole weight` 3.55291 0.08562 41.50 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.713 on 4175 degrees of freedom
## Multiple R-squared: 0.292, Adjusted R-squared: 0.2919
## F-statistic: 1722 on 1 and 4175 DF, p-value: < 2.2e-16
lm.shucked <- lm(Rings ~ `Shucked weight`)
summary(lm.shucked)
##
## Call:
## lm(formula = Rings ~ `Shucked weight`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.7428 -1.8756 -0.7878 1.0253 17.2795
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.73664 0.08613 89.82 <2e-16 ***
## `Shucked weight` 6.11363 0.20393 29.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.925 on 4175 degrees of freedom
## Multiple R-squared: 0.1771, Adjusted R-squared: 0.1769
## F-statistic: 898.8 on 1 and 4175 DF, p-value: < 2.2e-16
lm.viscera <- lm(Rings ~ `Viscera weight`)
summary(lm.viscera)
##
## Call:
## lm(formula = Rings ~ `Viscera weight`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.5200 -1.7622 -0.7097 1.0310 16.9782
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.25743 0.08307 87.37 <2e-16 ***
## `Viscera weight` 14.81923 0.39322 37.69 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.785 on 4175 degrees of freedom
## Multiple R-squared: 0.2538, Adjusted R-squared: 0.2537
## F-statistic: 1420 on 1 and 4175 DF, p-value: < 2.2e-16
lm.shell <- lm(Rings~`Shell weight`)
summary(lm.shell)
##
## Call:
## lm(formula = Rings ~ `Shell weight`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9830 -1.6005 -0.5843 0.9390 15.6334
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.46212 0.07715 83.76 <2e-16 ***
## `Shell weight` 14.53568 0.27908 52.08 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.51 on 4175 degrees of freedom
## Multiple R-squared: 0.3938, Adjusted R-squared: 0.3937
## F-statistic: 2713 on 1 and 4175 DF, p-value: < 2.2e-16
The heighst Adjusted Rsqured value shows with the Shell weight
explanatory variable.
For quadratic term
, We could assume the heavier abalones are, the older they are. Let’s choose this term and square it to obtain our quadratic term.
For Dichotomous vs. quantative interaction
, I will use the 1 Dichotomous variable, Sex
with the Shell weight
to see the interaction between them.
# Quadratic variable
dt$q1 <- dt$`Shell weight`^2
# Dichotomous vs. quantative interaction
dt$q2 <- dt$Sex * dt$`Shell weight`
lm <- lm(Rings ~ Sex + Length + Diameter + Height + `Whole weight` + `Shucked weight` + `Viscera weight` + `Shell weight` + q1 + q2, data = dt)
summary(lm)
##
## Call:
## lm(formula = Rings ~ Sex + Length + Diameter + Height + `Whole weight` +
## `Shucked weight` + `Viscera weight` + `Shell weight` + q1 +
## q2, data = dt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.0817 -1.3316 -0.3506 0.9214 14.0725
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.1473 0.3425 15.030 < 2e-16 ***
## Sex -0.5857 0.1589 -3.687 0.00023 ***
## Length -4.5676 1.8259 -2.502 0.01240 *
## Diameter 7.2304 2.2763 3.176 0.00150 **
## Height 9.3971 1.5421 6.094 1.2e-09 ***
## `Whole weight` 9.7878 0.7241 13.518 < 2e-16 ***
## `Shucked weight` -19.8969 0.8134 -24.460 < 2e-16 ***
## `Viscera weight` -10.6837 1.2874 -8.299 < 2e-16 ***
## `Shell weight` 22.9485 1.9359 11.854 < 2e-16 ***
## q1 -17.9504 1.8347 -9.784 < 2e-16 ***
## q2 1.1057 0.5421 2.040 0.04145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.184 on 4166 degrees of freedom
## Multiple R-squared: 0.5421, Adjusted R-squared: 0.541
## F-statistic: 493.3 on 10 and 4166 DF, p-value: < 2.2e-16
We perform the backwards elimination in this updated model again by removing a variable with highest p-value.
lm <- update(lm, .~. -q2)
summary(lm)
##
## Call:
## lm(formula = Rings ~ Sex + Length + Diameter + Height + `Whole weight` +
## `Shucked weight` + `Viscera weight` + `Shell weight` + q1,
## data = dt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.1381 -1.3308 -0.3428 0.9180 14.1754
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.86518 0.31341 15.524 < 2e-16 ***
## Sex -0.29804 0.07311 -4.076 4.66e-05 ***
## Length -4.52567 1.82647 -2.478 0.01326 *
## Diameter 7.40277 2.27557 3.253 0.00115 **
## Height 9.40414 1.54267 6.096 1.19e-09 ***
## `Whole weight` 9.78598 0.72433 13.510 < 2e-16 ***
## `Shucked weight` -20.00633 0.81197 -24.639 < 2e-16 ***
## `Viscera weight` -10.61075 1.28736 -8.242 2.24e-16 ***
## `Shell weight` 23.82324 1.88849 12.615 < 2e-16 ***
## q1 -18.27535 1.82847 -9.995 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.185 on 4167 degrees of freedom
## Multiple R-squared: 0.5417, Adjusted R-squared: 0.5407
## F-statistic: 547.2 on 9 and 4167 DF, p-value: < 2.2e-16
lm <- update(lm, .~. -Length)
summary(lm)
##
## Call:
## lm(formula = Rings ~ Sex + Diameter + Height + `Whole weight` +
## `Shucked weight` + `Viscera weight` + `Shell weight` + q1,
## data = dt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.0586 -1.3387 -0.3601 0.9177 14.3073
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.55230 0.28702 15.861 < 2e-16 ***
## Sex -0.30609 0.07309 -4.188 2.87e-05 ***
## Diameter 2.82376 1.32863 2.125 0.0336 *
## Height 9.40112 1.54362 6.090 1.23e-09 ***
## `Whole weight` 9.78426 0.72477 13.500 < 2e-16 ***
## `Shucked weight` -20.18021 0.80943 -24.931 < 2e-16 ***
## `Viscera weight` -10.85881 1.28425 -8.455 < 2e-16 ***
## `Shell weight` 23.24936 1.87539 12.397 < 2e-16 ***
## q1 -17.48415 1.80148 -9.705 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.186 on 4168 degrees of freedom
## Multiple R-squared: 0.541, Adjusted R-squared: 0.5401
## F-statistic: 614.1 on 8 and 4168 DF, p-value: < 2.2e-16
lm <- update(lm, .~. -Diameter)
summary(lm)
##
## Call:
## lm(formula = Rings ~ Sex + Height + `Whole weight` + `Shucked weight` +
## `Viscera weight` + `Shell weight` + q1, data = dt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.2094 -1.3343 -0.3540 0.9182 14.3959
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.06787 0.15347 33.022 < 2e-16 ***
## Sex -0.29632 0.07297 -4.061 4.98e-05 ***
## Height 9.82084 1.53158 6.412 1.59e-10 ***
## `Whole weight` 9.90165 0.72297 13.696 < 2e-16 ***
## `Shucked weight` -19.99672 0.80515 -24.836 < 2e-16 ***
## `Viscera weight` -10.79746 1.28447 -8.406 < 2e-16 ***
## `Shell weight` 25.74126 1.46429 17.579 < 2e-16 ***
## q1 -20.07549 1.32672 -15.132 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.187 on 4169 degrees of freedom
## Multiple R-squared: 0.5405, Adjusted R-squared: 0.5397
## F-statistic: 700.6 on 7 and 4169 DF, p-value: < 2.2e-16
We stop the backward elimination process as all variables’ p-value is less than our predetermined threshold of 0.05.
Our model equation is:
\(\widehat{ring} = 3.24 -0.29Sex + 9.82Height + 9.90WholeWeight -19.99ShuckedWeight -10.79VisceraWeight + 25.74ShellWeight -20.07(Sex*WholeWeight)\)
Based on \(R^2\) value, the model explains 53.97% of variability in the data.
Height
coef. predicts that 9.82 rings increase in height.
Whole weight
coef. predicts that 9.90 rings increase in whole weight.
Shell weight
coef. predicts that 25.74 rings increase in shell weight.
Sex
coef. predicts that -0.29 rings decrease in gender. (Sex == 0 presents male.)
Shucked weight
coef. predicts that -19.99 rings decrease in shucked weight.
Viscera weight
coef. predicts that -10.79 rings decrease in viscera weight.
Sex * WholeWeight
coef. predicts that -20.07 rings decrease in the quadratic term value.
M <- subset(dt, select = c(Rings, Sex, Height, `Whole weight`, `Shucked weight`, `Shell weight`, `q2`))
corrplot(cor(M), method="number")
While the coef. of variables influence the rings of abalone, I would say that this model really is not a great fit and there is room for improvement.
ggplot(dt) +
geom_point(aes(Rings, lm$residuals)) +
geom_hline(yintercept=0, color='blue')
plot(fitted(lm),resid(lm))
qqnorm(resid(lm))
qqline(resid(lm))
hist(lm$residuals)
par(mfrow = c(2,2))
plot(lm)
Based on the Residuals vs. Fitted plot, there are some outliers in the data. We see the that residuals roughly follow the indicated line. We can see a bit more of a pattern and some obvious nonlinearities, leading us to be slightly more cautious about concluding that the residuals are normally distributed. We should not necessarily reject the model based on this one test, but the results should serve as a reminder that the model is imperfect.