Discussion week12

Week 12, Regression 2

Fundamentals of Computational Mathematics

CUNY MSDS DATA 605, Fall 2018

Rose Koh

11/12/2018

Using R,build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?

Data Description

Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope – a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem.

From the original data examples with missing values were removed (the majority having the predicted value missing), and the ranges of the continuous values have been scaled for use with an ANN (by dividing by 200).

Name        Data Type   Meas.   Description
----        ---------   -----   -----------
Sex     nominal         M, F, and I (infant)
Length      continuous  mm  Longest shell measurement
Diameter    continuous  mm  perpendicular to length
Height      continuous  mm  with meat in shell
Whole weight    continuous  grams   whole abalone
Shucked weight  continuous  grams   weight of meat
Viscera weight  continuous  grams   gut weight (after bleeding)
Shell weight    continuous  grams   after being dried
Rings       integer         +1.5 gives the age in years

As `Rings` +1.5 gives the age in years, we will use `Rings` as response variable.

Load Data

if (!require(RCurl)) install.packages("RCurl")
if (!require(plyr)) install.packages("plyr")
if (!require(data.table)) install.packages("data.table")
if (!require(corrplot)) install.packages("corrplot")
if (!require(ggplot2)) install.packages("ggplot2")
# readlines of original dataset from the url directly
dt.original <- readLines("http://mlr.cs.umass.edu/ml/machine-learning-databases/abalone/abalone.data")

#leave the original copy and make data frame as dt
dt <- as.data.frame(dt.original, stringsAsFactors = F)

#split the elements of character vector by the separator ","
dt <- strsplit(dt$dt, ",")

#combine lists elements into a dataframe and also set the frame as data table
dt <- ldply(dt)
dt <- as.data.table(dt)

n <- readLines("http://mlr.cs.umass.edu/ml/machine-learning-databases/abalone/abalone.names")
n <- n[89:97]

#create for loop to change the name
l <- list(NULL)
for(line in 1:length(n)){
    lines <- unlist(strsplit(n[line], '\t'))[2]
    l[line] <- lines
    names <- unlist(l)
}

#set dt names
colnames(dt) <- names
# check the data
head(dt)
##    Sex Length Diameter Height Whole weight Shucked weight Viscera weight
## 1:   M  0.455    0.365  0.095        0.514         0.2245          0.101
## 2:   M   0.35    0.265   0.09       0.2255         0.0995         0.0485
## 3:   F   0.53     0.42  0.135        0.677         0.2565         0.1415
## 4:   M   0.44    0.365  0.125        0.516         0.2155          0.114
## 5:   I   0.33    0.255   0.08        0.205         0.0895         0.0395
## 6:   I  0.425      0.3  0.095       0.3515          0.141         0.0775
##    Shell weight Rings
## 1:         0.15    15
## 2:         0.07     7
## 3:         0.21     9
## 4:        0.155    10
## 5:        0.055     7
## 6:         0.12     8
dim(dt)
## [1] 4177    9
str(dt)
## Classes 'data.table' and 'data.frame':   4177 obs. of  9 variables:
##  $ Sex           : chr  "M" "M" "F" "M" ...
##  $ Length        : chr  "0.455" "0.35" "0.53" "0.44" ...
##  $ Diameter      : chr  "0.365" "0.265" "0.42" "0.365" ...
##  $ Height        : chr  "0.095" "0.09" "0.135" "0.125" ...
##  $ Whole weight  : chr  "0.514" "0.2255" "0.677" "0.516" ...
##  $ Shucked weight: chr  "0.2245" "0.0995" "0.2565" "0.2155" ...
##  $ Viscera weight: chr  "0.101" "0.0485" "0.1415" "0.114" ...
##  $ Shell weight  : chr  "0.15" "0.07" "0.21" "0.155" ...
##  $ Rings         : chr  "15" "7" "9" "10" ...
##  - attr(*, ".internal.selfref")=<externalptr>
# change format
dt$Sex <- ifelse(dt$Sex == 'M', 0, 1) # Male == 0, Female == 1
dt[,2:9] <- lapply(dt[,2:9], function (x) as.numeric(as.character(x)))
# check the data again
str(dt)
## Classes 'data.table' and 'data.frame':   4177 obs. of  9 variables:
##  $ Sex           : num  0 0 1 0 1 1 1 1 0 1 ...
##  $ Length        : num  0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...
##  $ Diameter      : num  0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
##  $ Height        : num  0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
##  $ Whole weight  : num  0.514 0.226 0.677 0.516 0.205 ...
##  $ Shucked weight: num  0.2245 0.0995 0.2565 0.2155 0.0895 ...
##  $ Viscera weight: num  0.101 0.0485 0.1415 0.114 0.0395 ...
##  $ Shell weight  : num  0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
##  $ Rings         : num  15 7 9 10 7 8 20 16 9 19 ...
##  - attr(*, ".internal.selfref")=<externalptr>
table(is.na(dt))
## 
## FALSE 
## 37593
summary(dt)
##       Sex             Length         Diameter          Height      
##  Min.   :0.0000   Min.   :0.075   Min.   :0.0550   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.450   1st Qu.:0.3500   1st Qu.:0.1150  
##  Median :1.0000   Median :0.545   Median :0.4250   Median :0.1400  
##  Mean   :0.6342   Mean   :0.524   Mean   :0.4079   Mean   :0.1395  
##  3rd Qu.:1.0000   3rd Qu.:0.615   3rd Qu.:0.4800   3rd Qu.:0.1650  
##  Max.   :1.0000   Max.   :0.815   Max.   :0.6500   Max.   :1.1300  
##   Whole weight    Shucked weight   Viscera weight    Shell weight   
##  Min.   :0.0020   Min.   :0.0010   Min.   :0.0005   Min.   :0.0015  
##  1st Qu.:0.4415   1st Qu.:0.1860   1st Qu.:0.0935   1st Qu.:0.1300  
##  Median :0.7995   Median :0.3360   Median :0.1710   Median :0.2340  
##  Mean   :0.8287   Mean   :0.3594   Mean   :0.1806   Mean   :0.2388  
##  3rd Qu.:1.1530   3rd Qu.:0.5020   3rd Qu.:0.2530   3rd Qu.:0.3290  
##  Max.   :2.8255   Max.   :1.4880   Max.   :0.7600   Max.   :1.0050  
##      Rings       
##  Min.   : 1.000  
##  1st Qu.: 8.000  
##  Median : 9.000  
##  Mean   : 9.934  
##  3rd Qu.:11.000  
##  Max.   :29.000
knitr::kable(head(dt))
Sex Length Diameter Height Whole weight Shucked weight Viscera weight Shell weight Rings
0 0.455 0.365 0.095 0.5140 0.2245 0.1010 0.150 15
0 0.350 0.265 0.090 0.2255 0.0995 0.0485 0.070 7
1 0.530 0.420 0.135 0.6770 0.2565 0.1415 0.210 9
0 0.440 0.365 0.125 0.5160 0.2155 0.1140 0.155 10
1 0.330 0.255 0.080 0.2050 0.0895 0.0395 0.055 7
1 0.425 0.300 0.095 0.3515 0.1410 0.0775 0.120 8
pairs(dt, gap = 1)

Build model

We will build a model to prediict rings according to the explanatory variables.

attach(dt)
lm <- lm(Rings ~ Sex + Length + Diameter + Height + `Whole weight` + `Shucked weight` + `Viscera weight` + `Shell weight`) 
summary(lm)
## 
## Call:
## lm(formula = Rings ~ Sex + Length + Diameter + Height + `Whole weight` + 
##     `Shucked weight` + `Viscera weight` + `Shell weight`)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.9064  -1.3447  -0.3859   0.8984  13.9779 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        3.32279    0.27600  12.039  < 2e-16 ***
## Sex               -0.38289    0.07348  -5.211 1.97e-07 ***
## Length            -1.33763    1.81962  -0.735    0.462    
## Diameter          13.02142    2.23104   5.836 5.74e-09 ***
## Height            11.69866    1.54349   7.579 4.25e-14 ***
## `Whole weight`     9.19536    0.73043  12.589  < 2e-16 ***
## `Shucked weight` -20.31284    0.82096 -24.743  < 2e-16 ***
## `Viscera weight`  -9.79915    1.29995  -7.538 5.82e-14 ***
## `Shell weight`     8.62593    1.13323   7.612 3.32e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.211 on 4168 degrees of freedom
## Multiple R-squared:  0.5307, Adjusted R-squared:  0.5298 
## F-statistic: 589.1 on 8 and 4168 DF,  p-value: < 2.2e-16

We will perform backwards elimination as the Length shows the highest p-value.

lm <- update(lm, .~. -Length)
summary(lm)
## 
## Call:
## lm(formula = Rings ~ Sex + Diameter + Height + `Whole weight` + 
##     `Shucked weight` + `Viscera weight` + `Shell weight`)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.8630  -1.3419  -0.3845   0.8962  14.0207 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        3.24776    0.25642  12.666  < 2e-16 ***
## Sex               -0.38422    0.07345  -5.231 1.77e-07 ***
## Diameter          11.55128    0.98894  11.680  < 2e-16 ***
## Height            11.66746    1.54282   7.562 4.84e-14 ***
## `Whole weight`     9.20263    0.73032  12.601  < 2e-16 ***
## `Shucked weight` -20.36181    0.81821 -24.886  < 2e-16 ***
## `Viscera weight`  -9.88548    1.29456  -7.636 2.76e-14 ***
## `Shell weight`     8.65156    1.13263   7.638 2.71e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.211 on 4169 degrees of freedom
## Multiple R-squared:  0.5306, Adjusted R-squared:  0.5298 
## F-statistic: 673.3 on 7 and 4169 DF,  p-value: < 2.2e-16

Since all variables p-value is less than our predetermined threshold of 0.05, we stop the backward elimination process.

Nevertheless, all of these predictors have p-values below our significance threshold, so we have no reason to exclude any specific predictor. We decide to include all predictors in the final model:

\(\widehat{ring} = 3.24 -0.38Sex + 11.55Diameter + 11.66Height + 9.20WholeWeight -20.36ShuckedWeight -9.88VisceraWeight + 8.65ShellWeight\)

The number of degrees of freedom in each subsequent model increases as predictors are excluded, as expected (only 1 as there is no missing values in updated model).

The Adjusted R-squared did not change. The Adjusted R-squred is considerably small, thus we should not read too much into the model.

Let’s build a model that has at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term.

Sex is dichotomous variable. Let’s take a look at variables one by one to find suitable quadratic term.

lm.diameter <- lm(Rings ~ Diameter)
summary(lm.diameter)
## 
## Call:
## lm(formula = Rings ~ Diameter)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1868 -1.6932 -0.7200  0.9066 15.9999 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.3186     0.1727   13.42   <2e-16 ***
## Diameter     18.6699     0.4115   45.37   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.639 on 4175 degrees of freedom
## Multiple R-squared:  0.3302, Adjusted R-squared:  0.3301 
## F-statistic:  2059 on 1 and 4175 DF,  p-value: < 2.2e-16
lm.height <- lm(Rings ~ Height)
summary(lm.height)
## 
## Call:
## lm(formula = Rings ~ Height)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.496  -1.657  -0.607   0.839  17.112 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.9385     0.1443   27.30   <2e-16 ***
## Height       42.9714     0.9904   43.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.677 on 4175 degrees of freedom
## Multiple R-squared:  0.3108, Adjusted R-squared:  0.3106 
## F-statistic:  1882 on 1 and 4175 DF,  p-value: < 2.2e-16
lm.whole.weight <- lm(Rings ~ `Whole weight`)
summary(lm.whole.weight)
## 
## Call:
## lm(formula = Rings ~ `Whole weight`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2693 -1.7518 -0.6874  1.0177 15.7029 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     6.98924    0.08244   84.78   <2e-16 ***
## `Whole weight`  3.55291    0.08562   41.50   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.713 on 4175 degrees of freedom
## Multiple R-squared:  0.292,  Adjusted R-squared:  0.2919 
## F-statistic:  1722 on 1 and 4175 DF,  p-value: < 2.2e-16
lm.shucked <- lm(Rings ~ `Shucked weight`)
summary(lm.shucked)
## 
## Call:
## lm(formula = Rings ~ `Shucked weight`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.7428 -1.8756 -0.7878  1.0253 17.2795 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       7.73664    0.08613   89.82   <2e-16 ***
## `Shucked weight`  6.11363    0.20393   29.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.925 on 4175 degrees of freedom
## Multiple R-squared:  0.1771, Adjusted R-squared:  0.1769 
## F-statistic: 898.8 on 1 and 4175 DF,  p-value: < 2.2e-16
lm.viscera <- lm(Rings ~ `Viscera weight`)
summary(lm.viscera)
## 
## Call:
## lm(formula = Rings ~ `Viscera weight`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.5200 -1.7622 -0.7097  1.0310 16.9782 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       7.25743    0.08307   87.37   <2e-16 ***
## `Viscera weight` 14.81923    0.39322   37.69   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.785 on 4175 degrees of freedom
## Multiple R-squared:  0.2538, Adjusted R-squared:  0.2537 
## F-statistic:  1420 on 1 and 4175 DF,  p-value: < 2.2e-16
lm.shell <- lm(Rings~`Shell weight`)
summary(lm.shell)
## 
## Call:
## lm(formula = Rings ~ `Shell weight`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.9830 -1.6005 -0.5843  0.9390 15.6334 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     6.46212    0.07715   83.76   <2e-16 ***
## `Shell weight` 14.53568    0.27908   52.08   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.51 on 4175 degrees of freedom
## Multiple R-squared:  0.3938, Adjusted R-squared:  0.3937 
## F-statistic:  2713 on 1 and 4175 DF,  p-value: < 2.2e-16

The heighst Adjusted Rsqured value shows with the Shell weight explanatory variable.

For quadratic term, We could assume the heavier abalones are, the older they are. Let’s choose this term and square it to obtain our quadratic term.

For Dichotomous vs. quantative interaction, I will use the 1 Dichotomous variable, Sex with the Shell weight to see the interaction between them.

# Quadratic variable
dt$q1 <- dt$`Shell weight`^2
# Dichotomous vs. quantative interaction
dt$q2 <- dt$Sex * dt$`Shell weight`

lm <- lm(Rings ~ Sex + Length + Diameter + Height + `Whole weight` + `Shucked weight` + `Viscera weight` + `Shell weight` + q1 + q2, data = dt) 
summary(lm)
## 
## Call:
## lm(formula = Rings ~ Sex + Length + Diameter + Height + `Whole weight` + 
##     `Shucked weight` + `Viscera weight` + `Shell weight` + q1 + 
##     q2, data = dt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.0817 -1.3316 -0.3506  0.9214 14.0725 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        5.1473     0.3425  15.030  < 2e-16 ***
## Sex               -0.5857     0.1589  -3.687  0.00023 ***
## Length            -4.5676     1.8259  -2.502  0.01240 *  
## Diameter           7.2304     2.2763   3.176  0.00150 ** 
## Height             9.3971     1.5421   6.094  1.2e-09 ***
## `Whole weight`     9.7878     0.7241  13.518  < 2e-16 ***
## `Shucked weight` -19.8969     0.8134 -24.460  < 2e-16 ***
## `Viscera weight` -10.6837     1.2874  -8.299  < 2e-16 ***
## `Shell weight`    22.9485     1.9359  11.854  < 2e-16 ***
## q1               -17.9504     1.8347  -9.784  < 2e-16 ***
## q2                 1.1057     0.5421   2.040  0.04145 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.184 on 4166 degrees of freedom
## Multiple R-squared:  0.5421, Adjusted R-squared:  0.541 
## F-statistic: 493.3 on 10 and 4166 DF,  p-value: < 2.2e-16

We perform the backwards elimination in this updated model again by removing a variable with highest p-value.

lm <- update(lm, .~. -q2)
summary(lm)
## 
## Call:
## lm(formula = Rings ~ Sex + Length + Diameter + Height + `Whole weight` + 
##     `Shucked weight` + `Viscera weight` + `Shell weight` + q1, 
##     data = dt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.1381 -1.3308 -0.3428  0.9180 14.1754 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        4.86518    0.31341  15.524  < 2e-16 ***
## Sex               -0.29804    0.07311  -4.076 4.66e-05 ***
## Length            -4.52567    1.82647  -2.478  0.01326 *  
## Diameter           7.40277    2.27557   3.253  0.00115 ** 
## Height             9.40414    1.54267   6.096 1.19e-09 ***
## `Whole weight`     9.78598    0.72433  13.510  < 2e-16 ***
## `Shucked weight` -20.00633    0.81197 -24.639  < 2e-16 ***
## `Viscera weight` -10.61075    1.28736  -8.242 2.24e-16 ***
## `Shell weight`    23.82324    1.88849  12.615  < 2e-16 ***
## q1               -18.27535    1.82847  -9.995  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.185 on 4167 degrees of freedom
## Multiple R-squared:  0.5417, Adjusted R-squared:  0.5407 
## F-statistic: 547.2 on 9 and 4167 DF,  p-value: < 2.2e-16
lm <- update(lm, .~. -Length)
summary(lm)
## 
## Call:
## lm(formula = Rings ~ Sex + Diameter + Height + `Whole weight` + 
##     `Shucked weight` + `Viscera weight` + `Shell weight` + q1, 
##     data = dt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.0586 -1.3387 -0.3601  0.9177 14.3073 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        4.55230    0.28702  15.861  < 2e-16 ***
## Sex               -0.30609    0.07309  -4.188 2.87e-05 ***
## Diameter           2.82376    1.32863   2.125   0.0336 *  
## Height             9.40112    1.54362   6.090 1.23e-09 ***
## `Whole weight`     9.78426    0.72477  13.500  < 2e-16 ***
## `Shucked weight` -20.18021    0.80943 -24.931  < 2e-16 ***
## `Viscera weight` -10.85881    1.28425  -8.455  < 2e-16 ***
## `Shell weight`    23.24936    1.87539  12.397  < 2e-16 ***
## q1               -17.48415    1.80148  -9.705  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.186 on 4168 degrees of freedom
## Multiple R-squared:  0.541,  Adjusted R-squared:  0.5401 
## F-statistic: 614.1 on 8 and 4168 DF,  p-value: < 2.2e-16
lm <- update(lm, .~. -Diameter)
summary(lm)
## 
## Call:
## lm(formula = Rings ~ Sex + Height + `Whole weight` + `Shucked weight` + 
##     `Viscera weight` + `Shell weight` + q1, data = dt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.2094 -1.3343 -0.3540  0.9182 14.3959 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        5.06787    0.15347  33.022  < 2e-16 ***
## Sex               -0.29632    0.07297  -4.061 4.98e-05 ***
## Height             9.82084    1.53158   6.412 1.59e-10 ***
## `Whole weight`     9.90165    0.72297  13.696  < 2e-16 ***
## `Shucked weight` -19.99672    0.80515 -24.836  < 2e-16 ***
## `Viscera weight` -10.79746    1.28447  -8.406  < 2e-16 ***
## `Shell weight`    25.74126    1.46429  17.579  < 2e-16 ***
## q1               -20.07549    1.32672 -15.132  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.187 on 4169 degrees of freedom
## Multiple R-squared:  0.5405, Adjusted R-squared:  0.5397 
## F-statistic: 700.6 on 7 and 4169 DF,  p-value: < 2.2e-16

We stop the backward elimination process as all variables’ p-value is less than our predetermined threshold of 0.05.

Our model equation is:

\(\widehat{ring} = 3.24 -0.29Sex + 9.82Height + 9.90WholeWeight -19.99ShuckedWeight -10.79VisceraWeight + 25.74ShellWeight -20.07(Sex*WholeWeight)\)

Based on \(R^2\) value, the model explains 53.97% of variability in the data.

Height coef. predicts that 9.82 rings increase in height.

Whole weight coef. predicts that 9.90 rings increase in whole weight.

Shell weight coef. predicts that 25.74 rings increase in shell weight.

Sex coef. predicts that -0.29 rings decrease in gender. (Sex == 0 presents male.)

Shucked weight coef. predicts that -19.99 rings decrease in shucked weight.

Viscera weight coef. predicts that -10.79 rings decrease in viscera weight.

Sex * WholeWeight coef. predicts that -20.07 rings decrease in the quadratic term value.

M <- subset(dt, select = c(Rings, Sex, Height, `Whole weight`, `Shucked weight`, `Shell weight`, `q2`))
corrplot(cor(M), method="number")

While the coef. of variables influence the rings of abalone, I would say that this model really is not a great fit and there is room for improvement.

Residual Analysis

ggplot(dt) +
  geom_point(aes(Rings, lm$residuals)) +
  geom_hline(yintercept=0, color='blue')

plot(fitted(lm),resid(lm))

qqnorm(resid(lm))
qqline(resid(lm))

hist(lm$residuals)

par(mfrow = c(2,2))
plot(lm)

Based on the Residuals vs. Fitted plot, there are some outliers in the data. We see the that residuals roughly follow the indicated line. We can see a bit more of a pattern and some obvious nonlinearities, leading us to be slightly more cautious about concluding that the residuals are normally distributed. We should not necessarily reject the model based on this one test, but the results should serve as a reminder that the model is imperfect.