STAT321 Project 3: Multiple Linear Regression Models on Catepillar Data

Section 1: Introduction and Data

In this project, I will be using a Best Subset of Predictors approach to fit multiple linear regression models to find which predictors create the most effective model. I will create models using Mallow’s Cp, forward selection, backwards elimination, and stepwise selection to find which variables prodcue the best model. Then, I will repeat these steps after performing a natural logarithmic transformation on the response variable. A portion of the dataset I will be using is shown below.

##   Instar ActiveFeeding Fgp Mgp     Mass   LogMass   Intake  LogIntake WetFrass
## 1      1             Y   Y   Y 0.002064 -2.685290 0.165118 -0.7822056 0.000241
## 2      1             Y   N   N 0.005191 -2.284749 0.201008 -0.6967867 0.000063
## 3      2             N   Y   N 0.005603 -2.251579 0.189125 -0.7232511 0.001401
## 4      2             Y   N   N 0.019300 -1.714443 0.283280 -0.5477841 0.002045
## 5      2             N   Y   Y 0.029300 -1.533132 0.259569 -0.5857472 0.005377
## 6      3             Y   Y   N 0.062600 -1.203426 0.327864 -0.4843063 0.029500
##   LogWetFrass DryFrass LogDryFrass     Cassim LogCassim   Nfrass LogNfrass
## 1   -3.617983 0.000208   -3.681937 0.01422378 -1.846985 6.61e-06 -5.179510
## 2   -4.200659 0.000061   -4.214670 0.01739189 -1.759653 1.03e-06 -5.986783
## 3   -2.853562 0.000969   -3.013676 0.01639923 -1.785177 2.78e-05 -4.555794
## 4   -2.689307 0.001834   -2.736601 0.02392468 -1.621154 4.64e-05 -4.333480
## 5   -2.269460 0.003523   -2.453087 0.02122857 -1.673079 9.97e-05 -4.001301
## 6   -1.530178 0.000789   -3.102923 0.02836365 -1.547238 1.84e-05 -4.735567
##        Nassim LogNassim
## 1 0.001858999 -2.730721
## 2 0.002270091 -2.643957
## 3 0.002302210 -2.637855
## 4 0.003041352 -2.516933
## 5 0.002791898 -2.554100
## 6 0.003627464 -2.440397

Section 2: Multiple Linear Regression Model with Nassim as Response

## 
## Call:
## lm(formula = Nassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + 
##     Intake + WetFrass + DryFrass + Cassim + Nfrass, data = caterpillars)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0027588 -0.0001865 -0.0000518  0.0000977  0.0045538 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -3.427e-05  1.885e-04  -0.182 0.855859    
## Instar          2.006e-06  5.948e-05   0.034 0.973123    
## ActiveFeedingY  9.271e-05  1.307e-04   0.709 0.478697    
## FgpY           -7.843e-05  1.483e-04  -0.529 0.597298    
## MgpY            8.781e-05  1.211e-04   0.725 0.469109    
## Mass            5.681e-05  4.617e-05   1.231 0.219635    
## Intake         -6.759e-03  7.115e-04  -9.500  < 2e-16 ***
## WetFrass       -1.554e-03  4.078e-04  -3.809 0.000177 ***
## DryFrass        8.487e-02  5.100e-03  16.641  < 2e-16 ***
## Cassim          2.048e-01  7.483e-03  27.362  < 2e-16 ***
## Nfrass         -9.653e-01  5.586e-02 -17.282  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0007354 on 243 degrees of freedom
##   (13 observations deleted due to missingness)
## Multiple R-squared:  0.9981, Adjusted R-squared:  0.998 
## F-statistic: 1.287e+04 on 10 and 243 DF,  p-value: < 2.2e-16

This model uses all of the predictor variables provided by the data to fit a linear model. As shown in the model, the R-squared value is 0.9981, which mean this model provides a nearly perfect linear fit with all of the variables included.

Section 3: Model Using Mallow’s Cp with Nassim as Response

## The best subset is:
##     Intake,    DryFrass,    Cassim,    Nfrass
## with Mallow's Cp =14.99251.

Based on this Mallow’s Cp model, it appears the model is most effective with 4 predictor variables: Intake, DryFrass, Cassim, and Nfrass. These variables give us a Mallow’s Cp value of 14.99251. Mallow’s Cp cirteria states that a good model should have a value that is rougly equivalent to the number of predictors + 1. Because our model has 10 predictors, I believe 14.99251 is an acceptable Cp value. After testing different numbers of included predictors, I found that 4 gave the best fitted model.

Section 4: Forward Selection with Nassim as Response

Section 4.1: Forward Selection

## Start:  AIC=-2080.9
## Nassim ~ 1
## 
##                 Df Sum of Sq      RSS     AIC
## + Cassim         1  0.068653 0.001081 -3137.3
## + Intake         1  0.066616 0.003118 -2868.2
## + DryFrass       1  0.056004 0.013730 -2491.7
## + Nfrass         1  0.049286 0.020448 -2390.5
## + WetFrass       1  0.046622 0.023112 -2359.4
## + Instar         1  0.036239 0.033496 -2265.2
## + Mass           1  0.026941 0.042793 -2202.9
## + ActiveFeeding  1  0.005302 0.064432 -2099.0
## + Mgp            1  0.000870 0.068864 -2082.1
## <none>                       0.069734 -2080.9
## + Fgp            1  0.000010 0.069724 -2078.9
## 
## Step:  AIC=-3137.3
## Nassim ~ Cassim
## 
##                 Df  Sum of Sq        RSS     AIC
## + Nfrass         1 0.00066146 0.00041941 -3375.8
## + WetFrass       1 0.00065541 0.00042546 -3372.1
## + Mass           1 0.00037668 0.00070420 -3244.1
## + DryFrass       1 0.00031902 0.00076185 -3224.1
## + Intake         1 0.00027061 0.00081027 -3208.5
## + Fgp            1 0.00020422 0.00087665 -3188.5
## + ActiveFeeding  1 0.00008228 0.00099860 -3155.4
## + Mgp            1 0.00002349 0.00105738 -3140.9
## + Instar         1 0.00001551 0.00106536 -3139.0
## <none>                        0.00108087 -3137.3
## 
## Step:  AIC=-3375.75
## Nassim ~ Cassim + Nfrass
## 
##                 Df  Sum of Sq        RSS     AIC
## + DryFrass       1 2.2225e-04 0.00019717 -3565.5
## + Intake         1 8.3816e-05 0.00033560 -3430.4
## + Instar         1 5.5925e-05 0.00036349 -3410.1
## + Mass           1 1.6583e-05 0.00040283 -3384.0
## + WetFrass       1 1.0456e-05 0.00040896 -3380.2
## + Fgp            1 1.0151e-05 0.00040926 -3380.0
## + Mgp            1 3.7440e-06 0.00041567 -3376.0
## <none>                        0.00041941 -3375.8
## + ActiveFeeding  1 4.9600e-07 0.00041892 -3374.1
## 
## Step:  AIC=-3565.47
## Nassim ~ Cassim + Nfrass + DryFrass
## 
##                 Df  Sum of Sq        RSS     AIC
## + Intake         1 5.7116e-05 0.00014005 -3650.4
## + Mass           1 8.8580e-06 0.00018831 -3575.1
## + WetFrass       1 3.9390e-06 0.00019323 -3568.6
## + Instar         1 2.1660e-06 0.00019500 -3566.3
## <none>                        0.00019717 -3565.5
## + ActiveFeeding  1 1.1730e-06 0.00019599 -3565.0
## + Fgp            1 2.6500e-07 0.00019690 -3563.8
## + Mgp            1 2.1800e-07 0.00019695 -3563.8
## 
## Step:  AIC=-3650.35
## Nassim ~ Cassim + Nfrass + DryFrass + Intake
## 
##                 Df  Sum of Sq        RSS     AIC
## + WetFrass       1 7.0914e-06 0.00013296 -3661.6
## <none>                        0.00014005 -3650.4
## + Instar         1 4.5630e-07 0.00013959 -3649.2
## + Mass           1 1.8200e-07 0.00013987 -3648.7
## + Mgp            1 1.6950e-07 0.00013988 -3648.7
## + ActiveFeeding  1 8.2000e-09 0.00014004 -3648.4
## + Fgp            1 0.0000e+00 0.00014005 -3648.4
## 
## Step:  AIC=-3661.55
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass
## 
##                 Df  Sum of Sq        RSS     AIC
## + Mass           1 1.0564e-06 0.00013190 -3661.6
## <none>                        0.00013296 -3661.6
## + Mgp            1 1.8389e-07 0.00013277 -3659.9
## + Instar         1 1.7244e-07 0.00013279 -3659.9
## + Fgp            1 6.9200e-08 0.00013289 -3659.7
## + ActiveFeeding  1 9.3100e-09 0.00013295 -3659.6
## 
## Step:  AIC=-3661.58
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + Mass
## 
##                 Df  Sum of Sq        RSS     AIC
## <none>                        0.00013190 -3661.6
## + ActiveFeeding  1 1.9569e-07 0.00013171 -3660.0
## + Mgp            1 1.8323e-07 0.00013172 -3659.9
## + Instar         1 1.7590e-09 0.00013190 -3659.6
## + Fgp            1 1.0440e-09 0.00013190 -3659.6

Section 4.2: Forward Selection Model

## 
## Call:
## lm(formula = Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + 
##     Mass, data = caterpillars)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0027704 -0.0001662 -0.0000398  0.0001088  0.0045810 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.027e-06  6.633e-05  -0.015 0.987660    
## Cassim       2.040e-01  7.375e-03  27.658  < 2e-16 ***
## Nfrass      -9.622e-01  5.348e-02 -17.993  < 2e-16 ***
## DryFrass     8.374e-02  4.738e-03  17.676  < 2e-16 ***
## Intake      -6.650e-03  6.955e-04  -9.562  < 2e-16 ***
## WetFrass    -1.522e-03  3.941e-04  -3.862 0.000144 ***
## Mass         5.521e-05  3.925e-05   1.406 0.160839    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0007308 on 247 degrees of freedom
##   (13 observations deleted due to missingness)
## Multiple R-squared:  0.9981, Adjusted R-squared:  0.9981 
## F-statistic: 2.172e+04 on 6 and 247 DF,  p-value: < 2.2e-16

Based on this forward selection model, it appears 6 predictor variables made the best model: Cassim, Nfrass, DryFrass, Intake, WetFrass, and Mass. This model has an R-squared of 0.9981, meaning it is an almost perfect linear model with these variables.

Section 5: Backwards Elimination Model with Nassim as Response

Section 5.1: Backwards Elimination

## Start:  AIC=-3654.54
## Nassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + Intake + 
##     WetFrass + DryFrass + Cassim + Nfrass
## 
##                 Df  Sum of Sq        RSS     AIC
## - Instar         1 0.00000000 0.00013140 -3656.5
## - Fgp            1 0.00000015 0.00013155 -3656.3
## - ActiveFeeding  1 0.00000027 0.00013167 -3656.0
## - Mgp            1 0.00000028 0.00013169 -3656.0
## - Mass           1 0.00000082 0.00013222 -3655.0
## <none>                        0.00013140 -3654.5
## - WetFrass       1 0.00000785 0.00013925 -3641.8
## - Intake         1 0.00004880 0.00018020 -3576.3
## - DryFrass       1 0.00014974 0.00028114 -3463.4
## - Nfrass         1 0.00016150 0.00029291 -3452.9
## - Cassim         1 0.00040486 0.00053626 -3299.3
## 
## Step:  AIC=-3656.54
## Nassim ~ ActiveFeeding + Fgp + Mgp + Mass + Intake + WetFrass + 
##     DryFrass + Cassim + Nfrass
## 
##                 Df  Sum of Sq        RSS     AIC
## - Fgp            1 0.00000015 0.00013156 -3658.2
## - ActiveFeeding  1 0.00000027 0.00013167 -3658.0
## - Mgp            1 0.00000028 0.00013169 -3658.0
## - Mass           1 0.00000098 0.00013238 -3656.7
## <none>                        0.00013140 -3656.5
## - WetFrass       1 0.00000823 0.00013963 -3643.1
## - Intake         1 0.00004890 0.00018030 -3578.2
## - DryFrass       1 0.00015435 0.00028575 -3461.2
## - Nfrass         1 0.00016929 0.00030070 -3448.3
## - Cassim         1 0.00040488 0.00053628 -3301.3
## 
## Step:  AIC=-3658.25
## Nassim ~ ActiveFeeding + Mgp + Mass + Intake + WetFrass + DryFrass + 
##     Cassim + Nfrass
## 
##                 Df  Sum of Sq        RSS     AIC
## - Mgp            1 0.00000015 0.00013171 -3660.0
## - ActiveFeeding  1 0.00000016 0.00013172 -3659.9
## <none>                        0.00013156 -3658.2
## - Mass           1 0.00000122 0.00013277 -3657.9
## - WetFrass       1 0.00000811 0.00013966 -3645.1
## - Intake         1 0.00004903 0.00018059 -3579.8
## - DryFrass       1 0.00016505 0.00029661 -3453.8
## - Nfrass         1 0.00017200 0.00030355 -3447.9
## - Cassim         1 0.00040767 0.00053922 -3301.9
## 
## Step:  AIC=-3659.96
## Nassim ~ ActiveFeeding + Mass + Intake + WetFrass + DryFrass + 
##     Cassim + Nfrass
## 
##                 Df  Sum of Sq        RSS     AIC
## - ActiveFeeding  1 0.00000020 0.00013190 -3661.6
## <none>                        0.00013171 -3660.0
## - Mass           1 0.00000124 0.00013295 -3659.6
## - WetFrass       1 0.00000811 0.00013981 -3646.8
## - Intake         1 0.00004900 0.00018071 -3581.6
## - DryFrass       1 0.00016696 0.00029866 -3454.0
## - Nfrass         1 0.00017187 0.00030357 -3449.9
## - Cassim         1 0.00040870 0.00054041 -3303.4
## 
## Step:  AIC=-3661.58
## Nassim ~ Mass + Intake + WetFrass + DryFrass + Cassim + Nfrass
## 
##            Df  Sum of Sq        RSS     AIC
## <none>                   0.00013190 -3661.6
## - Mass      1 0.00000106 0.00013296 -3661.6
## - WetFrass  1 0.00000797 0.00013987 -3648.7
## - Intake    1 0.00004882 0.00018073 -3583.6
## - DryFrass  1 0.00016685 0.00029875 -3455.9
## - Nfrass    1 0.00017289 0.00030479 -3450.8
## - Cassim    1 0.00040850 0.00054041 -3305.4

Section 5.2: Backwards Elimination Model

## 
## Call:
## lm(formula = Nassim ~ Mass + Intake + WetFrass + DryFrass + Cassim + 
##     Nfrass, data = caterpillars)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0027704 -0.0001662 -0.0000398  0.0001088  0.0045810 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.027e-06  6.633e-05  -0.015 0.987660    
## Mass         5.521e-05  3.925e-05   1.406 0.160839    
## Intake      -6.650e-03  6.955e-04  -9.562  < 2e-16 ***
## WetFrass    -1.522e-03  3.941e-04  -3.862 0.000144 ***
## DryFrass     8.374e-02  4.738e-03  17.676  < 2e-16 ***
## Cassim       2.040e-01  7.375e-03  27.658  < 2e-16 ***
## Nfrass      -9.622e-01  5.348e-02 -17.993  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0007308 on 247 degrees of freedom
##   (13 observations deleted due to missingness)
## Multiple R-squared:  0.9981, Adjusted R-squared:  0.9981 
## F-statistic: 2.172e+04 on 6 and 247 DF,  p-value: < 2.2e-16

Based on this backwards elimination model, it appears the same 6 predictors were chosen (Mass, Intake, WetFrass, DryFrass, Cassim, and Nfrass). This model gives an R-squared of 0.9981, similar to the last model. This means that the chosen predictor variables provide a nearly perfect linear model.

Section 6: Stepwise Selection with Nassim as Response

Section 6.1: Stepwise Selection Model with Null Model

## Start:  AIC=-2080.9
## Nassim ~ 1

## 
## Call:
## lm(formula = Nassim ~ 1, data = caterpillars)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.016029 -0.011200 -0.008595  0.002465  0.050394 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.013768   0.001042   13.22   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0166 on 253 degrees of freedom
##   (13 observations deleted due to missingness)

Section 6.2: Stepwise Selection Model with Full Model

## Start:  AIC=-3654.54
## Nassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + Intake + 
##     WetFrass + DryFrass + Cassim + Nfrass
## 
##                 Df  Sum of Sq        RSS     AIC
## - Instar         1 0.00000000 0.00013140 -3656.5
## - Fgp            1 0.00000015 0.00013155 -3656.3
## - ActiveFeeding  1 0.00000027 0.00013167 -3656.0
## - Mgp            1 0.00000028 0.00013169 -3656.0
## - Mass           1 0.00000082 0.00013222 -3655.0
## <none>                        0.00013140 -3654.5
## - WetFrass       1 0.00000785 0.00013925 -3641.8
## - Intake         1 0.00004880 0.00018020 -3576.3
## - DryFrass       1 0.00014974 0.00028114 -3463.4
## - Nfrass         1 0.00016150 0.00029291 -3452.9
## - Cassim         1 0.00040486 0.00053626 -3299.3
## 
## Step:  AIC=-3656.54
## Nassim ~ ActiveFeeding + Fgp + Mgp + Mass + Intake + WetFrass + 
##     DryFrass + Cassim + Nfrass
## 
##                 Df  Sum of Sq        RSS     AIC
## - Fgp            1 0.00000015 0.00013156 -3658.2
## - ActiveFeeding  1 0.00000027 0.00013167 -3658.0
## - Mgp            1 0.00000028 0.00013169 -3658.0
## - Mass           1 0.00000098 0.00013238 -3656.7
## <none>                        0.00013140 -3656.5
## + Instar         1 0.00000000 0.00013140 -3654.5
## - WetFrass       1 0.00000823 0.00013963 -3643.1
## - Intake         1 0.00004890 0.00018030 -3578.2
## - DryFrass       1 0.00015435 0.00028575 -3461.2
## - Nfrass         1 0.00016929 0.00030070 -3448.3
## - Cassim         1 0.00040488 0.00053628 -3301.3
## 
## Step:  AIC=-3658.25
## Nassim ~ ActiveFeeding + Mgp + Mass + Intake + WetFrass + DryFrass + 
##     Cassim + Nfrass
## 
##                 Df  Sum of Sq        RSS     AIC
## - Mgp            1 0.00000015 0.00013171 -3660.0
## - ActiveFeeding  1 0.00000016 0.00013172 -3659.9
## <none>                        0.00013156 -3658.2
## - Mass           1 0.00000122 0.00013277 -3657.9
## + Fgp            1 0.00000015 0.00013140 -3656.5
## + Instar         1 0.00000000 0.00013155 -3656.3
## - WetFrass       1 0.00000811 0.00013966 -3645.1
## - Intake         1 0.00004903 0.00018059 -3579.8
## - DryFrass       1 0.00016505 0.00029661 -3453.8
## - Nfrass         1 0.00017200 0.00030355 -3447.9
## - Cassim         1 0.00040767 0.00053922 -3301.9
## 
## Step:  AIC=-3659.96
## Nassim ~ ActiveFeeding + Mass + Intake + WetFrass + DryFrass + 
##     Cassim + Nfrass
## 
##                 Df  Sum of Sq        RSS     AIC
## - ActiveFeeding  1 0.00000020 0.00013190 -3661.6
## <none>                        0.00013171 -3660.0
## - Mass           1 0.00000124 0.00013295 -3659.6
## + Mgp            1 0.00000015 0.00013156 -3658.2
## + Fgp            1 0.00000002 0.00013169 -3658.0
## + Instar         1 0.00000000 0.00013171 -3658.0
## - WetFrass       1 0.00000811 0.00013981 -3646.8
## - Intake         1 0.00004900 0.00018071 -3581.6
## - DryFrass       1 0.00016696 0.00029866 -3454.0
## - Nfrass         1 0.00017187 0.00030357 -3449.9
## - Cassim         1 0.00040870 0.00054041 -3303.4
## 
## Step:  AIC=-3661.58
## Nassim ~ Mass + Intake + WetFrass + DryFrass + Cassim + Nfrass
## 
##                 Df  Sum of Sq        RSS     AIC
## <none>                        0.00013190 -3661.6
## - Mass           1 0.00000106 0.00013296 -3661.6
## + ActiveFeeding  1 0.00000020 0.00013171 -3660.0
## + Mgp            1 0.00000018 0.00013172 -3659.9
## + Instar         1 0.00000000 0.00013190 -3659.6
## + Fgp            1 0.00000000 0.00013190 -3659.6
## - WetFrass       1 0.00000797 0.00013987 -3648.7
## - Intake         1 0.00004882 0.00018073 -3583.6
## - DryFrass       1 0.00016685 0.00029875 -3455.9
## - Nfrass         1 0.00017289 0.00030479 -3450.8
## - Cassim         1 0.00040850 0.00054041 -3305.4

## 
## Call:
## lm(formula = Nassim ~ Mass + Intake + WetFrass + DryFrass + Cassim + 
##     Nfrass, data = caterpillars)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0027704 -0.0001662 -0.0000398  0.0001088  0.0045810 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.027e-06  6.633e-05  -0.015 0.987660    
## Mass         5.521e-05  3.925e-05   1.406 0.160839    
## Intake      -6.650e-03  6.955e-04  -9.562  < 2e-16 ***
## WetFrass    -1.522e-03  3.941e-04  -3.862 0.000144 ***
## DryFrass     8.374e-02  4.738e-03  17.676  < 2e-16 ***
## Cassim       2.040e-01  7.375e-03  27.658  < 2e-16 ***
## Nfrass      -9.622e-01  5.348e-02 -17.993  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0007308 on 247 degrees of freedom
##   (13 observations deleted due to missingness)
## Multiple R-squared:  0.9981, Adjusted R-squared:  0.9981 
## F-statistic: 2.172e+04 on 6 and 247 DF,  p-value: < 2.2e-16

This stepwise selection model uses the same 6 predictor vairables as the last 2 models, and results in the same R-squared value of 0.9981.

Section 7: Multiple Linear Regression Model with LogNassim as Response

## 
## Call:
## lm(formula = LogNassim ~ Instar + ActiveFeeding + Fgp + Mgp + 
##     Mass + Intake + WetFrass + DryFrass + Cassim + Nfrass, data = caterpillars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.78898 -0.06022  0.00886  0.06858  0.24441 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -3.007731   0.031993 -94.012  < 2e-16 ***
## Instar           0.159333   0.010096  15.781  < 2e-16 ***
## ActiveFeedingY   0.114118   0.022204   5.140 5.68e-07 ***
## FgpY            -0.030847   0.025140  -1.227 0.221004    
## MgpY             0.040446   0.020550   1.968 0.050186 .  
## Mass             0.029500   0.007872   3.748 0.000223 ***
## Intake           0.075744   0.120911   0.626 0.531612    
## WetFrass        -0.163631   0.072638  -2.253 0.025176 *  
## DryFrass         1.027704   0.865211   1.188 0.236074    
## Cassim           1.675558   1.269629   1.320 0.188175    
## Nfrass         -24.273257   9.856176  -2.463 0.014485 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1247 on 242 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:  0.9414, Adjusted R-squared:  0.939 
## F-statistic:   389 on 10 and 242 DF,  p-value: < 2.2e-16

This model, now using the variable Nassim with a logarithmic transformation applied, displays an R-squared of 0.9414. While this value is lower than the R-squared using Nassim as the repsonse with no transformation, this is still an appropriate model.

Section 8: Model using Mallow’s Cp and LogNassim as Response

## The best subset is:
##     Instar,    ActiveFeedingY,    Mass,    Intake,    Nfrass
## with Mallow's Cp =10.81244.

Using the Mallow’s Cp method and testing different combinations of predictor variables, I have found that the 5 variables (Instar, ActiveFeedingY, Mass, Intake, and Nfrass) provide the best model. As explained above in the first Mallow’s Cp model, an acceptable Cp value is rougly the number of predictors + 1. The Mallow’s Cp value for this model is 10.81244, meaning the chosen predictor variables fit the model well.

Section 9: Forward Selection Model with LogNassim as Response

Section 9.1: Forward Selection

## Start:  AIC=-344.83
## LogNassim ~ 1
## 
##                 Df Sum of Sq    RSS     AIC
## + Cassim         1    51.793 12.441 -758.13
## + Intake         1    51.343 12.891 -749.15
## + Instar         1    49.423 14.811 -714.03
## + DryFrass       1    45.252 18.982 -651.25
## + Nfrass         1    38.414 25.820 -573.41
## + WetFrass       1    35.542 28.692 -546.72
## + Mass           1    26.816 37.418 -479.54
## + ActiveFeeding  1     3.785 60.449 -358.19
## <none>                       64.234 -344.83
## + Mgp            1     0.457 63.777 -344.63
## + Fgp            1     0.013 64.221 -342.88
## 
## Step:  AIC=-758.13
## LogNassim ~ Cassim
## 
##                 Df Sum of Sq     RSS     AIC
## + Instar         1    7.0508  5.3905 -967.73
## + WetFrass       1    0.5231 11.9183 -767.00
## + Nfrass         1    0.4133 12.0280 -764.68
## <none>                       12.4413 -758.13
## + Fgp            1    0.0474 12.3939 -757.10
## + ActiveFeeding  1    0.0435 12.3978 -757.02
## + Mass           1    0.0434 12.3979 -757.01
## + Intake         1    0.0255 12.4158 -756.65
## + DryFrass       1    0.0005 12.4408 -756.14
## + Mgp            1    0.0001 12.4412 -756.13
## 
## Step:  AIC=-967.73
## LogNassim ~ Cassim + Instar
## 
##                 Df Sum of Sq    RSS      AIC
## + WetFrass       1   0.83640 4.5541 -1008.39
## + Nfrass         1   0.82454 4.5660 -1007.73
## + ActiveFeeding  1   0.76451 4.6260 -1004.43
## + DryFrass       1   0.46343 4.9271  -988.48
## + Mass           1   0.40932 4.9812  -985.71
## + Fgp            1   0.39412 4.9964  -984.94
## + Intake         1   0.29274 5.0978  -979.86
## + Mgp            1   0.21690 5.1736  -976.12
## <none>                       5.3905  -967.73
## 
## Step:  AIC=-1008.39
## LogNassim ~ Cassim + Instar + WetFrass
## 
##                 Df Sum of Sq    RSS     AIC
## + ActiveFeeding  1   0.34919 4.2049 -1026.6
## + Intake         1   0.08084 4.4733 -1010.9
## + Mass           1   0.07747 4.4766 -1010.7
## + DryFrass       1   0.07457 4.4795 -1010.6
## + Mgp            1   0.06495 4.4892 -1010.0
## + Fgp            1   0.06049 4.4936 -1009.8
## <none>                       4.5541 -1008.4
## + Nfrass         1   0.01054 4.5436 -1007.0
## 
## Step:  AIC=-1026.58
## LogNassim ~ Cassim + Instar + WetFrass + ActiveFeeding
## 
##            Df Sum of Sq    RSS     AIC
## + Mass      1  0.220727 3.9842 -1038.2
## + DryFrass  1  0.056339 4.1486 -1028.0
## + Intake    1  0.044955 4.1600 -1027.3
## + Mgp       1  0.044135 4.1608 -1027.2
## <none>                  4.2049 -1026.6
## + Nfrass    1  0.003303 4.2016 -1024.8
## + Fgp       1  0.000029 4.2049 -1024.6
## 
## Step:  AIC=-1038.22
## LogNassim ~ Cassim + Instar + WetFrass + ActiveFeeding + Mass
## 
##            Df Sum of Sq    RSS     AIC
## + Intake    1  0.076640 3.9075 -1041.1
## + DryFrass  1  0.055300 3.9289 -1039.8
## + Mgp       1  0.038576 3.9456 -1038.7
## <none>                  3.9842 -1038.2
## + Nfrass    1  0.011178 3.9730 -1036.9
## + Fgp       1  0.004124 3.9801 -1036.5
## 
## Step:  AIC=-1041.13
## LogNassim ~ Cassim + Instar + WetFrass + ActiveFeeding + Mass + 
##     Intake
## 
##            Df Sum of Sq    RSS     AIC
## + Nfrass    1  0.077230 3.8303 -1044.2
## + Mgp       1  0.040835 3.8667 -1041.8
## <none>                  3.9075 -1041.1
## + DryFrass  1  0.000736 3.9068 -1039.2
## + Fgp       1  0.000022 3.9075 -1039.1
## 
## Step:  AIC=-1044.18
## LogNassim ~ Cassim + Instar + WetFrass + ActiveFeeding + Mass + 
##     Intake + Nfrass
## 
##            Df Sum of Sq    RSS     AIC
## + Mgp       1  0.033554 3.7968 -1044.4
## <none>                  3.8303 -1044.2
## + DryFrass  1  0.007178 3.8231 -1042.7
## + Fgp       1  0.000313 3.8300 -1042.2
## 
## Step:  AIC=-1044.41
## LogNassim ~ Cassim + Instar + WetFrass + ActiveFeeding + Mass + 
##     Intake + Nfrass + Mgp
## 
##            Df Sum of Sq    RSS     AIC
## <none>                  3.7968 -1044.4
## + Fgp       1  0.013163 3.7836 -1043.3
## + DryFrass  1  0.011691 3.7851 -1043.2

Section 9.2: Forward Selection Model

## 
## Call:
## lm(formula = LogNassim ~ Cassim + Instar + WetFrass + ActiveFeeding + 
##     Mass + Intake + Nfrass + Mgp, data = caterpillars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77533 -0.05999  0.01044  0.07452  0.24687 
## 
## Coefficients:
##                  Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)     -3.025013   0.029780 -101.580  < 2e-16 ***
## Cassim           0.619114   0.726558    0.852  0.39498    
## Instar           0.161550   0.009955   16.229  < 2e-16 ***
## WetFrass        -0.155714   0.072417   -2.150  0.03252 *  
## ActiveFeedingY   0.104387   0.020658    5.053 8.53e-07 ***
## Mass             0.032654   0.007570    4.314 2.33e-05 ***
## Intake           0.183970   0.061253    3.003  0.00295 ** 
## Nfrass         -19.120975   9.018416   -2.120  0.03500 *  
## MgpY             0.025978   0.017690    1.468  0.14327    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1247 on 244 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:  0.9409, Adjusted R-squared:  0.939 
## F-statistic: 485.5 on 8 and 244 DF,  p-value: < 2.2e-16

This forward selection model shows the 8 predictor variables that best fit the model (Cassim, Instar, WetFrass, ActiveFeedingY, Mass, Intake, Nfrass, and MgpY). Using these predictors, the model has an R-squared of 0.9409. While this R-squared is slightly lower than our first model, these variables still provide a very good fit.

Section 11: Backwards Elimination with LogNassim as Response

Section 11.1: Backwards Elimination

## Start:  AIC=-1042.76
## LogNassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + Intake + 
##     WetFrass + DryFrass + Cassim + Nfrass
## 
##                 Df Sum of Sq    RSS      AIC
## - Intake         1    0.0061 3.7678 -1044.35
## - DryFrass       1    0.0219 3.7836 -1043.29
## - Fgp            1    0.0234 3.7851 -1043.19
## - Cassim         1    0.0271 3.7887 -1042.94
## <none>                       3.7617 -1042.76
## - Mgp            1    0.0602 3.8219 -1040.74
## - WetFrass       1    0.0789 3.8405 -1039.51
## - Nfrass         1    0.0943 3.8559 -1038.49
## - Mass           1    0.2183 3.9800 -1030.48
## - ActiveFeeding  1    0.4106 4.1723 -1018.55
## - Instar         1    3.8712 7.6329  -865.73
## 
## Step:  AIC=-1044.35
## LogNassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + WetFrass + 
##     DryFrass + Cassim + Nfrass
## 
##                 Df Sum of Sq    RSS      AIC
## - Fgp            1    0.0270 3.7947 -1044.54
## <none>                       3.7678 -1044.35
## - Mgp            1    0.0683 3.8361 -1041.80
## - WetFrass       1    0.0804 3.8482 -1041.01
## - Nfrass         1    0.1007 3.8685 -1039.67
## - DryFrass       1    0.1690 3.9368 -1035.25
## - Mass           1    0.2167 3.9845 -1032.20
## - ActiveFeeding  1    0.4230 4.1908 -1019.43
## - Cassim         1    2.2823 6.0501  -926.53
## - Instar         1    3.8975 7.6653  -866.66
## 
## Step:  AIC=-1044.54
## LogNassim ~ Instar + ActiveFeeding + Mgp + Mass + WetFrass + 
##     DryFrass + Cassim + Nfrass
## 
##                 Df Sum of Sq    RSS      AIC
## <none>                       3.7947 -1044.54
## - Mgp            1    0.0431 3.8379 -1043.68
## - WetFrass       1    0.0735 3.8683 -1041.69
## - Nfrass         1    0.0874 3.8821 -1040.78
## - DryFrass       1    0.1424 3.9371 -1037.22
## - Mass           1    0.2449 4.0396 -1030.72
## - ActiveFeeding  1    0.4017 4.1964 -1021.09
## - Cassim         1    2.3977 6.1924  -922.65
## - Instar         1    3.9705 7.7652  -865.39

Section 11.2: Backwards Elimination Model

## 
## Call:
## lm(formula = LogNassim ~ Instar + ActiveFeeding + Mgp + Mass + 
##     WetFrass + DryFrass + Cassim + Nfrass, data = caterpillars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.78035 -0.05863  0.00678  0.07494  0.25192 
## 
## Coefficients:
##                  Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)     -3.021182   0.030145 -100.221  < 2e-16 ***
## Instar           0.160675   0.010056   15.978  < 2e-16 ***
## ActiveFeedingY   0.104865   0.020633    5.082 7.42e-07 ***
## MgpY             0.029488   0.017705    1.666  0.09709 .  
## Mass             0.029448   0.007421    3.968 9.53e-05 ***
## WetFrass        -0.157402   0.072381   -2.175  0.03062 *  
## DryFrass         1.278422   0.422482    3.026  0.00274 ** 
## Cassim           2.497754   0.201163   12.417  < 2e-16 ***
## Nfrass         -22.962052   9.684892   -2.371  0.01852 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1247 on 244 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:  0.9409, Adjusted R-squared:  0.939 
## F-statistic: 485.8 on 8 and 244 DF,  p-value: < 2.2e-16

This backwards model has the same predictor variables selected as the forward selection model, and provides the same R-squared value, making it a good linear model.

Section 12: Stepwise Selection with LogNassim as Response

Section 12.1: Stepwise Selection Model with Null Model

## Start:  AIC=-344.83
## LogNassim ~ 1

## 
## Call:
## lm(formula = LogNassim ~ 1, data = caterpillars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9611 -0.4284 -0.1305  0.3669  0.9611 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.15381    0.03174  -67.86   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5049 on 252 degrees of freedom
##   (14 observations deleted due to missingness)

Section 12.2: Stepwise Selection Model with Full Model

## Start:  AIC=-1042.76
## LogNassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + Intake + 
##     WetFrass + DryFrass + Cassim + Nfrass
## 
##                 Df Sum of Sq    RSS      AIC
## - Intake         1    0.0061 3.7678 -1044.35
## - DryFrass       1    0.0219 3.7836 -1043.29
## - Fgp            1    0.0234 3.7851 -1043.19
## - Cassim         1    0.0271 3.7887 -1042.94
## <none>                       3.7617 -1042.76
## - Mgp            1    0.0602 3.8219 -1040.74
## - WetFrass       1    0.0789 3.8405 -1039.51
## - Nfrass         1    0.0943 3.8559 -1038.49
## - Mass           1    0.2183 3.9800 -1030.48
## - ActiveFeeding  1    0.4106 4.1723 -1018.55
## - Instar         1    3.8712 7.6329  -865.73
## 
## Step:  AIC=-1044.35
## LogNassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + WetFrass + 
##     DryFrass + Cassim + Nfrass
## 
##                 Df Sum of Sq    RSS      AIC
## - Fgp            1    0.0270 3.7947 -1044.54
## <none>                       3.7678 -1044.35
## + Intake         1    0.0061 3.7617 -1042.76
## - Mgp            1    0.0683 3.8361 -1041.80
## - WetFrass       1    0.0804 3.8482 -1041.01
## - Nfrass         1    0.1007 3.8685 -1039.67
## - DryFrass       1    0.1690 3.9368 -1035.25
## - Mass           1    0.2167 3.9845 -1032.20
## - ActiveFeeding  1    0.4230 4.1908 -1019.43
## - Cassim         1    2.2823 6.0501  -926.53
## - Instar         1    3.8975 7.6653  -866.66
## 
## Step:  AIC=-1044.54
## LogNassim ~ Instar + ActiveFeeding + Mgp + Mass + WetFrass + 
##     DryFrass + Cassim + Nfrass
## 
##                 Df Sum of Sq    RSS      AIC
## <none>                       3.7947 -1044.54
## + Fgp            1    0.0270 3.7678 -1044.35
## - Mgp            1    0.0431 3.8379 -1043.68
## + Intake         1    0.0097 3.7851 -1043.19
## - WetFrass       1    0.0735 3.8683 -1041.69
## - Nfrass         1    0.0874 3.8821 -1040.78
## - DryFrass       1    0.1424 3.9371 -1037.22
## - Mass           1    0.2449 4.0396 -1030.72
## - ActiveFeeding  1    0.4017 4.1964 -1021.09
## - Cassim         1    2.3977 6.1924  -922.65
## - Instar         1    3.9705 7.7652  -865.39

## 
## Call:
## lm(formula = LogNassim ~ Instar + ActiveFeeding + Mgp + Mass + 
##     WetFrass + DryFrass + Cassim + Nfrass, data = caterpillars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.78035 -0.05863  0.00678  0.07494  0.25192 
## 
## Coefficients:
##                  Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)     -3.021182   0.030145 -100.221  < 2e-16 ***
## Instar           0.160675   0.010056   15.978  < 2e-16 ***
## ActiveFeedingY   0.104865   0.020633    5.082 7.42e-07 ***
## MgpY             0.029488   0.017705    1.666  0.09709 .  
## Mass             0.029448   0.007421    3.968 9.53e-05 ***
## WetFrass        -0.157402   0.072381   -2.175  0.03062 *  
## DryFrass         1.278422   0.422482    3.026  0.00274 ** 
## Cassim           2.497754   0.201163   12.417  < 2e-16 ***
## Nfrass         -22.962052   9.684892   -2.371  0.01852 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1247 on 244 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:  0.9409, Adjusted R-squared:  0.939 
## F-statistic: 485.8 on 8 and 244 DF,  p-value: < 2.2e-16

The stepwise selection model with the full model has the same predictor variables selected to fit the model as the forward selection and backwards elimination models. The R-squared is still 0.9409.