Section 1 Introduction

This report investigates the predictors influencing the Nassim variable in the Caterpillars dataset. Using various model selection methods this report aims to identify the most effective model for predicting Nassim. Each method is evaluated with different criteria (Mallow’s Cp and AIC) to achieve an optimal balance between model simplicity and predictive power.

Section 2: Best Subsets Selection Using Mallow’s Cp

The best subsets selection identifies combinations of predictors to balance goodness of fit and model complexity using Mallow’s Cp criterion. The predictors chosen by each subset and corresponding Cp values are shown below.

##   (Intercept) Instar ActiveFeedingY  FgpY  MgpY  Mass LogMass Intake LogIntake
## 1        TRUE  FALSE          FALSE FALSE FALSE FALSE   FALSE  FALSE     FALSE
## 2        TRUE  FALSE          FALSE FALSE FALSE FALSE   FALSE  FALSE     FALSE
## 3        TRUE  FALSE          FALSE FALSE FALSE FALSE   FALSE  FALSE     FALSE
## 4        TRUE  FALSE          FALSE FALSE FALSE FALSE   FALSE   TRUE     FALSE
## 5        TRUE  FALSE          FALSE FALSE FALSE FALSE   FALSE   TRUE     FALSE
## 6        TRUE  FALSE          FALSE FALSE FALSE FALSE   FALSE   TRUE     FALSE
## 7        TRUE  FALSE          FALSE FALSE FALSE FALSE   FALSE   TRUE     FALSE
## 8        TRUE  FALSE          FALSE FALSE FALSE FALSE   FALSE   TRUE     FALSE
##   WetFrass LogWetFrass DryFrass LogDryFrass Cassim LogCassim Nfrass LogNfrass
## 1    FALSE       FALSE    FALSE       FALSE   TRUE     FALSE  FALSE     FALSE
## 2    FALSE       FALSE    FALSE       FALSE   TRUE     FALSE   TRUE     FALSE
## 3    FALSE       FALSE     TRUE       FALSE   TRUE     FALSE   TRUE     FALSE
## 4    FALSE       FALSE     TRUE       FALSE   TRUE     FALSE   TRUE     FALSE
## 5     TRUE       FALSE     TRUE       FALSE   TRUE     FALSE   TRUE     FALSE
## 6    FALSE       FALSE     TRUE       FALSE   TRUE      TRUE   TRUE     FALSE
## 7    FALSE        TRUE     TRUE       FALSE   TRUE      TRUE   TRUE     FALSE
## 8     TRUE       FALSE     TRUE        TRUE   TRUE      TRUE   TRUE     FALSE
##   LogNassim
## 1     FALSE
## 2     FALSE
## 3     FALSE
## 4     FALSE
## 5     FALSE
## 6      TRUE
## 7      TRUE
## 8      TRUE
## [1] 2383.64220  818.70906  256.10841  113.15240   94.62556   53.12324   30.96409
## [8]   21.88728

The Cp values and the selected predictors show the model should use Nassim, Nfrass, and DryFrass to strike a balance between simplicity and accuracy.

Section 3 Forward Selection (AIC Criterion)

The forward selection method adds predictors that improve the AIC. The final model after forward selection is shown below:

## Start:  AIC=-2072.64
## Nassim ~ 1
## 
##                 Df Sum of Sq      RSS     AIC
## + Cassim         1  0.068441 0.001035 -3134.9
## + Intake         1  0.066558 0.002919 -2872.6
## + LogCassim      1  0.057992 0.011484 -2526.0
## + LogNassim      1  0.057931 0.011545 -2524.7
## + DryFrass       1  0.056780 0.012697 -2500.7
## + LogIntake      1  0.056396 0.013080 -2493.1
## + Nfrass         1  0.050478 0.018998 -2398.7
## + WetFrass       1  0.047288 0.022188 -2359.4
## + LogNfrass      1  0.042527 0.026950 -2310.2
## + LogWetFrass    1  0.042226 0.027250 -2307.4
## + LogDryFrass    1  0.040837 0.028639 -2294.8
## + Instar         1  0.036922 0.032554 -2262.4
## + LogMass        1  0.036665 0.032811 -2260.4
## + Mass           1  0.028824 0.040652 -2206.2
## + ActiveFeeding  1  0.005126 0.064350 -2090.0
## + Mgp            1  0.000831 0.068645 -2073.7
## <none>                       0.069476 -2072.6
## + Fgp            1  0.000003 0.069473 -2070.7
## 
## Step:  AIC=-3134.89
## Nassim ~ Cassim
## 
##                 Df  Sum of Sq        RSS     AIC
## + Nfrass         1 0.00061605 0.00041899 -3361.7
## + WetFrass       1 0.00061257 0.00042248 -3359.6
## + Mass           1 0.00033289 0.00070216 -3231.1
## + DryFrass       1 0.00027721 0.00075784 -3211.8
## + Intake         1 0.00023430 0.00080075 -3197.8
## + Fgp            1 0.00018866 0.00084638 -3183.8
## + LogNassim      1 0.00017216 0.00086289 -3178.9
## + LogCassim      1 0.00007681 0.00095824 -3152.4
## + ActiveFeeding  1 0.00007205 0.00096300 -3151.1
## + LogIntake      1 0.00004263 0.00099242 -3143.5
## + Instar         1 0.00002479 0.00101026 -3139.0
## + Mgp            1 0.00002096 0.00101409 -3138.1
## + LogDryFrass    1 0.00002060 0.00101445 -3138.0
## <none>                        0.00103505 -3134.9
## + LogNfrass      1 0.00000680 0.00102825 -3134.6
## + LogWetFrass    1 0.00000679 0.00102826 -3134.6
## + LogMass        1 0.00000194 0.00103310 -3133.4
## 
## Step:  AIC=-3361.69
## Nassim ~ Cassim + Nfrass
## 
##                 Df  Sum of Sq        RSS     AIC
## + DryFrass       1 2.2198e-04 0.00019702 -3550.6
## + Intake         1 8.4103e-05 0.00033489 -3416.4
## + LogIntake      1 8.2865e-05 0.00033613 -3415.4
## + LogNassim      1 7.6452e-05 0.00034254 -3410.7
## + LogDryFrass    1 6.9584e-05 0.00034941 -3405.6
## + LogNfrass      1 6.7226e-05 0.00035177 -3403.9
## + LogWetFrass    1 6.6936e-05 0.00035206 -3403.7
## + LogCassim      1 5.7766e-05 0.00036123 -3397.2
## + Instar         1 5.5543e-05 0.00036345 -3395.7
## + LogMass        1 4.2688e-05 0.00037631 -3386.9
## + Mass           1 1.6293e-05 0.00040270 -3369.7
## + WetFrass       1 1.0129e-05 0.00040886 -3365.9
## + Fgp            1 9.9900e-06 0.00040900 -3365.8
## + Mgp            1 3.8450e-06 0.00041515 -3362.0
## <none>                        0.00041899 -3361.7
## + ActiveFeeding  1 5.1400e-07 0.00041848 -3360.0
## 
## Step:  AIC=-3550.6
## Nassim ~ Cassim + Nfrass + DryFrass
## 
##                 Df  Sum of Sq        RSS     AIC
## + Intake         1 5.6991e-05 0.00014003 -3635.0
## + Mass           1 8.7400e-06 0.00018828 -3560.1
## + LogNassim      1 6.4120e-06 0.00019060 -3557.0
## + WetFrass       1 3.8270e-06 0.00019319 -3553.6
## + LogIntake      1 3.6180e-06 0.00019340 -3553.3
## + LogWetFrass    1 3.2610e-06 0.00019375 -3552.8
## + LogNfrass      1 3.1800e-06 0.00019384 -3552.7
## + LogDryFrass    1 3.1310e-06 0.00019389 -3552.7
## + LogMass        1 3.1040e-06 0.00019391 -3552.6
## + LogCassim      1 2.9150e-06 0.00019410 -3552.4
## + Instar         1 2.0630e-06 0.00019495 -3551.3
## <none>                        0.00019702 -3550.6
## + ActiveFeeding  1 1.1900e-06 0.00019583 -3550.1
## + Fgp            1 2.8200e-07 0.00019673 -3549.0
## + Mgp            1 2.3300e-07 0.00019678 -3548.9
## 
## Step:  AIC=-3634.99
## Nassim ~ Cassim + Nfrass + DryFrass + Intake
## 
##                 Df  Sum of Sq        RSS     AIC
## + WetFrass       1 8.0703e-06 0.00013195 -3648.0
## + LogNassim      1 4.5558e-06 0.00013547 -3641.4
## + LogIntake      1 2.0466e-06 0.00013798 -3636.7
## + LogCassim      1 1.8411e-06 0.00013818 -3636.3
## <none>                        0.00014003 -3635.0
## + LogDryFrass    1 9.6280e-07 0.00013906 -3634.7
## + LogMass        1 8.7910e-07 0.00013915 -3634.6
## + LogNfrass      1 8.4520e-07 0.00013918 -3634.5
## + LogWetFrass    1 8.0140e-07 0.00013922 -3634.4
## + Instar         1 4.8590e-07 0.00013954 -3633.9
## + Mass           1 1.8900e-07 0.00013984 -3633.3
## + Mgp            1 1.7640e-07 0.00013985 -3633.3
## + ActiveFeeding  1 9.1000e-09 0.00014002 -3633.0
## + Fgp            1 1.0000e-10 0.00014003 -3633.0
## 
## Step:  AIC=-3648.01
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass
## 
##                 Df  Sum of Sq        RSS     AIC
## + LogNassim      1 3.5556e-06 0.00012840 -3652.9
## + LogIntake      1 1.4474e-06 0.00013051 -3648.8
## + Mass           1 1.3392e-06 0.00013061 -3648.6
## + LogCassim      1 1.3039e-06 0.00013065 -3648.5
## <none>                        0.00013195 -3648.0
## + LogMass        1 7.8640e-07 0.00013117 -3647.5
## + LogWetFrass    1 6.8490e-07 0.00013127 -3647.3
## + LogDryFrass    1 5.9430e-07 0.00013136 -3647.2
## + LogNfrass      1 4.8430e-07 0.00013147 -3646.9
## + Instar         1 2.5120e-07 0.00013170 -3646.5
## + Mgp            1 2.3260e-07 0.00013172 -3646.5
## + Fgp            1 6.0500e-08 0.00013189 -3646.1
## + ActiveFeeding  1 1.6800e-08 0.00013194 -3646.0
## 
## Step:  AIC=-3652.92
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## + LogCassim      1 1.8967e-05 0.00010943 -3691.4
## + LogIntake      1 8.7034e-06 0.00011969 -3668.7
## + Instar         1 1.7618e-06 0.00012664 -3654.4
## + LogNfrass      1 1.2330e-06 0.00012717 -3653.4
## + LogDryFrass    1 1.0434e-06 0.00012736 -3653.0
## <none>                        0.00012840 -3652.9
## + LogWetFrass    1 9.2960e-07 0.00012747 -3652.8
## + LogMass        1 6.3520e-07 0.00012776 -3652.2
## + Mgp            1 2.2660e-07 0.00012817 -3651.4
## + Mass           1 1.4940e-07 0.00012825 -3651.2
## + ActiveFeeding  1 2.3600e-08 0.00012837 -3651.0
## + Fgp            1 1.7000e-09 0.00012840 -3650.9
## 
## Step:  AIC=-3691.36
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + LogNassim + 
##     LogCassim
## 
##                 Df  Sum of Sq        RSS     AIC
## + LogDryFrass    1 8.4341e-06 0.00010100 -3709.7
## + LogNfrass      1 8.3077e-06 0.00010112 -3709.3
## + LogWetFrass    1 8.1214e-06 0.00010131 -3708.9
## + Instar         1 7.5222e-06 0.00010191 -3707.4
## + LogIntake      1 4.9728e-06 0.00010446 -3701.1
## + LogMass        1 4.0833e-06 0.00010535 -3699.0
## + Mass           1 2.5019e-06 0.00010693 -3695.2
## <none>                        0.00010943 -3691.4
## + Mgp            1 4.4050e-07 0.00010899 -3690.4
## + ActiveFeeding  1 5.6200e-08 0.00010938 -3689.5
## + Fgp            1 1.4100e-08 0.00010942 -3689.4
## 
## Step:  AIC=-3709.65
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + LogNassim + 
##     LogCassim + LogDryFrass
## 
##                 Df  Sum of Sq        RSS     AIC
## + Mass           1 4.7464e-06 9.6251e-05 -3719.8
## + Mgp            1 1.0330e-06 9.9965e-05 -3710.3
## <none>                        1.0100e-04 -3709.7
## + Instar         1 7.3800e-07 1.0026e-04 -3709.5
## + Fgp            1 1.2750e-07 1.0087e-04 -3708.0
## + LogIntake      1 1.2500e-07 1.0087e-04 -3708.0
## + LogMass        1 8.6800e-08 1.0091e-04 -3707.9
## + LogWetFrass    1 8.0900e-08 1.0092e-04 -3707.9
## + ActiveFeeding  1 5.6400e-08 1.0094e-04 -3707.8
## + LogNfrass      1 1.3800e-08 1.0098e-04 -3707.7
## 
## Step:  AIC=-3719.83
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + LogNassim + 
##     LogCassim + LogDryFrass + Mass
## 
##                 Df  Sum of Sq        RSS     AIC
## + Instar         1 1.9779e-06 9.4273e-05 -3723.1
## + LogMass        1 1.8367e-06 9.4415e-05 -3722.7
## + LogIntake      1 1.7298e-06 9.4521e-05 -3722.4
## + Mgp            1 1.2380e-06 9.5013e-05 -3721.1
## + Fgp            1 1.0563e-06 9.5195e-05 -3720.6
## <none>                        9.6251e-05 -3719.8
## + LogWetFrass    1 6.5320e-07 9.5598e-05 -3719.6
## + LogNfrass      1 6.0008e-07 9.5651e-05 -3719.4
## + ActiveFeeding  1 3.0618e-07 9.5945e-05 -3718.6
## 
## Step:  AIC=-3723.08
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + LogNassim + 
##     LogCassim + LogDryFrass + Mass + Instar
## 
##                 Df  Sum of Sq        RSS     AIC
## + LogIntake      1 1.0451e-06 9.3228e-05 -3723.9
## <none>                        9.4273e-05 -3723.1
## + LogNfrass      1 4.5821e-07 9.3815e-05 -3722.3
## + Mgp            1 3.9407e-07 9.3879e-05 -3722.1
## + Fgp            1 2.8018e-07 9.3993e-05 -3721.8
## + LogMass        1 2.7435e-07 9.3999e-05 -3721.8
## + LogWetFrass    1 7.5200e-08 9.4198e-05 -3721.3
## + ActiveFeeding  1 1.6470e-08 9.4257e-05 -3721.1
## 
## Step:  AIC=-3723.9
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + LogNassim + 
##     LogCassim + LogDryFrass + Mass + Instar + LogIntake
## 
##                 Df  Sum of Sq        RSS     AIC
## <none>                        9.3228e-05 -3723.9
## + Mgp            1 3.6233e-07 9.2866e-05 -3722.9
## + LogNfrass      1 3.3530e-07 9.2893e-05 -3722.8
## + Fgp            1 2.5529e-07 9.2973e-05 -3722.6
## + ActiveFeeding  1 1.3964e-07 9.3089e-05 -3722.3
## + LogMass        1 1.2647e-07 9.3102e-05 -3722.2
## + LogWetFrass    1 2.9700e-08 9.3199e-05 -3722.0
## 
## Call:
## lm(formula = Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + 
##     LogNassim + LogCassim + LogDryFrass + Mass + Instar + LogIntake, 
##     data = caterpillars_clean)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.718e-03 -1.576e-04 -1.664e-05  1.462e-04  2.704e-03 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.774e-02  2.286e-03   7.759 2.42e-13 ***
## Cassim       1.906e-01  6.531e-03  29.190  < 2e-16 ***
## Nfrass      -8.224e-01  5.048e-02 -16.292  < 2e-16 ***
## DryFrass     7.973e-02  4.267e-03  18.686  < 2e-16 ***
## Intake      -6.094e-03  6.075e-04 -10.032  < 2e-16 ***
## WetFrass    -1.753e-03  3.648e-04  -4.807 2.70e-06 ***
## LogNassim    1.487e-02  1.545e-03   9.624  < 2e-16 ***
## LogCassim   -1.135e-02  1.524e-03  -7.445 1.71e-12 ***
## LogDryFrass -1.843e-04  1.785e-04  -1.032   0.3030    
## Mass         1.838e-04  4.315e-05   4.260 2.93e-05 ***
## Instar      -2.021e-04  1.105e-04  -1.828   0.0687 .  
## LogIntake   -2.175e-03  1.323e-03  -1.644   0.1016    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.000622 on 241 degrees of freedom
## Multiple R-squared:  0.9987, Adjusted R-squared:  0.9986 
## F-statistic: 1.631e+04 on 11 and 241 DF,  p-value: < 2.2e-16

The forward selection process highlights that predictors such as Cassim, Nfrass, and DryFrass play significant roles in explaining Nassim, with a high adjusted R-squared value, indicating strong predictive power

Section 4 Backward Elimination (AIC Criterion)

Backward elimination starts with the full model and removes predictors with the least impact based on AIC. The model resulting from backward elimination is as follows:

## Start:  AIC=-3714.18
## Nassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + LogMass + 
##     Intake + LogIntake + WetFrass + LogWetFrass + DryFrass + 
##     LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - Fgp            1 0.00000000 0.00009239 -3716.2
## - LogMass        1 0.00000002 0.00009241 -3716.1
## - LogWetFrass    1 0.00000002 0.00009242 -3716.1
## - ActiveFeeding  1 0.00000005 0.00009245 -3716.0
## - LogDryFrass    1 0.00000008 0.00009247 -3716.0
## - Instar         1 0.00000011 0.00009250 -3715.9
## - Mgp            1 0.00000022 0.00009261 -3715.6
## - LogNfrass      1 0.00000025 0.00009264 -3715.5
## <none>                        0.00009239 -3714.2
## - LogIntake      1 0.00000078 0.00009317 -3714.1
## - Mass           1 0.00000694 0.00009933 -3697.9
## - WetFrass       1 0.00000821 0.00010060 -3694.6
## - LogCassim      1 0.00002034 0.00011273 -3665.8
## - LogNassim      1 0.00003523 0.00012763 -3634.4
## - Intake         1 0.00003883 0.00013122 -3627.4
## - Nfrass         1 0.00009267 0.00018506 -3540.4
## - DryFrass       1 0.00011947 0.00021186 -3506.2
## - Cassim         1 0.00032552 0.00041791 -3334.3
## 
## Step:  AIC=-3716.18
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake + 
##     LogIntake + WetFrass + LogWetFrass + DryFrass + LogDryFrass + 
##     Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogWetFrass    1 0.00000003 0.00009242 -3718.1
## - LogMass        1 0.00000003 0.00009243 -3718.1
## - ActiveFeeding  1 0.00000005 0.00009245 -3718.0
## - LogDryFrass    1 0.00000008 0.00009247 -3718.0
## - Instar         1 0.00000013 0.00009253 -3717.8
## - LogNfrass      1 0.00000025 0.00009264 -3717.5
## - Mgp            1 0.00000032 0.00009271 -3717.3
## <none>                        0.00009239 -3716.2
## - LogIntake      1 0.00000080 0.00009319 -3716.0
## - Mass           1 0.00000694 0.00009933 -3699.9
## - WetFrass       1 0.00000833 0.00010072 -3696.3
## - LogCassim      1 0.00002041 0.00011280 -3667.7
## - LogNassim      1 0.00003524 0.00012764 -3636.4
## - Intake         1 0.00003889 0.00013128 -3629.3
## - Nfrass         1 0.00009439 0.00018678 -3540.1
## - DryFrass       1 0.00012175 0.00021415 -3505.5
## - Cassim         1 0.00032651 0.00041891 -3335.7
## 
## Step:  AIC=-3718.1
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake + 
##     LogIntake + WetFrass + DryFrass + LogDryFrass + Cassim + 
##     LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogMass        1 0.00000004 0.00009246 -3720.0
## - ActiveFeeding  1 0.00000005 0.00009247 -3720.0
## - LogDryFrass    1 0.00000005 0.00009248 -3720.0
## - Instar         1 0.00000017 0.00009259 -3719.7
## - LogNfrass      1 0.00000024 0.00009266 -3719.4
## - Mgp            1 0.00000033 0.00009275 -3719.2
## <none>                        0.00009242 -3718.1
## - LogIntake      1 0.00000082 0.00009324 -3717.9
## - Mass           1 0.00000692 0.00009934 -3701.8
## - WetFrass       1 0.00000902 0.00010144 -3696.5
## - LogCassim      1 0.00002048 0.00011290 -3669.5
## - LogNassim      1 0.00003528 0.00012770 -3638.3
## - Intake         1 0.00003887 0.00013129 -3631.3
## - Nfrass         1 0.00009476 0.00018718 -3541.6
## - DryFrass       1 0.00012173 0.00021416 -3507.5
## - Cassim         1 0.00032669 0.00041911 -3337.6
## 
## Step:  AIC=-3719.99
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake + 
##     WetFrass + DryFrass + LogDryFrass + Cassim + LogCassim + 
##     Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogDryFrass    1 0.00000006 0.00009253 -3721.8
## - ActiveFeeding  1 0.00000012 0.00009258 -3721.7
## - LogNfrass      1 0.00000026 0.00009272 -3721.3
## - Mgp            1 0.00000032 0.00009278 -3721.1
## - Instar         1 0.00000045 0.00009291 -3720.8
## <none>                        0.00009246 -3720.0
## - LogIntake      1 0.00000101 0.00009347 -3719.2
## - Mass           1 0.00000692 0.00009938 -3703.7
## - WetFrass       1 0.00000933 0.00010179 -3697.7
## - LogCassim      1 0.00002159 0.00011405 -3668.9
## - LogNassim      1 0.00003566 0.00012812 -3639.5
## - Intake         1 0.00003933 0.00013180 -3632.3
## - Nfrass         1 0.00009596 0.00018842 -3541.9
## - DryFrass       1 0.00013210 0.00022457 -3497.5
## - Cassim         1 0.00032884 0.00042130 -3338.3
## 
## Step:  AIC=-3721.81
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake + 
##     WetFrass + DryFrass + Cassim + LogCassim + Nfrass + LogNfrass + 
##     LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - ActiveFeeding  1 0.00000014 0.00009266 -3723.4
## - Mgp            1 0.00000038 0.00009291 -3722.8
## - Instar         1 0.00000040 0.00009293 -3722.7
## <none>                        0.00009253 -3721.8
## - LogNfrass      1 0.00000088 0.00009341 -3721.4
## - LogIntake      1 0.00000101 0.00009354 -3721.1
## - Mass           1 0.00000698 0.00009950 -3705.4
## - WetFrass       1 0.00000929 0.00010181 -3699.6
## - LogCassim      1 0.00002188 0.00011441 -3670.1
## - LogNassim      1 0.00003645 0.00012898 -3639.8
## - Intake         1 0.00003947 0.00013199 -3633.9
## - Nfrass         1 0.00009956 0.00019208 -3539.0
## - DryFrass       1 0.00013353 0.00022606 -3497.8
## - Cassim         1 0.00032878 0.00042130 -3340.3
## 
## Step:  AIC=-3723.44
## Nassim ~ Instar + Mgp + Mass + Intake + LogIntake + WetFrass + 
##     DryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##             Df  Sum of Sq        RSS     AIC
## - Mgp        1 0.00000038 0.00009304 -3724.4
## - Instar     1 0.00000064 0.00009330 -3723.7
## <none>                    0.00009266 -3723.4
## - LogNfrass  1 0.00000086 0.00009352 -3723.1
## - LogIntake  1 0.00000089 0.00009356 -3723.0
## - Mass       1 0.00000722 0.00009989 -3706.4
## - WetFrass   1 0.00000915 0.00010181 -3701.6
## - LogCassim  1 0.00002220 0.00011487 -3671.1
## - LogNassim  1 0.00003632 0.00012898 -3641.8
## - Intake     1 0.00003943 0.00013209 -3635.7
## - Nfrass     1 0.00009980 0.00019247 -3540.5
## - DryFrass   1 0.00013359 0.00022625 -3499.6
## - Cassim     1 0.00032891 0.00042157 -3342.1
## 
## Step:  AIC=-3724.41
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass + 
##     Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##             Df  Sum of Sq        RSS     AIC
## - LogNfrass  1 0.00000060 0.00009364 -3724.8
## <none>                    0.00009304 -3724.4
## - LogIntake  1 0.00000091 0.00009395 -3723.9
## - Instar     1 0.00000115 0.00009420 -3723.3
## - Mass       1 0.00000732 0.00010036 -3707.3
## - WetFrass   1 0.00000909 0.00010214 -3702.8
## - LogCassim  1 0.00002194 0.00011498 -3672.9
## - LogNassim  1 0.00003604 0.00012909 -3643.6
## - Intake     1 0.00003912 0.00013216 -3637.6
## - Nfrass     1 0.00010039 0.00019343 -3541.3
## - DryFrass   1 0.00013495 0.00022799 -3499.7
## - Cassim     1 0.00032968 0.00042272 -3343.5
## 
## Step:  AIC=-3724.79
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass + 
##     Cassim + LogCassim + Nfrass + LogNassim
## 
##             Df  Sum of Sq        RSS     AIC
## <none>                    0.00009364 -3724.8
## - LogIntake  1 0.00000200 0.00009564 -3721.4
## - Instar     1 0.00000326 0.00009690 -3718.1
## - Mass       1 0.00000793 0.00010157 -3706.2
## - WetFrass   1 0.00000923 0.00010287 -3703.0
## - LogCassim  1 0.00002229 0.00011593 -3672.8
## - LogNassim  1 0.00003545 0.00012909 -3645.6
## - Intake     1 0.00003853 0.00013217 -3639.6
## - Nfrass     1 0.00010469 0.00019833 -3536.9
## - DryFrass   1 0.00013483 0.00022847 -3501.1
## - Cassim     1 0.00032978 0.00042342 -3345.0
## 
## Call:
## lm(formula = Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + 
##     DryFrass + Cassim + LogCassim + Nfrass + LogNassim, data = caterpillars_clean)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.742e-03 -1.613e-04 -2.116e-05  1.637e-04  2.704e-03 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.844e-02  2.184e-03   8.443 2.87e-15 ***
## Instar      -2.659e-04  9.161e-05  -2.903  0.00404 ** 
## Mass         1.920e-04  4.242e-05   4.526 9.42e-06 ***
## Intake      -6.024e-03  6.037e-04  -9.978  < 2e-16 ***
## LogIntake   -2.740e-03  1.205e-03  -2.274  0.02381 *  
## WetFrass    -1.778e-03  3.640e-04  -4.884 1.89e-06 ***
## DryFrass     7.964e-02  4.267e-03  18.666  < 2e-16 ***
## Cassim       1.901e-01  6.513e-03  29.194  < 2e-16 ***
## LogCassim   -1.078e-02  1.420e-03  -7.589 6.93e-13 ***
## Nfrass      -8.271e-01  5.028e-02 -16.449  < 2e-16 ***
## LogNassim    1.465e-02  1.530e-03   9.572  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.000622 on 242 degrees of freedom
## Multiple R-squared:  0.9987, Adjusted R-squared:  0.9986 
## F-statistic: 1.793e+04 on 10 and 242 DF,  p-value: < 2.2e-16

The model’s predictors show consistency with those in forward selection, reinforcing the significance of Nfrass, DryFrass, Intake, and WetFrass in predicting Nassim.

Section 5 Stepwise Selection (AIC Criterion)

Stepwise selection is performed for the intercept-only model and from the full model. The final stepwise models are shown below: Stepwise Selection Starting from Null Model

## Start:  AIC=-2072.64
## Nassim ~ 1
## 
## Call:
## lm(formula = Nassim ~ 1, data = caterpillars_clean)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.013064 -0.011215 -0.008635  0.002501  0.050331 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.013831   0.001044   13.25   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0166 on 252 degrees of freedom

Stepwise Selection Starting from Full Model

## Start:  AIC=-3714.18
## Nassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + LogMass + 
##     Intake + LogIntake + WetFrass + LogWetFrass + DryFrass + 
##     LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - Fgp            1 0.00000000 0.00009239 -3716.2
## - LogMass        1 0.00000002 0.00009241 -3716.1
## - LogWetFrass    1 0.00000002 0.00009242 -3716.1
## - ActiveFeeding  1 0.00000005 0.00009245 -3716.0
## - LogDryFrass    1 0.00000008 0.00009247 -3716.0
## - Instar         1 0.00000011 0.00009250 -3715.9
## - Mgp            1 0.00000022 0.00009261 -3715.6
## - LogNfrass      1 0.00000025 0.00009264 -3715.5
## <none>                        0.00009239 -3714.2
## - LogIntake      1 0.00000078 0.00009317 -3714.1
## - Mass           1 0.00000694 0.00009933 -3697.9
## - WetFrass       1 0.00000821 0.00010060 -3694.6
## - LogCassim      1 0.00002034 0.00011273 -3665.8
## - LogNassim      1 0.00003523 0.00012763 -3634.4
## - Intake         1 0.00003883 0.00013122 -3627.4
## - Nfrass         1 0.00009267 0.00018506 -3540.4
## - DryFrass       1 0.00011947 0.00021186 -3506.2
## - Cassim         1 0.00032552 0.00041791 -3334.3
## 
## Step:  AIC=-3716.18
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake + 
##     LogIntake + WetFrass + LogWetFrass + DryFrass + LogDryFrass + 
##     Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogWetFrass    1 0.00000003 0.00009242 -3718.1
## - LogMass        1 0.00000003 0.00009243 -3718.1
## - ActiveFeeding  1 0.00000005 0.00009245 -3718.0
## - LogDryFrass    1 0.00000008 0.00009247 -3718.0
## - Instar         1 0.00000013 0.00009253 -3717.8
## - LogNfrass      1 0.00000025 0.00009264 -3717.5
## - Mgp            1 0.00000032 0.00009271 -3717.3
## <none>                        0.00009239 -3716.2
## - LogIntake      1 0.00000080 0.00009319 -3716.0
## + Fgp            1 0.00000000 0.00009239 -3714.2
## - Mass           1 0.00000694 0.00009933 -3699.9
## - WetFrass       1 0.00000833 0.00010072 -3696.3
## - LogCassim      1 0.00002041 0.00011280 -3667.7
## - LogNassim      1 0.00003524 0.00012764 -3636.4
## - Intake         1 0.00003889 0.00013128 -3629.3
## - Nfrass         1 0.00009439 0.00018678 -3540.1
## - DryFrass       1 0.00012175 0.00021415 -3505.5
## - Cassim         1 0.00032651 0.00041891 -3335.7
## 
## Step:  AIC=-3718.1
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake + 
##     LogIntake + WetFrass + DryFrass + LogDryFrass + Cassim + 
##     LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogMass        1 0.00000004 0.00009246 -3720.0
## - ActiveFeeding  1 0.00000005 0.00009247 -3720.0
## - LogDryFrass    1 0.00000005 0.00009248 -3720.0
## - Instar         1 0.00000017 0.00009259 -3719.7
## - LogNfrass      1 0.00000024 0.00009266 -3719.4
## - Mgp            1 0.00000033 0.00009275 -3719.2
## <none>                        0.00009242 -3718.1
## - LogIntake      1 0.00000082 0.00009324 -3717.9
## + LogWetFrass    1 0.00000003 0.00009239 -3716.2
## + Fgp            1 0.00000000 0.00009242 -3716.1
## - Mass           1 0.00000692 0.00009934 -3701.8
## - WetFrass       1 0.00000902 0.00010144 -3696.5
## - LogCassim      1 0.00002048 0.00011290 -3669.5
## - LogNassim      1 0.00003528 0.00012770 -3638.3
## - Intake         1 0.00003887 0.00013129 -3631.3
## - Nfrass         1 0.00009476 0.00018718 -3541.6
## - DryFrass       1 0.00012173 0.00021416 -3507.5
## - Cassim         1 0.00032669 0.00041911 -3337.6
## 
## Step:  AIC=-3719.99
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake + 
##     WetFrass + DryFrass + LogDryFrass + Cassim + LogCassim + 
##     Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogDryFrass    1 0.00000006 0.00009253 -3721.8
## - ActiveFeeding  1 0.00000012 0.00009258 -3721.7
## - LogNfrass      1 0.00000026 0.00009272 -3721.3
## - Mgp            1 0.00000032 0.00009278 -3721.1
## - Instar         1 0.00000045 0.00009291 -3720.8
## <none>                        0.00009246 -3720.0
## - LogIntake      1 0.00000101 0.00009347 -3719.2
## + LogMass        1 0.00000004 0.00009242 -3718.1
## + LogWetFrass    1 0.00000004 0.00009243 -3718.1
## + Fgp            1 0.00000001 0.00009246 -3718.0
## - Mass           1 0.00000692 0.00009938 -3703.7
## - WetFrass       1 0.00000933 0.00010179 -3697.7
## - LogCassim      1 0.00002159 0.00011405 -3668.9
## - LogNassim      1 0.00003566 0.00012812 -3639.5
## - Intake         1 0.00003933 0.00013180 -3632.3
## - Nfrass         1 0.00009596 0.00018842 -3541.9
## - DryFrass       1 0.00013210 0.00022457 -3497.5
## - Cassim         1 0.00032884 0.00042130 -3338.3
## 
## Step:  AIC=-3721.81
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake + 
##     WetFrass + DryFrass + Cassim + LogCassim + Nfrass + LogNfrass + 
##     LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - ActiveFeeding  1 0.00000014 0.00009266 -3723.4
## - Mgp            1 0.00000038 0.00009291 -3722.8
## - Instar         1 0.00000040 0.00009293 -3722.7
## <none>                        0.00009253 -3721.8
## - LogNfrass      1 0.00000088 0.00009341 -3721.4
## - LogIntake      1 0.00000101 0.00009354 -3721.1
## + LogDryFrass    1 0.00000006 0.00009246 -3720.0
## + LogMass        1 0.00000005 0.00009248 -3720.0
## + Fgp            1 0.00000002 0.00009251 -3719.9
## + LogWetFrass    1 0.00000000 0.00009252 -3719.8
## - Mass           1 0.00000698 0.00009950 -3705.4
## - WetFrass       1 0.00000929 0.00010181 -3699.6
## - LogCassim      1 0.00002188 0.00011441 -3670.1
## - LogNassim      1 0.00003645 0.00012898 -3639.8
## - Intake         1 0.00003947 0.00013199 -3633.9
## - Nfrass         1 0.00009956 0.00019208 -3539.0
## - DryFrass       1 0.00013353 0.00022606 -3497.8
## - Cassim         1 0.00032878 0.00042130 -3340.3
## 
## Step:  AIC=-3723.44
## Nassim ~ Instar + Mgp + Mass + Intake + LogIntake + WetFrass + 
##     DryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - Mgp            1 0.00000038 0.00009304 -3724.4
## - Instar         1 0.00000064 0.00009330 -3723.7
## <none>                        0.00009266 -3723.4
## - LogNfrass      1 0.00000086 0.00009352 -3723.1
## - LogIntake      1 0.00000089 0.00009356 -3723.0
## + LogMass        1 0.00000014 0.00009253 -3721.8
## + ActiveFeeding  1 0.00000014 0.00009253 -3721.8
## + LogDryFrass    1 0.00000008 0.00009258 -3721.7
## + Fgp            1 0.00000007 0.00009259 -3721.6
## + LogWetFrass    1 0.00000000 0.00009266 -3721.4
## - Mass           1 0.00000722 0.00009989 -3706.4
## - WetFrass       1 0.00000915 0.00010181 -3701.6
## - LogCassim      1 0.00002220 0.00011487 -3671.1
## - LogNassim      1 0.00003632 0.00012898 -3641.8
## - Intake         1 0.00003943 0.00013209 -3635.7
## - Nfrass         1 0.00009980 0.00019247 -3540.5
## - DryFrass       1 0.00013359 0.00022625 -3499.6
## - Cassim         1 0.00032891 0.00042157 -3342.1
## 
## Step:  AIC=-3724.41
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass + 
##     Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogNfrass      1 0.00000060 0.00009364 -3724.8
## <none>                        0.00009304 -3724.4
## - LogIntake      1 0.00000091 0.00009395 -3723.9
## + Mgp            1 0.00000038 0.00009266 -3723.4
## - Instar         1 0.00000115 0.00009420 -3723.3
## + Fgp            1 0.00000025 0.00009279 -3723.1
## + LogDryFrass    1 0.00000015 0.00009289 -3722.8
## + ActiveFeeding  1 0.00000013 0.00009291 -3722.8
## + LogMass        1 0.00000012 0.00009292 -3722.7
## + LogWetFrass    1 0.00000000 0.00009304 -3722.4
## - Mass           1 0.00000732 0.00010036 -3707.3
## - WetFrass       1 0.00000909 0.00010214 -3702.8
## - LogCassim      1 0.00002194 0.00011498 -3672.9
## - LogNassim      1 0.00003604 0.00012909 -3643.6
## - Intake         1 0.00003912 0.00013216 -3637.6
## - Nfrass         1 0.00010039 0.00019343 -3541.3
## - DryFrass       1 0.00013495 0.00022799 -3499.7
## - Cassim         1 0.00032968 0.00042272 -3343.5
## 
## Step:  AIC=-3724.79
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass + 
##     Cassim + LogCassim + Nfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## <none>                        0.00009364 -3724.8
## + LogNfrass      1 0.00000060 0.00009304 -3724.4
## + LogDryFrass    1 0.00000041 0.00009323 -3723.9
## + LogWetFrass    1 0.00000041 0.00009323 -3723.9
## + Mgp            1 0.00000012 0.00009352 -3723.1
## + ActiveFeeding  1 0.00000011 0.00009353 -3723.1
## + LogMass        1 0.00000009 0.00009355 -3723.0
## + Fgp            1 0.00000006 0.00009358 -3722.9
## - LogIntake      1 0.00000200 0.00009564 -3721.4
## - Instar         1 0.00000326 0.00009690 -3718.1
## - Mass           1 0.00000793 0.00010157 -3706.2
## - WetFrass       1 0.00000923 0.00010287 -3703.0
## - LogCassim      1 0.00002229 0.00011593 -3672.8
## - LogNassim      1 0.00003545 0.00012909 -3645.6
## - Intake         1 0.00003853 0.00013217 -3639.6
## - Nfrass         1 0.00010469 0.00019833 -3536.9
## - DryFrass       1 0.00013483 0.00022847 -3501.1
## - Cassim         1 0.00032978 0.00042342 -3345.0
## 
## Call:
## lm(formula = Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + 
##     DryFrass + Cassim + LogCassim + Nfrass + LogNassim, data = caterpillars_clean)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.742e-03 -1.613e-04 -2.116e-05  1.637e-04  2.704e-03 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.844e-02  2.184e-03   8.443 2.87e-15 ***
## Instar      -2.659e-04  9.161e-05  -2.903  0.00404 ** 
## Mass         1.920e-04  4.242e-05   4.526 9.42e-06 ***
## Intake      -6.024e-03  6.037e-04  -9.978  < 2e-16 ***
## LogIntake   -2.740e-03  1.205e-03  -2.274  0.02381 *  
## WetFrass    -1.778e-03  3.640e-04  -4.884 1.89e-06 ***
## DryFrass     7.964e-02  4.267e-03  18.666  < 2e-16 ***
## Cassim       1.901e-01  6.513e-03  29.194  < 2e-16 ***
## LogCassim   -1.078e-02  1.420e-03  -7.589 6.93e-13 ***
## Nfrass      -8.271e-01  5.028e-02 -16.449  < 2e-16 ***
## LogNassim    1.465e-02  1.530e-03   9.572  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.000622 on 242 degrees of freedom
## Multiple R-squared:  0.9987, Adjusted R-squared:  0.9986 
## F-statistic: 1.793e+04 on 10 and 242 DF,  p-value: < 2.2e-16

The stepwise approach consistently identifies Cassim, Nfrass, and DryFrass as key predictors with the final model having a high adjusted R-squared value, further confirming its predictive strength.

Section 6: Comparison of Model Selection Methods

The model selection methods—Best Subsets, Forward Selection, Backward Elimination, and Stepwise Selection consistently highlighted Cassim, Nfrass, and DryFrass as essential predictors of Nassim. This alignment across different methods underscores the stability and predictive strength of these variables. Each method yielded models with high predictive accuracy which supports a balance between simplicity and explanatory power. Overall the results affirm the biological relevance of Cassim Nfrass and DryFrass in relation to Nassim.

Appendix

Code for best subsets solution

library(leaps)
best_subset_model <- regsubsets(Nassim ~ ., data = caterpillars_clean, nbest = 1, method = "exhaustive")
summary(best_subset_model)

Code for forward selection

null_model <- lm(Nassim ~ 1, data = caterpillars_clean)
full_model <- lm(Nassim ~ ., data = caterpillars_clean)
forward_model <- step(null_model, direction = "forward", scope = list(lower = null_model, upper = full_model))
summary(forward_model)

Code for backward elimination

backward_model <- step(full_model, direction = "backward")
summary(backward_model)

Code for stepwise solution

stepwise_model1 <- step(null_model, direction = "both")
summary(stepwise_model1)

stepwise_model2 <- step(full_model, direction = "both")
summary(stepwise_model2)