This report investigates the predictors influencing the Nassim variable in the Caterpillars dataset. Using various model selection methods this report aims to identify the most effective model for predicting Nassim. Each method is evaluated with different criteria (Mallow’s Cp and AIC) to achieve an optimal balance between model simplicity and predictive power.
The best subsets selection identifies combinations of predictors to balance goodness of fit and model complexity using Mallow’s Cp criterion. The predictors chosen by each subset and corresponding Cp values are shown below.
## (Intercept) Instar ActiveFeedingY FgpY MgpY Mass LogMass Intake LogIntake
## 1 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 2 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 3 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 4 TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## 5 TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## 6 TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## 7 TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## 8 TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## WetFrass LogWetFrass DryFrass LogDryFrass Cassim LogCassim Nfrass LogNfrass
## 1 FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
## 3 FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
## 4 FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
## 5 TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
## 6 FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE
## 7 FALSE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
## 8 TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
## LogNassim
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 TRUE
## 7 TRUE
## 8 TRUE
## [1] 2383.64220 818.70906 256.10841 113.15240 94.62556 53.12324 30.96409
## [8] 21.88728
The Cp values and the selected predictors show the model should use Nassim, Nfrass, and DryFrass to strike a balance between simplicity and accuracy.
The forward selection method adds predictors that improve the AIC. The final model after forward selection is shown below:
## Start: AIC=-2072.64
## Nassim ~ 1
##
## Df Sum of Sq RSS AIC
## + Cassim 1 0.068441 0.001035 -3134.9
## + Intake 1 0.066558 0.002919 -2872.6
## + LogCassim 1 0.057992 0.011484 -2526.0
## + LogNassim 1 0.057931 0.011545 -2524.7
## + DryFrass 1 0.056780 0.012697 -2500.7
## + LogIntake 1 0.056396 0.013080 -2493.1
## + Nfrass 1 0.050478 0.018998 -2398.7
## + WetFrass 1 0.047288 0.022188 -2359.4
## + LogNfrass 1 0.042527 0.026950 -2310.2
## + LogWetFrass 1 0.042226 0.027250 -2307.4
## + LogDryFrass 1 0.040837 0.028639 -2294.8
## + Instar 1 0.036922 0.032554 -2262.4
## + LogMass 1 0.036665 0.032811 -2260.4
## + Mass 1 0.028824 0.040652 -2206.2
## + ActiveFeeding 1 0.005126 0.064350 -2090.0
## + Mgp 1 0.000831 0.068645 -2073.7
## <none> 0.069476 -2072.6
## + Fgp 1 0.000003 0.069473 -2070.7
##
## Step: AIC=-3134.89
## Nassim ~ Cassim
##
## Df Sum of Sq RSS AIC
## + Nfrass 1 0.00061605 0.00041899 -3361.7
## + WetFrass 1 0.00061257 0.00042248 -3359.6
## + Mass 1 0.00033289 0.00070216 -3231.1
## + DryFrass 1 0.00027721 0.00075784 -3211.8
## + Intake 1 0.00023430 0.00080075 -3197.8
## + Fgp 1 0.00018866 0.00084638 -3183.8
## + LogNassim 1 0.00017216 0.00086289 -3178.9
## + LogCassim 1 0.00007681 0.00095824 -3152.4
## + ActiveFeeding 1 0.00007205 0.00096300 -3151.1
## + LogIntake 1 0.00004263 0.00099242 -3143.5
## + Instar 1 0.00002479 0.00101026 -3139.0
## + Mgp 1 0.00002096 0.00101409 -3138.1
## + LogDryFrass 1 0.00002060 0.00101445 -3138.0
## <none> 0.00103505 -3134.9
## + LogNfrass 1 0.00000680 0.00102825 -3134.6
## + LogWetFrass 1 0.00000679 0.00102826 -3134.6
## + LogMass 1 0.00000194 0.00103310 -3133.4
##
## Step: AIC=-3361.69
## Nassim ~ Cassim + Nfrass
##
## Df Sum of Sq RSS AIC
## + DryFrass 1 2.2198e-04 0.00019702 -3550.6
## + Intake 1 8.4103e-05 0.00033489 -3416.4
## + LogIntake 1 8.2865e-05 0.00033613 -3415.4
## + LogNassim 1 7.6452e-05 0.00034254 -3410.7
## + LogDryFrass 1 6.9584e-05 0.00034941 -3405.6
## + LogNfrass 1 6.7226e-05 0.00035177 -3403.9
## + LogWetFrass 1 6.6936e-05 0.00035206 -3403.7
## + LogCassim 1 5.7766e-05 0.00036123 -3397.2
## + Instar 1 5.5543e-05 0.00036345 -3395.7
## + LogMass 1 4.2688e-05 0.00037631 -3386.9
## + Mass 1 1.6293e-05 0.00040270 -3369.7
## + WetFrass 1 1.0129e-05 0.00040886 -3365.9
## + Fgp 1 9.9900e-06 0.00040900 -3365.8
## + Mgp 1 3.8450e-06 0.00041515 -3362.0
## <none> 0.00041899 -3361.7
## + ActiveFeeding 1 5.1400e-07 0.00041848 -3360.0
##
## Step: AIC=-3550.6
## Nassim ~ Cassim + Nfrass + DryFrass
##
## Df Sum of Sq RSS AIC
## + Intake 1 5.6991e-05 0.00014003 -3635.0
## + Mass 1 8.7400e-06 0.00018828 -3560.1
## + LogNassim 1 6.4120e-06 0.00019060 -3557.0
## + WetFrass 1 3.8270e-06 0.00019319 -3553.6
## + LogIntake 1 3.6180e-06 0.00019340 -3553.3
## + LogWetFrass 1 3.2610e-06 0.00019375 -3552.8
## + LogNfrass 1 3.1800e-06 0.00019384 -3552.7
## + LogDryFrass 1 3.1310e-06 0.00019389 -3552.7
## + LogMass 1 3.1040e-06 0.00019391 -3552.6
## + LogCassim 1 2.9150e-06 0.00019410 -3552.4
## + Instar 1 2.0630e-06 0.00019495 -3551.3
## <none> 0.00019702 -3550.6
## + ActiveFeeding 1 1.1900e-06 0.00019583 -3550.1
## + Fgp 1 2.8200e-07 0.00019673 -3549.0
## + Mgp 1 2.3300e-07 0.00019678 -3548.9
##
## Step: AIC=-3634.99
## Nassim ~ Cassim + Nfrass + DryFrass + Intake
##
## Df Sum of Sq RSS AIC
## + WetFrass 1 8.0703e-06 0.00013195 -3648.0
## + LogNassim 1 4.5558e-06 0.00013547 -3641.4
## + LogIntake 1 2.0466e-06 0.00013798 -3636.7
## + LogCassim 1 1.8411e-06 0.00013818 -3636.3
## <none> 0.00014003 -3635.0
## + LogDryFrass 1 9.6280e-07 0.00013906 -3634.7
## + LogMass 1 8.7910e-07 0.00013915 -3634.6
## + LogNfrass 1 8.4520e-07 0.00013918 -3634.5
## + LogWetFrass 1 8.0140e-07 0.00013922 -3634.4
## + Instar 1 4.8590e-07 0.00013954 -3633.9
## + Mass 1 1.8900e-07 0.00013984 -3633.3
## + Mgp 1 1.7640e-07 0.00013985 -3633.3
## + ActiveFeeding 1 9.1000e-09 0.00014002 -3633.0
## + Fgp 1 1.0000e-10 0.00014003 -3633.0
##
## Step: AIC=-3648.01
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass
##
## Df Sum of Sq RSS AIC
## + LogNassim 1 3.5556e-06 0.00012840 -3652.9
## + LogIntake 1 1.4474e-06 0.00013051 -3648.8
## + Mass 1 1.3392e-06 0.00013061 -3648.6
## + LogCassim 1 1.3039e-06 0.00013065 -3648.5
## <none> 0.00013195 -3648.0
## + LogMass 1 7.8640e-07 0.00013117 -3647.5
## + LogWetFrass 1 6.8490e-07 0.00013127 -3647.3
## + LogDryFrass 1 5.9430e-07 0.00013136 -3647.2
## + LogNfrass 1 4.8430e-07 0.00013147 -3646.9
## + Instar 1 2.5120e-07 0.00013170 -3646.5
## + Mgp 1 2.3260e-07 0.00013172 -3646.5
## + Fgp 1 6.0500e-08 0.00013189 -3646.1
## + ActiveFeeding 1 1.6800e-08 0.00013194 -3646.0
##
## Step: AIC=-3652.92
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + LogNassim
##
## Df Sum of Sq RSS AIC
## + LogCassim 1 1.8967e-05 0.00010943 -3691.4
## + LogIntake 1 8.7034e-06 0.00011969 -3668.7
## + Instar 1 1.7618e-06 0.00012664 -3654.4
## + LogNfrass 1 1.2330e-06 0.00012717 -3653.4
## + LogDryFrass 1 1.0434e-06 0.00012736 -3653.0
## <none> 0.00012840 -3652.9
## + LogWetFrass 1 9.2960e-07 0.00012747 -3652.8
## + LogMass 1 6.3520e-07 0.00012776 -3652.2
## + Mgp 1 2.2660e-07 0.00012817 -3651.4
## + Mass 1 1.4940e-07 0.00012825 -3651.2
## + ActiveFeeding 1 2.3600e-08 0.00012837 -3651.0
## + Fgp 1 1.7000e-09 0.00012840 -3650.9
##
## Step: AIC=-3691.36
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + LogNassim +
## LogCassim
##
## Df Sum of Sq RSS AIC
## + LogDryFrass 1 8.4341e-06 0.00010100 -3709.7
## + LogNfrass 1 8.3077e-06 0.00010112 -3709.3
## + LogWetFrass 1 8.1214e-06 0.00010131 -3708.9
## + Instar 1 7.5222e-06 0.00010191 -3707.4
## + LogIntake 1 4.9728e-06 0.00010446 -3701.1
## + LogMass 1 4.0833e-06 0.00010535 -3699.0
## + Mass 1 2.5019e-06 0.00010693 -3695.2
## <none> 0.00010943 -3691.4
## + Mgp 1 4.4050e-07 0.00010899 -3690.4
## + ActiveFeeding 1 5.6200e-08 0.00010938 -3689.5
## + Fgp 1 1.4100e-08 0.00010942 -3689.4
##
## Step: AIC=-3709.65
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + LogNassim +
## LogCassim + LogDryFrass
##
## Df Sum of Sq RSS AIC
## + Mass 1 4.7464e-06 9.6251e-05 -3719.8
## + Mgp 1 1.0330e-06 9.9965e-05 -3710.3
## <none> 1.0100e-04 -3709.7
## + Instar 1 7.3800e-07 1.0026e-04 -3709.5
## + Fgp 1 1.2750e-07 1.0087e-04 -3708.0
## + LogIntake 1 1.2500e-07 1.0087e-04 -3708.0
## + LogMass 1 8.6800e-08 1.0091e-04 -3707.9
## + LogWetFrass 1 8.0900e-08 1.0092e-04 -3707.9
## + ActiveFeeding 1 5.6400e-08 1.0094e-04 -3707.8
## + LogNfrass 1 1.3800e-08 1.0098e-04 -3707.7
##
## Step: AIC=-3719.83
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + LogNassim +
## LogCassim + LogDryFrass + Mass
##
## Df Sum of Sq RSS AIC
## + Instar 1 1.9779e-06 9.4273e-05 -3723.1
## + LogMass 1 1.8367e-06 9.4415e-05 -3722.7
## + LogIntake 1 1.7298e-06 9.4521e-05 -3722.4
## + Mgp 1 1.2380e-06 9.5013e-05 -3721.1
## + Fgp 1 1.0563e-06 9.5195e-05 -3720.6
## <none> 9.6251e-05 -3719.8
## + LogWetFrass 1 6.5320e-07 9.5598e-05 -3719.6
## + LogNfrass 1 6.0008e-07 9.5651e-05 -3719.4
## + ActiveFeeding 1 3.0618e-07 9.5945e-05 -3718.6
##
## Step: AIC=-3723.08
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + LogNassim +
## LogCassim + LogDryFrass + Mass + Instar
##
## Df Sum of Sq RSS AIC
## + LogIntake 1 1.0451e-06 9.3228e-05 -3723.9
## <none> 9.4273e-05 -3723.1
## + LogNfrass 1 4.5821e-07 9.3815e-05 -3722.3
## + Mgp 1 3.9407e-07 9.3879e-05 -3722.1
## + Fgp 1 2.8018e-07 9.3993e-05 -3721.8
## + LogMass 1 2.7435e-07 9.3999e-05 -3721.8
## + LogWetFrass 1 7.5200e-08 9.4198e-05 -3721.3
## + ActiveFeeding 1 1.6470e-08 9.4257e-05 -3721.1
##
## Step: AIC=-3723.9
## Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass + LogNassim +
## LogCassim + LogDryFrass + Mass + Instar + LogIntake
##
## Df Sum of Sq RSS AIC
## <none> 9.3228e-05 -3723.9
## + Mgp 1 3.6233e-07 9.2866e-05 -3722.9
## + LogNfrass 1 3.3530e-07 9.2893e-05 -3722.8
## + Fgp 1 2.5529e-07 9.2973e-05 -3722.6
## + ActiveFeeding 1 1.3964e-07 9.3089e-05 -3722.3
## + LogMass 1 1.2647e-07 9.3102e-05 -3722.2
## + LogWetFrass 1 2.9700e-08 9.3199e-05 -3722.0
##
## Call:
## lm(formula = Nassim ~ Cassim + Nfrass + DryFrass + Intake + WetFrass +
## LogNassim + LogCassim + LogDryFrass + Mass + Instar + LogIntake,
## data = caterpillars_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.718e-03 -1.576e-04 -1.664e-05 1.462e-04 2.704e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.774e-02 2.286e-03 7.759 2.42e-13 ***
## Cassim 1.906e-01 6.531e-03 29.190 < 2e-16 ***
## Nfrass -8.224e-01 5.048e-02 -16.292 < 2e-16 ***
## DryFrass 7.973e-02 4.267e-03 18.686 < 2e-16 ***
## Intake -6.094e-03 6.075e-04 -10.032 < 2e-16 ***
## WetFrass -1.753e-03 3.648e-04 -4.807 2.70e-06 ***
## LogNassim 1.487e-02 1.545e-03 9.624 < 2e-16 ***
## LogCassim -1.135e-02 1.524e-03 -7.445 1.71e-12 ***
## LogDryFrass -1.843e-04 1.785e-04 -1.032 0.3030
## Mass 1.838e-04 4.315e-05 4.260 2.93e-05 ***
## Instar -2.021e-04 1.105e-04 -1.828 0.0687 .
## LogIntake -2.175e-03 1.323e-03 -1.644 0.1016
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.000622 on 241 degrees of freedom
## Multiple R-squared: 0.9987, Adjusted R-squared: 0.9986
## F-statistic: 1.631e+04 on 11 and 241 DF, p-value: < 2.2e-16
The forward selection process highlights that predictors such as Cassim, Nfrass, and DryFrass play significant roles in explaining Nassim, with a high adjusted R-squared value, indicating strong predictive power
Backward elimination starts with the full model and removes predictors with the least impact based on AIC. The model resulting from backward elimination is as follows:
## Start: AIC=-3714.18
## Nassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + LogMass +
## Intake + LogIntake + WetFrass + LogWetFrass + DryFrass +
## LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - Fgp 1 0.00000000 0.00009239 -3716.2
## - LogMass 1 0.00000002 0.00009241 -3716.1
## - LogWetFrass 1 0.00000002 0.00009242 -3716.1
## - ActiveFeeding 1 0.00000005 0.00009245 -3716.0
## - LogDryFrass 1 0.00000008 0.00009247 -3716.0
## - Instar 1 0.00000011 0.00009250 -3715.9
## - Mgp 1 0.00000022 0.00009261 -3715.6
## - LogNfrass 1 0.00000025 0.00009264 -3715.5
## <none> 0.00009239 -3714.2
## - LogIntake 1 0.00000078 0.00009317 -3714.1
## - Mass 1 0.00000694 0.00009933 -3697.9
## - WetFrass 1 0.00000821 0.00010060 -3694.6
## - LogCassim 1 0.00002034 0.00011273 -3665.8
## - LogNassim 1 0.00003523 0.00012763 -3634.4
## - Intake 1 0.00003883 0.00013122 -3627.4
## - Nfrass 1 0.00009267 0.00018506 -3540.4
## - DryFrass 1 0.00011947 0.00021186 -3506.2
## - Cassim 1 0.00032552 0.00041791 -3334.3
##
## Step: AIC=-3716.18
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake +
## LogIntake + WetFrass + LogWetFrass + DryFrass + LogDryFrass +
## Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogWetFrass 1 0.00000003 0.00009242 -3718.1
## - LogMass 1 0.00000003 0.00009243 -3718.1
## - ActiveFeeding 1 0.00000005 0.00009245 -3718.0
## - LogDryFrass 1 0.00000008 0.00009247 -3718.0
## - Instar 1 0.00000013 0.00009253 -3717.8
## - LogNfrass 1 0.00000025 0.00009264 -3717.5
## - Mgp 1 0.00000032 0.00009271 -3717.3
## <none> 0.00009239 -3716.2
## - LogIntake 1 0.00000080 0.00009319 -3716.0
## - Mass 1 0.00000694 0.00009933 -3699.9
## - WetFrass 1 0.00000833 0.00010072 -3696.3
## - LogCassim 1 0.00002041 0.00011280 -3667.7
## - LogNassim 1 0.00003524 0.00012764 -3636.4
## - Intake 1 0.00003889 0.00013128 -3629.3
## - Nfrass 1 0.00009439 0.00018678 -3540.1
## - DryFrass 1 0.00012175 0.00021415 -3505.5
## - Cassim 1 0.00032651 0.00041891 -3335.7
##
## Step: AIC=-3718.1
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake +
## LogIntake + WetFrass + DryFrass + LogDryFrass + Cassim +
## LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogMass 1 0.00000004 0.00009246 -3720.0
## - ActiveFeeding 1 0.00000005 0.00009247 -3720.0
## - LogDryFrass 1 0.00000005 0.00009248 -3720.0
## - Instar 1 0.00000017 0.00009259 -3719.7
## - LogNfrass 1 0.00000024 0.00009266 -3719.4
## - Mgp 1 0.00000033 0.00009275 -3719.2
## <none> 0.00009242 -3718.1
## - LogIntake 1 0.00000082 0.00009324 -3717.9
## - Mass 1 0.00000692 0.00009934 -3701.8
## - WetFrass 1 0.00000902 0.00010144 -3696.5
## - LogCassim 1 0.00002048 0.00011290 -3669.5
## - LogNassim 1 0.00003528 0.00012770 -3638.3
## - Intake 1 0.00003887 0.00013129 -3631.3
## - Nfrass 1 0.00009476 0.00018718 -3541.6
## - DryFrass 1 0.00012173 0.00021416 -3507.5
## - Cassim 1 0.00032669 0.00041911 -3337.6
##
## Step: AIC=-3719.99
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake +
## WetFrass + DryFrass + LogDryFrass + Cassim + LogCassim +
## Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogDryFrass 1 0.00000006 0.00009253 -3721.8
## - ActiveFeeding 1 0.00000012 0.00009258 -3721.7
## - LogNfrass 1 0.00000026 0.00009272 -3721.3
## - Mgp 1 0.00000032 0.00009278 -3721.1
## - Instar 1 0.00000045 0.00009291 -3720.8
## <none> 0.00009246 -3720.0
## - LogIntake 1 0.00000101 0.00009347 -3719.2
## - Mass 1 0.00000692 0.00009938 -3703.7
## - WetFrass 1 0.00000933 0.00010179 -3697.7
## - LogCassim 1 0.00002159 0.00011405 -3668.9
## - LogNassim 1 0.00003566 0.00012812 -3639.5
## - Intake 1 0.00003933 0.00013180 -3632.3
## - Nfrass 1 0.00009596 0.00018842 -3541.9
## - DryFrass 1 0.00013210 0.00022457 -3497.5
## - Cassim 1 0.00032884 0.00042130 -3338.3
##
## Step: AIC=-3721.81
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake +
## WetFrass + DryFrass + Cassim + LogCassim + Nfrass + LogNfrass +
## LogNassim
##
## Df Sum of Sq RSS AIC
## - ActiveFeeding 1 0.00000014 0.00009266 -3723.4
## - Mgp 1 0.00000038 0.00009291 -3722.8
## - Instar 1 0.00000040 0.00009293 -3722.7
## <none> 0.00009253 -3721.8
## - LogNfrass 1 0.00000088 0.00009341 -3721.4
## - LogIntake 1 0.00000101 0.00009354 -3721.1
## - Mass 1 0.00000698 0.00009950 -3705.4
## - WetFrass 1 0.00000929 0.00010181 -3699.6
## - LogCassim 1 0.00002188 0.00011441 -3670.1
## - LogNassim 1 0.00003645 0.00012898 -3639.8
## - Intake 1 0.00003947 0.00013199 -3633.9
## - Nfrass 1 0.00009956 0.00019208 -3539.0
## - DryFrass 1 0.00013353 0.00022606 -3497.8
## - Cassim 1 0.00032878 0.00042130 -3340.3
##
## Step: AIC=-3723.44
## Nassim ~ Instar + Mgp + Mass + Intake + LogIntake + WetFrass +
## DryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - Mgp 1 0.00000038 0.00009304 -3724.4
## - Instar 1 0.00000064 0.00009330 -3723.7
## <none> 0.00009266 -3723.4
## - LogNfrass 1 0.00000086 0.00009352 -3723.1
## - LogIntake 1 0.00000089 0.00009356 -3723.0
## - Mass 1 0.00000722 0.00009989 -3706.4
## - WetFrass 1 0.00000915 0.00010181 -3701.6
## - LogCassim 1 0.00002220 0.00011487 -3671.1
## - LogNassim 1 0.00003632 0.00012898 -3641.8
## - Intake 1 0.00003943 0.00013209 -3635.7
## - Nfrass 1 0.00009980 0.00019247 -3540.5
## - DryFrass 1 0.00013359 0.00022625 -3499.6
## - Cassim 1 0.00032891 0.00042157 -3342.1
##
## Step: AIC=-3724.41
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass +
## Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogNfrass 1 0.00000060 0.00009364 -3724.8
## <none> 0.00009304 -3724.4
## - LogIntake 1 0.00000091 0.00009395 -3723.9
## - Instar 1 0.00000115 0.00009420 -3723.3
## - Mass 1 0.00000732 0.00010036 -3707.3
## - WetFrass 1 0.00000909 0.00010214 -3702.8
## - LogCassim 1 0.00002194 0.00011498 -3672.9
## - LogNassim 1 0.00003604 0.00012909 -3643.6
## - Intake 1 0.00003912 0.00013216 -3637.6
## - Nfrass 1 0.00010039 0.00019343 -3541.3
## - DryFrass 1 0.00013495 0.00022799 -3499.7
## - Cassim 1 0.00032968 0.00042272 -3343.5
##
## Step: AIC=-3724.79
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass +
## Cassim + LogCassim + Nfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## <none> 0.00009364 -3724.8
## - LogIntake 1 0.00000200 0.00009564 -3721.4
## - Instar 1 0.00000326 0.00009690 -3718.1
## - Mass 1 0.00000793 0.00010157 -3706.2
## - WetFrass 1 0.00000923 0.00010287 -3703.0
## - LogCassim 1 0.00002229 0.00011593 -3672.8
## - LogNassim 1 0.00003545 0.00012909 -3645.6
## - Intake 1 0.00003853 0.00013217 -3639.6
## - Nfrass 1 0.00010469 0.00019833 -3536.9
## - DryFrass 1 0.00013483 0.00022847 -3501.1
## - Cassim 1 0.00032978 0.00042342 -3345.0
##
## Call:
## lm(formula = Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass +
## DryFrass + Cassim + LogCassim + Nfrass + LogNassim, data = caterpillars_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.742e-03 -1.613e-04 -2.116e-05 1.637e-04 2.704e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.844e-02 2.184e-03 8.443 2.87e-15 ***
## Instar -2.659e-04 9.161e-05 -2.903 0.00404 **
## Mass 1.920e-04 4.242e-05 4.526 9.42e-06 ***
## Intake -6.024e-03 6.037e-04 -9.978 < 2e-16 ***
## LogIntake -2.740e-03 1.205e-03 -2.274 0.02381 *
## WetFrass -1.778e-03 3.640e-04 -4.884 1.89e-06 ***
## DryFrass 7.964e-02 4.267e-03 18.666 < 2e-16 ***
## Cassim 1.901e-01 6.513e-03 29.194 < 2e-16 ***
## LogCassim -1.078e-02 1.420e-03 -7.589 6.93e-13 ***
## Nfrass -8.271e-01 5.028e-02 -16.449 < 2e-16 ***
## LogNassim 1.465e-02 1.530e-03 9.572 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.000622 on 242 degrees of freedom
## Multiple R-squared: 0.9987, Adjusted R-squared: 0.9986
## F-statistic: 1.793e+04 on 10 and 242 DF, p-value: < 2.2e-16
The model’s predictors show consistency with those in forward selection, reinforcing the significance of Nfrass, DryFrass, Intake, and WetFrass in predicting Nassim.
Stepwise selection is performed for the intercept-only model and from the full model. The final stepwise models are shown below: Stepwise Selection Starting from Null Model
## Start: AIC=-2072.64
## Nassim ~ 1
##
## Call:
## lm(formula = Nassim ~ 1, data = caterpillars_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.013064 -0.011215 -0.008635 0.002501 0.050331
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.013831 0.001044 13.25 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0166 on 252 degrees of freedom
Stepwise Selection Starting from Full Model
## Start: AIC=-3714.18
## Nassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + LogMass +
## Intake + LogIntake + WetFrass + LogWetFrass + DryFrass +
## LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - Fgp 1 0.00000000 0.00009239 -3716.2
## - LogMass 1 0.00000002 0.00009241 -3716.1
## - LogWetFrass 1 0.00000002 0.00009242 -3716.1
## - ActiveFeeding 1 0.00000005 0.00009245 -3716.0
## - LogDryFrass 1 0.00000008 0.00009247 -3716.0
## - Instar 1 0.00000011 0.00009250 -3715.9
## - Mgp 1 0.00000022 0.00009261 -3715.6
## - LogNfrass 1 0.00000025 0.00009264 -3715.5
## <none> 0.00009239 -3714.2
## - LogIntake 1 0.00000078 0.00009317 -3714.1
## - Mass 1 0.00000694 0.00009933 -3697.9
## - WetFrass 1 0.00000821 0.00010060 -3694.6
## - LogCassim 1 0.00002034 0.00011273 -3665.8
## - LogNassim 1 0.00003523 0.00012763 -3634.4
## - Intake 1 0.00003883 0.00013122 -3627.4
## - Nfrass 1 0.00009267 0.00018506 -3540.4
## - DryFrass 1 0.00011947 0.00021186 -3506.2
## - Cassim 1 0.00032552 0.00041791 -3334.3
##
## Step: AIC=-3716.18
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake +
## LogIntake + WetFrass + LogWetFrass + DryFrass + LogDryFrass +
## Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogWetFrass 1 0.00000003 0.00009242 -3718.1
## - LogMass 1 0.00000003 0.00009243 -3718.1
## - ActiveFeeding 1 0.00000005 0.00009245 -3718.0
## - LogDryFrass 1 0.00000008 0.00009247 -3718.0
## - Instar 1 0.00000013 0.00009253 -3717.8
## - LogNfrass 1 0.00000025 0.00009264 -3717.5
## - Mgp 1 0.00000032 0.00009271 -3717.3
## <none> 0.00009239 -3716.2
## - LogIntake 1 0.00000080 0.00009319 -3716.0
## + Fgp 1 0.00000000 0.00009239 -3714.2
## - Mass 1 0.00000694 0.00009933 -3699.9
## - WetFrass 1 0.00000833 0.00010072 -3696.3
## - LogCassim 1 0.00002041 0.00011280 -3667.7
## - LogNassim 1 0.00003524 0.00012764 -3636.4
## - Intake 1 0.00003889 0.00013128 -3629.3
## - Nfrass 1 0.00009439 0.00018678 -3540.1
## - DryFrass 1 0.00012175 0.00021415 -3505.5
## - Cassim 1 0.00032651 0.00041891 -3335.7
##
## Step: AIC=-3718.1
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake +
## LogIntake + WetFrass + DryFrass + LogDryFrass + Cassim +
## LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogMass 1 0.00000004 0.00009246 -3720.0
## - ActiveFeeding 1 0.00000005 0.00009247 -3720.0
## - LogDryFrass 1 0.00000005 0.00009248 -3720.0
## - Instar 1 0.00000017 0.00009259 -3719.7
## - LogNfrass 1 0.00000024 0.00009266 -3719.4
## - Mgp 1 0.00000033 0.00009275 -3719.2
## <none> 0.00009242 -3718.1
## - LogIntake 1 0.00000082 0.00009324 -3717.9
## + LogWetFrass 1 0.00000003 0.00009239 -3716.2
## + Fgp 1 0.00000000 0.00009242 -3716.1
## - Mass 1 0.00000692 0.00009934 -3701.8
## - WetFrass 1 0.00000902 0.00010144 -3696.5
## - LogCassim 1 0.00002048 0.00011290 -3669.5
## - LogNassim 1 0.00003528 0.00012770 -3638.3
## - Intake 1 0.00003887 0.00013129 -3631.3
## - Nfrass 1 0.00009476 0.00018718 -3541.6
## - DryFrass 1 0.00012173 0.00021416 -3507.5
## - Cassim 1 0.00032669 0.00041911 -3337.6
##
## Step: AIC=-3719.99
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake +
## WetFrass + DryFrass + LogDryFrass + Cassim + LogCassim +
## Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogDryFrass 1 0.00000006 0.00009253 -3721.8
## - ActiveFeeding 1 0.00000012 0.00009258 -3721.7
## - LogNfrass 1 0.00000026 0.00009272 -3721.3
## - Mgp 1 0.00000032 0.00009278 -3721.1
## - Instar 1 0.00000045 0.00009291 -3720.8
## <none> 0.00009246 -3720.0
## - LogIntake 1 0.00000101 0.00009347 -3719.2
## + LogMass 1 0.00000004 0.00009242 -3718.1
## + LogWetFrass 1 0.00000004 0.00009243 -3718.1
## + Fgp 1 0.00000001 0.00009246 -3718.0
## - Mass 1 0.00000692 0.00009938 -3703.7
## - WetFrass 1 0.00000933 0.00010179 -3697.7
## - LogCassim 1 0.00002159 0.00011405 -3668.9
## - LogNassim 1 0.00003566 0.00012812 -3639.5
## - Intake 1 0.00003933 0.00013180 -3632.3
## - Nfrass 1 0.00009596 0.00018842 -3541.9
## - DryFrass 1 0.00013210 0.00022457 -3497.5
## - Cassim 1 0.00032884 0.00042130 -3338.3
##
## Step: AIC=-3721.81
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake +
## WetFrass + DryFrass + Cassim + LogCassim + Nfrass + LogNfrass +
## LogNassim
##
## Df Sum of Sq RSS AIC
## - ActiveFeeding 1 0.00000014 0.00009266 -3723.4
## - Mgp 1 0.00000038 0.00009291 -3722.8
## - Instar 1 0.00000040 0.00009293 -3722.7
## <none> 0.00009253 -3721.8
## - LogNfrass 1 0.00000088 0.00009341 -3721.4
## - LogIntake 1 0.00000101 0.00009354 -3721.1
## + LogDryFrass 1 0.00000006 0.00009246 -3720.0
## + LogMass 1 0.00000005 0.00009248 -3720.0
## + Fgp 1 0.00000002 0.00009251 -3719.9
## + LogWetFrass 1 0.00000000 0.00009252 -3719.8
## - Mass 1 0.00000698 0.00009950 -3705.4
## - WetFrass 1 0.00000929 0.00010181 -3699.6
## - LogCassim 1 0.00002188 0.00011441 -3670.1
## - LogNassim 1 0.00003645 0.00012898 -3639.8
## - Intake 1 0.00003947 0.00013199 -3633.9
## - Nfrass 1 0.00009956 0.00019208 -3539.0
## - DryFrass 1 0.00013353 0.00022606 -3497.8
## - Cassim 1 0.00032878 0.00042130 -3340.3
##
## Step: AIC=-3723.44
## Nassim ~ Instar + Mgp + Mass + Intake + LogIntake + WetFrass +
## DryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - Mgp 1 0.00000038 0.00009304 -3724.4
## - Instar 1 0.00000064 0.00009330 -3723.7
## <none> 0.00009266 -3723.4
## - LogNfrass 1 0.00000086 0.00009352 -3723.1
## - LogIntake 1 0.00000089 0.00009356 -3723.0
## + LogMass 1 0.00000014 0.00009253 -3721.8
## + ActiveFeeding 1 0.00000014 0.00009253 -3721.8
## + LogDryFrass 1 0.00000008 0.00009258 -3721.7
## + Fgp 1 0.00000007 0.00009259 -3721.6
## + LogWetFrass 1 0.00000000 0.00009266 -3721.4
## - Mass 1 0.00000722 0.00009989 -3706.4
## - WetFrass 1 0.00000915 0.00010181 -3701.6
## - LogCassim 1 0.00002220 0.00011487 -3671.1
## - LogNassim 1 0.00003632 0.00012898 -3641.8
## - Intake 1 0.00003943 0.00013209 -3635.7
## - Nfrass 1 0.00009980 0.00019247 -3540.5
## - DryFrass 1 0.00013359 0.00022625 -3499.6
## - Cassim 1 0.00032891 0.00042157 -3342.1
##
## Step: AIC=-3724.41
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass +
## Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogNfrass 1 0.00000060 0.00009364 -3724.8
## <none> 0.00009304 -3724.4
## - LogIntake 1 0.00000091 0.00009395 -3723.9
## + Mgp 1 0.00000038 0.00009266 -3723.4
## - Instar 1 0.00000115 0.00009420 -3723.3
## + Fgp 1 0.00000025 0.00009279 -3723.1
## + LogDryFrass 1 0.00000015 0.00009289 -3722.8
## + ActiveFeeding 1 0.00000013 0.00009291 -3722.8
## + LogMass 1 0.00000012 0.00009292 -3722.7
## + LogWetFrass 1 0.00000000 0.00009304 -3722.4
## - Mass 1 0.00000732 0.00010036 -3707.3
## - WetFrass 1 0.00000909 0.00010214 -3702.8
## - LogCassim 1 0.00002194 0.00011498 -3672.9
## - LogNassim 1 0.00003604 0.00012909 -3643.6
## - Intake 1 0.00003912 0.00013216 -3637.6
## - Nfrass 1 0.00010039 0.00019343 -3541.3
## - DryFrass 1 0.00013495 0.00022799 -3499.7
## - Cassim 1 0.00032968 0.00042272 -3343.5
##
## Step: AIC=-3724.79
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass +
## Cassim + LogCassim + Nfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## <none> 0.00009364 -3724.8
## + LogNfrass 1 0.00000060 0.00009304 -3724.4
## + LogDryFrass 1 0.00000041 0.00009323 -3723.9
## + LogWetFrass 1 0.00000041 0.00009323 -3723.9
## + Mgp 1 0.00000012 0.00009352 -3723.1
## + ActiveFeeding 1 0.00000011 0.00009353 -3723.1
## + LogMass 1 0.00000009 0.00009355 -3723.0
## + Fgp 1 0.00000006 0.00009358 -3722.9
## - LogIntake 1 0.00000200 0.00009564 -3721.4
## - Instar 1 0.00000326 0.00009690 -3718.1
## - Mass 1 0.00000793 0.00010157 -3706.2
## - WetFrass 1 0.00000923 0.00010287 -3703.0
## - LogCassim 1 0.00002229 0.00011593 -3672.8
## - LogNassim 1 0.00003545 0.00012909 -3645.6
## - Intake 1 0.00003853 0.00013217 -3639.6
## - Nfrass 1 0.00010469 0.00019833 -3536.9
## - DryFrass 1 0.00013483 0.00022847 -3501.1
## - Cassim 1 0.00032978 0.00042342 -3345.0
##
## Call:
## lm(formula = Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass +
## DryFrass + Cassim + LogCassim + Nfrass + LogNassim, data = caterpillars_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.742e-03 -1.613e-04 -2.116e-05 1.637e-04 2.704e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.844e-02 2.184e-03 8.443 2.87e-15 ***
## Instar -2.659e-04 9.161e-05 -2.903 0.00404 **
## Mass 1.920e-04 4.242e-05 4.526 9.42e-06 ***
## Intake -6.024e-03 6.037e-04 -9.978 < 2e-16 ***
## LogIntake -2.740e-03 1.205e-03 -2.274 0.02381 *
## WetFrass -1.778e-03 3.640e-04 -4.884 1.89e-06 ***
## DryFrass 7.964e-02 4.267e-03 18.666 < 2e-16 ***
## Cassim 1.901e-01 6.513e-03 29.194 < 2e-16 ***
## LogCassim -1.078e-02 1.420e-03 -7.589 6.93e-13 ***
## Nfrass -8.271e-01 5.028e-02 -16.449 < 2e-16 ***
## LogNassim 1.465e-02 1.530e-03 9.572 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.000622 on 242 degrees of freedom
## Multiple R-squared: 0.9987, Adjusted R-squared: 0.9986
## F-statistic: 1.793e+04 on 10 and 242 DF, p-value: < 2.2e-16
The stepwise approach consistently identifies Cassim, Nfrass, and DryFrass as key predictors with the final model having a high adjusted R-squared value, further confirming its predictive strength.
The model selection methods—Best Subsets, Forward Selection, Backward Elimination, and Stepwise Selection consistently highlighted Cassim, Nfrass, and DryFrass as essential predictors of Nassim. This alignment across different methods underscores the stability and predictive strength of these variables. Each method yielded models with high predictive accuracy which supports a balance between simplicity and explanatory power. Overall the results affirm the biological relevance of Cassim Nfrass and DryFrass in relation to Nassim.
Code for best subsets solution
library(leaps)
best_subset_model <- regsubsets(Nassim ~ ., data = caterpillars_clean, nbest = 1, method = "exhaustive")
summary(best_subset_model)
Code for forward selection
null_model <- lm(Nassim ~ 1, data = caterpillars_clean)
full_model <- lm(Nassim ~ ., data = caterpillars_clean)
forward_model <- step(null_model, direction = "forward", scope = list(lower = null_model, upper = full_model))
summary(forward_model)
Code for backward elimination
backward_model <- step(full_model, direction = "backward")
summary(backward_model)
Code for stepwise solution
stepwise_model1 <- step(null_model, direction = "both")
summary(stepwise_model1)
stepwise_model2 <- step(full_model, direction = "both")
summary(stepwise_model2)