Introduction

This project fits a multiple linear regression model on the “Caterpillars” dataset using Nassim as the response variable. Various combinations of explanatory variables are compared using forward, backward, and stepwise selection methods. Additionally, a natural-log transformation of Nassim is used to compare the models.

1. Data Preparation

# Load the necessary libraries
library(MASS)
# Load the dataset
caterpillars_data <- read.csv("https://www.stat2.org/datasets/Caterpillars.csv")

# View the first few rows of the data
head(caterpillars_data)
##   Instar ActiveFeeding Fgp Mgp     Mass   LogMass   Intake  LogIntake WetFrass
## 1      1             Y   Y   Y 0.002064 -2.685290 0.165118 -0.7822056 0.000241
## 2      1             Y   N   N 0.005191 -2.284749 0.201008 -0.6967867 0.000063
## 3      2             N   Y   N 0.005603 -2.251579 0.189125 -0.7232511 0.001401
## 4      2             Y   N   N 0.019300 -1.714443 0.283280 -0.5477841 0.002045
## 5      2             N   Y   Y 0.029300 -1.533132 0.259569 -0.5857472 0.005377
## 6      3             Y   Y   N 0.062600 -1.203426 0.327864 -0.4843063 0.029500
##   LogWetFrass DryFrass LogDryFrass     Cassim LogCassim   Nfrass LogNfrass
## 1   -3.617983 0.000208   -3.681937 0.01422378 -1.846985 6.61e-06 -5.179510
## 2   -4.200659 0.000061   -4.214670 0.01739189 -1.759653 1.03e-06 -5.986783
## 3   -2.853562 0.000969   -3.013676 0.01639923 -1.785177 2.78e-05 -4.555794
## 4   -2.689307 0.001834   -2.736601 0.02392468 -1.621154 4.64e-05 -4.333480
## 5   -2.269460 0.003523   -2.453087 0.02122857 -1.673079 9.97e-05 -4.001301
## 6   -1.530178 0.000789   -3.102923 0.02836365 -1.547238 1.84e-05 -4.735567
##        Nassim LogNassim
## 1 0.001858999 -2.730721
## 2 0.002270091 -2.643957
## 3 0.002302210 -2.637855
## 4 0.003041352 -2.516933
## 5 0.002791898 -2.554100
## 6 0.003627464 -2.440397

Model selection Methods

This section outlines the three methods used for model selection: forward selection, backward selection, and stepwise selection. Each method is explained in the subsections that follow.

1. Forward Selection

In this subsection, a linear model is first created with only the intercept. The forward selection method is then applied, adding explanatory variables step-by-step based on the model’s improvement. The summary of the final model is displayed.

# Set Nassim as the response variable
lm_model <- lm(Nassim ~ 1, data = caterpillars_data)
forward_method <- step(lm_model, direction = "forward")
## Start:  AIC=-2080.9
## Nassim ~ 1
summary(forward_method)
## 
## Call:
## lm(formula = Nassim ~ 1, data = caterpillars_data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.016029 -0.011200 -0.008595  0.002465  0.050394 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.013768   0.001042   13.22   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0166 on 253 degrees of freedom
##   (13 observations deleted due to missingness)

2. Backward Selection

Here, a full linear model including all variables is created. The backward selection method is applied next, which removes variables one at a time, starting with the least significant. The final model’s summary is also presented.

lm_model_full <- lm(Nassim ~ ., data = caterpillars_data)
backward_method <- step(lm_model_full, direction = "backward")
## Start:  AIC=-3714.18
## Nassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + LogMass + 
##     Intake + LogIntake + WetFrass + LogWetFrass + DryFrass + 
##     LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - Fgp            1 0.00000000 0.00009239 -3716.2
## - LogMass        1 0.00000002 0.00009241 -3716.1
## - LogWetFrass    1 0.00000002 0.00009242 -3716.1
## - ActiveFeeding  1 0.00000005 0.00009245 -3716.0
## - LogDryFrass    1 0.00000008 0.00009247 -3716.0
## - Instar         1 0.00000011 0.00009250 -3715.9
## - Mgp            1 0.00000022 0.00009261 -3715.6
## - LogNfrass      1 0.00000025 0.00009264 -3715.5
## <none>                        0.00009239 -3714.2
## - LogIntake      1 0.00000078 0.00009317 -3714.1
## - Mass           1 0.00000694 0.00009933 -3697.9
## - WetFrass       1 0.00000821 0.00010060 -3694.6
## - LogCassim      1 0.00002034 0.00011273 -3665.8
## - LogNassim      1 0.00003523 0.00012763 -3634.4
## - Intake         1 0.00003883 0.00013122 -3627.4
## - Nfrass         1 0.00009267 0.00018506 -3540.4
## - DryFrass       1 0.00011947 0.00021186 -3506.2
## - Cassim         1 0.00032552 0.00041791 -3334.3
## 
## Step:  AIC=-3716.18
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake + 
##     LogIntake + WetFrass + LogWetFrass + DryFrass + LogDryFrass + 
##     Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogWetFrass    1 0.00000003 0.00009242 -3718.1
## - LogMass        1 0.00000003 0.00009243 -3718.1
## - ActiveFeeding  1 0.00000005 0.00009245 -3718.0
## - LogDryFrass    1 0.00000008 0.00009247 -3718.0
## - Instar         1 0.00000013 0.00009253 -3717.8
## - LogNfrass      1 0.00000025 0.00009264 -3717.5
## - Mgp            1 0.00000032 0.00009271 -3717.3
## <none>                        0.00009239 -3716.2
## - LogIntake      1 0.00000080 0.00009319 -3716.0
## - Mass           1 0.00000694 0.00009933 -3699.9
## - WetFrass       1 0.00000833 0.00010072 -3696.3
## - LogCassim      1 0.00002041 0.00011280 -3667.7
## - LogNassim      1 0.00003524 0.00012764 -3636.4
## - Intake         1 0.00003889 0.00013128 -3629.3
## - Nfrass         1 0.00009439 0.00018678 -3540.1
## - DryFrass       1 0.00012175 0.00021415 -3505.5
## - Cassim         1 0.00032651 0.00041891 -3335.7
## 
## Step:  AIC=-3718.1
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake + 
##     LogIntake + WetFrass + DryFrass + LogDryFrass + Cassim + 
##     LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogMass        1 0.00000004 0.00009246 -3720.0
## - ActiveFeeding  1 0.00000005 0.00009247 -3720.0
## - LogDryFrass    1 0.00000005 0.00009248 -3720.0
## - Instar         1 0.00000017 0.00009259 -3719.7
## - LogNfrass      1 0.00000024 0.00009266 -3719.4
## - Mgp            1 0.00000033 0.00009275 -3719.2
## <none>                        0.00009242 -3718.1
## - LogIntake      1 0.00000082 0.00009324 -3717.9
## - Mass           1 0.00000692 0.00009934 -3701.8
## - WetFrass       1 0.00000902 0.00010144 -3696.5
## - LogCassim      1 0.00002048 0.00011290 -3669.5
## - LogNassim      1 0.00003528 0.00012770 -3638.3
## - Intake         1 0.00003887 0.00013129 -3631.3
## - Nfrass         1 0.00009476 0.00018718 -3541.6
## - DryFrass       1 0.00012173 0.00021416 -3507.5
## - Cassim         1 0.00032669 0.00041911 -3337.6
## 
## Step:  AIC=-3719.99
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake + 
##     WetFrass + DryFrass + LogDryFrass + Cassim + LogCassim + 
##     Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogDryFrass    1 0.00000006 0.00009253 -3721.8
## - ActiveFeeding  1 0.00000012 0.00009258 -3721.7
## - LogNfrass      1 0.00000026 0.00009272 -3721.3
## - Mgp            1 0.00000032 0.00009278 -3721.1
## - Instar         1 0.00000045 0.00009291 -3720.8
## <none>                        0.00009246 -3720.0
## - LogIntake      1 0.00000101 0.00009347 -3719.2
## - Mass           1 0.00000692 0.00009938 -3703.7
## - WetFrass       1 0.00000933 0.00010179 -3697.7
## - LogCassim      1 0.00002159 0.00011405 -3668.9
## - LogNassim      1 0.00003566 0.00012812 -3639.5
## - Intake         1 0.00003933 0.00013180 -3632.3
## - Nfrass         1 0.00009596 0.00018842 -3541.9
## - DryFrass       1 0.00013210 0.00022457 -3497.5
## - Cassim         1 0.00032884 0.00042130 -3338.3
## 
## Step:  AIC=-3721.81
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake + 
##     WetFrass + DryFrass + Cassim + LogCassim + Nfrass + LogNfrass + 
##     LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - ActiveFeeding  1 0.00000014 0.00009266 -3723.4
## - Mgp            1 0.00000038 0.00009291 -3722.8
## - Instar         1 0.00000040 0.00009293 -3722.7
## <none>                        0.00009253 -3721.8
## - LogNfrass      1 0.00000088 0.00009341 -3721.4
## - LogIntake      1 0.00000101 0.00009354 -3721.1
## - Mass           1 0.00000698 0.00009950 -3705.4
## - WetFrass       1 0.00000929 0.00010181 -3699.6
## - LogCassim      1 0.00002188 0.00011441 -3670.1
## - LogNassim      1 0.00003645 0.00012898 -3639.8
## - Intake         1 0.00003947 0.00013199 -3633.9
## - Nfrass         1 0.00009956 0.00019208 -3539.0
## - DryFrass       1 0.00013353 0.00022606 -3497.8
## - Cassim         1 0.00032878 0.00042130 -3340.3
## 
## Step:  AIC=-3723.44
## Nassim ~ Instar + Mgp + Mass + Intake + LogIntake + WetFrass + 
##     DryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##             Df  Sum of Sq        RSS     AIC
## - Mgp        1 0.00000038 0.00009304 -3724.4
## - Instar     1 0.00000064 0.00009330 -3723.7
## <none>                    0.00009266 -3723.4
## - LogNfrass  1 0.00000086 0.00009352 -3723.1
## - LogIntake  1 0.00000089 0.00009356 -3723.0
## - Mass       1 0.00000722 0.00009989 -3706.4
## - WetFrass   1 0.00000915 0.00010181 -3701.6
## - LogCassim  1 0.00002220 0.00011487 -3671.1
## - LogNassim  1 0.00003632 0.00012898 -3641.8
## - Intake     1 0.00003943 0.00013209 -3635.7
## - Nfrass     1 0.00009980 0.00019247 -3540.5
## - DryFrass   1 0.00013359 0.00022625 -3499.6
## - Cassim     1 0.00032891 0.00042157 -3342.1
## 
## Step:  AIC=-3724.41
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass + 
##     Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##             Df  Sum of Sq        RSS     AIC
## - LogNfrass  1 0.00000060 0.00009364 -3724.8
## <none>                    0.00009304 -3724.4
## - LogIntake  1 0.00000091 0.00009395 -3723.9
## - Instar     1 0.00000115 0.00009420 -3723.3
## - Mass       1 0.00000732 0.00010036 -3707.3
## - WetFrass   1 0.00000909 0.00010214 -3702.8
## - LogCassim  1 0.00002194 0.00011498 -3672.9
## - LogNassim  1 0.00003604 0.00012909 -3643.6
## - Intake     1 0.00003912 0.00013216 -3637.6
## - Nfrass     1 0.00010039 0.00019343 -3541.3
## - DryFrass   1 0.00013495 0.00022799 -3499.7
## - Cassim     1 0.00032968 0.00042272 -3343.5
## 
## Step:  AIC=-3724.79
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass + 
##     Cassim + LogCassim + Nfrass + LogNassim
## 
##             Df  Sum of Sq        RSS     AIC
## <none>                    0.00009364 -3724.8
## - LogIntake  1 0.00000200 0.00009564 -3721.4
## - Instar     1 0.00000326 0.00009690 -3718.1
## - Mass       1 0.00000793 0.00010157 -3706.2
## - WetFrass   1 0.00000923 0.00010287 -3703.0
## - LogCassim  1 0.00002229 0.00011593 -3672.8
## - LogNassim  1 0.00003545 0.00012909 -3645.6
## - Intake     1 0.00003853 0.00013217 -3639.6
## - Nfrass     1 0.00010469 0.00019833 -3536.9
## - DryFrass   1 0.00013483 0.00022847 -3501.1
## - Cassim     1 0.00032978 0.00042342 -3345.0
summary(backward_method)
## 
## Call:
## lm(formula = Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + 
##     DryFrass + Cassim + LogCassim + Nfrass + LogNassim, data = caterpillars_data)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.742e-03 -1.613e-04 -2.116e-05  1.637e-04  2.704e-03 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.844e-02  2.184e-03   8.443 2.87e-15 ***
## Instar      -2.659e-04  9.161e-05  -2.903  0.00404 ** 
## Mass         1.920e-04  4.242e-05   4.526 9.42e-06 ***
## Intake      -6.024e-03  6.037e-04  -9.978  < 2e-16 ***
## LogIntake   -2.740e-03  1.205e-03  -2.274  0.02381 *  
## WetFrass    -1.778e-03  3.640e-04  -4.884 1.89e-06 ***
## DryFrass     7.964e-02  4.267e-03  18.666  < 2e-16 ***
## Cassim       1.901e-01  6.513e-03  29.194  < 2e-16 ***
## LogCassim   -1.078e-02  1.420e-03  -7.589 6.93e-13 ***
## Nfrass      -8.271e-01  5.028e-02 -16.449  < 2e-16 ***
## LogNassim    1.465e-02  1.530e-03   9.572  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.000622 on 242 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:  0.9987, Adjusted R-squared:  0.9986 
## F-statistic: 1.793e+04 on 10 and 242 DF,  p-value: < 2.2e-16

3. Stepwise Selection

This subsection combines both forward and backward selection methods. The stepwise approach considers both adding and removing variables based on their significance. The resulting model’s summary is provided.

stepwise_method <- step(lm_model_full, direction = "both")
## Start:  AIC=-3714.18
## Nassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + LogMass + 
##     Intake + LogIntake + WetFrass + LogWetFrass + DryFrass + 
##     LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - Fgp            1 0.00000000 0.00009239 -3716.2
## - LogMass        1 0.00000002 0.00009241 -3716.1
## - LogWetFrass    1 0.00000002 0.00009242 -3716.1
## - ActiveFeeding  1 0.00000005 0.00009245 -3716.0
## - LogDryFrass    1 0.00000008 0.00009247 -3716.0
## - Instar         1 0.00000011 0.00009250 -3715.9
## - Mgp            1 0.00000022 0.00009261 -3715.6
## - LogNfrass      1 0.00000025 0.00009264 -3715.5
## <none>                        0.00009239 -3714.2
## - LogIntake      1 0.00000078 0.00009317 -3714.1
## - Mass           1 0.00000694 0.00009933 -3697.9
## - WetFrass       1 0.00000821 0.00010060 -3694.6
## - LogCassim      1 0.00002034 0.00011273 -3665.8
## - LogNassim      1 0.00003523 0.00012763 -3634.4
## - Intake         1 0.00003883 0.00013122 -3627.4
## - Nfrass         1 0.00009267 0.00018506 -3540.4
## - DryFrass       1 0.00011947 0.00021186 -3506.2
## - Cassim         1 0.00032552 0.00041791 -3334.3
## 
## Step:  AIC=-3716.18
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake + 
##     LogIntake + WetFrass + LogWetFrass + DryFrass + LogDryFrass + 
##     Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogWetFrass    1 0.00000003 0.00009242 -3718.1
## - LogMass        1 0.00000003 0.00009243 -3718.1
## - ActiveFeeding  1 0.00000005 0.00009245 -3718.0
## - LogDryFrass    1 0.00000008 0.00009247 -3718.0
## - Instar         1 0.00000013 0.00009253 -3717.8
## - LogNfrass      1 0.00000025 0.00009264 -3717.5
## - Mgp            1 0.00000032 0.00009271 -3717.3
## <none>                        0.00009239 -3716.2
## - LogIntake      1 0.00000080 0.00009319 -3716.0
## + Fgp            1 0.00000000 0.00009239 -3714.2
## - Mass           1 0.00000694 0.00009933 -3699.9
## - WetFrass       1 0.00000833 0.00010072 -3696.3
## - LogCassim      1 0.00002041 0.00011280 -3667.7
## - LogNassim      1 0.00003524 0.00012764 -3636.4
## - Intake         1 0.00003889 0.00013128 -3629.3
## - Nfrass         1 0.00009439 0.00018678 -3540.1
## - DryFrass       1 0.00012175 0.00021415 -3505.5
## - Cassim         1 0.00032651 0.00041891 -3335.7
## 
## Step:  AIC=-3718.1
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake + 
##     LogIntake + WetFrass + DryFrass + LogDryFrass + Cassim + 
##     LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogMass        1 0.00000004 0.00009246 -3720.0
## - ActiveFeeding  1 0.00000005 0.00009247 -3720.0
## - LogDryFrass    1 0.00000005 0.00009248 -3720.0
## - Instar         1 0.00000017 0.00009259 -3719.7
## - LogNfrass      1 0.00000024 0.00009266 -3719.4
## - Mgp            1 0.00000033 0.00009275 -3719.2
## <none>                        0.00009242 -3718.1
## - LogIntake      1 0.00000082 0.00009324 -3717.9
## + LogWetFrass    1 0.00000003 0.00009239 -3716.2
## + Fgp            1 0.00000000 0.00009242 -3716.1
## - Mass           1 0.00000692 0.00009934 -3701.8
## - WetFrass       1 0.00000902 0.00010144 -3696.5
## - LogCassim      1 0.00002048 0.00011290 -3669.5
## - LogNassim      1 0.00003528 0.00012770 -3638.3
## - Intake         1 0.00003887 0.00013129 -3631.3
## - Nfrass         1 0.00009476 0.00018718 -3541.6
## - DryFrass       1 0.00012173 0.00021416 -3507.5
## - Cassim         1 0.00032669 0.00041911 -3337.6
## 
## Step:  AIC=-3719.99
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake + 
##     WetFrass + DryFrass + LogDryFrass + Cassim + LogCassim + 
##     Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogDryFrass    1 0.00000006 0.00009253 -3721.8
## - ActiveFeeding  1 0.00000012 0.00009258 -3721.7
## - LogNfrass      1 0.00000026 0.00009272 -3721.3
## - Mgp            1 0.00000032 0.00009278 -3721.1
## - Instar         1 0.00000045 0.00009291 -3720.8
## <none>                        0.00009246 -3720.0
## - LogIntake      1 0.00000101 0.00009347 -3719.2
## + LogMass        1 0.00000004 0.00009242 -3718.1
## + LogWetFrass    1 0.00000004 0.00009243 -3718.1
## + Fgp            1 0.00000001 0.00009246 -3718.0
## - Mass           1 0.00000692 0.00009938 -3703.7
## - WetFrass       1 0.00000933 0.00010179 -3697.7
## - LogCassim      1 0.00002159 0.00011405 -3668.9
## - LogNassim      1 0.00003566 0.00012812 -3639.5
## - Intake         1 0.00003933 0.00013180 -3632.3
## - Nfrass         1 0.00009596 0.00018842 -3541.9
## - DryFrass       1 0.00013210 0.00022457 -3497.5
## - Cassim         1 0.00032884 0.00042130 -3338.3
## 
## Step:  AIC=-3721.81
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake + 
##     WetFrass + DryFrass + Cassim + LogCassim + Nfrass + LogNfrass + 
##     LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - ActiveFeeding  1 0.00000014 0.00009266 -3723.4
## - Mgp            1 0.00000038 0.00009291 -3722.8
## - Instar         1 0.00000040 0.00009293 -3722.7
## <none>                        0.00009253 -3721.8
## - LogNfrass      1 0.00000088 0.00009341 -3721.4
## - LogIntake      1 0.00000101 0.00009354 -3721.1
## + LogDryFrass    1 0.00000006 0.00009246 -3720.0
## + LogMass        1 0.00000005 0.00009248 -3720.0
## + Fgp            1 0.00000002 0.00009251 -3719.9
## + LogWetFrass    1 0.00000000 0.00009252 -3719.8
## - Mass           1 0.00000698 0.00009950 -3705.4
## - WetFrass       1 0.00000929 0.00010181 -3699.6
## - LogCassim      1 0.00002188 0.00011441 -3670.1
## - LogNassim      1 0.00003645 0.00012898 -3639.8
## - Intake         1 0.00003947 0.00013199 -3633.9
## - Nfrass         1 0.00009956 0.00019208 -3539.0
## - DryFrass       1 0.00013353 0.00022606 -3497.8
## - Cassim         1 0.00032878 0.00042130 -3340.3
## 
## Step:  AIC=-3723.44
## Nassim ~ Instar + Mgp + Mass + Intake + LogIntake + WetFrass + 
##     DryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - Mgp            1 0.00000038 0.00009304 -3724.4
## - Instar         1 0.00000064 0.00009330 -3723.7
## <none>                        0.00009266 -3723.4
## - LogNfrass      1 0.00000086 0.00009352 -3723.1
## - LogIntake      1 0.00000089 0.00009356 -3723.0
## + LogMass        1 0.00000014 0.00009253 -3721.8
## + ActiveFeeding  1 0.00000014 0.00009253 -3721.8
## + LogDryFrass    1 0.00000008 0.00009258 -3721.7
## + Fgp            1 0.00000007 0.00009259 -3721.6
## + LogWetFrass    1 0.00000000 0.00009266 -3721.4
## - Mass           1 0.00000722 0.00009989 -3706.4
## - WetFrass       1 0.00000915 0.00010181 -3701.6
## - LogCassim      1 0.00002220 0.00011487 -3671.1
## - LogNassim      1 0.00003632 0.00012898 -3641.8
## - Intake         1 0.00003943 0.00013209 -3635.7
## - Nfrass         1 0.00009980 0.00019247 -3540.5
## - DryFrass       1 0.00013359 0.00022625 -3499.6
## - Cassim         1 0.00032891 0.00042157 -3342.1
## 
## Step:  AIC=-3724.41
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass + 
##     Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## - LogNfrass      1 0.00000060 0.00009364 -3724.8
## <none>                        0.00009304 -3724.4
## - LogIntake      1 0.00000091 0.00009395 -3723.9
## + Mgp            1 0.00000038 0.00009266 -3723.4
## - Instar         1 0.00000115 0.00009420 -3723.3
## + Fgp            1 0.00000025 0.00009279 -3723.1
## + LogDryFrass    1 0.00000015 0.00009289 -3722.8
## + ActiveFeeding  1 0.00000013 0.00009291 -3722.8
## + LogMass        1 0.00000012 0.00009292 -3722.7
## + LogWetFrass    1 0.00000000 0.00009304 -3722.4
## - Mass           1 0.00000732 0.00010036 -3707.3
## - WetFrass       1 0.00000909 0.00010214 -3702.8
## - LogCassim      1 0.00002194 0.00011498 -3672.9
## - LogNassim      1 0.00003604 0.00012909 -3643.6
## - Intake         1 0.00003912 0.00013216 -3637.6
## - Nfrass         1 0.00010039 0.00019343 -3541.3
## - DryFrass       1 0.00013495 0.00022799 -3499.7
## - Cassim         1 0.00032968 0.00042272 -3343.5
## 
## Step:  AIC=-3724.79
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass + 
##     Cassim + LogCassim + Nfrass + LogNassim
## 
##                 Df  Sum of Sq        RSS     AIC
## <none>                        0.00009364 -3724.8
## + LogNfrass      1 0.00000060 0.00009304 -3724.4
## + LogDryFrass    1 0.00000041 0.00009323 -3723.9
## + LogWetFrass    1 0.00000041 0.00009323 -3723.9
## + Mgp            1 0.00000012 0.00009352 -3723.1
## + ActiveFeeding  1 0.00000011 0.00009353 -3723.1
## + LogMass        1 0.00000009 0.00009355 -3723.0
## + Fgp            1 0.00000006 0.00009358 -3722.9
## - LogIntake      1 0.00000200 0.00009564 -3721.4
## - Instar         1 0.00000326 0.00009690 -3718.1
## - Mass           1 0.00000793 0.00010157 -3706.2
## - WetFrass       1 0.00000923 0.00010287 -3703.0
## - LogCassim      1 0.00002229 0.00011593 -3672.8
## - LogNassim      1 0.00003545 0.00012909 -3645.6
## - Intake         1 0.00003853 0.00013217 -3639.6
## - Nfrass         1 0.00010469 0.00019833 -3536.9
## - DryFrass       1 0.00013483 0.00022847 -3501.1
## - Cassim         1 0.00032978 0.00042342 -3345.0
summary(stepwise_method)
## 
## Call:
## lm(formula = Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + 
##     DryFrass + Cassim + LogCassim + Nfrass + LogNassim, data = caterpillars_data)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.742e-03 -1.613e-04 -2.116e-05  1.637e-04  2.704e-03 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.844e-02  2.184e-03   8.443 2.87e-15 ***
## Instar      -2.659e-04  9.161e-05  -2.903  0.00404 ** 
## Mass         1.920e-04  4.242e-05   4.526 9.42e-06 ***
## Intake      -6.024e-03  6.037e-04  -9.978  < 2e-16 ***
## LogIntake   -2.740e-03  1.205e-03  -2.274  0.02381 *  
## WetFrass    -1.778e-03  3.640e-04  -4.884 1.89e-06 ***
## DryFrass     7.964e-02  4.267e-03  18.666  < 2e-16 ***
## Cassim       1.901e-01  6.513e-03  29.194  < 2e-16 ***
## LogCassim   -1.078e-02  1.420e-03  -7.589 6.93e-13 ***
## Nfrass      -8.271e-01  5.028e-02 -16.449  < 2e-16 ***
## LogNassim    1.465e-02  1.530e-03   9.572  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.000622 on 242 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:  0.9987, Adjusted R-squared:  0.9986 
## F-statistic: 1.793e+04 on 10 and 242 DF,  p-value: < 2.2e-16

Comparison of Models

In this section, the models selected by each method are compared using the AIC (Akaike Information Criterion) and adjusted R-squared values. These metrics help assess the models’ performance and determine which one is the best fit.

# Comparing AIC and Adjusted R-squared for each model
AIC(forward_method, backward_method, stepwise_method)
## Warning in AIC.default(forward_method, backward_method, stepwise_method):
## models are not all fitted to the same number of observations
##                 df       AIC
## forward_method   2 -1358.080
## backward_method 12 -3004.805
## stepwise_method 12 -3004.805
summary(forward_method)$adj.r.squared
## [1] 0
summary(backward_method)$adj.r.squared
## [1] 0.9985965
summary(stepwise_method)$adj.r.squared
## [1] 0.9985965

Log Transformation on Nassim

This section addresses the analysis using a natural-log transformation of the response variable, Nassim. The same model selection methods are applied again to see how the log transformation affects the results.

1. Forward Selection with Log-transformed Nassim

Here, a new linear model is created using the log-transformed response variable. The forward selection method is applied, and the summary of the resulting model is presented.

lm_model_log <- lm(log(Nassim) ~ 1, data = caterpillars_data)
## Warning in log(Nassim): NaNs produced
forward_log_method <- step(lm_model_log, direction = "forward")
## Start:  AIC=77.19
## log(Nassim) ~ 1
summary(forward_log_method)
## 
## Call:
## lm(formula = log(Nassim) ~ 1, data = caterpillars_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2130 -0.9864 -0.3005  0.8447  2.2130 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.95933    0.07309  -67.86   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.163 on 252 degrees of freedom
##   (14 observations deleted due to missingness)

2. Backward Selection with Log-transformed Nassim

A full linear model is created with the log-transformed variable, followed by the backward selection method. The summary of the final model is displayed.

lm_model_full_log <- lm(log(Nassim) ~ ., data = caterpillars_data)
## Warning in log(Nassim): NaNs produced
backward_log_method <- step(lm_model_full_log, direction = "backward")
## Start:  AIC=-8211.49
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + LogMass + 
##     Intake + LogIntake + WetFrass + LogWetFrass + DryFrass + 
##     LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
##                 Df Sum of Sq    RSS     AIC
## - Mass           1    0.0000 0.0000 -8213.5
## - DryFrass       1    0.0000 0.0000 -8213.5
## - WetFrass       1    0.0000 0.0000 -8213.4
## - LogDryFrass    1    0.0000 0.0000 -8213.4
## - Fgp            1    0.0000 0.0000 -8213.3
## - Intake         1    0.0000 0.0000 -8213.2
## - ActiveFeeding  1    0.0000 0.0000 -8213.1
## - Mgp            1    0.0000 0.0000 -8213.1
## - LogMass        1    0.0000 0.0000 -8212.9
## - Cassim         1    0.0000 0.0000 -8212.9
## - LogWetFrass    1    0.0000 0.0000 -8212.7
## - Nfrass         1    0.0000 0.0000 -8212.7
## - LogNfrass      1    0.0000 0.0000 -8212.3
## - Instar         1    0.0000 0.0000 -8212.2
## <none>                       0.0000 -8211.5
## - LogIntake      1    0.0000 0.0000 -8205.0
## - LogCassim      1    0.0000 0.0000 -8200.9
## - LogNassim      1    0.8271 0.8271 -1414.0
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8213.49
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass + 
##     Intake + LogIntake + WetFrass + LogWetFrass + DryFrass + 
##     LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
##                 Df Sum of Sq     RSS     AIC
## - DryFrass       1   0.00000 0.00000 -8215.5
## - WetFrass       1   0.00000 0.00000 -8215.4
## - LogDryFrass    1   0.00000 0.00000 -8215.4
## - Fgp            1   0.00000 0.00000 -8215.3
## - Intake         1   0.00000 0.00000 -8215.1
## - ActiveFeeding  1   0.00000 0.00000 -8215.1
## - Mgp            1   0.00000 0.00000 -8215.1
## - LogMass        1   0.00000 0.00000 -8214.9
## - Cassim         1   0.00000 0.00000 -8214.8
## - LogWetFrass    1   0.00000 0.00000 -8214.7
## - Nfrass         1   0.00000 0.00000 -8214.7
## - LogNfrass      1   0.00000 0.00000 -8214.3
## - Instar         1   0.00000 0.00000 -8214.2
## <none>                       0.00000 -8213.5
## - LogIntake      1   0.00000 0.00000 -8206.3
## - LogCassim      1   0.00000 0.00000 -8202.6
## - LogNassim      1   0.93154 0.93154 -1385.9
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8215.48
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass + 
##     Intake + LogIntake + WetFrass + LogWetFrass + LogDryFrass + 
##     Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
##                 Df Sum of Sq     RSS     AIC
## - WetFrass       1   0.00000 0.00000 -8217.4
## - LogDryFrass    1   0.00000 0.00000 -8217.4
## - Fgp            1   0.00000 0.00000 -8217.3
## - ActiveFeeding  1   0.00000 0.00000 -8217.0
## - Mgp            1   0.00000 0.00000 -8217.0
## - LogMass        1   0.00000 0.00000 -8216.9
## - LogWetFrass    1   0.00000 0.00000 -8216.7
## - Nfrass         1   0.00000 0.00000 -8216.6
## - LogNfrass      1   0.00000 0.00000 -8216.2
## - Instar         1   0.00000 0.00000 -8216.2
## - Intake         1   0.00000 0.00000 -8216.2
## - Cassim         1   0.00000 0.00000 -8215.8
## <none>                       0.00000 -8215.5
## - LogIntake      1   0.00000 0.00000 -8207.6
## - LogCassim      1   0.00000 0.00000 -8204.2
## - LogNassim      1   0.93208 0.93208 -1387.7
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8217.4
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass + 
##     Intake + LogIntake + LogWetFrass + LogDryFrass + Cassim + 
##     LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
##                 Df Sum of Sq     RSS     AIC
## - LogDryFrass    1   0.00000 0.00000 -8219.3
## - Fgp            1   0.00000 0.00000 -8219.2
## - ActiveFeeding  1   0.00000 0.00000 -8219.0
## - Mgp            1   0.00000 0.00000 -8218.9
## - LogMass        1   0.00000 0.00000 -8218.8
## - LogWetFrass    1   0.00000 0.00000 -8218.4
## - LogNfrass      1   0.00000 0.00000 -8218.2
## - Intake         1   0.00000 0.00000 -8218.1
## - Nfrass         1   0.00000 0.00000 -8218.1
## - Instar         1   0.00000 0.00000 -8218.0
## - Cassim         1   0.00000 0.00000 -8217.7
## <none>                       0.00000 -8217.4
## - LogIntake      1   0.00000 0.00000 -8209.5
## - LogCassim      1   0.00000 0.00000 -8205.8
## - LogNassim      1   0.94437 0.94437 -1386.4
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8219.27
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass + 
##     Intake + LogIntake + LogWetFrass + Cassim + LogCassim + Nfrass + 
##     LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
##                 Df Sum of Sq     RSS     AIC
## - Fgp            1   0.00000 0.00000 -8221.1
## - ActiveFeeding  1   0.00000 0.00000 -8220.8
## - Mgp            1   0.00000 0.00000 -8220.8
## - LogMass        1   0.00000 0.00000 -8220.7
## - LogWetFrass    1   0.00000 0.00000 -8220.4
## - Nfrass         1   0.00000 0.00000 -8220.1
## - Intake         1   0.00000 0.00000 -8220.0
## - Instar         1   0.00000 0.00000 -8219.8
## - Cassim         1   0.00000 0.00000 -8219.7
## <none>                       0.00000 -8219.3
## - LogNfrass      1   0.00000 0.00000 -8218.6
## - LogIntake      1   0.00000 0.00000 -8211.5
## - LogCassim      1   0.00000 0.00000 -8207.4
## - LogNassim      1   0.97395 0.97395 -1380.6
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8221.07
## log(Nassim) ~ Instar + ActiveFeeding + Mgp + LogMass + Intake + 
##     LogIntake + LogWetFrass + Cassim + LogCassim + Nfrass + LogNfrass + 
##     LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
##                 Df Sum of Sq     RSS     AIC
## - LogMass        1   0.00000 0.00000 -8222.7
## - ActiveFeeding  1   0.00000 0.00000 -8222.6
## - LogWetFrass    1   0.00000 0.00000 -8222.3
## - Nfrass         1   0.00000 0.00000 -8222.0
## - Intake         1   0.00000 0.00000 -8221.9
## - Cassim         1   0.00000 0.00000 -8221.6
## - Mgp            1   0.00000 0.00000 -8221.6
## <none>                       0.00000 -8221.1
## - Instar         1   0.00000 0.00000 -8220.9
## - LogNfrass      1   0.00000 0.00000 -8220.1
## - LogIntake      1   0.00000 0.00000 -8212.3
## - LogCassim      1   0.00000 0.00000 -8208.7
## - LogNassim      1   0.97399 0.97399 -1382.6
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8222.72
## log(Nassim) ~ Instar + ActiveFeeding + Mgp + Intake + LogIntake + 
##     LogWetFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
##                 Df Sum of Sq   RSS     AIC
## - ActiveFeeding  1     0.000 0.000 -8224.5
## - LogWetFrass    1     0.000 0.000 -8224.1
## - Nfrass         1     0.000 0.000 -8223.3
## - Mgp            1     0.000 0.000 -8223.1
## - Intake         1     0.000 0.000 -8223.0
## <none>                       0.000 -8222.7
## - Cassim         1     0.000 0.000 -8222.7
## - LogNfrass      1     0.000 0.000 -8221.5
## - Instar         1     0.000 0.000 -8220.0
## - LogIntake      1     0.000 0.000 -8211.7
## - LogCassim      1     0.000 0.000 -8209.1
## - LogNassim      1     0.976 0.976 -1384.1
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8224.55
## log(Nassim) ~ Instar + Mgp + Intake + LogIntake + LogWetFrass + 
##     Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
##               Df Sum of Sq     RSS     AIC
## - LogWetFrass  1   0.00000 0.00000 -8226.0
## - Nfrass       1   0.00000 0.00000 -8225.3
## - Intake       1   0.00000 0.00000 -8224.9
## - Mgp          1   0.00000 0.00000 -8224.9
## - Cassim       1   0.00000 0.00000 -8224.6
## <none>                     0.00000 -8224.5
## - LogNfrass    1   0.00000 0.00000 -8223.3
## - Instar       1   0.00000 0.00000 -8221.9
## - LogIntake    1   0.00000 0.00000 -8213.2
## - LogCassim    1   0.00000 0.00000 -8210.6
## - LogNassim    1   0.98129 0.98129 -1384.7
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8225.97
## log(Nassim) ~ Instar + Mgp + Intake + LogIntake + Cassim + LogCassim + 
##     Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
##             Df Sum of Sq     RSS     AIC
## - Nfrass     1   0.00000 0.00000 -8226.9
## - Intake     1   0.00000 0.00000 -8226.5
## - Mgp        1   0.00000 0.00000 -8226.4
## - Cassim     1   0.00000 0.00000 -8226.1
## <none>                   0.00000 -8226.0
## - Instar     1   0.00000 0.00000 -8223.9
## - LogIntake  1   0.00000 0.00000 -8215.1
## - LogCassim  1   0.00000 0.00000 -8212.6
## - LogNfrass  1   0.00000 0.00000 -8212.3
## - LogNassim  1   0.99426 0.99426 -1383.4
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8226.95
## log(Nassim) ~ Instar + Mgp + Intake + LogIntake + Cassim + LogCassim + 
##     LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
##             Df Sum of Sq   RSS     AIC
## - Intake     1     0.000 0.000 -8228.4
## - Mgp        1     0.000 0.000 -8227.8
## - Cassim     1     0.000 0.000 -8227.8
## <none>                   0.000 -8226.9
## - Instar     1     0.000 0.000 -8225.3
## - LogIntake  1     0.000 0.000 -8215.8
## - LogNfrass  1     0.000 0.000 -8214.2
## - LogCassim  1     0.000 0.000 -8213.7
## - LogNassim  1     1.643 1.643 -1258.3
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8228.38
## log(Nassim) ~ Instar + Mgp + LogIntake + Cassim + LogCassim + 
##     LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
##             Df Sum of Sq    RSS     AIC
## - Mgp        1    0.0000 0.0000 -8229.0
## <none>                   0.0000 -8228.4
## - Cassim     1    0.0000 0.0000 -8227.0
## - Instar     1    0.0000 0.0000 -8225.8
## - LogNfrass  1    0.0000 0.0000 -8215.5
## - LogCassim  1    0.0000 0.0000 -8215.2
## - LogIntake  1    0.0000 0.0000 -8215.0
## - LogNassim  1    2.3981 2.3981 -1164.7
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8229
## log(Nassim) ~ Instar + LogIntake + Cassim + LogCassim + LogNfrass + 
##     LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
##             Df Sum of Sq    RSS     AIC
## <none>                   0.0000 -8229.0
## - Cassim     1    0.0000 0.0000 -8227.6
## - Instar     1    0.0000 0.0000 -8227.5
## - LogNfrass  1    0.0000 0.0000 -8217.5
## - LogIntake  1    0.0000 0.0000 -8216.8
## - LogCassim  1    0.0000 0.0000 -8216.1
## - LogNassim  1    2.4693 2.4693 -1159.3
summary(backward_log_method)
## 
## Call:
## lm(formula = log(Nassim) ~ Instar + LogIntake + Cassim + LogCassim + 
##     LogNfrass + LogNassim, data = caterpillars_data)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -4.363e-07 -2.245e-08 -2.660e-09  2.636e-08  3.020e-07 
## 
## Coefficients:
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)  7.517e-07  2.155e-07  3.488e+00 0.000576 ***
## Instar      -2.714e-08  1.465e-08 -1.852e+00 0.065170 .  
## LogIntake   -5.023e-07  1.332e-07 -3.771e+00 0.000203 ***
## Cassim      -1.852e-07  1.017e-07 -1.820e+00 0.069901 .  
## LogCassim    6.949e-07  1.796e-07  3.870e+00 0.000140 ***
## LogNfrass    8.287e-08  2.259e-08  3.669e+00 0.000298 ***
## LogNassim    2.303e+00  1.251e-07  1.841e+07  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.535e-08 on 246 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 7.792e+15 on 6 and 246 DF,  p-value: < 2.2e-16

3. Stepwise Selection with Log-transformed Nassim

We apply the stepwise method to the model with the log-transformed response variable. The summary of the resulting model is also shown.

stepwise_log_method <- step(lm_model_full_log, direction = "both")
## Start:  AIC=-8211.49
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + LogMass + 
##     Intake + LogIntake + WetFrass + LogWetFrass + DryFrass + 
##     LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
##                 Df Sum of Sq    RSS     AIC
## - Mass           1    0.0000 0.0000 -8213.5
## - DryFrass       1    0.0000 0.0000 -8213.5
## - WetFrass       1    0.0000 0.0000 -8213.4
## - LogDryFrass    1    0.0000 0.0000 -8213.4
## - Fgp            1    0.0000 0.0000 -8213.3
## - Intake         1    0.0000 0.0000 -8213.2
## - ActiveFeeding  1    0.0000 0.0000 -8213.1
## - Mgp            1    0.0000 0.0000 -8213.1
## - LogMass        1    0.0000 0.0000 -8212.9
## - Cassim         1    0.0000 0.0000 -8212.9
## - LogWetFrass    1    0.0000 0.0000 -8212.7
## - Nfrass         1    0.0000 0.0000 -8212.7
## - LogNfrass      1    0.0000 0.0000 -8212.3
## - Instar         1    0.0000 0.0000 -8212.2
## <none>                       0.0000 -8211.5
## - LogIntake      1    0.0000 0.0000 -8205.0
## - LogCassim      1    0.0000 0.0000 -8200.9
## - LogNassim      1    0.8271 0.8271 -1414.0
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8213.49
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass + 
##     Intake + LogIntake + WetFrass + LogWetFrass + DryFrass + 
##     LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
##                 Df Sum of Sq     RSS     AIC
## - DryFrass       1   0.00000 0.00000 -8215.5
## - WetFrass       1   0.00000 0.00000 -8215.4
## - LogDryFrass    1   0.00000 0.00000 -8215.4
## - Fgp            1   0.00000 0.00000 -8215.3
## - Intake         1   0.00000 0.00000 -8215.1
## - ActiveFeeding  1   0.00000 0.00000 -8215.1
## - Mgp            1   0.00000 0.00000 -8215.1
## - LogMass        1   0.00000 0.00000 -8214.9
## - Cassim         1   0.00000 0.00000 -8214.8
## - LogWetFrass    1   0.00000 0.00000 -8214.7
## - Nfrass         1   0.00000 0.00000 -8214.7
## - LogNfrass      1   0.00000 0.00000 -8214.3
## - Instar         1   0.00000 0.00000 -8214.2
## <none>                       0.00000 -8213.5
## + Mass           1   0.00000 0.00000 -8211.5
## - LogIntake      1   0.00000 0.00000 -8206.3
## - LogCassim      1   0.00000 0.00000 -8202.6
## - LogNassim      1   0.93154 0.93154 -1385.9
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8215.48
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass + 
##     Intake + LogIntake + WetFrass + LogWetFrass + LogDryFrass + 
##     Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
##                 Df Sum of Sq     RSS     AIC
## - WetFrass       1   0.00000 0.00000 -8217.4
## - LogDryFrass    1   0.00000 0.00000 -8217.4
## - Fgp            1   0.00000 0.00000 -8217.3
## - ActiveFeeding  1   0.00000 0.00000 -8217.0
## - Mgp            1   0.00000 0.00000 -8217.0
## - LogMass        1   0.00000 0.00000 -8216.9
## - LogWetFrass    1   0.00000 0.00000 -8216.7
## - Nfrass         1   0.00000 0.00000 -8216.6
## - LogNfrass      1   0.00000 0.00000 -8216.2
## - Instar         1   0.00000 0.00000 -8216.2
## - Intake         1   0.00000 0.00000 -8216.2
## - Cassim         1   0.00000 0.00000 -8215.8
## <none>                       0.00000 -8215.5
## + DryFrass       1   0.00000 0.00000 -8213.5
## + Mass           1   0.00000 0.00000 -8213.5
## - LogIntake      1   0.00000 0.00000 -8207.6
## - LogCassim      1   0.00000 0.00000 -8204.2
## - LogNassim      1   0.93208 0.93208 -1387.7
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8217.4
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass + 
##     Intake + LogIntake + LogWetFrass + LogDryFrass + Cassim + 
##     LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
##                 Df Sum of Sq     RSS     AIC
## - LogDryFrass    1   0.00000 0.00000 -8219.3
## - Fgp            1   0.00000 0.00000 -8219.2
## - ActiveFeeding  1   0.00000 0.00000 -8219.0
## - Mgp            1   0.00000 0.00000 -8218.9
## - LogMass        1   0.00000 0.00000 -8218.8
## - LogWetFrass    1   0.00000 0.00000 -8218.4
## - LogNfrass      1   0.00000 0.00000 -8218.2
## - Intake         1   0.00000 0.00000 -8218.1
## - Nfrass         1   0.00000 0.00000 -8218.1
## - Instar         1   0.00000 0.00000 -8218.0
## - Cassim         1   0.00000 0.00000 -8217.7
## <none>                       0.00000 -8217.4
## + WetFrass       1   0.00000 0.00000 -8215.5
## + DryFrass       1   0.00000 0.00000 -8215.4
## + Mass           1   0.00000 0.00000 -8215.4
## - LogIntake      1   0.00000 0.00000 -8209.5
## - LogCassim      1   0.00000 0.00000 -8205.8
## - LogNassim      1   0.94437 0.94437 -1386.4
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8219.27
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass + 
##     Intake + LogIntake + LogWetFrass + Cassim + LogCassim + Nfrass + 
##     LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
##                 Df Sum of Sq     RSS     AIC
## - Fgp            1   0.00000 0.00000 -8221.1
## - ActiveFeeding  1   0.00000 0.00000 -8220.8
## - Mgp            1   0.00000 0.00000 -8220.8
## - LogMass        1   0.00000 0.00000 -8220.7
## - LogWetFrass    1   0.00000 0.00000 -8220.4
## - Nfrass         1   0.00000 0.00000 -8220.1
## - Intake         1   0.00000 0.00000 -8220.0
## - Instar         1   0.00000 0.00000 -8219.8
## - Cassim         1   0.00000 0.00000 -8219.7
## <none>                       0.00000 -8219.3
## - LogNfrass      1   0.00000 0.00000 -8218.6
## + LogDryFrass    1   0.00000 0.00000 -8217.4
## + WetFrass       1   0.00000 0.00000 -8217.4
## + Mass           1   0.00000 0.00000 -8217.3
## + DryFrass       1   0.00000 0.00000 -8217.3
## - LogIntake      1   0.00000 0.00000 -8211.5
## - LogCassim      1   0.00000 0.00000 -8207.4
## - LogNassim      1   0.97395 0.97395 -1380.6
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8221.07
## log(Nassim) ~ Instar + ActiveFeeding + Mgp + LogMass + Intake + 
##     LogIntake + LogWetFrass + Cassim + LogCassim + Nfrass + LogNfrass + 
##     LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
##                 Df Sum of Sq     RSS     AIC
## - LogMass        1   0.00000 0.00000 -8222.7
## - ActiveFeeding  1   0.00000 0.00000 -8222.6
## - LogWetFrass    1   0.00000 0.00000 -8222.3
## - Nfrass         1   0.00000 0.00000 -8222.0
## - Intake         1   0.00000 0.00000 -8221.9
## - Cassim         1   0.00000 0.00000 -8221.6
## - Mgp            1   0.00000 0.00000 -8221.6
## <none>                       0.00000 -8221.1
## - Instar         1   0.00000 0.00000 -8220.9
## - LogNfrass      1   0.00000 0.00000 -8220.1
## + Fgp            1   0.00000 0.00000 -8219.3
## + LogDryFrass    1   0.00000 0.00000 -8219.2
## + WetFrass       1   0.00000 0.00000 -8219.2
## + Mass           1   0.00000 0.00000 -8219.1
## + DryFrass       1   0.00000 0.00000 -8219.1
## - LogIntake      1   0.00000 0.00000 -8212.3
## - LogCassim      1   0.00000 0.00000 -8208.7
## - LogNassim      1   0.97399 0.97399 -1382.6
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8222.72
## log(Nassim) ~ Instar + ActiveFeeding + Mgp + Intake + LogIntake + 
##     LogWetFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
##                 Df Sum of Sq   RSS     AIC
## - ActiveFeeding  1     0.000 0.000 -8224.5
## - LogWetFrass    1     0.000 0.000 -8224.1
## - Nfrass         1     0.000 0.000 -8223.3
## - Mgp            1     0.000 0.000 -8223.1
## - Intake         1     0.000 0.000 -8223.0
## <none>                       0.000 -8222.7
## - Cassim         1     0.000 0.000 -8222.7
## - LogNfrass      1     0.000 0.000 -8221.5
## + LogMass        1     0.000 0.000 -8221.1
## + WetFrass       1     0.000 0.000 -8220.8
## + LogDryFrass    1     0.000 0.000 -8220.8
## + DryFrass       1     0.000 0.000 -8220.8
## + Fgp            1     0.000 0.000 -8220.7
## + Mass           1     0.000 0.000 -8220.7
## - Instar         1     0.000 0.000 -8220.0
## - LogIntake      1     0.000 0.000 -8211.7
## - LogCassim      1     0.000 0.000 -8209.1
## - LogNassim      1     0.976 0.976 -1384.1
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8224.55
## log(Nassim) ~ Instar + Mgp + Intake + LogIntake + LogWetFrass + 
##     Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
##                 Df Sum of Sq     RSS     AIC
## - LogWetFrass    1   0.00000 0.00000 -8226.0
## - Nfrass         1   0.00000 0.00000 -8225.3
## - Intake         1   0.00000 0.00000 -8224.9
## - Mgp            1   0.00000 0.00000 -8224.9
## - Cassim         1   0.00000 0.00000 -8224.6
## <none>                       0.00000 -8224.5
## - LogNfrass      1   0.00000 0.00000 -8223.3
## + ActiveFeeding  1   0.00000 0.00000 -8222.7
## + LogMass        1   0.00000 0.00000 -8222.6
## + LogDryFrass    1   0.00000 0.00000 -8222.6
## + DryFrass       1   0.00000 0.00000 -8222.6
## + WetFrass       1   0.00000 0.00000 -8222.6
## + Mass           1   0.00000 0.00000 -8222.6
## + Fgp            1   0.00000 0.00000 -8222.6
## - Instar         1   0.00000 0.00000 -8221.9
## - LogIntake      1   0.00000 0.00000 -8213.2
## - LogCassim      1   0.00000 0.00000 -8210.6
## - LogNassim      1   0.98129 0.98129 -1384.7
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8225.97
## log(Nassim) ~ Instar + Mgp + Intake + LogIntake + Cassim + LogCassim + 
##     Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
##                 Df Sum of Sq     RSS     AIC
## - Nfrass         1   0.00000 0.00000 -8226.9
## - Intake         1   0.00000 0.00000 -8226.5
## - Mgp            1   0.00000 0.00000 -8226.4
## - Cassim         1   0.00000 0.00000 -8226.1
## <none>                       0.00000 -8226.0
## + LogWetFrass    1   0.00000 0.00000 -8224.5
## + WetFrass       1   0.00000 0.00000 -8224.2
## + ActiveFeeding  1   0.00000 0.00000 -8224.1
## + DryFrass       1   0.00000 0.00000 -8224.1
## + Mass           1   0.00000 0.00000 -8224.1
## + LogMass        1   0.00000 0.00000 -8224.0
## + Fgp            1   0.00000 0.00000 -8224.0
## + LogDryFrass    1   0.00000 0.00000 -8224.0
## - Instar         1   0.00000 0.00000 -8223.9
## - LogIntake      1   0.00000 0.00000 -8215.1
## - LogCassim      1   0.00000 0.00000 -8212.6
## - LogNfrass      1   0.00000 0.00000 -8212.3
## - LogNassim      1   0.99426 0.99426 -1383.4
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8226.95
## log(Nassim) ~ Instar + Mgp + Intake + LogIntake + Cassim + LogCassim + 
##     LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
##                 Df Sum of Sq   RSS     AIC
## - Intake         1     0.000 0.000 -8228.4
## - Mgp            1     0.000 0.000 -8227.8
## - Cassim         1     0.000 0.000 -8227.8
## <none>                       0.000 -8226.9
## + Nfrass         1     0.000 0.000 -8226.0
## + LogWetFrass    1     0.000 0.000 -8225.3
## - Instar         1     0.000 0.000 -8225.3
## + LogMass        1     0.000 0.000 -8225.3
## + WetFrass       1     0.000 0.000 -8225.3
## + Fgp            1     0.000 0.000 -8225.2
## + DryFrass       1     0.000 0.000 -8225.2
## + Mass           1     0.000 0.000 -8225.1
## + LogDryFrass    1     0.000 0.000 -8225.0
## + ActiveFeeding  1     0.000 0.000 -8225.0
## - LogIntake      1     0.000 0.000 -8215.8
## - LogNfrass      1     0.000 0.000 -8214.2
## - LogCassim      1     0.000 0.000 -8213.7
## - LogNassim      1     1.643 1.643 -1258.3
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8228.38
## log(Nassim) ~ Instar + Mgp + LogIntake + Cassim + LogCassim + 
##     LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
##                 Df Sum of Sq    RSS     AIC
## - Mgp            1    0.0000 0.0000 -8229.0
## <none>                       0.0000 -8228.4
## - Cassim         1    0.0000 0.0000 -8227.0
## + Intake         1    0.0000 0.0000 -8226.9
## + LogMass        1    0.0000 0.0000 -8226.8
## + LogWetFrass    1    0.0000 0.0000 -8226.7
## + Fgp            1    0.0000 0.0000 -8226.7
## + DryFrass       1    0.0000 0.0000 -8226.6
## + Mass           1    0.0000 0.0000 -8226.5
## + Nfrass         1    0.0000 0.0000 -8226.5
## + LogDryFrass    1    0.0000 0.0000 -8226.5
## + ActiveFeeding  1    0.0000 0.0000 -8226.4
## + WetFrass       1    0.0000 0.0000 -8226.4
## - Instar         1    0.0000 0.0000 -8225.8
## - LogNfrass      1    0.0000 0.0000 -8215.5
## - LogCassim      1    0.0000 0.0000 -8215.2
## - LogIntake      1    0.0000 0.0000 -8215.0
## - LogNassim      1    2.3981 2.3981 -1164.7
## Warning in log(Nassim): NaNs produced
## 
## Step:  AIC=-8229
## log(Nassim) ~ Instar + LogIntake + Cassim + LogCassim + LogNfrass + 
##     LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
##                 Df Sum of Sq    RSS     AIC
## <none>                       0.0000 -8229.0
## + Mgp            1    0.0000 0.0000 -8228.4
## + Intake         1    0.0000 0.0000 -8227.8
## - Cassim         1    0.0000 0.0000 -8227.6
## - Instar         1    0.0000 0.0000 -8227.5
## + LogMass        1    0.0000 0.0000 -8227.5
## + DryFrass       1    0.0000 0.0000 -8227.5
## + LogWetFrass    1    0.0000 0.0000 -8227.3
## + ActiveFeeding  1    0.0000 0.0000 -8227.1
## + Mass           1    0.0000 0.0000 -8227.0
## + WetFrass       1    0.0000 0.0000 -8227.0
## + Nfrass         1    0.0000 0.0000 -8227.0
## + LogDryFrass    1    0.0000 0.0000 -8227.0
## + Fgp            1    0.0000 0.0000 -8227.0
## - LogNfrass      1    0.0000 0.0000 -8217.5
## - LogIntake      1    0.0000 0.0000 -8216.8
## - LogCassim      1    0.0000 0.0000 -8216.1
## - LogNassim      1    2.4693 2.4693 -1159.3
summary(stepwise_log_method)
## 
## Call:
## lm(formula = log(Nassim) ~ Instar + LogIntake + Cassim + LogCassim + 
##     LogNfrass + LogNassim, data = caterpillars_data)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -4.363e-07 -2.245e-08 -2.660e-09  2.636e-08  3.020e-07 
## 
## Coefficients:
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)  7.517e-07  2.155e-07  3.488e+00 0.000576 ***
## Instar      -2.714e-08  1.465e-08 -1.852e+00 0.065170 .  
## LogIntake   -5.023e-07  1.332e-07 -3.771e+00 0.000203 ***
## Cassim      -1.852e-07  1.017e-07 -1.820e+00 0.069901 .  
## LogCassim    6.949e-07  1.796e-07  3.870e+00 0.000140 ***
## LogNfrass    8.287e-08  2.259e-08  3.669e+00 0.000298 ***
## LogNassim    2.303e+00  1.251e-07  1.841e+07  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.535e-08 on 246 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 7.792e+15 on 6 and 246 DF,  p-value: < 2.2e-16

Comparison of Models with Log-transformed Nassim

In this section, the models obtained from the log-transformed response variable are compared using AIC and adjusted R-squared values, similar to the previous comparison.

# Comparing AIC and Adjusted R-squared for each log-transformed model
AIC(forward_log_method, backward_log_method, stepwise_log_method)
##                     df        AIC
## forward_log_method   2   797.1759
## backward_log_method  8 -7509.0202
## stepwise_log_method  8 -7509.0202
summary(forward_log_method)$adj.r.squared
## [1] 0
summary(backward_log_method)$adj.r.squared
## [1] 1
summary(stepwise_log_method)$adj.r.squared
## [1] 1

Conclusion

In this analysis, we used multiple linear regression to explore how different factors affect the response variable, Nassim, in the Caterpillars dataset. By applying forward, backward, and stepwise selection methods, we identified the most important predictors. Comparing the models showed that their effectiveness varied based on adjusted R-squared values and AIC. Using a natural-log transformation on Nassim often improved the models’ performance. Overall, the stepwise selection method provided the best balance between complexity and accuracy, highlighting the value of these techniques in understanding the relationships in the data.