This project fits a multiple linear regression model on the “Caterpillars” dataset using Nassim as the response variable. Various combinations of explanatory variables are compared using forward, backward, and stepwise selection methods. Additionally, a natural-log transformation of Nassim is used to compare the models.
# Load the necessary libraries
library(MASS)
# Load the dataset
caterpillars_data <- read.csv("https://www.stat2.org/datasets/Caterpillars.csv")
# View the first few rows of the data
head(caterpillars_data)
## Instar ActiveFeeding Fgp Mgp Mass LogMass Intake LogIntake WetFrass
## 1 1 Y Y Y 0.002064 -2.685290 0.165118 -0.7822056 0.000241
## 2 1 Y N N 0.005191 -2.284749 0.201008 -0.6967867 0.000063
## 3 2 N Y N 0.005603 -2.251579 0.189125 -0.7232511 0.001401
## 4 2 Y N N 0.019300 -1.714443 0.283280 -0.5477841 0.002045
## 5 2 N Y Y 0.029300 -1.533132 0.259569 -0.5857472 0.005377
## 6 3 Y Y N 0.062600 -1.203426 0.327864 -0.4843063 0.029500
## LogWetFrass DryFrass LogDryFrass Cassim LogCassim Nfrass LogNfrass
## 1 -3.617983 0.000208 -3.681937 0.01422378 -1.846985 6.61e-06 -5.179510
## 2 -4.200659 0.000061 -4.214670 0.01739189 -1.759653 1.03e-06 -5.986783
## 3 -2.853562 0.000969 -3.013676 0.01639923 -1.785177 2.78e-05 -4.555794
## 4 -2.689307 0.001834 -2.736601 0.02392468 -1.621154 4.64e-05 -4.333480
## 5 -2.269460 0.003523 -2.453087 0.02122857 -1.673079 9.97e-05 -4.001301
## 6 -1.530178 0.000789 -3.102923 0.02836365 -1.547238 1.84e-05 -4.735567
## Nassim LogNassim
## 1 0.001858999 -2.730721
## 2 0.002270091 -2.643957
## 3 0.002302210 -2.637855
## 4 0.003041352 -2.516933
## 5 0.002791898 -2.554100
## 6 0.003627464 -2.440397
This section outlines the three methods used for model selection: forward selection, backward selection, and stepwise selection. Each method is explained in the subsections that follow.
In this subsection, a linear model is first created with only the intercept. The forward selection method is then applied, adding explanatory variables step-by-step based on the model’s improvement. The summary of the final model is displayed.
# Set Nassim as the response variable
lm_model <- lm(Nassim ~ 1, data = caterpillars_data)
forward_method <- step(lm_model, direction = "forward")
## Start: AIC=-2080.9
## Nassim ~ 1
summary(forward_method)
##
## Call:
## lm(formula = Nassim ~ 1, data = caterpillars_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.016029 -0.011200 -0.008595 0.002465 0.050394
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.013768 0.001042 13.22 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0166 on 253 degrees of freedom
## (13 observations deleted due to missingness)
Here, a full linear model including all variables is created. The backward selection method is applied next, which removes variables one at a time, starting with the least significant. The final model’s summary is also presented.
lm_model_full <- lm(Nassim ~ ., data = caterpillars_data)
backward_method <- step(lm_model_full, direction = "backward")
## Start: AIC=-3714.18
## Nassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + LogMass +
## Intake + LogIntake + WetFrass + LogWetFrass + DryFrass +
## LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - Fgp 1 0.00000000 0.00009239 -3716.2
## - LogMass 1 0.00000002 0.00009241 -3716.1
## - LogWetFrass 1 0.00000002 0.00009242 -3716.1
## - ActiveFeeding 1 0.00000005 0.00009245 -3716.0
## - LogDryFrass 1 0.00000008 0.00009247 -3716.0
## - Instar 1 0.00000011 0.00009250 -3715.9
## - Mgp 1 0.00000022 0.00009261 -3715.6
## - LogNfrass 1 0.00000025 0.00009264 -3715.5
## <none> 0.00009239 -3714.2
## - LogIntake 1 0.00000078 0.00009317 -3714.1
## - Mass 1 0.00000694 0.00009933 -3697.9
## - WetFrass 1 0.00000821 0.00010060 -3694.6
## - LogCassim 1 0.00002034 0.00011273 -3665.8
## - LogNassim 1 0.00003523 0.00012763 -3634.4
## - Intake 1 0.00003883 0.00013122 -3627.4
## - Nfrass 1 0.00009267 0.00018506 -3540.4
## - DryFrass 1 0.00011947 0.00021186 -3506.2
## - Cassim 1 0.00032552 0.00041791 -3334.3
##
## Step: AIC=-3716.18
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake +
## LogIntake + WetFrass + LogWetFrass + DryFrass + LogDryFrass +
## Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogWetFrass 1 0.00000003 0.00009242 -3718.1
## - LogMass 1 0.00000003 0.00009243 -3718.1
## - ActiveFeeding 1 0.00000005 0.00009245 -3718.0
## - LogDryFrass 1 0.00000008 0.00009247 -3718.0
## - Instar 1 0.00000013 0.00009253 -3717.8
## - LogNfrass 1 0.00000025 0.00009264 -3717.5
## - Mgp 1 0.00000032 0.00009271 -3717.3
## <none> 0.00009239 -3716.2
## - LogIntake 1 0.00000080 0.00009319 -3716.0
## - Mass 1 0.00000694 0.00009933 -3699.9
## - WetFrass 1 0.00000833 0.00010072 -3696.3
## - LogCassim 1 0.00002041 0.00011280 -3667.7
## - LogNassim 1 0.00003524 0.00012764 -3636.4
## - Intake 1 0.00003889 0.00013128 -3629.3
## - Nfrass 1 0.00009439 0.00018678 -3540.1
## - DryFrass 1 0.00012175 0.00021415 -3505.5
## - Cassim 1 0.00032651 0.00041891 -3335.7
##
## Step: AIC=-3718.1
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake +
## LogIntake + WetFrass + DryFrass + LogDryFrass + Cassim +
## LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogMass 1 0.00000004 0.00009246 -3720.0
## - ActiveFeeding 1 0.00000005 0.00009247 -3720.0
## - LogDryFrass 1 0.00000005 0.00009248 -3720.0
## - Instar 1 0.00000017 0.00009259 -3719.7
## - LogNfrass 1 0.00000024 0.00009266 -3719.4
## - Mgp 1 0.00000033 0.00009275 -3719.2
## <none> 0.00009242 -3718.1
## - LogIntake 1 0.00000082 0.00009324 -3717.9
## - Mass 1 0.00000692 0.00009934 -3701.8
## - WetFrass 1 0.00000902 0.00010144 -3696.5
## - LogCassim 1 0.00002048 0.00011290 -3669.5
## - LogNassim 1 0.00003528 0.00012770 -3638.3
## - Intake 1 0.00003887 0.00013129 -3631.3
## - Nfrass 1 0.00009476 0.00018718 -3541.6
## - DryFrass 1 0.00012173 0.00021416 -3507.5
## - Cassim 1 0.00032669 0.00041911 -3337.6
##
## Step: AIC=-3719.99
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake +
## WetFrass + DryFrass + LogDryFrass + Cassim + LogCassim +
## Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogDryFrass 1 0.00000006 0.00009253 -3721.8
## - ActiveFeeding 1 0.00000012 0.00009258 -3721.7
## - LogNfrass 1 0.00000026 0.00009272 -3721.3
## - Mgp 1 0.00000032 0.00009278 -3721.1
## - Instar 1 0.00000045 0.00009291 -3720.8
## <none> 0.00009246 -3720.0
## - LogIntake 1 0.00000101 0.00009347 -3719.2
## - Mass 1 0.00000692 0.00009938 -3703.7
## - WetFrass 1 0.00000933 0.00010179 -3697.7
## - LogCassim 1 0.00002159 0.00011405 -3668.9
## - LogNassim 1 0.00003566 0.00012812 -3639.5
## - Intake 1 0.00003933 0.00013180 -3632.3
## - Nfrass 1 0.00009596 0.00018842 -3541.9
## - DryFrass 1 0.00013210 0.00022457 -3497.5
## - Cassim 1 0.00032884 0.00042130 -3338.3
##
## Step: AIC=-3721.81
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake +
## WetFrass + DryFrass + Cassim + LogCassim + Nfrass + LogNfrass +
## LogNassim
##
## Df Sum of Sq RSS AIC
## - ActiveFeeding 1 0.00000014 0.00009266 -3723.4
## - Mgp 1 0.00000038 0.00009291 -3722.8
## - Instar 1 0.00000040 0.00009293 -3722.7
## <none> 0.00009253 -3721.8
## - LogNfrass 1 0.00000088 0.00009341 -3721.4
## - LogIntake 1 0.00000101 0.00009354 -3721.1
## - Mass 1 0.00000698 0.00009950 -3705.4
## - WetFrass 1 0.00000929 0.00010181 -3699.6
## - LogCassim 1 0.00002188 0.00011441 -3670.1
## - LogNassim 1 0.00003645 0.00012898 -3639.8
## - Intake 1 0.00003947 0.00013199 -3633.9
## - Nfrass 1 0.00009956 0.00019208 -3539.0
## - DryFrass 1 0.00013353 0.00022606 -3497.8
## - Cassim 1 0.00032878 0.00042130 -3340.3
##
## Step: AIC=-3723.44
## Nassim ~ Instar + Mgp + Mass + Intake + LogIntake + WetFrass +
## DryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - Mgp 1 0.00000038 0.00009304 -3724.4
## - Instar 1 0.00000064 0.00009330 -3723.7
## <none> 0.00009266 -3723.4
## - LogNfrass 1 0.00000086 0.00009352 -3723.1
## - LogIntake 1 0.00000089 0.00009356 -3723.0
## - Mass 1 0.00000722 0.00009989 -3706.4
## - WetFrass 1 0.00000915 0.00010181 -3701.6
## - LogCassim 1 0.00002220 0.00011487 -3671.1
## - LogNassim 1 0.00003632 0.00012898 -3641.8
## - Intake 1 0.00003943 0.00013209 -3635.7
## - Nfrass 1 0.00009980 0.00019247 -3540.5
## - DryFrass 1 0.00013359 0.00022625 -3499.6
## - Cassim 1 0.00032891 0.00042157 -3342.1
##
## Step: AIC=-3724.41
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass +
## Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogNfrass 1 0.00000060 0.00009364 -3724.8
## <none> 0.00009304 -3724.4
## - LogIntake 1 0.00000091 0.00009395 -3723.9
## - Instar 1 0.00000115 0.00009420 -3723.3
## - Mass 1 0.00000732 0.00010036 -3707.3
## - WetFrass 1 0.00000909 0.00010214 -3702.8
## - LogCassim 1 0.00002194 0.00011498 -3672.9
## - LogNassim 1 0.00003604 0.00012909 -3643.6
## - Intake 1 0.00003912 0.00013216 -3637.6
## - Nfrass 1 0.00010039 0.00019343 -3541.3
## - DryFrass 1 0.00013495 0.00022799 -3499.7
## - Cassim 1 0.00032968 0.00042272 -3343.5
##
## Step: AIC=-3724.79
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass +
## Cassim + LogCassim + Nfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## <none> 0.00009364 -3724.8
## - LogIntake 1 0.00000200 0.00009564 -3721.4
## - Instar 1 0.00000326 0.00009690 -3718.1
## - Mass 1 0.00000793 0.00010157 -3706.2
## - WetFrass 1 0.00000923 0.00010287 -3703.0
## - LogCassim 1 0.00002229 0.00011593 -3672.8
## - LogNassim 1 0.00003545 0.00012909 -3645.6
## - Intake 1 0.00003853 0.00013217 -3639.6
## - Nfrass 1 0.00010469 0.00019833 -3536.9
## - DryFrass 1 0.00013483 0.00022847 -3501.1
## - Cassim 1 0.00032978 0.00042342 -3345.0
summary(backward_method)
##
## Call:
## lm(formula = Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass +
## DryFrass + Cassim + LogCassim + Nfrass + LogNassim, data = caterpillars_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.742e-03 -1.613e-04 -2.116e-05 1.637e-04 2.704e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.844e-02 2.184e-03 8.443 2.87e-15 ***
## Instar -2.659e-04 9.161e-05 -2.903 0.00404 **
## Mass 1.920e-04 4.242e-05 4.526 9.42e-06 ***
## Intake -6.024e-03 6.037e-04 -9.978 < 2e-16 ***
## LogIntake -2.740e-03 1.205e-03 -2.274 0.02381 *
## WetFrass -1.778e-03 3.640e-04 -4.884 1.89e-06 ***
## DryFrass 7.964e-02 4.267e-03 18.666 < 2e-16 ***
## Cassim 1.901e-01 6.513e-03 29.194 < 2e-16 ***
## LogCassim -1.078e-02 1.420e-03 -7.589 6.93e-13 ***
## Nfrass -8.271e-01 5.028e-02 -16.449 < 2e-16 ***
## LogNassim 1.465e-02 1.530e-03 9.572 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.000622 on 242 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.9987, Adjusted R-squared: 0.9986
## F-statistic: 1.793e+04 on 10 and 242 DF, p-value: < 2.2e-16
This subsection combines both forward and backward selection methods. The stepwise approach considers both adding and removing variables based on their significance. The resulting model’s summary is provided.
stepwise_method <- step(lm_model_full, direction = "both")
## Start: AIC=-3714.18
## Nassim ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + LogMass +
## Intake + LogIntake + WetFrass + LogWetFrass + DryFrass +
## LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - Fgp 1 0.00000000 0.00009239 -3716.2
## - LogMass 1 0.00000002 0.00009241 -3716.1
## - LogWetFrass 1 0.00000002 0.00009242 -3716.1
## - ActiveFeeding 1 0.00000005 0.00009245 -3716.0
## - LogDryFrass 1 0.00000008 0.00009247 -3716.0
## - Instar 1 0.00000011 0.00009250 -3715.9
## - Mgp 1 0.00000022 0.00009261 -3715.6
## - LogNfrass 1 0.00000025 0.00009264 -3715.5
## <none> 0.00009239 -3714.2
## - LogIntake 1 0.00000078 0.00009317 -3714.1
## - Mass 1 0.00000694 0.00009933 -3697.9
## - WetFrass 1 0.00000821 0.00010060 -3694.6
## - LogCassim 1 0.00002034 0.00011273 -3665.8
## - LogNassim 1 0.00003523 0.00012763 -3634.4
## - Intake 1 0.00003883 0.00013122 -3627.4
## - Nfrass 1 0.00009267 0.00018506 -3540.4
## - DryFrass 1 0.00011947 0.00021186 -3506.2
## - Cassim 1 0.00032552 0.00041791 -3334.3
##
## Step: AIC=-3716.18
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake +
## LogIntake + WetFrass + LogWetFrass + DryFrass + LogDryFrass +
## Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogWetFrass 1 0.00000003 0.00009242 -3718.1
## - LogMass 1 0.00000003 0.00009243 -3718.1
## - ActiveFeeding 1 0.00000005 0.00009245 -3718.0
## - LogDryFrass 1 0.00000008 0.00009247 -3718.0
## - Instar 1 0.00000013 0.00009253 -3717.8
## - LogNfrass 1 0.00000025 0.00009264 -3717.5
## - Mgp 1 0.00000032 0.00009271 -3717.3
## <none> 0.00009239 -3716.2
## - LogIntake 1 0.00000080 0.00009319 -3716.0
## + Fgp 1 0.00000000 0.00009239 -3714.2
## - Mass 1 0.00000694 0.00009933 -3699.9
## - WetFrass 1 0.00000833 0.00010072 -3696.3
## - LogCassim 1 0.00002041 0.00011280 -3667.7
## - LogNassim 1 0.00003524 0.00012764 -3636.4
## - Intake 1 0.00003889 0.00013128 -3629.3
## - Nfrass 1 0.00009439 0.00018678 -3540.1
## - DryFrass 1 0.00012175 0.00021415 -3505.5
## - Cassim 1 0.00032651 0.00041891 -3335.7
##
## Step: AIC=-3718.1
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + LogMass + Intake +
## LogIntake + WetFrass + DryFrass + LogDryFrass + Cassim +
## LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogMass 1 0.00000004 0.00009246 -3720.0
## - ActiveFeeding 1 0.00000005 0.00009247 -3720.0
## - LogDryFrass 1 0.00000005 0.00009248 -3720.0
## - Instar 1 0.00000017 0.00009259 -3719.7
## - LogNfrass 1 0.00000024 0.00009266 -3719.4
## - Mgp 1 0.00000033 0.00009275 -3719.2
## <none> 0.00009242 -3718.1
## - LogIntake 1 0.00000082 0.00009324 -3717.9
## + LogWetFrass 1 0.00000003 0.00009239 -3716.2
## + Fgp 1 0.00000000 0.00009242 -3716.1
## - Mass 1 0.00000692 0.00009934 -3701.8
## - WetFrass 1 0.00000902 0.00010144 -3696.5
## - LogCassim 1 0.00002048 0.00011290 -3669.5
## - LogNassim 1 0.00003528 0.00012770 -3638.3
## - Intake 1 0.00003887 0.00013129 -3631.3
## - Nfrass 1 0.00009476 0.00018718 -3541.6
## - DryFrass 1 0.00012173 0.00021416 -3507.5
## - Cassim 1 0.00032669 0.00041911 -3337.6
##
## Step: AIC=-3719.99
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake +
## WetFrass + DryFrass + LogDryFrass + Cassim + LogCassim +
## Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogDryFrass 1 0.00000006 0.00009253 -3721.8
## - ActiveFeeding 1 0.00000012 0.00009258 -3721.7
## - LogNfrass 1 0.00000026 0.00009272 -3721.3
## - Mgp 1 0.00000032 0.00009278 -3721.1
## - Instar 1 0.00000045 0.00009291 -3720.8
## <none> 0.00009246 -3720.0
## - LogIntake 1 0.00000101 0.00009347 -3719.2
## + LogMass 1 0.00000004 0.00009242 -3718.1
## + LogWetFrass 1 0.00000004 0.00009243 -3718.1
## + Fgp 1 0.00000001 0.00009246 -3718.0
## - Mass 1 0.00000692 0.00009938 -3703.7
## - WetFrass 1 0.00000933 0.00010179 -3697.7
## - LogCassim 1 0.00002159 0.00011405 -3668.9
## - LogNassim 1 0.00003566 0.00012812 -3639.5
## - Intake 1 0.00003933 0.00013180 -3632.3
## - Nfrass 1 0.00009596 0.00018842 -3541.9
## - DryFrass 1 0.00013210 0.00022457 -3497.5
## - Cassim 1 0.00032884 0.00042130 -3338.3
##
## Step: AIC=-3721.81
## Nassim ~ Instar + ActiveFeeding + Mgp + Mass + Intake + LogIntake +
## WetFrass + DryFrass + Cassim + LogCassim + Nfrass + LogNfrass +
## LogNassim
##
## Df Sum of Sq RSS AIC
## - ActiveFeeding 1 0.00000014 0.00009266 -3723.4
## - Mgp 1 0.00000038 0.00009291 -3722.8
## - Instar 1 0.00000040 0.00009293 -3722.7
## <none> 0.00009253 -3721.8
## - LogNfrass 1 0.00000088 0.00009341 -3721.4
## - LogIntake 1 0.00000101 0.00009354 -3721.1
## + LogDryFrass 1 0.00000006 0.00009246 -3720.0
## + LogMass 1 0.00000005 0.00009248 -3720.0
## + Fgp 1 0.00000002 0.00009251 -3719.9
## + LogWetFrass 1 0.00000000 0.00009252 -3719.8
## - Mass 1 0.00000698 0.00009950 -3705.4
## - WetFrass 1 0.00000929 0.00010181 -3699.6
## - LogCassim 1 0.00002188 0.00011441 -3670.1
## - LogNassim 1 0.00003645 0.00012898 -3639.8
## - Intake 1 0.00003947 0.00013199 -3633.9
## - Nfrass 1 0.00009956 0.00019208 -3539.0
## - DryFrass 1 0.00013353 0.00022606 -3497.8
## - Cassim 1 0.00032878 0.00042130 -3340.3
##
## Step: AIC=-3723.44
## Nassim ~ Instar + Mgp + Mass + Intake + LogIntake + WetFrass +
## DryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - Mgp 1 0.00000038 0.00009304 -3724.4
## - Instar 1 0.00000064 0.00009330 -3723.7
## <none> 0.00009266 -3723.4
## - LogNfrass 1 0.00000086 0.00009352 -3723.1
## - LogIntake 1 0.00000089 0.00009356 -3723.0
## + LogMass 1 0.00000014 0.00009253 -3721.8
## + ActiveFeeding 1 0.00000014 0.00009253 -3721.8
## + LogDryFrass 1 0.00000008 0.00009258 -3721.7
## + Fgp 1 0.00000007 0.00009259 -3721.6
## + LogWetFrass 1 0.00000000 0.00009266 -3721.4
## - Mass 1 0.00000722 0.00009989 -3706.4
## - WetFrass 1 0.00000915 0.00010181 -3701.6
## - LogCassim 1 0.00002220 0.00011487 -3671.1
## - LogNassim 1 0.00003632 0.00012898 -3641.8
## - Intake 1 0.00003943 0.00013209 -3635.7
## - Nfrass 1 0.00009980 0.00019247 -3540.5
## - DryFrass 1 0.00013359 0.00022625 -3499.6
## - Cassim 1 0.00032891 0.00042157 -3342.1
##
## Step: AIC=-3724.41
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass +
## Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## - LogNfrass 1 0.00000060 0.00009364 -3724.8
## <none> 0.00009304 -3724.4
## - LogIntake 1 0.00000091 0.00009395 -3723.9
## + Mgp 1 0.00000038 0.00009266 -3723.4
## - Instar 1 0.00000115 0.00009420 -3723.3
## + Fgp 1 0.00000025 0.00009279 -3723.1
## + LogDryFrass 1 0.00000015 0.00009289 -3722.8
## + ActiveFeeding 1 0.00000013 0.00009291 -3722.8
## + LogMass 1 0.00000012 0.00009292 -3722.7
## + LogWetFrass 1 0.00000000 0.00009304 -3722.4
## - Mass 1 0.00000732 0.00010036 -3707.3
## - WetFrass 1 0.00000909 0.00010214 -3702.8
## - LogCassim 1 0.00002194 0.00011498 -3672.9
## - LogNassim 1 0.00003604 0.00012909 -3643.6
## - Intake 1 0.00003912 0.00013216 -3637.6
## - Nfrass 1 0.00010039 0.00019343 -3541.3
## - DryFrass 1 0.00013495 0.00022799 -3499.7
## - Cassim 1 0.00032968 0.00042272 -3343.5
##
## Step: AIC=-3724.79
## Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass + DryFrass +
## Cassim + LogCassim + Nfrass + LogNassim
##
## Df Sum of Sq RSS AIC
## <none> 0.00009364 -3724.8
## + LogNfrass 1 0.00000060 0.00009304 -3724.4
## + LogDryFrass 1 0.00000041 0.00009323 -3723.9
## + LogWetFrass 1 0.00000041 0.00009323 -3723.9
## + Mgp 1 0.00000012 0.00009352 -3723.1
## + ActiveFeeding 1 0.00000011 0.00009353 -3723.1
## + LogMass 1 0.00000009 0.00009355 -3723.0
## + Fgp 1 0.00000006 0.00009358 -3722.9
## - LogIntake 1 0.00000200 0.00009564 -3721.4
## - Instar 1 0.00000326 0.00009690 -3718.1
## - Mass 1 0.00000793 0.00010157 -3706.2
## - WetFrass 1 0.00000923 0.00010287 -3703.0
## - LogCassim 1 0.00002229 0.00011593 -3672.8
## - LogNassim 1 0.00003545 0.00012909 -3645.6
## - Intake 1 0.00003853 0.00013217 -3639.6
## - Nfrass 1 0.00010469 0.00019833 -3536.9
## - DryFrass 1 0.00013483 0.00022847 -3501.1
## - Cassim 1 0.00032978 0.00042342 -3345.0
summary(stepwise_method)
##
## Call:
## lm(formula = Nassim ~ Instar + Mass + Intake + LogIntake + WetFrass +
## DryFrass + Cassim + LogCassim + Nfrass + LogNassim, data = caterpillars_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.742e-03 -1.613e-04 -2.116e-05 1.637e-04 2.704e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.844e-02 2.184e-03 8.443 2.87e-15 ***
## Instar -2.659e-04 9.161e-05 -2.903 0.00404 **
## Mass 1.920e-04 4.242e-05 4.526 9.42e-06 ***
## Intake -6.024e-03 6.037e-04 -9.978 < 2e-16 ***
## LogIntake -2.740e-03 1.205e-03 -2.274 0.02381 *
## WetFrass -1.778e-03 3.640e-04 -4.884 1.89e-06 ***
## DryFrass 7.964e-02 4.267e-03 18.666 < 2e-16 ***
## Cassim 1.901e-01 6.513e-03 29.194 < 2e-16 ***
## LogCassim -1.078e-02 1.420e-03 -7.589 6.93e-13 ***
## Nfrass -8.271e-01 5.028e-02 -16.449 < 2e-16 ***
## LogNassim 1.465e-02 1.530e-03 9.572 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.000622 on 242 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.9987, Adjusted R-squared: 0.9986
## F-statistic: 1.793e+04 on 10 and 242 DF, p-value: < 2.2e-16
In this section, the models selected by each method are compared using the AIC (Akaike Information Criterion) and adjusted R-squared values. These metrics help assess the models’ performance and determine which one is the best fit.
# Comparing AIC and Adjusted R-squared for each model
AIC(forward_method, backward_method, stepwise_method)
## Warning in AIC.default(forward_method, backward_method, stepwise_method):
## models are not all fitted to the same number of observations
## df AIC
## forward_method 2 -1358.080
## backward_method 12 -3004.805
## stepwise_method 12 -3004.805
summary(forward_method)$adj.r.squared
## [1] 0
summary(backward_method)$adj.r.squared
## [1] 0.9985965
summary(stepwise_method)$adj.r.squared
## [1] 0.9985965
This section addresses the analysis using a natural-log transformation of the response variable, Nassim. The same model selection methods are applied again to see how the log transformation affects the results.
Here, a new linear model is created using the log-transformed response variable. The forward selection method is applied, and the summary of the resulting model is presented.
lm_model_log <- lm(log(Nassim) ~ 1, data = caterpillars_data)
## Warning in log(Nassim): NaNs produced
forward_log_method <- step(lm_model_log, direction = "forward")
## Start: AIC=77.19
## log(Nassim) ~ 1
summary(forward_log_method)
##
## Call:
## lm(formula = log(Nassim) ~ 1, data = caterpillars_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2130 -0.9864 -0.3005 0.8447 2.2130
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.95933 0.07309 -67.86 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.163 on 252 degrees of freedom
## (14 observations deleted due to missingness)
A full linear model is created with the log-transformed variable, followed by the backward selection method. The summary of the final model is displayed.
lm_model_full_log <- lm(log(Nassim) ~ ., data = caterpillars_data)
## Warning in log(Nassim): NaNs produced
backward_log_method <- step(lm_model_full_log, direction = "backward")
## Start: AIC=-8211.49
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + LogMass +
## Intake + LogIntake + WetFrass + LogWetFrass + DryFrass +
## LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Df Sum of Sq RSS AIC
## - Mass 1 0.0000 0.0000 -8213.5
## - DryFrass 1 0.0000 0.0000 -8213.5
## - WetFrass 1 0.0000 0.0000 -8213.4
## - LogDryFrass 1 0.0000 0.0000 -8213.4
## - Fgp 1 0.0000 0.0000 -8213.3
## - Intake 1 0.0000 0.0000 -8213.2
## - ActiveFeeding 1 0.0000 0.0000 -8213.1
## - Mgp 1 0.0000 0.0000 -8213.1
## - LogMass 1 0.0000 0.0000 -8212.9
## - Cassim 1 0.0000 0.0000 -8212.9
## - LogWetFrass 1 0.0000 0.0000 -8212.7
## - Nfrass 1 0.0000 0.0000 -8212.7
## - LogNfrass 1 0.0000 0.0000 -8212.3
## - Instar 1 0.0000 0.0000 -8212.2
## <none> 0.0000 -8211.5
## - LogIntake 1 0.0000 0.0000 -8205.0
## - LogCassim 1 0.0000 0.0000 -8200.9
## - LogNassim 1 0.8271 0.8271 -1414.0
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8213.49
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass +
## Intake + LogIntake + WetFrass + LogWetFrass + DryFrass +
## LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Df Sum of Sq RSS AIC
## - DryFrass 1 0.00000 0.00000 -8215.5
## - WetFrass 1 0.00000 0.00000 -8215.4
## - LogDryFrass 1 0.00000 0.00000 -8215.4
## - Fgp 1 0.00000 0.00000 -8215.3
## - Intake 1 0.00000 0.00000 -8215.1
## - ActiveFeeding 1 0.00000 0.00000 -8215.1
## - Mgp 1 0.00000 0.00000 -8215.1
## - LogMass 1 0.00000 0.00000 -8214.9
## - Cassim 1 0.00000 0.00000 -8214.8
## - LogWetFrass 1 0.00000 0.00000 -8214.7
## - Nfrass 1 0.00000 0.00000 -8214.7
## - LogNfrass 1 0.00000 0.00000 -8214.3
## - Instar 1 0.00000 0.00000 -8214.2
## <none> 0.00000 -8213.5
## - LogIntake 1 0.00000 0.00000 -8206.3
## - LogCassim 1 0.00000 0.00000 -8202.6
## - LogNassim 1 0.93154 0.93154 -1385.9
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8215.48
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass +
## Intake + LogIntake + WetFrass + LogWetFrass + LogDryFrass +
## Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Df Sum of Sq RSS AIC
## - WetFrass 1 0.00000 0.00000 -8217.4
## - LogDryFrass 1 0.00000 0.00000 -8217.4
## - Fgp 1 0.00000 0.00000 -8217.3
## - ActiveFeeding 1 0.00000 0.00000 -8217.0
## - Mgp 1 0.00000 0.00000 -8217.0
## - LogMass 1 0.00000 0.00000 -8216.9
## - LogWetFrass 1 0.00000 0.00000 -8216.7
## - Nfrass 1 0.00000 0.00000 -8216.6
## - LogNfrass 1 0.00000 0.00000 -8216.2
## - Instar 1 0.00000 0.00000 -8216.2
## - Intake 1 0.00000 0.00000 -8216.2
## - Cassim 1 0.00000 0.00000 -8215.8
## <none> 0.00000 -8215.5
## - LogIntake 1 0.00000 0.00000 -8207.6
## - LogCassim 1 0.00000 0.00000 -8204.2
## - LogNassim 1 0.93208 0.93208 -1387.7
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8217.4
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass +
## Intake + LogIntake + LogWetFrass + LogDryFrass + Cassim +
## LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Df Sum of Sq RSS AIC
## - LogDryFrass 1 0.00000 0.00000 -8219.3
## - Fgp 1 0.00000 0.00000 -8219.2
## - ActiveFeeding 1 0.00000 0.00000 -8219.0
## - Mgp 1 0.00000 0.00000 -8218.9
## - LogMass 1 0.00000 0.00000 -8218.8
## - LogWetFrass 1 0.00000 0.00000 -8218.4
## - LogNfrass 1 0.00000 0.00000 -8218.2
## - Intake 1 0.00000 0.00000 -8218.1
## - Nfrass 1 0.00000 0.00000 -8218.1
## - Instar 1 0.00000 0.00000 -8218.0
## - Cassim 1 0.00000 0.00000 -8217.7
## <none> 0.00000 -8217.4
## - LogIntake 1 0.00000 0.00000 -8209.5
## - LogCassim 1 0.00000 0.00000 -8205.8
## - LogNassim 1 0.94437 0.94437 -1386.4
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8219.27
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass +
## Intake + LogIntake + LogWetFrass + Cassim + LogCassim + Nfrass +
## LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Df Sum of Sq RSS AIC
## - Fgp 1 0.00000 0.00000 -8221.1
## - ActiveFeeding 1 0.00000 0.00000 -8220.8
## - Mgp 1 0.00000 0.00000 -8220.8
## - LogMass 1 0.00000 0.00000 -8220.7
## - LogWetFrass 1 0.00000 0.00000 -8220.4
## - Nfrass 1 0.00000 0.00000 -8220.1
## - Intake 1 0.00000 0.00000 -8220.0
## - Instar 1 0.00000 0.00000 -8219.8
## - Cassim 1 0.00000 0.00000 -8219.7
## <none> 0.00000 -8219.3
## - LogNfrass 1 0.00000 0.00000 -8218.6
## - LogIntake 1 0.00000 0.00000 -8211.5
## - LogCassim 1 0.00000 0.00000 -8207.4
## - LogNassim 1 0.97395 0.97395 -1380.6
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8221.07
## log(Nassim) ~ Instar + ActiveFeeding + Mgp + LogMass + Intake +
## LogIntake + LogWetFrass + Cassim + LogCassim + Nfrass + LogNfrass +
## LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Df Sum of Sq RSS AIC
## - LogMass 1 0.00000 0.00000 -8222.7
## - ActiveFeeding 1 0.00000 0.00000 -8222.6
## - LogWetFrass 1 0.00000 0.00000 -8222.3
## - Nfrass 1 0.00000 0.00000 -8222.0
## - Intake 1 0.00000 0.00000 -8221.9
## - Cassim 1 0.00000 0.00000 -8221.6
## - Mgp 1 0.00000 0.00000 -8221.6
## <none> 0.00000 -8221.1
## - Instar 1 0.00000 0.00000 -8220.9
## - LogNfrass 1 0.00000 0.00000 -8220.1
## - LogIntake 1 0.00000 0.00000 -8212.3
## - LogCassim 1 0.00000 0.00000 -8208.7
## - LogNassim 1 0.97399 0.97399 -1382.6
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8222.72
## log(Nassim) ~ Instar + ActiveFeeding + Mgp + Intake + LogIntake +
## LogWetFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Df Sum of Sq RSS AIC
## - ActiveFeeding 1 0.000 0.000 -8224.5
## - LogWetFrass 1 0.000 0.000 -8224.1
## - Nfrass 1 0.000 0.000 -8223.3
## - Mgp 1 0.000 0.000 -8223.1
## - Intake 1 0.000 0.000 -8223.0
## <none> 0.000 -8222.7
## - Cassim 1 0.000 0.000 -8222.7
## - LogNfrass 1 0.000 0.000 -8221.5
## - Instar 1 0.000 0.000 -8220.0
## - LogIntake 1 0.000 0.000 -8211.7
## - LogCassim 1 0.000 0.000 -8209.1
## - LogNassim 1 0.976 0.976 -1384.1
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8224.55
## log(Nassim) ~ Instar + Mgp + Intake + LogIntake + LogWetFrass +
## Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Df Sum of Sq RSS AIC
## - LogWetFrass 1 0.00000 0.00000 -8226.0
## - Nfrass 1 0.00000 0.00000 -8225.3
## - Intake 1 0.00000 0.00000 -8224.9
## - Mgp 1 0.00000 0.00000 -8224.9
## - Cassim 1 0.00000 0.00000 -8224.6
## <none> 0.00000 -8224.5
## - LogNfrass 1 0.00000 0.00000 -8223.3
## - Instar 1 0.00000 0.00000 -8221.9
## - LogIntake 1 0.00000 0.00000 -8213.2
## - LogCassim 1 0.00000 0.00000 -8210.6
## - LogNassim 1 0.98129 0.98129 -1384.7
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8225.97
## log(Nassim) ~ Instar + Mgp + Intake + LogIntake + Cassim + LogCassim +
## Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Df Sum of Sq RSS AIC
## - Nfrass 1 0.00000 0.00000 -8226.9
## - Intake 1 0.00000 0.00000 -8226.5
## - Mgp 1 0.00000 0.00000 -8226.4
## - Cassim 1 0.00000 0.00000 -8226.1
## <none> 0.00000 -8226.0
## - Instar 1 0.00000 0.00000 -8223.9
## - LogIntake 1 0.00000 0.00000 -8215.1
## - LogCassim 1 0.00000 0.00000 -8212.6
## - LogNfrass 1 0.00000 0.00000 -8212.3
## - LogNassim 1 0.99426 0.99426 -1383.4
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8226.95
## log(Nassim) ~ Instar + Mgp + Intake + LogIntake + Cassim + LogCassim +
## LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Df Sum of Sq RSS AIC
## - Intake 1 0.000 0.000 -8228.4
## - Mgp 1 0.000 0.000 -8227.8
## - Cassim 1 0.000 0.000 -8227.8
## <none> 0.000 -8226.9
## - Instar 1 0.000 0.000 -8225.3
## - LogIntake 1 0.000 0.000 -8215.8
## - LogNfrass 1 0.000 0.000 -8214.2
## - LogCassim 1 0.000 0.000 -8213.7
## - LogNassim 1 1.643 1.643 -1258.3
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8228.38
## log(Nassim) ~ Instar + Mgp + LogIntake + Cassim + LogCassim +
## LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Df Sum of Sq RSS AIC
## - Mgp 1 0.0000 0.0000 -8229.0
## <none> 0.0000 -8228.4
## - Cassim 1 0.0000 0.0000 -8227.0
## - Instar 1 0.0000 0.0000 -8225.8
## - LogNfrass 1 0.0000 0.0000 -8215.5
## - LogCassim 1 0.0000 0.0000 -8215.2
## - LogIntake 1 0.0000 0.0000 -8215.0
## - LogNassim 1 2.3981 2.3981 -1164.7
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8229
## log(Nassim) ~ Instar + LogIntake + Cassim + LogCassim + LogNfrass +
## LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Df Sum of Sq RSS AIC
## <none> 0.0000 -8229.0
## - Cassim 1 0.0000 0.0000 -8227.6
## - Instar 1 0.0000 0.0000 -8227.5
## - LogNfrass 1 0.0000 0.0000 -8217.5
## - LogIntake 1 0.0000 0.0000 -8216.8
## - LogCassim 1 0.0000 0.0000 -8216.1
## - LogNassim 1 2.4693 2.4693 -1159.3
summary(backward_log_method)
##
## Call:
## lm(formula = log(Nassim) ~ Instar + LogIntake + Cassim + LogCassim +
## LogNfrass + LogNassim, data = caterpillars_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.363e-07 -2.245e-08 -2.660e-09 2.636e-08 3.020e-07
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.517e-07 2.155e-07 3.488e+00 0.000576 ***
## Instar -2.714e-08 1.465e-08 -1.852e+00 0.065170 .
## LogIntake -5.023e-07 1.332e-07 -3.771e+00 0.000203 ***
## Cassim -1.852e-07 1.017e-07 -1.820e+00 0.069901 .
## LogCassim 6.949e-07 1.796e-07 3.870e+00 0.000140 ***
## LogNfrass 8.287e-08 2.259e-08 3.669e+00 0.000298 ***
## LogNassim 2.303e+00 1.251e-07 1.841e+07 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.535e-08 on 246 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 7.792e+15 on 6 and 246 DF, p-value: < 2.2e-16
We apply the stepwise method to the model with the log-transformed response variable. The summary of the resulting model is also shown.
stepwise_log_method <- step(lm_model_full_log, direction = "both")
## Start: AIC=-8211.49
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + Mass + LogMass +
## Intake + LogIntake + WetFrass + LogWetFrass + DryFrass +
## LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Df Sum of Sq RSS AIC
## - Mass 1 0.0000 0.0000 -8213.5
## - DryFrass 1 0.0000 0.0000 -8213.5
## - WetFrass 1 0.0000 0.0000 -8213.4
## - LogDryFrass 1 0.0000 0.0000 -8213.4
## - Fgp 1 0.0000 0.0000 -8213.3
## - Intake 1 0.0000 0.0000 -8213.2
## - ActiveFeeding 1 0.0000 0.0000 -8213.1
## - Mgp 1 0.0000 0.0000 -8213.1
## - LogMass 1 0.0000 0.0000 -8212.9
## - Cassim 1 0.0000 0.0000 -8212.9
## - LogWetFrass 1 0.0000 0.0000 -8212.7
## - Nfrass 1 0.0000 0.0000 -8212.7
## - LogNfrass 1 0.0000 0.0000 -8212.3
## - Instar 1 0.0000 0.0000 -8212.2
## <none> 0.0000 -8211.5
## - LogIntake 1 0.0000 0.0000 -8205.0
## - LogCassim 1 0.0000 0.0000 -8200.9
## - LogNassim 1 0.8271 0.8271 -1414.0
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8213.49
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass +
## Intake + LogIntake + WetFrass + LogWetFrass + DryFrass +
## LogDryFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
## Df Sum of Sq RSS AIC
## - DryFrass 1 0.00000 0.00000 -8215.5
## - WetFrass 1 0.00000 0.00000 -8215.4
## - LogDryFrass 1 0.00000 0.00000 -8215.4
## - Fgp 1 0.00000 0.00000 -8215.3
## - Intake 1 0.00000 0.00000 -8215.1
## - ActiveFeeding 1 0.00000 0.00000 -8215.1
## - Mgp 1 0.00000 0.00000 -8215.1
## - LogMass 1 0.00000 0.00000 -8214.9
## - Cassim 1 0.00000 0.00000 -8214.8
## - LogWetFrass 1 0.00000 0.00000 -8214.7
## - Nfrass 1 0.00000 0.00000 -8214.7
## - LogNfrass 1 0.00000 0.00000 -8214.3
## - Instar 1 0.00000 0.00000 -8214.2
## <none> 0.00000 -8213.5
## + Mass 1 0.00000 0.00000 -8211.5
## - LogIntake 1 0.00000 0.00000 -8206.3
## - LogCassim 1 0.00000 0.00000 -8202.6
## - LogNassim 1 0.93154 0.93154 -1385.9
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8215.48
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass +
## Intake + LogIntake + WetFrass + LogWetFrass + LogDryFrass +
## Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
## Df Sum of Sq RSS AIC
## - WetFrass 1 0.00000 0.00000 -8217.4
## - LogDryFrass 1 0.00000 0.00000 -8217.4
## - Fgp 1 0.00000 0.00000 -8217.3
## - ActiveFeeding 1 0.00000 0.00000 -8217.0
## - Mgp 1 0.00000 0.00000 -8217.0
## - LogMass 1 0.00000 0.00000 -8216.9
## - LogWetFrass 1 0.00000 0.00000 -8216.7
## - Nfrass 1 0.00000 0.00000 -8216.6
## - LogNfrass 1 0.00000 0.00000 -8216.2
## - Instar 1 0.00000 0.00000 -8216.2
## - Intake 1 0.00000 0.00000 -8216.2
## - Cassim 1 0.00000 0.00000 -8215.8
## <none> 0.00000 -8215.5
## + DryFrass 1 0.00000 0.00000 -8213.5
## + Mass 1 0.00000 0.00000 -8213.5
## - LogIntake 1 0.00000 0.00000 -8207.6
## - LogCassim 1 0.00000 0.00000 -8204.2
## - LogNassim 1 0.93208 0.93208 -1387.7
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8217.4
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass +
## Intake + LogIntake + LogWetFrass + LogDryFrass + Cassim +
## LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
## Df Sum of Sq RSS AIC
## - LogDryFrass 1 0.00000 0.00000 -8219.3
## - Fgp 1 0.00000 0.00000 -8219.2
## - ActiveFeeding 1 0.00000 0.00000 -8219.0
## - Mgp 1 0.00000 0.00000 -8218.9
## - LogMass 1 0.00000 0.00000 -8218.8
## - LogWetFrass 1 0.00000 0.00000 -8218.4
## - LogNfrass 1 0.00000 0.00000 -8218.2
## - Intake 1 0.00000 0.00000 -8218.1
## - Nfrass 1 0.00000 0.00000 -8218.1
## - Instar 1 0.00000 0.00000 -8218.0
## - Cassim 1 0.00000 0.00000 -8217.7
## <none> 0.00000 -8217.4
## + WetFrass 1 0.00000 0.00000 -8215.5
## + DryFrass 1 0.00000 0.00000 -8215.4
## + Mass 1 0.00000 0.00000 -8215.4
## - LogIntake 1 0.00000 0.00000 -8209.5
## - LogCassim 1 0.00000 0.00000 -8205.8
## - LogNassim 1 0.94437 0.94437 -1386.4
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8219.27
## log(Nassim) ~ Instar + ActiveFeeding + Fgp + Mgp + LogMass +
## Intake + LogIntake + LogWetFrass + Cassim + LogCassim + Nfrass +
## LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
## Df Sum of Sq RSS AIC
## - Fgp 1 0.00000 0.00000 -8221.1
## - ActiveFeeding 1 0.00000 0.00000 -8220.8
## - Mgp 1 0.00000 0.00000 -8220.8
## - LogMass 1 0.00000 0.00000 -8220.7
## - LogWetFrass 1 0.00000 0.00000 -8220.4
## - Nfrass 1 0.00000 0.00000 -8220.1
## - Intake 1 0.00000 0.00000 -8220.0
## - Instar 1 0.00000 0.00000 -8219.8
## - Cassim 1 0.00000 0.00000 -8219.7
## <none> 0.00000 -8219.3
## - LogNfrass 1 0.00000 0.00000 -8218.6
## + LogDryFrass 1 0.00000 0.00000 -8217.4
## + WetFrass 1 0.00000 0.00000 -8217.4
## + Mass 1 0.00000 0.00000 -8217.3
## + DryFrass 1 0.00000 0.00000 -8217.3
## - LogIntake 1 0.00000 0.00000 -8211.5
## - LogCassim 1 0.00000 0.00000 -8207.4
## - LogNassim 1 0.97395 0.97395 -1380.6
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8221.07
## log(Nassim) ~ Instar + ActiveFeeding + Mgp + LogMass + Intake +
## LogIntake + LogWetFrass + Cassim + LogCassim + Nfrass + LogNfrass +
## LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
## Df Sum of Sq RSS AIC
## - LogMass 1 0.00000 0.00000 -8222.7
## - ActiveFeeding 1 0.00000 0.00000 -8222.6
## - LogWetFrass 1 0.00000 0.00000 -8222.3
## - Nfrass 1 0.00000 0.00000 -8222.0
## - Intake 1 0.00000 0.00000 -8221.9
## - Cassim 1 0.00000 0.00000 -8221.6
## - Mgp 1 0.00000 0.00000 -8221.6
## <none> 0.00000 -8221.1
## - Instar 1 0.00000 0.00000 -8220.9
## - LogNfrass 1 0.00000 0.00000 -8220.1
## + Fgp 1 0.00000 0.00000 -8219.3
## + LogDryFrass 1 0.00000 0.00000 -8219.2
## + WetFrass 1 0.00000 0.00000 -8219.2
## + Mass 1 0.00000 0.00000 -8219.1
## + DryFrass 1 0.00000 0.00000 -8219.1
## - LogIntake 1 0.00000 0.00000 -8212.3
## - LogCassim 1 0.00000 0.00000 -8208.7
## - LogNassim 1 0.97399 0.97399 -1382.6
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8222.72
## log(Nassim) ~ Instar + ActiveFeeding + Mgp + Intake + LogIntake +
## LogWetFrass + Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
## Df Sum of Sq RSS AIC
## - ActiveFeeding 1 0.000 0.000 -8224.5
## - LogWetFrass 1 0.000 0.000 -8224.1
## - Nfrass 1 0.000 0.000 -8223.3
## - Mgp 1 0.000 0.000 -8223.1
## - Intake 1 0.000 0.000 -8223.0
## <none> 0.000 -8222.7
## - Cassim 1 0.000 0.000 -8222.7
## - LogNfrass 1 0.000 0.000 -8221.5
## + LogMass 1 0.000 0.000 -8221.1
## + WetFrass 1 0.000 0.000 -8220.8
## + LogDryFrass 1 0.000 0.000 -8220.8
## + DryFrass 1 0.000 0.000 -8220.8
## + Fgp 1 0.000 0.000 -8220.7
## + Mass 1 0.000 0.000 -8220.7
## - Instar 1 0.000 0.000 -8220.0
## - LogIntake 1 0.000 0.000 -8211.7
## - LogCassim 1 0.000 0.000 -8209.1
## - LogNassim 1 0.976 0.976 -1384.1
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8224.55
## log(Nassim) ~ Instar + Mgp + Intake + LogIntake + LogWetFrass +
## Cassim + LogCassim + Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
## Df Sum of Sq RSS AIC
## - LogWetFrass 1 0.00000 0.00000 -8226.0
## - Nfrass 1 0.00000 0.00000 -8225.3
## - Intake 1 0.00000 0.00000 -8224.9
## - Mgp 1 0.00000 0.00000 -8224.9
## - Cassim 1 0.00000 0.00000 -8224.6
## <none> 0.00000 -8224.5
## - LogNfrass 1 0.00000 0.00000 -8223.3
## + ActiveFeeding 1 0.00000 0.00000 -8222.7
## + LogMass 1 0.00000 0.00000 -8222.6
## + LogDryFrass 1 0.00000 0.00000 -8222.6
## + DryFrass 1 0.00000 0.00000 -8222.6
## + WetFrass 1 0.00000 0.00000 -8222.6
## + Mass 1 0.00000 0.00000 -8222.6
## + Fgp 1 0.00000 0.00000 -8222.6
## - Instar 1 0.00000 0.00000 -8221.9
## - LogIntake 1 0.00000 0.00000 -8213.2
## - LogCassim 1 0.00000 0.00000 -8210.6
## - LogNassim 1 0.98129 0.98129 -1384.7
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8225.97
## log(Nassim) ~ Instar + Mgp + Intake + LogIntake + Cassim + LogCassim +
## Nfrass + LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
## Df Sum of Sq RSS AIC
## - Nfrass 1 0.00000 0.00000 -8226.9
## - Intake 1 0.00000 0.00000 -8226.5
## - Mgp 1 0.00000 0.00000 -8226.4
## - Cassim 1 0.00000 0.00000 -8226.1
## <none> 0.00000 -8226.0
## + LogWetFrass 1 0.00000 0.00000 -8224.5
## + WetFrass 1 0.00000 0.00000 -8224.2
## + ActiveFeeding 1 0.00000 0.00000 -8224.1
## + DryFrass 1 0.00000 0.00000 -8224.1
## + Mass 1 0.00000 0.00000 -8224.1
## + LogMass 1 0.00000 0.00000 -8224.0
## + Fgp 1 0.00000 0.00000 -8224.0
## + LogDryFrass 1 0.00000 0.00000 -8224.0
## - Instar 1 0.00000 0.00000 -8223.9
## - LogIntake 1 0.00000 0.00000 -8215.1
## - LogCassim 1 0.00000 0.00000 -8212.6
## - LogNfrass 1 0.00000 0.00000 -8212.3
## - LogNassim 1 0.99426 0.99426 -1383.4
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8226.95
## log(Nassim) ~ Instar + Mgp + Intake + LogIntake + Cassim + LogCassim +
## LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
## Df Sum of Sq RSS AIC
## - Intake 1 0.000 0.000 -8228.4
## - Mgp 1 0.000 0.000 -8227.8
## - Cassim 1 0.000 0.000 -8227.8
## <none> 0.000 -8226.9
## + Nfrass 1 0.000 0.000 -8226.0
## + LogWetFrass 1 0.000 0.000 -8225.3
## - Instar 1 0.000 0.000 -8225.3
## + LogMass 1 0.000 0.000 -8225.3
## + WetFrass 1 0.000 0.000 -8225.3
## + Fgp 1 0.000 0.000 -8225.2
## + DryFrass 1 0.000 0.000 -8225.2
## + Mass 1 0.000 0.000 -8225.1
## + LogDryFrass 1 0.000 0.000 -8225.0
## + ActiveFeeding 1 0.000 0.000 -8225.0
## - LogIntake 1 0.000 0.000 -8215.8
## - LogNfrass 1 0.000 0.000 -8214.2
## - LogCassim 1 0.000 0.000 -8213.7
## - LogNassim 1 1.643 1.643 -1258.3
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8228.38
## log(Nassim) ~ Instar + Mgp + LogIntake + Cassim + LogCassim +
## LogNfrass + LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
## Df Sum of Sq RSS AIC
## - Mgp 1 0.0000 0.0000 -8229.0
## <none> 0.0000 -8228.4
## - Cassim 1 0.0000 0.0000 -8227.0
## + Intake 1 0.0000 0.0000 -8226.9
## + LogMass 1 0.0000 0.0000 -8226.8
## + LogWetFrass 1 0.0000 0.0000 -8226.7
## + Fgp 1 0.0000 0.0000 -8226.7
## + DryFrass 1 0.0000 0.0000 -8226.6
## + Mass 1 0.0000 0.0000 -8226.5
## + Nfrass 1 0.0000 0.0000 -8226.5
## + LogDryFrass 1 0.0000 0.0000 -8226.5
## + ActiveFeeding 1 0.0000 0.0000 -8226.4
## + WetFrass 1 0.0000 0.0000 -8226.4
## - Instar 1 0.0000 0.0000 -8225.8
## - LogNfrass 1 0.0000 0.0000 -8215.5
## - LogCassim 1 0.0000 0.0000 -8215.2
## - LogIntake 1 0.0000 0.0000 -8215.0
## - LogNassim 1 2.3981 2.3981 -1164.7
## Warning in log(Nassim): NaNs produced
##
## Step: AIC=-8229
## log(Nassim) ~ Instar + LogIntake + Cassim + LogCassim + LogNfrass +
## LogNassim
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning: attempting model selection on an essentially perfect fit is nonsense
## Warning in log(Nassim): NaNs produced
## Df Sum of Sq RSS AIC
## <none> 0.0000 -8229.0
## + Mgp 1 0.0000 0.0000 -8228.4
## + Intake 1 0.0000 0.0000 -8227.8
## - Cassim 1 0.0000 0.0000 -8227.6
## - Instar 1 0.0000 0.0000 -8227.5
## + LogMass 1 0.0000 0.0000 -8227.5
## + DryFrass 1 0.0000 0.0000 -8227.5
## + LogWetFrass 1 0.0000 0.0000 -8227.3
## + ActiveFeeding 1 0.0000 0.0000 -8227.1
## + Mass 1 0.0000 0.0000 -8227.0
## + WetFrass 1 0.0000 0.0000 -8227.0
## + Nfrass 1 0.0000 0.0000 -8227.0
## + LogDryFrass 1 0.0000 0.0000 -8227.0
## + Fgp 1 0.0000 0.0000 -8227.0
## - LogNfrass 1 0.0000 0.0000 -8217.5
## - LogIntake 1 0.0000 0.0000 -8216.8
## - LogCassim 1 0.0000 0.0000 -8216.1
## - LogNassim 1 2.4693 2.4693 -1159.3
summary(stepwise_log_method)
##
## Call:
## lm(formula = log(Nassim) ~ Instar + LogIntake + Cassim + LogCassim +
## LogNfrass + LogNassim, data = caterpillars_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.363e-07 -2.245e-08 -2.660e-09 2.636e-08 3.020e-07
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.517e-07 2.155e-07 3.488e+00 0.000576 ***
## Instar -2.714e-08 1.465e-08 -1.852e+00 0.065170 .
## LogIntake -5.023e-07 1.332e-07 -3.771e+00 0.000203 ***
## Cassim -1.852e-07 1.017e-07 -1.820e+00 0.069901 .
## LogCassim 6.949e-07 1.796e-07 3.870e+00 0.000140 ***
## LogNfrass 8.287e-08 2.259e-08 3.669e+00 0.000298 ***
## LogNassim 2.303e+00 1.251e-07 1.841e+07 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.535e-08 on 246 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 7.792e+15 on 6 and 246 DF, p-value: < 2.2e-16
In this section, the models obtained from the log-transformed response variable are compared using AIC and adjusted R-squared values, similar to the previous comparison.
# Comparing AIC and Adjusted R-squared for each log-transformed model
AIC(forward_log_method, backward_log_method, stepwise_log_method)
## df AIC
## forward_log_method 2 797.1759
## backward_log_method 8 -7509.0202
## stepwise_log_method 8 -7509.0202
summary(forward_log_method)$adj.r.squared
## [1] 0
summary(backward_log_method)$adj.r.squared
## [1] 1
summary(stepwise_log_method)$adj.r.squared
## [1] 1
In this analysis, we used multiple linear regression to explore how different factors affect the response variable, Nassim, in the Caterpillars dataset. By applying forward, backward, and stepwise selection methods, we identified the most important predictors. Comparing the models showed that their effectiveness varied based on adjusted R-squared values and AIC. Using a natural-log transformation on Nassim often improved the models’ performance. Overall, the stepwise selection method provided the best balance between complexity and accuracy, highlighting the value of these techniques in understanding the relationships in the data.