With our newfound understanding of the relationship between honey produced per colony and the number of colonies, we are no closer to answering our ultimate question: do we really rely on the bees?
There are three ways we can perform the regression analysis: [1] the full model, [2] the backward elimination model, and [3] the forward selection model.
We need to load the data set that we will use for this analysis first (of course), which came from our main analysis file Honey, I Shrunk the Bees and exported to GitHub:
HoneyC_FullModel <- read.csv("https://raw.githubusercontent.com/chrisgmartin/HoneyIShrunkTheBees/master/Appendix/HoneyC_FullModel.csv", header = TRUE)
Another data set we will need comes from the City University of New York’s Masters of Data Analytics Program (CUNY MSDA Program) course IS 606 - Statistics and Probability for Data Analytics. These custom files assist in the linear regression as it includes a very hand function:
load(url("https://github.com/chrisgmartin/HoneyIShrunkTheBees/raw/master/Additional/nc.RData"))
Simply enough, each .CSV was imported into a schema (titled HISP) using the MySQL data import wizard.
The full model uses every available explanatory variable to attempt to predict the variable at hand. As our state dataset includes an intimidating 108 different variables, using the full model is an incredibly messy task. For ease, we’ll switch to the county dataset which includes “only” 21 DataItem’s, remove all incomplete cases, and look at total production measured in lb’s (rather than production per colony. Note that this case requires us to force null values to 0 despite my previous warnings not to do so, after removing cases with incomplete results (N/A’s or Null Values) for Honey Produced because there are simply not enough data points without null values in one of the many columns (there are zero, in fact):
#return only complete cases
HoneyC_FullModel <- HoneyC_FullModel[complete.cases(HoneyC_FullModel[,13]),]
HoneyC_FullModel[is.na(HoneyC_FullModel)] <- 0
m_full <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + CropOrgOps + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColOps + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_full)
##
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales +
## CropOrgOps + CropOrgSales + FertExp + FertOps + HoneyOpsProd +
## HoneyOpsSales + HoneySales + HoneyColOps + HoneyColSales +
## HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres +
## HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -718898 -49324 -27017 2728 2106250
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.921e+04 2.069e+04 -0.928 0.35335
## ChemExp 2.938e-03 3.895e-04 7.542 5.22e-14 ***
## ChemOps 2.137e+02 2.406e+01 8.881 < 2e-16 ***
## CropOps 4.327e+01 1.724e+01 2.510 0.01209 *
## CropSales 3.304e+03 1.364e+03 2.422 0.01545 *
## CropOrgOps -1.049e+02 2.520e+02 -0.417 0.67705
## CropOrgSales 1.311e-02 1.852e-03 7.082 1.56e-12 ***
## FertExp -1.826e-03 3.947e-04 -4.627 3.78e-06 ***
## FertOps -2.023e+02 1.984e+01 -10.193 < 2e-16 ***
## HoneyOpsProd 3.263e+03 3.789e+02 8.610 < 2e-16 ***
## HoneyOpsSales -7.302e+03 5.190e+02 -14.070 < 2e-16 ***
## HoneySales 4.914e-01 8.573e-03 57.325 < 2e-16 ***
## HoneyColOps -7.638e+02 2.500e+03 -0.306 0.75999
## HoneyColSales 2.603e+02 1.877e+01 13.869 < 2e-16 ***
## HortExcTVSTOps -2.331e+02 7.446e+01 -3.130 0.00175 **
## HortExcTVSTSales 2.162e-04 1.111e-04 1.945 0.05181 .
## HortExcTAcres -4.309e+02 2.002e+02 -2.153 0.03135 *
## HortExcTIrgAcres -1.714e+01 6.103e+00 -2.809 0.00498 **
## HortExcTIrgOps 3.256e+01 8.003e+00 4.068 4.79e-05 ***
## HortUndProt 1.635e+03 3.627e+02 4.507 6.69e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146100 on 7078 degrees of freedom
## Multiple R-squared: 0.4337, Adjusted R-squared: 0.4322
## F-statistic: 285.3 on 19 and 7078 DF, p-value: < 2.2e-16
This full model can give us a model that is simply too cumbersome for casual output, but the main takeaway can be our Adjusted R-squared of 0.4322. This value we can use to compare how effective the model is likely to be against our other potential models: backward elimination and forward selection.
With backward elimination, we start with the full model and move backward by eliminating those variables with the lowest Adjusted R-squared until we get to our maximized Adjusted R-squared model. From the full model, we saw that explantory variable with a high Pr(>|t|) is CropOrgOps: Crop Organizations with Operations. We’ll start by eliminating that from our model first:
m_backward <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColOps + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_backward)
##
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales +
## CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales +
## HoneySales + HoneyColOps + HoneyColSales + HortExcTVSTOps +
## HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps +
## HortUndProt, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -713343 -49350 -26918 2774 2105027
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.903e+04 2.069e+04 -0.920 0.357707
## ChemExp 2.939e-03 3.895e-04 7.546 5.06e-14 ***
## ChemOps 2.149e+02 2.390e+01 8.991 < 2e-16 ***
## CropOps 4.199e+01 1.696e+01 2.476 0.013324 *
## CropSales 3.299e+03 1.364e+03 2.418 0.015610 *
## CropOrgSales 1.270e-02 1.572e-03 8.083 7.40e-16 ***
## FertExp -1.820e-03 3.944e-04 -4.614 4.01e-06 ***
## FertOps -2.026e+02 1.983e+01 -10.216 < 2e-16 ***
## HoneyOpsProd 3.241e+03 3.754e+02 8.634 < 2e-16 ***
## HoneyOpsSales -7.247e+03 5.015e+02 -14.450 < 2e-16 ***
## HoneySales 4.914e-01 8.571e-03 57.328 < 2e-16 ***
## HoneyColOps -8.194e+02 2.496e+03 -0.328 0.742731
## HoneyColSales 2.601e+02 1.876e+01 13.864 < 2e-16 ***
## HortExcTVSTOps -2.411e+02 7.195e+01 -3.350 0.000812 ***
## HortExcTVSTSales 2.185e-04 1.110e-04 1.969 0.048987 *
## HortExcTAcres -4.150e+02 1.964e+02 -2.112 0.034683 *
## HortExcTIrgAcres -1.719e+01 6.101e+00 -2.817 0.004858 **
## HortExcTIrgOps 3.260e+01 8.002e+00 4.074 4.67e-05 ***
## HortUndProt 1.625e+03 3.620e+02 4.490 7.24e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146100 on 7079 degrees of freedom
## Multiple R-squared: 0.4337, Adjusted R-squared: 0.4322
## F-statistic: 301.2 on 18 and 7079 DF, p-value: < 2.2e-16
There was no change to the Adjusted R-squared so we can move foward without just as simply as we can with it. Let’s remove another one, HoneyColOps: the number of Honey Colonies with Operations.
m_backward <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_backward)
##
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales +
## CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales +
## HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales +
## HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt,
## data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -712524 -49322 -26912 2842 2104209
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.893e+04 2.068e+04 -0.915 0.360125
## ChemExp 2.940e-03 3.894e-04 7.550 4.92e-14 ***
## ChemOps 2.154e+02 2.385e+01 9.030 < 2e-16 ***
## CropOps 4.217e+01 1.695e+01 2.488 0.012886 *
## CropSales 3.293e+03 1.364e+03 2.415 0.015766 *
## CropOrgSales 1.271e-02 1.572e-03 8.087 7.12e-16 ***
## FertExp -1.821e-03 3.943e-04 -4.619 3.92e-06 ***
## FertOps -2.032e+02 1.973e+01 -10.296 < 2e-16 ***
## HoneyOpsProd 3.206e+03 3.597e+02 8.914 < 2e-16 ***
## HoneyOpsSales -7.200e+03 4.809e+02 -14.971 < 2e-16 ***
## HoneySales 4.913e-01 8.570e-03 57.334 < 2e-16 ***
## HoneyColSales 2.580e+02 1.761e+01 14.654 < 2e-16 ***
## HortExcTVSTOps -2.420e+02 7.189e+01 -3.367 0.000764 ***
## HortExcTVSTSales 2.195e-04 1.109e-04 1.978 0.047959 *
## HortExcTAcres -4.177e+02 1.963e+02 -2.128 0.033337 *
## HortExcTIrgAcres -1.723e+01 6.100e+00 -2.824 0.004759 **
## HortExcTIrgOps 3.262e+01 8.001e+00 4.076 4.62e-05 ***
## HortUndProt 1.632e+03 3.614e+02 4.516 6.40e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146100 on 7080 degrees of freedom
## Multiple R-squared: 0.4337, Adjusted R-squared: 0.4323
## F-statistic: 318.9 on 17 and 7080 DF, p-value: < 2.2e-16
Our Adjusted R-squared increased ever so slightly, but increased none-the-less so we’ll keep it out. Let’s remove another one, HortExcTVSTSales: Horticulture Sales in $’s with the exception of Trees, Vegetable Seeds, and Transplants:
m_backward <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_backward)
##
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales +
## CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales +
## HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTAcres +
## HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -705872 -49287 -26985 2922 2110231
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.037e+04 2.068e+04 -0.985 0.32460
## ChemExp 3.016e-03 3.876e-04 7.780 8.29e-15 ***
## ChemOps 2.132e+02 2.383e+01 8.945 < 2e-16 ***
## CropOps 4.225e+01 1.696e+01 2.492 0.01273 *
## CropSales 3.396e+03 1.363e+03 2.492 0.01274 *
## CropOrgSales 1.351e-02 1.519e-03 8.895 < 2e-16 ***
## FertExp -1.825e-03 3.944e-04 -4.627 3.77e-06 ***
## FertOps -2.021e+02 1.973e+01 -10.241 < 2e-16 ***
## HoneyOpsProd 3.109e+03 3.564e+02 8.723 < 2e-16 ***
## HoneyOpsSales -7.170e+03 4.808e+02 -14.914 < 2e-16 ***
## HoneySales 4.910e-01 8.570e-03 57.295 < 2e-16 ***
## HoneyColSales 2.593e+02 1.760e+01 14.737 < 2e-16 ***
## HortExcTVSTOps -1.516e+02 5.550e+01 -2.732 0.00631 **
## HortExcTAcres -4.934e+02 1.925e+02 -2.563 0.01041 *
## HortExcTIrgAcres -1.761e+01 6.098e+00 -2.888 0.00389 **
## HortExcTIrgOps 3.369e+01 7.984e+00 4.220 2.47e-05 ***
## HortUndProt 1.769e+03 3.548e+02 4.986 6.30e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146200 on 7081 degrees of freedom
## Multiple R-squared: 0.4333, Adjusted R-squared: 0.4321
## F-statistic: 338.4 on 16 and 7081 DF, p-value: < 2.2e-16
A slight decrease, so we’ll keep that in our model and try another: HortExcTAcres.
m_backward <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_backward)
##
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales +
## CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales +
## HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales +
## HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -722316 -49272 -26908 2873 2104701
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.840e+04 2.069e+04 -0.889 0.3739
## ChemExp 2.905e-03 3.892e-04 7.464 9.37e-14 ***
## ChemOps 2.164e+02 2.385e+01 9.073 < 2e-16 ***
## CropOps 4.172e+01 1.696e+01 2.461 0.0139 *
## CropSales 3.228e+03 1.364e+03 2.367 0.0180 *
## CropOrgSales 1.279e-02 1.572e-03 8.135 4.81e-16 ***
## FertExp -1.763e-03 3.935e-04 -4.480 7.56e-06 ***
## FertOps -2.049e+02 1.972e+01 -10.390 < 2e-16 ***
## HoneyOpsProd 3.185e+03 3.596e+02 8.856 < 2e-16 ***
## HoneyOpsSales -7.059e+03 4.765e+02 -14.816 < 2e-16 ***
## HoneySales 4.913e-01 8.572e-03 57.323 < 2e-16 ***
## HoneyColSales 2.599e+02 1.759e+01 14.778 < 2e-16 ***
## HortExcTVSTOps -2.768e+02 7.003e+01 -3.952 7.82e-05 ***
## HortExcTVSTSales 2.655e-04 1.088e-04 2.439 0.0147 *
## HortExcTIrgAcres -2.235e+01 5.605e+00 -3.988 6.73e-05 ***
## HortExcTIrgOps 3.907e+01 7.406e+00 5.275 1.37e-07 ***
## HortUndProt 9.545e+02 1.710e+02 5.581 2.49e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146200 on 7081 degrees of freedom
## Multiple R-squared: 0.4333, Adjusted R-squared: 0.432
## F-statistic: 338.4 on 16 and 7081 DF, p-value: < 2.2e-16
Another slight drop. Another try: CropSales
m_backward <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_backward)
##
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropOrgSales +
## FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales +
## HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres +
## HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -721823 -48564 -27341 2500 2089082
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.050e+04 2.992e+03 10.193 < 2e-16 ***
## ChemExp 2.730e-03 3.798e-04 7.190 7.16e-13 ***
## ChemOps 2.203e+02 2.377e+01 9.268 < 2e-16 ***
## CropOps 5.054e+01 1.660e+01 3.044 0.00234 **
## CropOrgSales 1.253e-02 1.571e-03 7.981 1.68e-15 ***
## FertExp -1.429e-03 3.595e-04 -3.977 7.06e-05 ***
## FertOps -2.107e+02 1.950e+01 -10.806 < 2e-16 ***
## HoneyOpsProd 3.187e+03 3.597e+02 8.860 < 2e-16 ***
## HoneyOpsSales -7.213e+03 4.811e+02 -14.993 < 2e-16 ***
## HoneySales 4.913e-01 8.573e-03 57.316 < 2e-16 ***
## HoneyColSales 2.591e+02 1.761e+01 14.720 < 2e-16 ***
## HortExcTVSTOps -2.365e+02 7.188e+01 -3.291 0.00100 **
## HortExcTVSTSales 2.297e-04 1.109e-04 2.071 0.03839 *
## HortExcTAcres -4.070e+02 1.963e+02 -2.074 0.03815 *
## HortExcTIrgAcres -1.684e+01 6.100e+00 -2.761 0.00578 **
## HortExcTIrgOps 3.296e+01 8.002e+00 4.118 3.86e-05 ***
## HortUndProt 1.604e+03 3.613e+02 4.438 9.20e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146200 on 7081 degrees of freedom
## Multiple R-squared: 0.4332, Adjusted R-squared: 0.4319
## F-statistic: 338.2 on 16 and 7081 DF, p-value: < 2.2e-16
Another drop. In fact, after removing each of these variables once, there was no increase in the Adjusted R-squared so the backward model looks strikingly similar to the full model, except that we’ve ruled out any valuable impact to our model by eliminating the number of Crop Organizations with Operations and the number of Honey Colonies with Operations:
m_backward <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_backward)
##
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales +
## CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales +
## HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales +
## HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt,
## data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -712524 -49322 -26912 2842 2104209
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.893e+04 2.068e+04 -0.915 0.360125
## ChemExp 2.940e-03 3.894e-04 7.550 4.92e-14 ***
## ChemOps 2.154e+02 2.385e+01 9.030 < 2e-16 ***
## CropOps 4.217e+01 1.695e+01 2.488 0.012886 *
## CropSales 3.293e+03 1.364e+03 2.415 0.015766 *
## CropOrgSales 1.271e-02 1.572e-03 8.087 7.12e-16 ***
## FertExp -1.821e-03 3.943e-04 -4.619 3.92e-06 ***
## FertOps -2.032e+02 1.973e+01 -10.296 < 2e-16 ***
## HoneyOpsProd 3.206e+03 3.597e+02 8.914 < 2e-16 ***
## HoneyOpsSales -7.200e+03 4.809e+02 -14.971 < 2e-16 ***
## HoneySales 4.913e-01 8.570e-03 57.334 < 2e-16 ***
## HoneyColSales 2.580e+02 1.761e+01 14.654 < 2e-16 ***
## HortExcTVSTOps -2.420e+02 7.189e+01 -3.367 0.000764 ***
## HortExcTVSTSales 2.195e-04 1.109e-04 1.978 0.047959 *
## HortExcTAcres -4.177e+02 1.963e+02 -2.128 0.033337 *
## HortExcTIrgAcres -1.723e+01 6.100e+00 -2.824 0.004759 **
## HortExcTIrgOps 3.262e+01 8.001e+00 4.076 4.62e-05 ***
## HortUndProt 1.632e+03 3.614e+02 4.516 6.40e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146100 on 7080 degrees of freedom
## Multiple R-squared: 0.4337, Adjusted R-squared: 0.4323
## F-statistic: 318.9 on 17 and 7080 DF, p-value: < 2.2e-16
So far our new best model came from the Backward Elimination method and gave us an Adjusted R-squared of 0.4323.
With forward selection we start from the start by comparing each variable in a linear model and adding new variables if they improve our Adjusted R-squared. It is the opposite of backward elimination: we add items until our Adjusted R-squared is higher than it was in the previous model.
We start with the start, and end with the end. Here are a few examples:
m_forward <- lm(HoneyProd ~ ChemExp, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ ChemExp, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1392592 -56941 -46062 -12873 2730983
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.712e+04 2.347e+03 20.07 <2e-16 ***
## ChemExp 6.593e-03 2.225e-04 29.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 183000 on 7096 degrees of freedom
## Multiple R-squared: 0.1101, Adjusted R-squared: 0.11
## F-statistic: 878 on 1 and 7096 DF, p-value: < 2.2e-16
With ChemExp included, our Adjusted R-squared is 0.11
m_forward <- lm(HoneyProd ~ ChemOps, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ ChemOps, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -535655 -71954 -43096 -10021 2871231
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22939.376 3343.792 6.86 7.45e-12 ***
## ChemOps 145.128 7.127 20.36 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 188500 on 7096 degrees of freedom
## Multiple R-squared: 0.05521, Adjusted R-squared: 0.05508
## F-statistic: 414.7 on 1 and 7096 DF, p-value: < 2.2e-16
With ChemOps included, our Adjusted R-squared is 0.05508. ChemExp is a better starting point.
m_forward <- lm(HoneyProd ~ CropOps, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ CropOps, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -525947 -72728 -44649 -6749 2877363
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24780.520 3354.171 7.388 1.66e-13 ***
## CropOps 134.118 6.861 19.547 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 188900 on 7096 degrees of freedom
## Multiple R-squared: 0.05109, Adjusted R-squared: 0.05096
## F-statistic: 382.1 on 1 and 7096 DF, p-value: < 2.2e-16
With CropOps included, our Adjusted R-squared is 0.05096 ChemExp is again a better starting point.
m_forward <- lm(HoneyProd ~ HortUndProt, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -681024 -67292 -62064 -20189 2919735
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 68913.7 2318.2 29.73 <2e-16 ***
## HortUndProt 1558.6 138.2 11.28 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 192200 on 7096 degrees of freedom
## Multiple R-squared: 0.01761, Adjusted R-squared: 0.01747
## F-statistic: 127.2 on 1 and 7096 DF, p-value: < 2.2e-16
With HoneySales included, our Adjusted R-squared is 0.3084. HoneyProd is best starting point. We can add from here with a couple more examples:
m_forward <- lm(HoneyProd ~ HortUndProt + ChemExp, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + ChemExp, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1339641 -57923 -43898 -11766 2739401
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.406e+04 2.354e+03 18.714 <2e-16 ***
## HortUndProt 1.260e+03 1.311e+02 9.611 <2e-16 ***
## ChemExp 6.426e-03 2.218e-04 28.973 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 181800 on 7095 degrees of freedom
## Multiple R-squared: 0.1215, Adjusted R-squared: 0.1213
## F-statistic: 490.8 on 2 and 7095 DF, p-value: < 2.2e-16
Adding ChemExp to our model, we get a decreased Adjusted R-squared of 0.1213. This item should not be added to our model, yet.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -775407 -47306 -43136 -13742 2831442
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.682e+04 1.950e+03 24.01 <2e-16 ***
## HortUndProt 1.792e+03 1.141e+02 15.71 <2e-16 ***
## HoneySales 5.010e-01 8.675e-03 57.75 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 158600 on 7095 degrees of freedom
## Multiple R-squared: 0.3318, Adjusted R-squared: 0.3316
## F-statistic: 1761 on 2 and 7095 DF, p-value: < 2.2e-16
Adding HoneySales to our model, we get an increased Adjusted R-squared of 0.3316. This item should be added. Let’s see if we can add a third explanatory variable:
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales,
## data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -884552 -42236 -38248 -10244 2652548
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.059e+04 1.919e+03 21.15 <2e-16 ***
## HortUndProt 1.423e+03 1.123e+02 12.67 <2e-16 ***
## HoneySales 5.065e-01 8.433e-03 60.06 <2e-16 ***
## HoneyColSales 3.680e+02 1.791e+01 20.55 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 154000 on 7094 degrees of freedom
## Multiple R-squared: 0.3693, Adjusted R-squared: 0.369
## F-statistic: 1385 on 3 and 7094 DF, p-value: < 2.2e-16
Adding HoneyColSales to our model our Adjusted R-squared goes up to 0.369. It can be added and we can search for a fourth variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -859041 -43448 -31624 -12572 2379893
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.023e+04 1.977e+03 15.29 <2e-16 ***
## HortUndProt 1.284e+03 1.104e+02 11.64 <2e-16 ***
## HoneySales 4.662e-01 8.600e-03 54.21 <2e-16 ***
## HoneyColSales 3.358e+02 1.766e+01 19.01 <2e-16 ***
## ChemExp 3.271e-03 1.925e-04 16.99 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 151000 on 7093 degrees of freedom
## Multiple R-squared: 0.394, Adjusted R-squared: 0.3936
## F-statistic: 1153 on 4 and 7093 DF, p-value: < 2.2e-16
Adding ChemExp to our model our Adjusted R-squared goes up to 0.3936. It can be added and we can search for a fifth variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -834541 -46540 -34709 -2190 2358684
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.801e+04 2.086e+03 18.22 <2e-16 ***
## HortUndProt 1.200e+03 1.097e+02 10.94 <2e-16 ***
## HoneySales 4.814e-01 8.642e-03 55.70 <2e-16 ***
## HoneyColSales 3.210e+02 1.757e+01 18.27 <2e-16 ***
## ChemExp 3.499e-03 1.921e-04 18.22 <2e-16 ***
## HoneyOpsSales -3.611e+03 3.305e+02 -10.93 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 149800 on 7092 degrees of freedom
## Multiple R-squared: 0.404, Adjusted R-squared: 0.4036
## F-statistic: 961.4 on 5 and 7092 DF, p-value: < 2.2e-16
Adding HoneyOpsSales to our model our Adjusted R-squared goes up to 0.4036. It can be added and we can search for a sixth variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales + CropOrgSales, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -740485 -45476 -34045 -2282 2279390
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.714e+04 2.074e+03 17.903 <2e-16 ***
## HortUndProt 1.281e+03 1.093e+02 11.719 <2e-16 ***
## HoneySales 4.887e-01 8.619e-03 56.707 <2e-16 ***
## HoneyColSales 3.011e+02 1.757e+01 17.136 <2e-16 ***
## ChemExp 2.942e-03 1.992e-04 14.768 <2e-16 ***
## HoneyOpsSales -3.358e+03 3.294e+02 -10.195 <2e-16 ***
## CropOrgSales 1.452e-02 1.494e-03 9.721 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 148800 on 7091 degrees of freedom
## Multiple R-squared: 0.4118, Adjusted R-squared: 0.4113
## F-statistic: 827.5 on 6 and 7091 DF, p-value: < 2.2e-16
Adding CropOrgSales to our model our Adjusted R-squared goes up to 0.4113. It can be added and we can search for a seventh variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -701111 -47757 -30479 601 2259122
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.640e+04 2.474e+03 10.671 < 2e-16 ***
## HortUndProt 1.080e+03 1.118e+02 9.659 < 2e-16 ***
## HoneySales 4.947e-01 8.615e-03 57.422 < 2e-16 ***
## HoneyColSales 2.781e+02 1.774e+01 15.675 < 2e-16 ***
## ChemExp 2.940e-03 1.984e-04 14.821 < 2e-16 ***
## HoneyOpsSales -5.818e+03 4.526e+02 -12.855 < 2e-16 ***
## CropOrgSales 1.257e-02 1.508e-03 8.337 < 2e-16 ***
## HoneyOpsProd 2.447e+03 3.102e+02 7.888 3.55e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 148200 on 7090 degrees of freedom
## Multiple R-squared: 0.4169, Adjusted R-squared: 0.4164
## F-statistic: 724.3 on 7 and 7090 DF, p-value: < 2.2e-16
Adding HoneyOpsProd to our model our Adjusted R-squared goes up to 0.4164. It can be added and we can search for an eigth variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres,
## data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -692870 -47596 -30050 477 2261648
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.794e+04 2.491e+03 11.216 < 2e-16 ***
## HortUndProt 2.483e+03 3.134e+02 7.922 2.69e-15 ***
## HoneySales 4.948e-01 8.601e-03 57.527 < 2e-16 ***
## HoneyColSales 2.739e+02 1.773e+01 15.445 < 2e-16 ***
## ChemExp 2.885e-03 1.984e-04 14.538 < 2e-16 ***
## HoneyOpsSales -6.194e+03 4.587e+02 -13.504 < 2e-16 ***
## CropOrgSales 1.229e-02 1.507e-03 8.158 4.00e-16 ***
## HoneyOpsProd 2.681e+03 3.136e+02 8.550 < 2e-16 ***
## HortExcTAcres -8.241e+02 1.720e+02 -4.791 1.69e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 147900 on 7089 degrees of freedom
## Multiple R-squared: 0.4188, Adjusted R-squared: 0.4182
## F-statistic: 638.6 on 8 and 7089 DF, p-value: < 2.2e-16
Adding HortExcTAcres to our model our Adjusted R-squared goes up to 0.4182. It can be added and we can search for a ninth variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres +
## ChemOps, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -691954 -48883 -30069 1394 2202903
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.206e+04 2.827e+03 7.806 6.78e-15 ***
## HortUndProt 2.522e+03 3.132e+02 8.053 9.40e-16 ***
## HoneySales 4.950e-01 8.590e-03 57.622 < 2e-16 ***
## HoneyColSales 2.700e+02 1.773e+01 15.223 < 2e-16 ***
## ChemExp 2.424e-03 2.244e-04 10.798 < 2e-16 ***
## HoneyOpsSales -6.069e+03 4.590e+02 -13.223 < 2e-16 ***
## CropOrgSales 1.191e-02 1.507e-03 7.900 3.22e-15 ***
## HoneyOpsProd 2.260e+03 3.276e+02 6.899 5.68e-12 ***
## HortExcTAcres -8.935e+02 1.725e+02 -5.179 2.29e-07 ***
## ChemOps 3.160e+01 7.222e+00 4.375 1.23e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 147700 on 7088 degrees of freedom
## Multiple R-squared: 0.4204, Adjusted R-squared: 0.4197
## F-statistic: 571.2 on 9 and 7088 DF, p-value: < 2.2e-16
Adding ChemOps to our model our Adjusted R-squared goes up to 0.4197. It can be added and we can search for a tenth variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres +
## ChemOps + FertOps, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -676974 -50085 -27170 3748 2149204
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.926e+04 2.890e+03 10.126 < 2e-16 ***
## HortUndProt 2.394e+03 3.111e+02 7.695 1.61e-14 ***
## HoneySales 4.869e-01 8.562e-03 56.869 < 2e-16 ***
## HoneyColSales 2.652e+02 1.761e+01 15.058 < 2e-16 ***
## ChemExp 1.774e-03 2.314e-04 7.668 1.98e-14 ***
## HoneyOpsSales -7.514e+03 4.763e+02 -15.776 < 2e-16 ***
## CropOrgSales 1.288e-02 1.499e-03 8.595 < 2e-16 ***
## HoneyOpsProd 3.218e+03 3.380e+02 9.523 < 2e-16 ***
## HortExcTAcres -7.989e+02 1.715e+02 -4.659 3.24e-06 ***
## ChemOps 2.410e+02 2.138e+01 11.274 < 2e-16 ***
## FertOps -1.982e+02 1.907e+01 -10.398 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146600 on 7087 degrees of freedom
## Multiple R-squared: 0.4291, Adjusted R-squared: 0.4283
## F-statistic: 532.7 on 10 and 7087 DF, p-value: < 2.2e-16
Adding FertOps to our model our Adjusted R-squared goes up to 0.4283. It can be added and we can search for an eleventh variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres +
## ChemOps + FertOps + FertExp, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -745040 -49074 -27773 2052 2111088
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.171e+04 2.965e+03 10.695 < 2e-16 ***
## HortUndProt 2.426e+03 3.109e+02 7.801 7.03e-15 ***
## HoneySales 4.890e-01 8.575e-03 57.032 < 2e-16 ***
## HoneyColSales 2.621e+02 1.761e+01 14.880 < 2e-16 ***
## ChemExp 2.852e-03 3.763e-04 7.579 3.93e-14 ***
## HoneyOpsSales -7.286e+03 4.800e+02 -15.180 < 2e-16 ***
## CropOrgSales 1.291e-02 1.498e-03 8.620 < 2e-16 ***
## HoneyOpsProd 3.016e+03 3.422e+02 8.814 < 2e-16 ***
## HortExcTAcres -8.431e+02 1.718e+02 -4.909 9.37e-07 ***
## ChemOps 2.505e+02 2.152e+01 11.641 < 2e-16 ***
## FertOps -1.994e+02 1.905e+01 -10.467 < 2e-16 ***
## FertExp -1.284e-03 3.537e-04 -3.629 0.000286 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146500 on 7086 degrees of freedom
## Multiple R-squared: 0.4302, Adjusted R-squared: 0.4293
## F-statistic: 486.3 on 11 and 7086 DF, p-value: < 2.2e-16
Adding FertExp to our model our Adjusted R-squared goes up to 0.4283. It can be added and we can search for a twelth variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres +
## ChemOps + FertOps + FertExp + HortExcTIrgOps, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -722946 -48756 -27444 2041 2118695
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.110e+04 2.969e+03 10.476 < 2e-16 ***
## HortUndProt 1.958e+03 3.397e+02 5.763 8.62e-09 ***
## HoneySales 4.900e-01 8.573e-03 57.154 < 2e-16 ***
## HoneyColSales 2.610e+02 1.760e+01 14.827 < 2e-16 ***
## ChemExp 2.726e-03 3.778e-04 7.216 5.92e-13 ***
## HoneyOpsSales -7.230e+03 4.799e+02 -15.066 < 2e-16 ***
## CropOrgSales 1.316e-02 1.498e-03 8.784 < 2e-16 ***
## HoneyOpsProd 3.020e+03 3.420e+02 8.830 < 2e-16 ***
## HortExcTAcres -7.665e+02 1.731e+02 -4.428 9.66e-06 ***
## ChemOps 2.499e+02 2.150e+01 11.620 < 2e-16 ***
## FertOps -1.994e+02 1.904e+01 -10.475 < 2e-16 ***
## FertExp -1.210e-03 3.541e-04 -3.418 0.000634 ***
## HortExcTIrgOps 1.493e+01 4.383e+00 3.406 0.000663 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146400 on 7085 degrees of freedom
## Multiple R-squared: 0.4311, Adjusted R-squared: 0.4301
## F-statistic: 447.4 on 12 and 7085 DF, p-value: < 2.2e-16
Adding HortExcTIrgOps to our model our Adjusted R-squared goes up to 0.4301. It can be added and we can search for a thirteenth variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres +
## ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales,
## data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -689935 -49653 -26886 2755 2134137
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.325e+04 2.010e+04 -1.157 0.24745
## HortUndProt 1.985e+03 3.397e+02 5.844 5.32e-09 ***
## HoneySales 4.899e-01 8.569e-03 57.169 < 2e-16 ***
## HoneyColSales 2.593e+02 1.761e+01 14.731 < 2e-16 ***
## ChemExp 2.960e-03 3.872e-04 7.645 2.37e-14 ***
## HoneyOpsSales -7.240e+03 4.797e+02 -15.092 < 2e-16 ***
## CropOrgSales 1.318e-02 1.498e-03 8.800 < 2e-16 ***
## HoneyOpsProd 3.003e+03 3.419e+02 8.783 < 2e-16 ***
## HortExcTAcres -7.908e+02 1.732e+02 -4.565 5.08e-06 ***
## ChemOps 2.389e+02 2.187e+01 10.926 < 2e-16 ***
## FertOps -1.939e+02 1.914e+01 -10.134 < 2e-16 ***
## FertExp -1.669e-03 3.917e-04 -4.261 2.06e-05 ***
## HortExcTIrgOps 1.429e+01 4.387e+00 3.258 0.00113 **
## CropSales 3.627e+03 1.327e+03 2.734 0.00627 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146300 on 7084 degrees of freedom
## Multiple R-squared: 0.4317, Adjusted R-squared: 0.4307
## F-statistic: 413.9 on 13 and 7084 DF, p-value: < 2.2e-16
Adding CropSales to our model our Adjusted R-squared goes up to 0.4307. It can be added and we can search for a fourthteenth variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + HortExcTVSTOps, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres +
## ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales +
## HortExcTVSTOps, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -693663 -49551 -26670 2654 2121307
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.978e+04 2.027e+04 -1.469 0.14180
## HortUndProt 2.047e+03 3.405e+02 6.011 1.94e-09 ***
## HoneySales 4.909e-01 8.577e-03 57.238 < 2e-16 ***
## HoneyColSales 2.590e+02 1.760e+01 14.717 < 2e-16 ***
## ChemExp 3.024e-03 3.880e-04 7.794 7.41e-15 ***
## HoneyOpsSales -7.174e+03 4.803e+02 -14.938 < 2e-16 ***
## CropOrgSales 1.375e-02 1.516e-03 9.074 < 2e-16 ***
## HoneyOpsProd 3.237e+03 3.551e+02 9.116 < 2e-16 ***
## HortExcTAcres -7.172e+02 1.758e+02 -4.079 4.57e-05 ***
## ChemOps 2.377e+02 2.186e+01 10.874 < 2e-16 ***
## FertOps -1.902e+02 1.919e+01 -9.910 < 2e-16 ***
## FertExp -1.773e-03 3.939e-04 -4.500 6.89e-06 ***
## HortExcTIrgOps 1.360e+01 4.394e+00 3.094 0.00198 **
## CropSales 4.001e+03 1.335e+03 2.997 0.00273 **
## HortExcTVSTOps -1.327e+02 5.457e+01 -2.432 0.01504 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146300 on 7083 degrees of freedom
## Multiple R-squared: 0.4322, Adjusted R-squared: 0.431
## F-statistic: 385.1 on 14 and 7083 DF, p-value: < 2.2e-16
Adding HortExcTVSTOps to our model our Adjusted R-squared goes up to 0.431. It can be added and we can search for a fifthteenth variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + HortExcTVSTOps + HortExcTIrgAcres, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres +
## ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales +
## HortExcTVSTOps + HortExcTIrgAcres, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -688591 -49407 -26797 2227 2125262
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.072e+04 2.026e+04 -1.516 0.12949
## HortUndProt 1.754e+03 3.548e+02 4.943 7.86e-07 ***
## HoneySales 4.908e-01 8.573e-03 57.255 < 2e-16 ***
## HoneyColSales 2.605e+02 1.760e+01 14.802 < 2e-16 ***
## ChemExp 3.010e-03 3.878e-04 7.762 9.57e-15 ***
## HoneyOpsSales -7.115e+03 4.805e+02 -14.809 < 2e-16 ***
## CropOrgSales 1.381e-02 1.515e-03 9.115 < 2e-16 ***
## HoneyOpsProd 3.175e+03 3.555e+02 8.930 < 2e-16 ***
## HortExcTAcres -4.875e+02 1.926e+02 -2.531 0.01139 *
## ChemOps 2.369e+02 2.185e+01 10.840 < 2e-16 ***
## FertOps -1.905e+02 1.918e+01 -9.930 < 2e-16 ***
## FertExp -1.761e-03 3.937e-04 -4.474 7.80e-06 ***
## HortExcTIrgOps 3.303e+01 7.982e+00 4.137 3.56e-05 ***
## CropSales 4.091e+03 1.335e+03 3.065 0.00218 **
## HortExcTVSTOps -1.264e+02 5.459e+01 -2.315 0.02063 *
## HortExcTIrgAcres -1.778e+01 6.100e+00 -2.915 0.00357 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146200 on 7082 degrees of freedom
## Multiple R-squared: 0.4328, Adjusted R-squared: 0.4316
## F-statistic: 360.3 on 15 and 7082 DF, p-value: < 2.2e-16
Adding HortExcTIrgAcres to our model our Adjusted R-squared goes up to 0.4316. It can be added and we can search for a sixteenth variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + HortExcTVSTOps + HortExcTIrgAcres + CropOps, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres +
## ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales +
## HortExcTVSTOps + HortExcTIrgAcres + CropOps, data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -705872 -49287 -26985 2922 2110231
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.037e+04 2.068e+04 -0.985 0.32460
## HortUndProt 1.769e+03 3.548e+02 4.986 6.30e-07 ***
## HoneySales 4.910e-01 8.570e-03 57.295 < 2e-16 ***
## HoneyColSales 2.593e+02 1.760e+01 14.737 < 2e-16 ***
## ChemExp 3.016e-03 3.876e-04 7.780 8.29e-15 ***
## HoneyOpsSales -7.170e+03 4.808e+02 -14.914 < 2e-16 ***
## CropOrgSales 1.351e-02 1.519e-03 8.895 < 2e-16 ***
## HoneyOpsProd 3.109e+03 3.564e+02 8.723 < 2e-16 ***
## HortExcTAcres -4.934e+02 1.925e+02 -2.563 0.01041 *
## ChemOps 2.132e+02 2.383e+01 8.945 < 2e-16 ***
## FertOps -2.021e+02 1.973e+01 -10.241 < 2e-16 ***
## FertExp -1.825e-03 3.944e-04 -4.627 3.77e-06 ***
## HortExcTIrgOps 3.369e+01 7.984e+00 4.220 2.47e-05 ***
## CropSales 3.396e+03 1.363e+03 2.492 0.01274 *
## HortExcTVSTOps -1.516e+02 5.550e+01 -2.732 0.00631 **
## HortExcTIrgAcres -1.761e+01 6.098e+00 -2.888 0.00389 **
## CropOps 4.225e+01 1.696e+01 2.492 0.01273 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146200 on 7081 degrees of freedom
## Multiple R-squared: 0.4333, Adjusted R-squared: 0.4321
## F-statistic: 338.4 on 16 and 7081 DF, p-value: < 2.2e-16
Adding CropOps to our model our Adjusted R-squared goes up to 0.4321. It can be added and we can search for a seventeenth variable.
m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + HortExcTVSTOps + HortExcTIrgAcres + CropOps + HortExcTVSTSales, data = HoneyC_FullModel)
summary(m_forward)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres +
## ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales +
## HortExcTVSTOps + HortExcTIrgAcres + CropOps + HortExcTVSTSales,
## data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -712524 -49322 -26912 2842 2104209
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.893e+04 2.068e+04 -0.915 0.360125
## HortUndProt 1.632e+03 3.614e+02 4.516 6.40e-06 ***
## HoneySales 4.913e-01 8.570e-03 57.334 < 2e-16 ***
## HoneyColSales 2.580e+02 1.761e+01 14.654 < 2e-16 ***
## ChemExp 2.940e-03 3.894e-04 7.550 4.92e-14 ***
## HoneyOpsSales -7.200e+03 4.809e+02 -14.971 < 2e-16 ***
## CropOrgSales 1.271e-02 1.572e-03 8.087 7.12e-16 ***
## HoneyOpsProd 3.206e+03 3.597e+02 8.914 < 2e-16 ***
## HortExcTAcres -4.177e+02 1.963e+02 -2.128 0.033337 *
## ChemOps 2.154e+02 2.385e+01 9.030 < 2e-16 ***
## FertOps -2.032e+02 1.973e+01 -10.296 < 2e-16 ***
## FertExp -1.821e-03 3.943e-04 -4.619 3.92e-06 ***
## HortExcTIrgOps 3.262e+01 8.001e+00 4.076 4.62e-05 ***
## CropSales 3.293e+03 1.364e+03 2.415 0.015766 *
## HortExcTVSTOps -2.420e+02 7.189e+01 -3.367 0.000764 ***
## HortExcTIrgAcres -1.723e+01 6.100e+00 -2.824 0.004759 **
## CropOps 4.217e+01 1.695e+01 2.488 0.012886 *
## HortExcTVSTSales 2.195e-04 1.109e-04 1.978 0.047959 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146100 on 7080 degrees of freedom
## Multiple R-squared: 0.4337, Adjusted R-squared: 0.4323
## F-statistic: 318.9 on 17 and 7080 DF, p-value: < 2.2e-16
Adding HortExcTVSTSales to our model our Adjusted R-squared goes up to 0.4323. It can be added and we finish our final forward selection model with an Adjusted R-Squared of 0.4323.
It turns out, our final model is the same for backward elimination and forward selection. This isn’t exactly a surprise, but it gives us some understanding of what factors relate to one another.
m_final <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + HortExcTVSTOps + HortExcTIrgAcres + CropOps + HortExcTVSTSales, data = HoneyC_FullModel)
summary(m_final)
##
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales +
## ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres +
## ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales +
## HortExcTVSTOps + HortExcTIrgAcres + CropOps + HortExcTVSTSales,
## data = HoneyC_FullModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -712524 -49322 -26912 2842 2104209
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.893e+04 2.068e+04 -0.915 0.360125
## HortUndProt 1.632e+03 3.614e+02 4.516 6.40e-06 ***
## HoneySales 4.913e-01 8.570e-03 57.334 < 2e-16 ***
## HoneyColSales 2.580e+02 1.761e+01 14.654 < 2e-16 ***
## ChemExp 2.940e-03 3.894e-04 7.550 4.92e-14 ***
## HoneyOpsSales -7.200e+03 4.809e+02 -14.971 < 2e-16 ***
## CropOrgSales 1.271e-02 1.572e-03 8.087 7.12e-16 ***
## HoneyOpsProd 3.206e+03 3.597e+02 8.914 < 2e-16 ***
## HortExcTAcres -4.177e+02 1.963e+02 -2.128 0.033337 *
## ChemOps 2.154e+02 2.385e+01 9.030 < 2e-16 ***
## FertOps -2.032e+02 1.973e+01 -10.296 < 2e-16 ***
## FertExp -1.821e-03 3.943e-04 -4.619 3.92e-06 ***
## HortExcTIrgOps 3.262e+01 8.001e+00 4.076 4.62e-05 ***
## CropSales 3.293e+03 1.364e+03 2.415 0.015766 *
## HortExcTVSTOps -2.420e+02 7.189e+01 -3.367 0.000764 ***
## HortExcTIrgAcres -1.723e+01 6.100e+00 -2.824 0.004759 **
## CropOps 4.217e+01 1.695e+01 2.488 0.012886 *
## HortExcTVSTSales 2.195e-04 1.109e-04 1.978 0.047959 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 146100 on 7080 degrees of freedom
## Multiple R-squared: 0.4337, Adjusted R-squared: 0.4323
## F-statistic: 318.9 on 17 and 7080 DF, p-value: < 2.2e-16