Appendix 2: Multiple Regression Analysis:

With our newfound understanding of the relationship between honey produced per colony and the number of colonies, we are no closer to answering our ultimate question: do we really rely on the bees?

There are three ways we can perform the regression analysis: [1] the full model, [2] the backward elimination model, and [3] the forward selection model.

Loading packages and essential data:

We need to load the data set that we will use for this analysis first (of course), which came from our main analysis file Honey, I Shrunk the Bees and exported to GitHub:

HoneyC_FullModel <- read.csv("https://raw.githubusercontent.com/chrisgmartin/HoneyIShrunkTheBees/master/Appendix/HoneyC_FullModel.csv", header = TRUE)

Another data set we will need comes from the City University of New York’s Masters of Data Analytics Program (CUNY MSDA Program) course IS 606 - Statistics and Probability for Data Analytics. These custom files assist in the linear regression as it includes a very hand function:

load(url("https://github.com/chrisgmartin/HoneyIShrunkTheBees/raw/master/Additional/nc.RData"))

Data Import:

Simply enough, each .CSV was imported into a schema (titled HISP) using the MySQL data import wizard.

Full Model:

The full model uses every available explanatory variable to attempt to predict the variable at hand. As our state dataset includes an intimidating 108 different variables, using the full model is an incredibly messy task. For ease, we’ll switch to the county dataset which includes “only” 21 DataItem’s, remove all incomplete cases, and look at total production measured in lb’s (rather than production per colony. Note that this case requires us to force null values to 0 despite my previous warnings not to do so, after removing cases with incomplete results (N/A’s or Null Values) for Honey Produced because there are simply not enough data points without null values in one of the many columns (there are zero, in fact):

#return only complete cases
HoneyC_FullModel <- HoneyC_FullModel[complete.cases(HoneyC_FullModel[,13]),]
HoneyC_FullModel[is.na(HoneyC_FullModel)] <- 0

m_full <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + CropOrgOps + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColOps + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_full)
## 
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + 
##     CropOrgOps + CropOrgSales + FertExp + FertOps + HoneyOpsProd + 
##     HoneyOpsSales + HoneySales + HoneyColOps + HoneyColSales + 
##     HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres + 
##     HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -718898  -49324  -27017    2728 2106250 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -1.921e+04  2.069e+04  -0.928  0.35335    
## ChemExp           2.938e-03  3.895e-04   7.542 5.22e-14 ***
## ChemOps           2.137e+02  2.406e+01   8.881  < 2e-16 ***
## CropOps           4.327e+01  1.724e+01   2.510  0.01209 *  
## CropSales         3.304e+03  1.364e+03   2.422  0.01545 *  
## CropOrgOps       -1.049e+02  2.520e+02  -0.417  0.67705    
## CropOrgSales      1.311e-02  1.852e-03   7.082 1.56e-12 ***
## FertExp          -1.826e-03  3.947e-04  -4.627 3.78e-06 ***
## FertOps          -2.023e+02  1.984e+01 -10.193  < 2e-16 ***
## HoneyOpsProd      3.263e+03  3.789e+02   8.610  < 2e-16 ***
## HoneyOpsSales    -7.302e+03  5.190e+02 -14.070  < 2e-16 ***
## HoneySales        4.914e-01  8.573e-03  57.325  < 2e-16 ***
## HoneyColOps      -7.638e+02  2.500e+03  -0.306  0.75999    
## HoneyColSales     2.603e+02  1.877e+01  13.869  < 2e-16 ***
## HortExcTVSTOps   -2.331e+02  7.446e+01  -3.130  0.00175 ** 
## HortExcTVSTSales  2.162e-04  1.111e-04   1.945  0.05181 .  
## HortExcTAcres    -4.309e+02  2.002e+02  -2.153  0.03135 *  
## HortExcTIrgAcres -1.714e+01  6.103e+00  -2.809  0.00498 ** 
## HortExcTIrgOps    3.256e+01  8.003e+00   4.068 4.79e-05 ***
## HortUndProt       1.635e+03  3.627e+02   4.507 6.69e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146100 on 7078 degrees of freedom
## Multiple R-squared:  0.4337, Adjusted R-squared:  0.4322 
## F-statistic: 285.3 on 19 and 7078 DF,  p-value: < 2.2e-16

This full model can give us a model that is simply too cumbersome for casual output, but the main takeaway can be our Adjusted R-squared of 0.4322. This value we can use to compare how effective the model is likely to be against our other potential models: backward elimination and forward selection.

Backward Elimination:

With backward elimination, we start with the full model and move backward by eliminating those variables with the lowest Adjusted R-squared until we get to our maximized Adjusted R-squared model. From the full model, we saw that explantory variable with a high Pr(>|t|) is CropOrgOps: Crop Organizations with Operations. We’ll start by eliminating that from our model first:

m_backward <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColOps + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_backward)
## 
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + 
##     CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + 
##     HoneySales + HoneyColOps + HoneyColSales + HortExcTVSTOps + 
##     HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + 
##     HortUndProt, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -713343  -49350  -26918    2774 2105027 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -1.903e+04  2.069e+04  -0.920 0.357707    
## ChemExp           2.939e-03  3.895e-04   7.546 5.06e-14 ***
## ChemOps           2.149e+02  2.390e+01   8.991  < 2e-16 ***
## CropOps           4.199e+01  1.696e+01   2.476 0.013324 *  
## CropSales         3.299e+03  1.364e+03   2.418 0.015610 *  
## CropOrgSales      1.270e-02  1.572e-03   8.083 7.40e-16 ***
## FertExp          -1.820e-03  3.944e-04  -4.614 4.01e-06 ***
## FertOps          -2.026e+02  1.983e+01 -10.216  < 2e-16 ***
## HoneyOpsProd      3.241e+03  3.754e+02   8.634  < 2e-16 ***
## HoneyOpsSales    -7.247e+03  5.015e+02 -14.450  < 2e-16 ***
## HoneySales        4.914e-01  8.571e-03  57.328  < 2e-16 ***
## HoneyColOps      -8.194e+02  2.496e+03  -0.328 0.742731    
## HoneyColSales     2.601e+02  1.876e+01  13.864  < 2e-16 ***
## HortExcTVSTOps   -2.411e+02  7.195e+01  -3.350 0.000812 ***
## HortExcTVSTSales  2.185e-04  1.110e-04   1.969 0.048987 *  
## HortExcTAcres    -4.150e+02  1.964e+02  -2.112 0.034683 *  
## HortExcTIrgAcres -1.719e+01  6.101e+00  -2.817 0.004858 ** 
## HortExcTIrgOps    3.260e+01  8.002e+00   4.074 4.67e-05 ***
## HortUndProt       1.625e+03  3.620e+02   4.490 7.24e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146100 on 7079 degrees of freedom
## Multiple R-squared:  0.4337, Adjusted R-squared:  0.4322 
## F-statistic: 301.2 on 18 and 7079 DF,  p-value: < 2.2e-16

There was no change to the Adjusted R-squared so we can move foward without just as simply as we can with it. Let’s remove another one, HoneyColOps: the number of Honey Colonies with Operations.

m_backward <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_backward)
## 
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + 
##     CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + 
##     HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + 
##     HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, 
##     data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -712524  -49322  -26912    2842 2104209 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -1.893e+04  2.068e+04  -0.915 0.360125    
## ChemExp           2.940e-03  3.894e-04   7.550 4.92e-14 ***
## ChemOps           2.154e+02  2.385e+01   9.030  < 2e-16 ***
## CropOps           4.217e+01  1.695e+01   2.488 0.012886 *  
## CropSales         3.293e+03  1.364e+03   2.415 0.015766 *  
## CropOrgSales      1.271e-02  1.572e-03   8.087 7.12e-16 ***
## FertExp          -1.821e-03  3.943e-04  -4.619 3.92e-06 ***
## FertOps          -2.032e+02  1.973e+01 -10.296  < 2e-16 ***
## HoneyOpsProd      3.206e+03  3.597e+02   8.914  < 2e-16 ***
## HoneyOpsSales    -7.200e+03  4.809e+02 -14.971  < 2e-16 ***
## HoneySales        4.913e-01  8.570e-03  57.334  < 2e-16 ***
## HoneyColSales     2.580e+02  1.761e+01  14.654  < 2e-16 ***
## HortExcTVSTOps   -2.420e+02  7.189e+01  -3.367 0.000764 ***
## HortExcTVSTSales  2.195e-04  1.109e-04   1.978 0.047959 *  
## HortExcTAcres    -4.177e+02  1.963e+02  -2.128 0.033337 *  
## HortExcTIrgAcres -1.723e+01  6.100e+00  -2.824 0.004759 ** 
## HortExcTIrgOps    3.262e+01  8.001e+00   4.076 4.62e-05 ***
## HortUndProt       1.632e+03  3.614e+02   4.516 6.40e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146100 on 7080 degrees of freedom
## Multiple R-squared:  0.4337, Adjusted R-squared:  0.4323 
## F-statistic: 318.9 on 17 and 7080 DF,  p-value: < 2.2e-16

Our Adjusted R-squared increased ever so slightly, but increased none-the-less so we’ll keep it out. Let’s remove another one, HortExcTVSTSales: Horticulture Sales in $’s with the exception of Trees, Vegetable Seeds, and Transplants:

m_backward <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_backward)
## 
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + 
##     CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + 
##     HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTAcres + 
##     HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -705872  -49287  -26985    2922 2110231 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -2.037e+04  2.068e+04  -0.985  0.32460    
## ChemExp           3.016e-03  3.876e-04   7.780 8.29e-15 ***
## ChemOps           2.132e+02  2.383e+01   8.945  < 2e-16 ***
## CropOps           4.225e+01  1.696e+01   2.492  0.01273 *  
## CropSales         3.396e+03  1.363e+03   2.492  0.01274 *  
## CropOrgSales      1.351e-02  1.519e-03   8.895  < 2e-16 ***
## FertExp          -1.825e-03  3.944e-04  -4.627 3.77e-06 ***
## FertOps          -2.021e+02  1.973e+01 -10.241  < 2e-16 ***
## HoneyOpsProd      3.109e+03  3.564e+02   8.723  < 2e-16 ***
## HoneyOpsSales    -7.170e+03  4.808e+02 -14.914  < 2e-16 ***
## HoneySales        4.910e-01  8.570e-03  57.295  < 2e-16 ***
## HoneyColSales     2.593e+02  1.760e+01  14.737  < 2e-16 ***
## HortExcTVSTOps   -1.516e+02  5.550e+01  -2.732  0.00631 ** 
## HortExcTAcres    -4.934e+02  1.925e+02  -2.563  0.01041 *  
## HortExcTIrgAcres -1.761e+01  6.098e+00  -2.888  0.00389 ** 
## HortExcTIrgOps    3.369e+01  7.984e+00   4.220 2.47e-05 ***
## HortUndProt       1.769e+03  3.548e+02   4.986 6.30e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146200 on 7081 degrees of freedom
## Multiple R-squared:  0.4333, Adjusted R-squared:  0.4321 
## F-statistic: 338.4 on 16 and 7081 DF,  p-value: < 2.2e-16

A slight decrease, so we’ll keep that in our model and try another: HortExcTAcres.

m_backward <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_backward)
## 
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + 
##     CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + 
##     HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + 
##     HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -722316  -49272  -26908    2873 2104701 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -1.840e+04  2.069e+04  -0.889   0.3739    
## ChemExp           2.905e-03  3.892e-04   7.464 9.37e-14 ***
## ChemOps           2.164e+02  2.385e+01   9.073  < 2e-16 ***
## CropOps           4.172e+01  1.696e+01   2.461   0.0139 *  
## CropSales         3.228e+03  1.364e+03   2.367   0.0180 *  
## CropOrgSales      1.279e-02  1.572e-03   8.135 4.81e-16 ***
## FertExp          -1.763e-03  3.935e-04  -4.480 7.56e-06 ***
## FertOps          -2.049e+02  1.972e+01 -10.390  < 2e-16 ***
## HoneyOpsProd      3.185e+03  3.596e+02   8.856  < 2e-16 ***
## HoneyOpsSales    -7.059e+03  4.765e+02 -14.816  < 2e-16 ***
## HoneySales        4.913e-01  8.572e-03  57.323  < 2e-16 ***
## HoneyColSales     2.599e+02  1.759e+01  14.778  < 2e-16 ***
## HortExcTVSTOps   -2.768e+02  7.003e+01  -3.952 7.82e-05 ***
## HortExcTVSTSales  2.655e-04  1.088e-04   2.439   0.0147 *  
## HortExcTIrgAcres -2.235e+01  5.605e+00  -3.988 6.73e-05 ***
## HortExcTIrgOps    3.907e+01  7.406e+00   5.275 1.37e-07 ***
## HortUndProt       9.545e+02  1.710e+02   5.581 2.49e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146200 on 7081 degrees of freedom
## Multiple R-squared:  0.4333, Adjusted R-squared:  0.432 
## F-statistic: 338.4 on 16 and 7081 DF,  p-value: < 2.2e-16

Another slight drop. Another try: CropSales

m_backward <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_backward)
## 
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropOrgSales + 
##     FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + 
##     HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres + 
##     HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -721823  -48564  -27341    2500 2089082 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.050e+04  2.992e+03  10.193  < 2e-16 ***
## ChemExp           2.730e-03  3.798e-04   7.190 7.16e-13 ***
## ChemOps           2.203e+02  2.377e+01   9.268  < 2e-16 ***
## CropOps           5.054e+01  1.660e+01   3.044  0.00234 ** 
## CropOrgSales      1.253e-02  1.571e-03   7.981 1.68e-15 ***
## FertExp          -1.429e-03  3.595e-04  -3.977 7.06e-05 ***
## FertOps          -2.107e+02  1.950e+01 -10.806  < 2e-16 ***
## HoneyOpsProd      3.187e+03  3.597e+02   8.860  < 2e-16 ***
## HoneyOpsSales    -7.213e+03  4.811e+02 -14.993  < 2e-16 ***
## HoneySales        4.913e-01  8.573e-03  57.316  < 2e-16 ***
## HoneyColSales     2.591e+02  1.761e+01  14.720  < 2e-16 ***
## HortExcTVSTOps   -2.365e+02  7.188e+01  -3.291  0.00100 ** 
## HortExcTVSTSales  2.297e-04  1.109e-04   2.071  0.03839 *  
## HortExcTAcres    -4.070e+02  1.963e+02  -2.074  0.03815 *  
## HortExcTIrgAcres -1.684e+01  6.100e+00  -2.761  0.00578 ** 
## HortExcTIrgOps    3.296e+01  8.002e+00   4.118 3.86e-05 ***
## HortUndProt       1.604e+03  3.613e+02   4.438 9.20e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146200 on 7081 degrees of freedom
## Multiple R-squared:  0.4332, Adjusted R-squared:  0.4319 
## F-statistic: 338.2 on 16 and 7081 DF,  p-value: < 2.2e-16

Another drop. In fact, after removing each of these variables once, there was no increase in the Adjusted R-squared so the backward model looks strikingly similar to the full model, except that we’ve ruled out any valuable impact to our model by eliminating the number of Crop Organizations with Operations and the number of Honey Colonies with Operations:

m_backward <- lm(HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, data = HoneyC_FullModel)
summary(m_backward)
## 
## Call:
## lm(formula = HoneyProd ~ ChemExp + ChemOps + CropOps + CropSales + 
##     CropOrgSales + FertExp + FertOps + HoneyOpsProd + HoneyOpsSales + 
##     HoneySales + HoneyColSales + HortExcTVSTOps + HortExcTVSTSales + 
##     HortExcTAcres + HortExcTIrgAcres + HortExcTIrgOps + HortUndProt, 
##     data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -712524  -49322  -26912    2842 2104209 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -1.893e+04  2.068e+04  -0.915 0.360125    
## ChemExp           2.940e-03  3.894e-04   7.550 4.92e-14 ***
## ChemOps           2.154e+02  2.385e+01   9.030  < 2e-16 ***
## CropOps           4.217e+01  1.695e+01   2.488 0.012886 *  
## CropSales         3.293e+03  1.364e+03   2.415 0.015766 *  
## CropOrgSales      1.271e-02  1.572e-03   8.087 7.12e-16 ***
## FertExp          -1.821e-03  3.943e-04  -4.619 3.92e-06 ***
## FertOps          -2.032e+02  1.973e+01 -10.296  < 2e-16 ***
## HoneyOpsProd      3.206e+03  3.597e+02   8.914  < 2e-16 ***
## HoneyOpsSales    -7.200e+03  4.809e+02 -14.971  < 2e-16 ***
## HoneySales        4.913e-01  8.570e-03  57.334  < 2e-16 ***
## HoneyColSales     2.580e+02  1.761e+01  14.654  < 2e-16 ***
## HortExcTVSTOps   -2.420e+02  7.189e+01  -3.367 0.000764 ***
## HortExcTVSTSales  2.195e-04  1.109e-04   1.978 0.047959 *  
## HortExcTAcres    -4.177e+02  1.963e+02  -2.128 0.033337 *  
## HortExcTIrgAcres -1.723e+01  6.100e+00  -2.824 0.004759 ** 
## HortExcTIrgOps    3.262e+01  8.001e+00   4.076 4.62e-05 ***
## HortUndProt       1.632e+03  3.614e+02   4.516 6.40e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146100 on 7080 degrees of freedom
## Multiple R-squared:  0.4337, Adjusted R-squared:  0.4323 
## F-statistic: 318.9 on 17 and 7080 DF,  p-value: < 2.2e-16

So far our new best model came from the Backward Elimination method and gave us an Adjusted R-squared of 0.4323.

Forward Selection:

With forward selection we start from the start by comparing each variable in a linear model and adding new variables if they improve our Adjusted R-squared. It is the opposite of backward elimination: we add items until our Adjusted R-squared is higher than it was in the previous model.

We start with the start, and end with the end. Here are a few examples:

m_forward <- lm(HoneyProd ~ ChemExp, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ ChemExp, data = HoneyC_FullModel)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1392592   -56941   -46062   -12873  2730983 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.712e+04  2.347e+03   20.07   <2e-16 ***
## ChemExp     6.593e-03  2.225e-04   29.63   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 183000 on 7096 degrees of freedom
## Multiple R-squared:  0.1101, Adjusted R-squared:   0.11 
## F-statistic:   878 on 1 and 7096 DF,  p-value: < 2.2e-16

With ChemExp included, our Adjusted R-squared is 0.11

m_forward <- lm(HoneyProd ~ ChemOps, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ ChemOps, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -535655  -71954  -43096  -10021 2871231 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 22939.376   3343.792    6.86 7.45e-12 ***
## ChemOps       145.128      7.127   20.36  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 188500 on 7096 degrees of freedom
## Multiple R-squared:  0.05521,    Adjusted R-squared:  0.05508 
## F-statistic: 414.7 on 1 and 7096 DF,  p-value: < 2.2e-16

With ChemOps included, our Adjusted R-squared is 0.05508. ChemExp is a better starting point.

m_forward <- lm(HoneyProd ~ CropOps, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ CropOps, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -525947  -72728  -44649   -6749 2877363 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 24780.520   3354.171   7.388 1.66e-13 ***
## CropOps       134.118      6.861  19.547  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 188900 on 7096 degrees of freedom
## Multiple R-squared:  0.05109,    Adjusted R-squared:  0.05096 
## F-statistic: 382.1 on 1 and 7096 DF,  p-value: < 2.2e-16

With CropOps included, our Adjusted R-squared is 0.05096 ChemExp is again a better starting point.

m_forward <- lm(HoneyProd ~ HortUndProt, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -681024  -67292  -62064  -20189 2919735 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  68913.7     2318.2   29.73   <2e-16 ***
## HortUndProt   1558.6      138.2   11.28   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 192200 on 7096 degrees of freedom
## Multiple R-squared:  0.01761,    Adjusted R-squared:  0.01747 
## F-statistic: 127.2 on 1 and 7096 DF,  p-value: < 2.2e-16

With HoneySales included, our Adjusted R-squared is 0.3084. HoneyProd is best starting point. We can add from here with a couple more examples:

m_forward <- lm(HoneyProd ~ HortUndProt + ChemExp, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + ChemExp, data = HoneyC_FullModel)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1339641   -57923   -43898   -11766  2739401 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.406e+04  2.354e+03  18.714   <2e-16 ***
## HortUndProt 1.260e+03  1.311e+02   9.611   <2e-16 ***
## ChemExp     6.426e-03  2.218e-04  28.973   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 181800 on 7095 degrees of freedom
## Multiple R-squared:  0.1215, Adjusted R-squared:  0.1213 
## F-statistic: 490.8 on 2 and 7095 DF,  p-value: < 2.2e-16

Adding ChemExp to our model, we get a decreased Adjusted R-squared of 0.1213. This item should not be added to our model, yet.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -775407  -47306  -43136  -13742 2831442 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.682e+04  1.950e+03   24.01   <2e-16 ***
## HortUndProt 1.792e+03  1.141e+02   15.71   <2e-16 ***
## HoneySales  5.010e-01  8.675e-03   57.75   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 158600 on 7095 degrees of freedom
## Multiple R-squared:  0.3318, Adjusted R-squared:  0.3316 
## F-statistic:  1761 on 2 and 7095 DF,  p-value: < 2.2e-16

Adding HoneySales to our model, we get an increased Adjusted R-squared of 0.3316. This item should be added. Let’s see if we can add a third explanatory variable:

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales, 
##     data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -884552  -42236  -38248  -10244 2652548 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.059e+04  1.919e+03   21.15   <2e-16 ***
## HortUndProt   1.423e+03  1.123e+02   12.67   <2e-16 ***
## HoneySales    5.065e-01  8.433e-03   60.06   <2e-16 ***
## HoneyColSales 3.680e+02  1.791e+01   20.55   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 154000 on 7094 degrees of freedom
## Multiple R-squared:  0.3693, Adjusted R-squared:  0.369 
## F-statistic:  1385 on 3 and 7094 DF,  p-value: < 2.2e-16

Adding HoneyColSales to our model our Adjusted R-squared goes up to 0.369. It can be added and we can search for a fourth variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -859041  -43448  -31624  -12572 2379893 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.023e+04  1.977e+03   15.29   <2e-16 ***
## HortUndProt   1.284e+03  1.104e+02   11.64   <2e-16 ***
## HoneySales    4.662e-01  8.600e-03   54.21   <2e-16 ***
## HoneyColSales 3.358e+02  1.766e+01   19.01   <2e-16 ***
## ChemExp       3.271e-03  1.925e-04   16.99   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 151000 on 7093 degrees of freedom
## Multiple R-squared:  0.394,  Adjusted R-squared:  0.3936 
## F-statistic:  1153 on 4 and 7093 DF,  p-value: < 2.2e-16

Adding ChemExp to our model our Adjusted R-squared goes up to 0.3936. It can be added and we can search for a fifth variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -834541  -46540  -34709   -2190 2358684 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.801e+04  2.086e+03   18.22   <2e-16 ***
## HortUndProt    1.200e+03  1.097e+02   10.94   <2e-16 ***
## HoneySales     4.814e-01  8.642e-03   55.70   <2e-16 ***
## HoneyColSales  3.210e+02  1.757e+01   18.27   <2e-16 ***
## ChemExp        3.499e-03  1.921e-04   18.22   <2e-16 ***
## HoneyOpsSales -3.611e+03  3.305e+02  -10.93   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 149800 on 7092 degrees of freedom
## Multiple R-squared:  0.404,  Adjusted R-squared:  0.4036 
## F-statistic: 961.4 on 5 and 7092 DF,  p-value: < 2.2e-16

Adding HoneyOpsSales to our model our Adjusted R-squared goes up to 0.4036. It can be added and we can search for a sixth variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales + CropOrgSales, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -740485  -45476  -34045   -2282 2279390 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.714e+04  2.074e+03  17.903   <2e-16 ***
## HortUndProt    1.281e+03  1.093e+02  11.719   <2e-16 ***
## HoneySales     4.887e-01  8.619e-03  56.707   <2e-16 ***
## HoneyColSales  3.011e+02  1.757e+01  17.136   <2e-16 ***
## ChemExp        2.942e-03  1.992e-04  14.768   <2e-16 ***
## HoneyOpsSales -3.358e+03  3.294e+02 -10.195   <2e-16 ***
## CropOrgSales   1.452e-02  1.494e-03   9.721   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 148800 on 7091 degrees of freedom
## Multiple R-squared:  0.4118, Adjusted R-squared:  0.4113 
## F-statistic: 827.5 on 6 and 7091 DF,  p-value: < 2.2e-16

Adding CropOrgSales to our model our Adjusted R-squared goes up to 0.4113. It can be added and we can search for a seventh variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -701111  -47757  -30479     601 2259122 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.640e+04  2.474e+03  10.671  < 2e-16 ***
## HortUndProt    1.080e+03  1.118e+02   9.659  < 2e-16 ***
## HoneySales     4.947e-01  8.615e-03  57.422  < 2e-16 ***
## HoneyColSales  2.781e+02  1.774e+01  15.675  < 2e-16 ***
## ChemExp        2.940e-03  1.984e-04  14.821  < 2e-16 ***
## HoneyOpsSales -5.818e+03  4.526e+02 -12.855  < 2e-16 ***
## CropOrgSales   1.257e-02  1.508e-03   8.337  < 2e-16 ***
## HoneyOpsProd   2.447e+03  3.102e+02   7.888 3.55e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 148200 on 7090 degrees of freedom
## Multiple R-squared:  0.4169, Adjusted R-squared:  0.4164 
## F-statistic: 724.3 on 7 and 7090 DF,  p-value: < 2.2e-16

Adding HoneyOpsProd to our model our Adjusted R-squared goes up to 0.4164. It can be added and we can search for an eigth variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres, 
##     data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -692870  -47596  -30050     477 2261648 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.794e+04  2.491e+03  11.216  < 2e-16 ***
## HortUndProt    2.483e+03  3.134e+02   7.922 2.69e-15 ***
## HoneySales     4.948e-01  8.601e-03  57.527  < 2e-16 ***
## HoneyColSales  2.739e+02  1.773e+01  15.445  < 2e-16 ***
## ChemExp        2.885e-03  1.984e-04  14.538  < 2e-16 ***
## HoneyOpsSales -6.194e+03  4.587e+02 -13.504  < 2e-16 ***
## CropOrgSales   1.229e-02  1.507e-03   8.158 4.00e-16 ***
## HoneyOpsProd   2.681e+03  3.136e+02   8.550  < 2e-16 ***
## HortExcTAcres -8.241e+02  1.720e+02  -4.791 1.69e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 147900 on 7089 degrees of freedom
## Multiple R-squared:  0.4188, Adjusted R-squared:  0.4182 
## F-statistic: 638.6 on 8 and 7089 DF,  p-value: < 2.2e-16

Adding HortExcTAcres to our model our Adjusted R-squared goes up to 0.4182. It can be added and we can search for a ninth variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + 
##     ChemOps, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -691954  -48883  -30069    1394 2202903 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.206e+04  2.827e+03   7.806 6.78e-15 ***
## HortUndProt    2.522e+03  3.132e+02   8.053 9.40e-16 ***
## HoneySales     4.950e-01  8.590e-03  57.622  < 2e-16 ***
## HoneyColSales  2.700e+02  1.773e+01  15.223  < 2e-16 ***
## ChemExp        2.424e-03  2.244e-04  10.798  < 2e-16 ***
## HoneyOpsSales -6.069e+03  4.590e+02 -13.223  < 2e-16 ***
## CropOrgSales   1.191e-02  1.507e-03   7.900 3.22e-15 ***
## HoneyOpsProd   2.260e+03  3.276e+02   6.899 5.68e-12 ***
## HortExcTAcres -8.935e+02  1.725e+02  -5.179 2.29e-07 ***
## ChemOps        3.160e+01  7.222e+00   4.375 1.23e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 147700 on 7088 degrees of freedom
## Multiple R-squared:  0.4204, Adjusted R-squared:  0.4197 
## F-statistic: 571.2 on 9 and 7088 DF,  p-value: < 2.2e-16

Adding ChemOps to our model our Adjusted R-squared goes up to 0.4197. It can be added and we can search for a tenth variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + 
##     ChemOps + FertOps, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -676974  -50085  -27170    3748 2149204 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.926e+04  2.890e+03  10.126  < 2e-16 ***
## HortUndProt    2.394e+03  3.111e+02   7.695 1.61e-14 ***
## HoneySales     4.869e-01  8.562e-03  56.869  < 2e-16 ***
## HoneyColSales  2.652e+02  1.761e+01  15.058  < 2e-16 ***
## ChemExp        1.774e-03  2.314e-04   7.668 1.98e-14 ***
## HoneyOpsSales -7.514e+03  4.763e+02 -15.776  < 2e-16 ***
## CropOrgSales   1.288e-02  1.499e-03   8.595  < 2e-16 ***
## HoneyOpsProd   3.218e+03  3.380e+02   9.523  < 2e-16 ***
## HortExcTAcres -7.989e+02  1.715e+02  -4.659 3.24e-06 ***
## ChemOps        2.410e+02  2.138e+01  11.274  < 2e-16 ***
## FertOps       -1.982e+02  1.907e+01 -10.398  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146600 on 7087 degrees of freedom
## Multiple R-squared:  0.4291, Adjusted R-squared:  0.4283 
## F-statistic: 532.7 on 10 and 7087 DF,  p-value: < 2.2e-16

Adding FertOps to our model our Adjusted R-squared goes up to 0.4283. It can be added and we can search for an eleventh variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + 
##     ChemOps + FertOps + FertExp, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -745040  -49074  -27773    2052 2111088 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.171e+04  2.965e+03  10.695  < 2e-16 ***
## HortUndProt    2.426e+03  3.109e+02   7.801 7.03e-15 ***
## HoneySales     4.890e-01  8.575e-03  57.032  < 2e-16 ***
## HoneyColSales  2.621e+02  1.761e+01  14.880  < 2e-16 ***
## ChemExp        2.852e-03  3.763e-04   7.579 3.93e-14 ***
## HoneyOpsSales -7.286e+03  4.800e+02 -15.180  < 2e-16 ***
## CropOrgSales   1.291e-02  1.498e-03   8.620  < 2e-16 ***
## HoneyOpsProd   3.016e+03  3.422e+02   8.814  < 2e-16 ***
## HortExcTAcres -8.431e+02  1.718e+02  -4.909 9.37e-07 ***
## ChemOps        2.505e+02  2.152e+01  11.641  < 2e-16 ***
## FertOps       -1.994e+02  1.905e+01 -10.467  < 2e-16 ***
## FertExp       -1.284e-03  3.537e-04  -3.629 0.000286 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146500 on 7086 degrees of freedom
## Multiple R-squared:  0.4302, Adjusted R-squared:  0.4293 
## F-statistic: 486.3 on 11 and 7086 DF,  p-value: < 2.2e-16

Adding FertExp to our model our Adjusted R-squared goes up to 0.4283. It can be added and we can search for a twelth variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + 
##     ChemOps + FertOps + FertExp + HortExcTIrgOps, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -722946  -48756  -27444    2041 2118695 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     3.110e+04  2.969e+03  10.476  < 2e-16 ***
## HortUndProt     1.958e+03  3.397e+02   5.763 8.62e-09 ***
## HoneySales      4.900e-01  8.573e-03  57.154  < 2e-16 ***
## HoneyColSales   2.610e+02  1.760e+01  14.827  < 2e-16 ***
## ChemExp         2.726e-03  3.778e-04   7.216 5.92e-13 ***
## HoneyOpsSales  -7.230e+03  4.799e+02 -15.066  < 2e-16 ***
## CropOrgSales    1.316e-02  1.498e-03   8.784  < 2e-16 ***
## HoneyOpsProd    3.020e+03  3.420e+02   8.830  < 2e-16 ***
## HortExcTAcres  -7.665e+02  1.731e+02  -4.428 9.66e-06 ***
## ChemOps         2.499e+02  2.150e+01  11.620  < 2e-16 ***
## FertOps        -1.994e+02  1.904e+01 -10.475  < 2e-16 ***
## FertExp        -1.210e-03  3.541e-04  -3.418 0.000634 ***
## HortExcTIrgOps  1.493e+01  4.383e+00   3.406 0.000663 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146400 on 7085 degrees of freedom
## Multiple R-squared:  0.4311, Adjusted R-squared:  0.4301 
## F-statistic: 447.4 on 12 and 7085 DF,  p-value: < 2.2e-16

Adding HortExcTIrgOps to our model our Adjusted R-squared goes up to 0.4301. It can be added and we can search for a thirteenth variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + 
##     ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales, 
##     data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -689935  -49653  -26886    2755 2134137 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -2.325e+04  2.010e+04  -1.157  0.24745    
## HortUndProt     1.985e+03  3.397e+02   5.844 5.32e-09 ***
## HoneySales      4.899e-01  8.569e-03  57.169  < 2e-16 ***
## HoneyColSales   2.593e+02  1.761e+01  14.731  < 2e-16 ***
## ChemExp         2.960e-03  3.872e-04   7.645 2.37e-14 ***
## HoneyOpsSales  -7.240e+03  4.797e+02 -15.092  < 2e-16 ***
## CropOrgSales    1.318e-02  1.498e-03   8.800  < 2e-16 ***
## HoneyOpsProd    3.003e+03  3.419e+02   8.783  < 2e-16 ***
## HortExcTAcres  -7.908e+02  1.732e+02  -4.565 5.08e-06 ***
## ChemOps         2.389e+02  2.187e+01  10.926  < 2e-16 ***
## FertOps        -1.939e+02  1.914e+01 -10.134  < 2e-16 ***
## FertExp        -1.669e-03  3.917e-04  -4.261 2.06e-05 ***
## HortExcTIrgOps  1.429e+01  4.387e+00   3.258  0.00113 ** 
## CropSales       3.627e+03  1.327e+03   2.734  0.00627 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146300 on 7084 degrees of freedom
## Multiple R-squared:  0.4317, Adjusted R-squared:  0.4307 
## F-statistic: 413.9 on 13 and 7084 DF,  p-value: < 2.2e-16

Adding CropSales to our model our Adjusted R-squared goes up to 0.4307. It can be added and we can search for a fourthteenth variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + HortExcTVSTOps, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + 
##     ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + 
##     HortExcTVSTOps, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -693663  -49551  -26670    2654 2121307 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -2.978e+04  2.027e+04  -1.469  0.14180    
## HortUndProt     2.047e+03  3.405e+02   6.011 1.94e-09 ***
## HoneySales      4.909e-01  8.577e-03  57.238  < 2e-16 ***
## HoneyColSales   2.590e+02  1.760e+01  14.717  < 2e-16 ***
## ChemExp         3.024e-03  3.880e-04   7.794 7.41e-15 ***
## HoneyOpsSales  -7.174e+03  4.803e+02 -14.938  < 2e-16 ***
## CropOrgSales    1.375e-02  1.516e-03   9.074  < 2e-16 ***
## HoneyOpsProd    3.237e+03  3.551e+02   9.116  < 2e-16 ***
## HortExcTAcres  -7.172e+02  1.758e+02  -4.079 4.57e-05 ***
## ChemOps         2.377e+02  2.186e+01  10.874  < 2e-16 ***
## FertOps        -1.902e+02  1.919e+01  -9.910  < 2e-16 ***
## FertExp        -1.773e-03  3.939e-04  -4.500 6.89e-06 ***
## HortExcTIrgOps  1.360e+01  4.394e+00   3.094  0.00198 ** 
## CropSales       4.001e+03  1.335e+03   2.997  0.00273 ** 
## HortExcTVSTOps -1.327e+02  5.457e+01  -2.432  0.01504 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146300 on 7083 degrees of freedom
## Multiple R-squared:  0.4322, Adjusted R-squared:  0.431 
## F-statistic: 385.1 on 14 and 7083 DF,  p-value: < 2.2e-16

Adding HortExcTVSTOps to our model our Adjusted R-squared goes up to 0.431. It can be added and we can search for a fifthteenth variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + HortExcTVSTOps + HortExcTIrgAcres, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + 
##     ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + 
##     HortExcTVSTOps + HortExcTIrgAcres, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -688591  -49407  -26797    2227 2125262 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -3.072e+04  2.026e+04  -1.516  0.12949    
## HortUndProt       1.754e+03  3.548e+02   4.943 7.86e-07 ***
## HoneySales        4.908e-01  8.573e-03  57.255  < 2e-16 ***
## HoneyColSales     2.605e+02  1.760e+01  14.802  < 2e-16 ***
## ChemExp           3.010e-03  3.878e-04   7.762 9.57e-15 ***
## HoneyOpsSales    -7.115e+03  4.805e+02 -14.809  < 2e-16 ***
## CropOrgSales      1.381e-02  1.515e-03   9.115  < 2e-16 ***
## HoneyOpsProd      3.175e+03  3.555e+02   8.930  < 2e-16 ***
## HortExcTAcres    -4.875e+02  1.926e+02  -2.531  0.01139 *  
## ChemOps           2.369e+02  2.185e+01  10.840  < 2e-16 ***
## FertOps          -1.905e+02  1.918e+01  -9.930  < 2e-16 ***
## FertExp          -1.761e-03  3.937e-04  -4.474 7.80e-06 ***
## HortExcTIrgOps    3.303e+01  7.982e+00   4.137 3.56e-05 ***
## CropSales         4.091e+03  1.335e+03   3.065  0.00218 ** 
## HortExcTVSTOps   -1.264e+02  5.459e+01  -2.315  0.02063 *  
## HortExcTIrgAcres -1.778e+01  6.100e+00  -2.915  0.00357 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146200 on 7082 degrees of freedom
## Multiple R-squared:  0.4328, Adjusted R-squared:  0.4316 
## F-statistic: 360.3 on 15 and 7082 DF,  p-value: < 2.2e-16

Adding HortExcTIrgAcres to our model our Adjusted R-squared goes up to 0.4316. It can be added and we can search for a sixteenth variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + HortExcTVSTOps + HortExcTIrgAcres + CropOps, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + 
##     ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + 
##     HortExcTVSTOps + HortExcTIrgAcres + CropOps, data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -705872  -49287  -26985    2922 2110231 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -2.037e+04  2.068e+04  -0.985  0.32460    
## HortUndProt       1.769e+03  3.548e+02   4.986 6.30e-07 ***
## HoneySales        4.910e-01  8.570e-03  57.295  < 2e-16 ***
## HoneyColSales     2.593e+02  1.760e+01  14.737  < 2e-16 ***
## ChemExp           3.016e-03  3.876e-04   7.780 8.29e-15 ***
## HoneyOpsSales    -7.170e+03  4.808e+02 -14.914  < 2e-16 ***
## CropOrgSales      1.351e-02  1.519e-03   8.895  < 2e-16 ***
## HoneyOpsProd      3.109e+03  3.564e+02   8.723  < 2e-16 ***
## HortExcTAcres    -4.934e+02  1.925e+02  -2.563  0.01041 *  
## ChemOps           2.132e+02  2.383e+01   8.945  < 2e-16 ***
## FertOps          -2.021e+02  1.973e+01 -10.241  < 2e-16 ***
## FertExp          -1.825e-03  3.944e-04  -4.627 3.77e-06 ***
## HortExcTIrgOps    3.369e+01  7.984e+00   4.220 2.47e-05 ***
## CropSales         3.396e+03  1.363e+03   2.492  0.01274 *  
## HortExcTVSTOps   -1.516e+02  5.550e+01  -2.732  0.00631 ** 
## HortExcTIrgAcres -1.761e+01  6.098e+00  -2.888  0.00389 ** 
## CropOps           4.225e+01  1.696e+01   2.492  0.01273 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146200 on 7081 degrees of freedom
## Multiple R-squared:  0.4333, Adjusted R-squared:  0.4321 
## F-statistic: 338.4 on 16 and 7081 DF,  p-value: < 2.2e-16

Adding CropOps to our model our Adjusted R-squared goes up to 0.4321. It can be added and we can search for a seventeenth variable.

m_forward <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + HortExcTVSTOps + HortExcTIrgAcres + CropOps + HortExcTVSTSales, data = HoneyC_FullModel)
summary(m_forward)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + 
##     ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + 
##     HortExcTVSTOps + HortExcTIrgAcres + CropOps + HortExcTVSTSales, 
##     data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -712524  -49322  -26912    2842 2104209 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -1.893e+04  2.068e+04  -0.915 0.360125    
## HortUndProt       1.632e+03  3.614e+02   4.516 6.40e-06 ***
## HoneySales        4.913e-01  8.570e-03  57.334  < 2e-16 ***
## HoneyColSales     2.580e+02  1.761e+01  14.654  < 2e-16 ***
## ChemExp           2.940e-03  3.894e-04   7.550 4.92e-14 ***
## HoneyOpsSales    -7.200e+03  4.809e+02 -14.971  < 2e-16 ***
## CropOrgSales      1.271e-02  1.572e-03   8.087 7.12e-16 ***
## HoneyOpsProd      3.206e+03  3.597e+02   8.914  < 2e-16 ***
## HortExcTAcres    -4.177e+02  1.963e+02  -2.128 0.033337 *  
## ChemOps           2.154e+02  2.385e+01   9.030  < 2e-16 ***
## FertOps          -2.032e+02  1.973e+01 -10.296  < 2e-16 ***
## FertExp          -1.821e-03  3.943e-04  -4.619 3.92e-06 ***
## HortExcTIrgOps    3.262e+01  8.001e+00   4.076 4.62e-05 ***
## CropSales         3.293e+03  1.364e+03   2.415 0.015766 *  
## HortExcTVSTOps   -2.420e+02  7.189e+01  -3.367 0.000764 ***
## HortExcTIrgAcres -1.723e+01  6.100e+00  -2.824 0.004759 ** 
## CropOps           4.217e+01  1.695e+01   2.488 0.012886 *  
## HortExcTVSTSales  2.195e-04  1.109e-04   1.978 0.047959 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146100 on 7080 degrees of freedom
## Multiple R-squared:  0.4337, Adjusted R-squared:  0.4323 
## F-statistic: 318.9 on 17 and 7080 DF,  p-value: < 2.2e-16

Adding HortExcTVSTSales to our model our Adjusted R-squared goes up to 0.4323. It can be added and we finish our final forward selection model with an Adjusted R-Squared of 0.4323.

Final Model Selection

It turns out, our final model is the same for backward elimination and forward selection. This isn’t exactly a surprise, but it gives us some understanding of what factors relate to one another.

m_final <- lm(HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + HortExcTVSTOps + HortExcTIrgAcres + CropOps + HortExcTVSTSales, data = HoneyC_FullModel)
summary(m_final)
## 
## Call:
## lm(formula = HoneyProd ~ HortUndProt + HoneySales + HoneyColSales + 
##     ChemExp + HoneyOpsSales + CropOrgSales + HoneyOpsProd + HortExcTAcres + 
##     ChemOps + FertOps + FertExp + HortExcTIrgOps + CropSales + 
##     HortExcTVSTOps + HortExcTIrgAcres + CropOps + HortExcTVSTSales, 
##     data = HoneyC_FullModel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -712524  -49322  -26912    2842 2104209 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -1.893e+04  2.068e+04  -0.915 0.360125    
## HortUndProt       1.632e+03  3.614e+02   4.516 6.40e-06 ***
## HoneySales        4.913e-01  8.570e-03  57.334  < 2e-16 ***
## HoneyColSales     2.580e+02  1.761e+01  14.654  < 2e-16 ***
## ChemExp           2.940e-03  3.894e-04   7.550 4.92e-14 ***
## HoneyOpsSales    -7.200e+03  4.809e+02 -14.971  < 2e-16 ***
## CropOrgSales      1.271e-02  1.572e-03   8.087 7.12e-16 ***
## HoneyOpsProd      3.206e+03  3.597e+02   8.914  < 2e-16 ***
## HortExcTAcres    -4.177e+02  1.963e+02  -2.128 0.033337 *  
## ChemOps           2.154e+02  2.385e+01   9.030  < 2e-16 ***
## FertOps          -2.032e+02  1.973e+01 -10.296  < 2e-16 ***
## FertExp          -1.821e-03  3.943e-04  -4.619 3.92e-06 ***
## HortExcTIrgOps    3.262e+01  8.001e+00   4.076 4.62e-05 ***
## CropSales         3.293e+03  1.364e+03   2.415 0.015766 *  
## HortExcTVSTOps   -2.420e+02  7.189e+01  -3.367 0.000764 ***
## HortExcTIrgAcres -1.723e+01  6.100e+00  -2.824 0.004759 ** 
## CropOps           4.217e+01  1.695e+01   2.488 0.012886 *  
## HortExcTVSTSales  2.195e-04  1.109e-04   1.978 0.047959 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 146100 on 7080 degrees of freedom
## Multiple R-squared:  0.4337, Adjusted R-squared:  0.4323 
## F-statistic: 318.9 on 17 and 7080 DF,  p-value: < 2.2e-16