Authors: Ajay Arora, Romerl Elizes

Date: 10/31/2021

I. Purpose

In this Initial Experiment, we will conduct a comparison exercise for the Recursive function vs.ย the Step function for 5 General Linear Model algorithms: Gaussian, Poisson, Gamma, Inverse Gaussian, and Binomial. We will only conduct one iteration of each exercise for this Midterm. The purpose of this experiment is to demonstrate that this exercise can easily be executed for this initial data set. The data set used is a famous baseball data set called moneyball. Each record highlights a team and their baseball statistics for a particular year. The data set contains over 2000 columns. The target variable for this data set is TARGET_WINS.

Summary of Initial Experiments

A Table of the Results may be found in the Conclusion section of the bottom of this document.

  • The Data Set has only 13 independent variables to work with. The overall goal of the Final Project is to be able to achieve Data Sets with at least 50 independent variables.
  • The Recursive Function overall has been able to reduce the number of viable variables to 8 in the Gamma and Inverse Gaussian models.
  • The Recursive implementation of Binomial was able to arrive at 10 optimal variables with 2 Recursive calls.
  • The Recursive implementation of Gaussian and Poisson were able to arrive at 11 optimal variables with 2 Recursive calls.
  • The Recursive Function on all models was executed only twice to derive at the optimal number of variables. Even though the Step function was competitive or better in terms of step calls, the optimal number of variables was only 11 as compared to the Recursive function achieving 8. Incidentally, the Step function only reached 11 viable variables with the Poisson Model.
  • Two Step models, Gaussian and Inverse Gaussian, did not even call the Step function because the optimal variables was 13.
  • Two Step models, Gamma and Binomal, called only one Step iteration but their optimal variables were 12.
  • The Calcuted McFadden R2 for all 5 models indicates that the R2 of the Step function is slightly better than the Recursive function.
  • The AIC for all 5 models indicates that the AIC of the Step function is slightly better than the Recursive function.
  • The BIC for all 5 models indicates that the BIC of the Recursive function is slightly better than the Step function.

These findings conclusively indicate that we will be proceeding with our additional Experiments with 4 other data sets increasing to 50 variables for the final Data Set. The results of these initial experiments indicate that our developed Recursive function is competitive to the Step function of the general linear models and that we should proceed with the rest of the experiments to validate our findings.

II. Data Preparation

This section covers the data prepartion activities needed for this experiment.

B. Get Rid of Some Variables and impute missing values.

We get rid of some variables as they will not be needed for this exercise.

C. Impute missing values.

We impute any missing values in the data set.

## 
##  iter imp variable
##   1   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   1   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   1   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   1   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   1   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   TARGET_WINS     TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B 
##  Min.   :  0.00   Min.   : 891   Min.   : 69.0   Min.   :  0.00  
##  1st Qu.: 71.00   1st Qu.:1383   1st Qu.:208.0   1st Qu.: 34.00  
##  Median : 82.00   Median :1454   Median :238.0   Median : 47.00  
##  Mean   : 80.79   Mean   :1469   Mean   :241.2   Mean   : 55.25  
##  3rd Qu.: 92.00   3rd Qu.:1537   3rd Qu.:273.0   3rd Qu.: 72.00  
##  Max.   :146.00   Max.   :2554   Max.   :458.0   Max.   :223.00  
##  TEAM_BATTING_HR  TEAM_BATTING_BB TEAM_BATTING_SO  TEAM_BASERUN_SB
##  Min.   :  0.00   Min.   :  0.0   Min.   :   0.0   Min.   :  0.0  
##  1st Qu.: 42.00   1st Qu.:451.0   1st Qu.: 546.0   1st Qu.: 67.0  
##  Median :102.00   Median :512.0   Median : 735.0   Median :106.0  
##  Mean   : 99.61   Mean   :501.6   Mean   : 728.7   Mean   :135.4  
##  3rd Qu.:147.00   3rd Qu.:580.0   3rd Qu.: 925.0   3rd Qu.:170.0  
##  Max.   :264.00   Max.   :878.0   Max.   :1399.0   Max.   :697.0  
##  TEAM_BASERUN_CS  TEAM_PITCHING_H TEAM_PITCHING_BB TEAM_PITCHING_SO 
##  Min.   :  0.00   Min.   : 1137   Min.   :   0.0   Min.   :    0.0  
##  1st Qu.: 42.00   1st Qu.: 1419   1st Qu.: 476.0   1st Qu.:  611.0  
##  Median : 56.00   Median : 1518   Median : 536.5   Median :  805.0  
##  Mean   : 74.06   Mean   : 1779   Mean   : 553.0   Mean   :  811.3  
##  3rd Qu.: 85.25   3rd Qu.: 1682   3rd Qu.: 611.0   3rd Qu.:  958.0  
##  Max.   :201.00   Max.   :30132   Max.   :3645.0   Max.   :19278.0  
##  TEAM_FIELDING_E  TEAM_FIELDING_DP
##  Min.   :  65.0   Min.   : 52.0   
##  1st Qu.: 127.0   1st Qu.:125.0   
##  Median : 159.0   Median :146.0   
##  Mean   : 246.5   Mean   :141.6   
##  3rd Qu.: 249.2   3rd Qu.:162.0   
##  Max.   :1898.0   Max.   :228.0
## [1] 2276   14

III. Defining Recursive Functions

IV. Run Experiments

A. Run glm Gaussian Experiment

## [1] "Step Model"
## 
## Call:
## lm(formula = TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_2B + 
##     TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_BB + TEAM_BATTING_SO + 
##     TEAM_BASERUN_SB + TEAM_PITCHING_H + TEAM_PITCHING_SO + TEAM_FIELDING_E + 
##     TEAM_FIELDING_DP, data = bb_train_imputed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -49.971  -8.518   0.188   8.272  47.657 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      34.8023987  5.0821245   6.848 9.61e-12 ***
## TEAM_BATTING_H    0.0429297  0.0035710  12.022  < 2e-16 ***
## TEAM_BATTING_2B  -0.0189732  0.0088853  -2.135 0.032841 *  
## TEAM_BATTING_3B   0.0254420  0.0162434   1.566 0.117418    
## TEAM_BATTING_HR   0.0820832  0.0093910   8.741  < 2e-16 ***
## TEAM_BATTING_BB   0.0068444  0.0030684   2.231 0.025804 *  
## TEAM_BATTING_SO  -0.0158206  0.0024253  -6.523 8.46e-11 ***
## TEAM_BASERUN_SB   0.0544231  0.0043586  12.486  < 2e-16 ***
## TEAM_PITCHING_H   0.0011557  0.0003371   3.428 0.000619 ***
## TEAM_PITCHING_SO  0.0012394  0.0006649   1.864 0.062442 .  
## TEAM_FIELDING_E  -0.0414281  0.0026877 -15.414  < 2e-16 ***
## TEAM_FIELDING_DP -0.1004216  0.0126872  -7.915 3.83e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.67 on 2264 degrees of freedom
## Multiple R-squared:  0.3566, Adjusted R-squared:  0.3535 
## F-statistic: 114.1 on 11 and 2264 DF,  p-value: < 2.2e-16
## [1] "Recursive Model"
## 
## Call:
## lm(formula = eval(parse(text = model1)), data = datainput)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -53.327  -8.561   0.387   8.416  49.196 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      42.1426399  4.5049030   9.355  < 2e-16 ***
## TEAM_BATTING_H    0.0390702  0.0024869  15.710  < 2e-16 ***
## TEAM_BATTING_HR   0.0814626  0.0086417   9.427  < 2e-16 ***
## TEAM_BATTING_SO  -0.0170569  0.0021410  -7.967 2.55e-15 ***
## TEAM_BASERUN_SB   0.0600712  0.0040554  14.813  < 2e-16 ***
## TEAM_PITCHING_H   0.0013156  0.0002949   4.462 8.53e-06 ***
## TEAM_FIELDING_E  -0.0437100  0.0024305 -17.984  < 2e-16 ***
## TEAM_FIELDING_DP -0.0999059  0.0126179  -7.918 3.75e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.7 on 2268 degrees of freedom
## Multiple R-squared:  0.3522, Adjusted R-squared:  0.3502 
## F-statistic: 176.1 on 7 and 2268 DF,  p-value: < 2.2e-16

B. Run glm Poisson Experiment

## [1] "Step Model"
## 
## Call:
## glm(formula = TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_2B + 
##     TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_BB + TEAM_BATTING_SO + 
##     TEAM_BASERUN_SB + TEAM_PITCHING_H + TEAM_PITCHING_SO + TEAM_FIELDING_E + 
##     TEAM_FIELDING_DP, family = poisson, data = bb_train_imputed)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -6.8744  -0.9603   0.0172   0.9167   5.0635  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       3.783e+00  4.594e-02  82.356   <2e-16 ***
## TEAM_BATTING_H    5.766e-04  3.252e-05  17.729   <2e-16 ***
## TEAM_BATTING_2B  -2.782e-04  7.857e-05  -3.540   0.0004 ***
## TEAM_BATTING_3B   3.326e-04  1.432e-04   2.323   0.0202 *  
## TEAM_BATTING_HR   9.565e-04  8.244e-05  11.602   <2e-16 ***
## TEAM_BATTING_BB   6.842e-05  2.689e-05   2.544   0.0110 *  
## TEAM_BATTING_SO  -1.933e-04  2.174e-05  -8.892   <2e-16 ***
## TEAM_BASERUN_SB   6.746e-04  3.804e-05  17.736   <2e-16 ***
## TEAM_PITCHING_H   6.204e-06  3.611e-06   1.718   0.0858 .  
## TEAM_PITCHING_SO  2.609e-05  6.709e-06   3.890   0.0001 ***
## TEAM_FIELDING_E  -5.549e-04  2.554e-05 -21.728   <2e-16 ***
## TEAM_FIELDING_DP -1.218e-03  1.120e-04 -10.875   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 7442.7  on 2275  degrees of freedom
## Residual deviance: 4874.1  on 2264  degrees of freedom
## AIC: 19027
## 
## Number of Fisher Scoring iterations: 4
## [1] "Recursive Model"
## 
## Call:
## glm(formula = eval(parse(text = model1)), family = poisson, data = datainput)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -6.7319  -0.9660   0.0175   0.9417   4.9917  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       3.804e+00  4.172e-02  91.167  < 2e-16 ***
## TEAM_BATTING_H    5.841e-04  3.125e-05  18.690  < 2e-16 ***
## TEAM_BATTING_2B  -2.702e-04  7.837e-05  -3.448 0.000565 ***
## TEAM_BATTING_3B   3.495e-04  1.414e-04   2.472 0.013452 *  
## TEAM_BATTING_HR   1.012e-03  7.904e-05  12.800  < 2e-16 ***
## TEAM_BATTING_SO  -2.051e-04  2.139e-05  -9.587  < 2e-16 ***
## TEAM_BASERUN_SB   6.771e-04  3.615e-05  18.731  < 2e-16 ***
## TEAM_PITCHING_SO  3.174e-05  5.654e-06   5.613 1.99e-08 ***
## TEAM_FIELDING_E  -5.528e-04  1.834e-05 -30.148  < 2e-16 ***
## TEAM_FIELDING_DP -1.158e-03  1.092e-04 -10.600  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 7442.7  on 2275  degrees of freedom
## Residual deviance: 4883.1  on 2266  degrees of freedom
## AIC: 19032
## 
## Number of Fisher Scoring iterations: 4

C. Run glm Gamma Experiment

## [1] "Step Model"
## 
## Call:
## glm(formula = TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_2B + 
##     TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_SO + TEAM_BASERUN_SB + 
##     TEAM_PITCHING_BB + TEAM_PITCHING_SO + TEAM_FIELDING_E + TEAM_FIELDING_DP, 
##     family = Gamma, data = bb_train_imputedG)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.26225  -0.11048   0.00294   0.10193   0.54108  
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1.927e-02  7.658e-04  25.168  < 2e-16 ***
## TEAM_BATTING_H   -6.903e-06  5.603e-07 -12.319  < 2e-16 ***
## TEAM_BATTING_2B   3.333e-06  1.436e-06   2.321   0.0204 *  
## TEAM_BATTING_3B  -4.162e-06  2.576e-06  -1.615   0.1064    
## TEAM_BATTING_HR  -1.115e-05  1.470e-06  -7.587 4.75e-14 ***
## TEAM_BATTING_SO   2.217e-06  4.126e-07   5.374 8.49e-08 ***
## TEAM_BASERUN_SB  -7.883e-06  6.494e-07 -12.138  < 2e-16 ***
## TEAM_PITCHING_BB -5.325e-07  3.313e-07  -1.607   0.1081    
## TEAM_PITCHING_SO -2.820e-07  1.343e-07  -2.100   0.0359 *  
## TEAM_FIELDING_E   7.104e-06  3.584e-07  19.822  < 2e-16 ***
## TEAM_FIELDING_DP  1.432e-05  2.017e-06   7.096 1.71e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.02803698)
## 
##     Null deviance: 103.169  on 2275  degrees of freedom
## Residual deviance:  71.785  on 2265  degrees of freedom
## AIC: 18573
## 
## Number of Fisher Scoring iterations: 5
## [1] 0.3041924
## [1] "Recursive Model"
## 
## Call:
## glm(formula = eval(parse(text = model1)), family = Gamma, data = datainput)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.27407  -0.11196   0.00168   0.10282   0.82901  
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1.909e-02  7.605e-04  25.094  < 2e-16 ***
## TEAM_BATTING_H   -7.016e-06  5.311e-07 -13.211  < 2e-16 ***
## TEAM_BATTING_2B   2.881e-06  1.427e-06   2.019   0.0436 *  
## TEAM_BATTING_HR  -1.070e-05  1.392e-06  -7.688 2.22e-14 ***
## TEAM_BATTING_SO   2.068e-06  3.677e-07   5.624 2.10e-08 ***
## TEAM_BASERUN_SB  -8.466e-06  6.154e-07 -13.758  < 2e-16 ***
## TEAM_FIELDING_E   6.931e-06  3.422e-07  20.256  < 2e-16 ***
## TEAM_FIELDING_DP  1.356e-05  1.985e-06   6.832 1.08e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.0282547)
## 
##     Null deviance: 103.17  on 2275  degrees of freedom
## Residual deviance:  72.22  on 2268  degrees of freedom
## AIC: 18580
## 
## Number of Fisher Scoring iterations: 5

D. Run glm Inverse Gaussian Experiment

## [1] "Step Model"
## 
## Call:
## glm(formula = TARGET_WINS ~ ., family = inverse.gaussian, data = bb_train_imputedI)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.97418  -0.01251   0.00066   0.01155   0.06592  
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.024e-04  2.091e-05  14.465  < 2e-16 ***
## TEAM_BATTING_H   -1.581e-07  1.470e-08 -10.753  < 2e-16 ***
## TEAM_BATTING_2B   6.425e-08  3.658e-08   1.756   0.0792 .  
## TEAM_BATTING_3B  -5.368e-08  6.331e-08  -0.848   0.3966    
## TEAM_BATTING_HR  -2.700e-07  3.843e-08  -7.027 2.78e-12 ***
## TEAM_BATTING_BB   3.579e-08  2.399e-08   1.492   0.1358    
## TEAM_BATTING_SO   5.328e-08  1.085e-08   4.912 9.67e-07 ***
## TEAM_BASERUN_SB  -1.568e-07  1.798e-08  -8.717  < 2e-16 ***
## TEAM_BASERUN_CS  -1.711e-08  4.259e-08  -0.402   0.6880    
## TEAM_PITCHING_H   6.518e-09  2.691e-09   2.423   0.0155 *  
## TEAM_PITCHING_BB -3.735e-08  1.712e-08  -2.182   0.0292 *  
## TEAM_PITCHING_SO -6.924e-09  4.835e-09  -1.432   0.1523    
## TEAM_FIELDING_E   1.543e-07  1.219e-08  12.654  < 2e-16 ***
## TEAM_FIELDING_DP  2.968e-07  5.333e-08   5.565 2.94e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for inverse.gaussian family taken to be 0.0003897628)
## 
##     Null deviance: 2.3540  on 2275  degrees of freedom
## Residual deviance: 1.9764  on 2262  degrees of freedom
## AIC: 20363
## 
## Number of Fisher Scoring iterations: 8
## [1] "Recursive Model"
## 
## Call:
## glm(formula = eval(parse(text = model1)), family = inverse.gaussian, 
##     data = datainput)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.97705  -0.01238   0.00059   0.01148   0.06833  
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       2.837e-04  1.770e-05  16.030  < 2e-16 ***
## TEAM_BATTING_H   -1.339e-07  8.988e-09 -14.902  < 2e-16 ***
## TEAM_BATTING_HR  -2.459e-07  3.550e-08  -6.927 5.57e-12 ***
## TEAM_BATTING_SO   5.316e-08  8.958e-09   5.935 3.39e-09 ***
## TEAM_BASERUN_SB  -1.765e-07  1.401e-08 -12.602  < 2e-16 ***
## TEAM_PITCHING_BB -1.750e-08  7.469e-09  -2.342   0.0192 *  
## TEAM_FIELDING_E   1.629e-07  8.333e-09  19.554  < 2e-16 ***
## TEAM_FIELDING_DP  3.358e-07  5.071e-08   6.622 4.42e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for inverse.gaussian family taken to be 0.0003861428)
## 
##     Null deviance: 2.3540  on 2275  degrees of freedom
## Residual deviance: 1.9816  on 2268  degrees of freedom
## AIC: 20357
## 
## Number of Fisher Scoring iterations: 5

E. Run glm Binomial Experiment

## [1] "Step Model"
## 
## Call:
## glm(formula = BI_TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_2B + 
##     TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_BB + TEAM_BATTING_SO + 
##     TEAM_BASERUN_SB + TEAM_PITCHING_H + TEAM_PITCHING_BB + TEAM_PITCHING_SO + 
##     TEAM_FIELDING_E + TEAM_FIELDING_DP, family = binomial, data = bb_train_imputedB)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0646  -0.9933   0.3852   0.9480   3.0153  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -5.222e+00  1.022e+00  -5.107 3.28e-07 ***
## TEAM_BATTING_H    4.295e-03  7.484e-04   5.739 9.55e-09 ***
## TEAM_BATTING_2B  -2.502e-03  1.637e-03  -1.528 0.126490    
## TEAM_BATTING_3B   1.080e-02  3.127e-03   3.455 0.000551 ***
## TEAM_BATTING_HR   1.462e-02  1.778e-03   8.222  < 2e-16 ***
## TEAM_BATTING_BB   4.905e-03  1.148e-03   4.274 1.92e-05 ***
## TEAM_BATTING_SO  -3.221e-03  4.849e-04  -6.642 3.09e-11 ***
## TEAM_BASERUN_SB   8.325e-03  9.360e-04   8.894  < 2e-16 ***
## TEAM_PITCHING_H   4.361e-04  8.472e-05   5.148 2.63e-07 ***
## TEAM_PITCHING_BB -2.608e-03  8.872e-04  -2.939 0.003288 ** 
## TEAM_PITCHING_SO  4.860e-04  1.950e-04   2.492 0.012692 *  
## TEAM_FIELDING_E  -5.527e-03  6.206e-04  -8.905  < 2e-16 ***
## TEAM_FIELDING_DP -1.413e-02  2.318e-03  -6.094 1.10e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3147.8  on 2275  degrees of freedom
## Residual deviance: 2601.0  on 2263  degrees of freedom
## AIC: 2627
## 
## Number of Fisher Scoring iterations: 5
## [1] "Recursive Model"
## 
## Call:
## glm(formula = eval(parse(text = model1)), family = binomial, 
##     data = datainput)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0662  -0.9898   0.3880   0.9533   2.9819  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -4.620e+00  9.336e-01  -4.949 7.47e-07 ***
## TEAM_BATTING_H    3.537e-03  5.487e-04   6.447 1.14e-10 ***
## TEAM_BATTING_3B   1.120e-02  3.113e-03   3.598 0.000321 ***
## TEAM_BATTING_HR   1.478e-02  1.772e-03   8.343  < 2e-16 ***
## TEAM_BATTING_BB   4.846e-03  1.142e-03   4.245 2.18e-05 ***
## TEAM_BATTING_SO  -3.374e-03  4.723e-04  -7.144 9.05e-13 ***
## TEAM_BASERUN_SB   8.331e-03  9.340e-04   8.920  < 2e-16 ***
## TEAM_PITCHING_H   4.374e-04  8.379e-05   5.220 1.79e-07 ***
## TEAM_PITCHING_BB -2.603e-03  8.818e-04  -2.952 0.003160 ** 
## TEAM_PITCHING_SO  4.551e-04  1.918e-04   2.373 0.017651 *  
## TEAM_FIELDING_E  -5.381e-03  6.104e-04  -8.816  < 2e-16 ***
## TEAM_FIELDING_DP -1.417e-02  2.317e-03  -6.116 9.58e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3147.8  on 2275  degrees of freedom
## Residual deviance: 2603.3  on 2264  degrees of freedom
## AIC: 2627.3
## 
## Number of Fisher Scoring iterations: 5

V. Tabular Results

Model Variables STEPR2 STEPFit STEPSkew STEPAIC STEPBIC STEPCalls STEPVariables RECRR2 RECRFit RECRSkew RECRAIC RECRBIC RECRCalls RECRVariables
Gaussian 13 0.3566129 114.0797198 -0.0311825 18030.032 18104.524 2 11 0.3521932 176.1491024 -0.0206568 18037.613 18089.185 3 7
Poisson 13 0.3451182 0.0000000 -0.0923911 19027.121 19095.883 2 11 0.3439106 0.0000000 -0.1112038 19032.109 19089.410 2 9
Gamma 13 0.3041924 1.0000000 1.1704365 18572.586 18641.349 3 10 0.2999763 1.0000000 0.8365347 18580.408 18631.980 2 7
Inverse Gaussian 13 0.1604279 1.0000000 4.0345558 20362.598 20448.550 0 13 0.1582047 1.0000000 2.6080747 20356.617 20408.188 3 7
Binomial 13 0.1737008 0.0000008 -2.9224244 2627.005 2701.498 1 12 0.1729562 0.0000007 -4.9740247 2627.349 2696.111 2 10