In this Initial Experiment, we will conduct a comparison exercise for the Recursive function vs.ย the Step function for 5 General Linear Model algorithms: Gaussian, Poisson, Gamma, Inverse Gaussian, and Binomial. We will only conduct one iteration of each exercise for this Midterm. The purpose of this experiment is to demonstrate that this exercise can easily be executed for this initial data set. The data set used is a famous baseball data set called moneyball. Each record highlights a team and their baseball statistics for a particular year. The data set contains over 2000 columns. The target variable for this data set is TARGET_WINS.
A Table of the Results may be found in the Conclusion section of the bottom of this document.
These findings conclusively indicate that we will be proceeding with our additional Experiments with 4 other data sets increasing to 50 variables for the final Data Set. The results of these initial experiments indicate that our developed Recursive function is competitive to the Step function of the general linear models and that we should proceed with the rest of the experiments to validate our findings.
This section covers the data prepartion activities needed for this experiment.
In this subsection, we retrieve the data from the csv files and define the global variables needed for this exercise.
bb_train <- read.table(file="moneyball-training-data.csv", header = TRUE, sep = ",")
target_var = "TARGET_WINS"
cand_variables = c()
totalruns = 1
recursivecalls = 0
linearModel = ""
numBegVariables = 0
model1R2 = 0
model1Fstat = 0
model1skew = 0
model1AIC = 0
model1BIC = 0
model1Variables = 0
model2R2 = 0
model2Fstat = 0
model2skew = 0
model2AIC = 0
model2BIC = 0
model2Variables = 0
steptime = 0
recursivetime = 0
We get rid of some variables as they will not be needed for this exercise.
We impute any missing values in the data set.
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## TARGET_WINS TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B
## Min. : 0.00 Min. : 891 Min. : 69.0 Min. : 0.00
## 1st Qu.: 71.00 1st Qu.:1383 1st Qu.:208.0 1st Qu.: 34.00
## Median : 82.00 Median :1454 Median :238.0 Median : 47.00
## Mean : 80.79 Mean :1469 Mean :241.2 Mean : 55.25
## 3rd Qu.: 92.00 3rd Qu.:1537 3rd Qu.:273.0 3rd Qu.: 72.00
## Max. :146.00 Max. :2554 Max. :458.0 Max. :223.00
## TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO TEAM_BASERUN_SB
## Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 42.00 1st Qu.:451.0 1st Qu.: 546.0 1st Qu.: 67.0
## Median :102.00 Median :512.0 Median : 735.0 Median :106.0
## Mean : 99.61 Mean :501.6 Mean : 728.7 Mean :135.4
## 3rd Qu.:147.00 3rd Qu.:580.0 3rd Qu.: 925.0 3rd Qu.:170.0
## Max. :264.00 Max. :878.0 Max. :1399.0 Max. :697.0
## TEAM_BASERUN_CS TEAM_PITCHING_H TEAM_PITCHING_BB TEAM_PITCHING_SO
## Min. : 0.00 Min. : 1137 Min. : 0.0 Min. : 0.0
## 1st Qu.: 42.00 1st Qu.: 1419 1st Qu.: 476.0 1st Qu.: 611.0
## Median : 56.00 Median : 1518 Median : 536.5 Median : 805.0
## Mean : 74.06 Mean : 1779 Mean : 553.0 Mean : 811.3
## 3rd Qu.: 85.25 3rd Qu.: 1682 3rd Qu.: 611.0 3rd Qu.: 958.0
## Max. :201.00 Max. :30132 Max. :3645.0 Max. :19278.0
## TEAM_FIELDING_E TEAM_FIELDING_DP
## Min. : 65.0 Min. : 52.0
## 1st Qu.: 127.0 1st Qu.:125.0
## Median : 159.0 Median :146.0
## Mean : 246.5 Mean :141.6
## 3rd Qu.: 249.2 3rd Qu.:162.0
## Max. :1898.0 Max. :228.0
## [1] 2276 14
We define the the recursive function for the GLM Gaussian models
recursiveGLMa <- function(targetVariable,lmresult,datainput)
{
returnVal <- TRUE
lmresult.summary <- summary(lmresult)
coefnames <- names(lmresult.summary$coefficients[,4])
lencoefnames = length(coefnames)
canresult1 <- lmresult.summary$coefficients[2:lencoefnames,4]
lenres1 <- length(canresult1)
canresult2 <- canresult1[canresult1 < 0.05]
lenres2 <- length(canresult2)
recursivecalls <<- recursivecalls + 1
if (lenres2 == lenres1) {
linearModel <<- lmresult
return(FALSE)
}
coefnames2 <- names(canresult2)
ExVar <- toString(paste(coefnames2, "+ ", collapse = ''))
ExVar <- substr(ExVar, 1, nchar(ExVar)-3)
model1 <- paste(targetVariable," ~ ",ExVar)
fit1 <- lm(eval(parse(text = model1)),data = datainput)
returnVal <- recursiveGLMa(targetVariable,fit1,datainput)
return(returnVal)
}
We define the the recursive function for the GLM Poisson models
recursiveGLMb <- function(targetVariable,lmresult,datainput)
{
returnVal <- TRUE
lmresult.summary <- summary(lmresult)
coefnames <- names(lmresult.summary$coefficients[,4])
lencoefnames = length(coefnames)
canresult1 <- lmresult.summary$coefficients[2:lencoefnames,4]
lenres1 <- length(canresult1)
canresult2 <- canresult1[canresult1 < 0.05]
lenres2 <- length(canresult2)
recursivecalls <<- recursivecalls + 1
if (lenres2 == lenres1) {
linearModel <<- lmresult
return(FALSE)
}
coefnames2 <- names(canresult2)
ExVar <- toString(paste(coefnames2, "+ ", collapse = ''))
ExVar <- substr(ExVar, 1, nchar(ExVar)-3)
model1 <- paste(targetVariable," ~ ",ExVar)
fit1 <- glm(eval(parse(text = model1)),data = datainput, family=poisson)
returnVal <- recursiveGLMb(targetVariable,fit1,datainput)
return(returnVal)
}
We define the the recursive function for the GLM Gamma models
recursiveGLMc <- function(targetVariable,lmresult,datainput)
{
returnVal <- TRUE
lmresult.summary <- summary(lmresult)
coefnames <- names(lmresult.summary$coefficients[,4])
lencoefnames = length(coefnames)
canresult1 <- lmresult.summary$coefficients[2:lencoefnames,4]
lenres1 <- length(canresult1)
canresult2 <- canresult1[canresult1 < 0.05]
lenres2 <- length(canresult2)
recursivecalls <<- recursivecalls + 1
if (lenres2 == lenres1) {
linearModel <<- lmresult
return(FALSE)
}
coefnames2 <- names(canresult2)
ExVar <- toString(paste(coefnames2, "+ ", collapse = ''))
ExVar <- substr(ExVar, 1, nchar(ExVar)-3)
model1 <- paste(targetVariable," ~ ",ExVar)
fit1 <- glm(eval(parse(text = model1)),data = datainput, family=Gamma)
returnVal <- recursiveGLMc(targetVariable,fit1,datainput)
return(returnVal)
}
We define the the recursive function for the GLM Poisson models
recursiveGLMd <- function(targetVariable,lmresult,datainput)
{
returnVal <- TRUE
lmresult.summary <- summary(lmresult)
coefnames <- names(lmresult.summary$coefficients[,4])
lencoefnames = length(coefnames)
canresult1 <- lmresult.summary$coefficients[2:lencoefnames,4]
lenres1 <- length(canresult1)
canresult2 <- canresult1[canresult1 < 0.05]
lenres2 <- length(canresult2)
recursivecalls <<- recursivecalls + 1
if (lenres2 == lenres1) {
linearModel <<- lmresult
return(FALSE)
}
if (lenres1 < 1) {
print("-------------------------------------------")
print("RECR function NOT OPTIMAL WITH THIS DATASET")
print("-------------------------------------------")
linearModel <<- lmresult
return(FALSE)
}
if (lenres2 < 1) {
print("-------------------------------------------")
print("RECR function NOT OPTIMAL WITH THIS DATASET")
print("-------------------------------------------")
linearModel <<- lmresult
return(FALSE)
}
coefnames2 <- names(canresult2)
ExVar <- toString(paste(coefnames2, "+ ", collapse = ''))
ExVar <- substr(ExVar, 1, nchar(ExVar)-3)
model1 <- paste(targetVariable," ~ ",ExVar)
fit1 <- glm(eval(parse(text = model1)),data = datainput, family=inverse.gaussian)
returnVal <- recursiveGLMd(targetVariable,fit1,datainput)
return(returnVal)
}
We define the the recursive function for the GLM Binomial models
recursiveGLMe <- function(targetVariable,lmresult,datainput)
{
returnVal <- TRUE
lmresult.summary <- summary(lmresult)
coefnames <- names(lmresult.summary$coefficients[,4])
lencoefnames = length(coefnames)
canresult1 <- lmresult.summary$coefficients[2:lencoefnames,4]
lenres1 <- length(canresult1)
canresult2 <- canresult1[canresult1 < 0.05]
lenres2 <- length(canresult2)
recursivecalls <<- recursivecalls + 1
if (lenres2 == lenres1) {
linearModel <<- lmresult
return(FALSE)
}
if (lenres2 < 1) {
print("-------------------------------------------")
print("RECR function NOT OPTIMAL WITH THIS DATASET")
print("-------------------------------------------")
linearModel <<- lmresult
return(FALSE)
}
coefnames2 <- names(canresult2)
ExVar <- toString(paste(coefnames2, "+ ", collapse = ''))
ExVar <- substr(ExVar, 1, nchar(ExVar)-3)
#ExVar
model1 <- paste(targetVariable," ~ ",ExVar)
fit1 <- glm(eval(parse(text = model1)),data = datainput, family = binomial)
returnVal <- recursiveGLMe(targetVariable,fit1,datainput)
return(returnVal)
}
## [1] "Step Model"
##
## Call:
## lm(formula = TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_2B +
## TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_BB + TEAM_BATTING_SO +
## TEAM_BASERUN_SB + TEAM_PITCHING_H + TEAM_PITCHING_SO + TEAM_FIELDING_E +
## TEAM_FIELDING_DP, data = bb_train_imputed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.971 -8.518 0.188 8.272 47.657
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.8023987 5.0821245 6.848 9.61e-12 ***
## TEAM_BATTING_H 0.0429297 0.0035710 12.022 < 2e-16 ***
## TEAM_BATTING_2B -0.0189732 0.0088853 -2.135 0.032841 *
## TEAM_BATTING_3B 0.0254420 0.0162434 1.566 0.117418
## TEAM_BATTING_HR 0.0820832 0.0093910 8.741 < 2e-16 ***
## TEAM_BATTING_BB 0.0068444 0.0030684 2.231 0.025804 *
## TEAM_BATTING_SO -0.0158206 0.0024253 -6.523 8.46e-11 ***
## TEAM_BASERUN_SB 0.0544231 0.0043586 12.486 < 2e-16 ***
## TEAM_PITCHING_H 0.0011557 0.0003371 3.428 0.000619 ***
## TEAM_PITCHING_SO 0.0012394 0.0006649 1.864 0.062442 .
## TEAM_FIELDING_E -0.0414281 0.0026877 -15.414 < 2e-16 ***
## TEAM_FIELDING_DP -0.1004216 0.0126872 -7.915 3.83e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.67 on 2264 degrees of freedom
## Multiple R-squared: 0.3566, Adjusted R-squared: 0.3535
## F-statistic: 114.1 on 11 and 2264 DF, p-value: < 2.2e-16
## [1] "Recursive Model"
##
## Call:
## lm(formula = eval(parse(text = model1)), data = datainput)
##
## Residuals:
## Min 1Q Median 3Q Max
## -53.327 -8.561 0.387 8.416 49.196
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 42.1426399 4.5049030 9.355 < 2e-16 ***
## TEAM_BATTING_H 0.0390702 0.0024869 15.710 < 2e-16 ***
## TEAM_BATTING_HR 0.0814626 0.0086417 9.427 < 2e-16 ***
## TEAM_BATTING_SO -0.0170569 0.0021410 -7.967 2.55e-15 ***
## TEAM_BASERUN_SB 0.0600712 0.0040554 14.813 < 2e-16 ***
## TEAM_PITCHING_H 0.0013156 0.0002949 4.462 8.53e-06 ***
## TEAM_FIELDING_E -0.0437100 0.0024305 -17.984 < 2e-16 ***
## TEAM_FIELDING_DP -0.0999059 0.0126179 -7.918 3.75e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.7 on 2268 degrees of freedom
## Multiple R-squared: 0.3522, Adjusted R-squared: 0.3502
## F-statistic: 176.1 on 7 and 2268 DF, p-value: < 2.2e-16
## [1] "Step Model"
##
## Call:
## glm(formula = TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_2B +
## TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_BB + TEAM_BATTING_SO +
## TEAM_BASERUN_SB + TEAM_PITCHING_H + TEAM_PITCHING_SO + TEAM_FIELDING_E +
## TEAM_FIELDING_DP, family = poisson, data = bb_train_imputed)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.8744 -0.9603 0.0172 0.9167 5.0635
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.783e+00 4.594e-02 82.356 <2e-16 ***
## TEAM_BATTING_H 5.766e-04 3.252e-05 17.729 <2e-16 ***
## TEAM_BATTING_2B -2.782e-04 7.857e-05 -3.540 0.0004 ***
## TEAM_BATTING_3B 3.326e-04 1.432e-04 2.323 0.0202 *
## TEAM_BATTING_HR 9.565e-04 8.244e-05 11.602 <2e-16 ***
## TEAM_BATTING_BB 6.842e-05 2.689e-05 2.544 0.0110 *
## TEAM_BATTING_SO -1.933e-04 2.174e-05 -8.892 <2e-16 ***
## TEAM_BASERUN_SB 6.746e-04 3.804e-05 17.736 <2e-16 ***
## TEAM_PITCHING_H 6.204e-06 3.611e-06 1.718 0.0858 .
## TEAM_PITCHING_SO 2.609e-05 6.709e-06 3.890 0.0001 ***
## TEAM_FIELDING_E -5.549e-04 2.554e-05 -21.728 <2e-16 ***
## TEAM_FIELDING_DP -1.218e-03 1.120e-04 -10.875 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 7442.7 on 2275 degrees of freedom
## Residual deviance: 4874.1 on 2264 degrees of freedom
## AIC: 19027
##
## Number of Fisher Scoring iterations: 4
## [1] "Recursive Model"
##
## Call:
## glm(formula = eval(parse(text = model1)), family = poisson, data = datainput)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.7319 -0.9660 0.0175 0.9417 4.9917
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.804e+00 4.172e-02 91.167 < 2e-16 ***
## TEAM_BATTING_H 5.841e-04 3.125e-05 18.690 < 2e-16 ***
## TEAM_BATTING_2B -2.702e-04 7.837e-05 -3.448 0.000565 ***
## TEAM_BATTING_3B 3.495e-04 1.414e-04 2.472 0.013452 *
## TEAM_BATTING_HR 1.012e-03 7.904e-05 12.800 < 2e-16 ***
## TEAM_BATTING_SO -2.051e-04 2.139e-05 -9.587 < 2e-16 ***
## TEAM_BASERUN_SB 6.771e-04 3.615e-05 18.731 < 2e-16 ***
## TEAM_PITCHING_SO 3.174e-05 5.654e-06 5.613 1.99e-08 ***
## TEAM_FIELDING_E -5.528e-04 1.834e-05 -30.148 < 2e-16 ***
## TEAM_FIELDING_DP -1.158e-03 1.092e-04 -10.600 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 7442.7 on 2275 degrees of freedom
## Residual deviance: 4883.1 on 2266 degrees of freedom
## AIC: 19032
##
## Number of Fisher Scoring iterations: 4
## [1] "Step Model"
##
## Call:
## glm(formula = TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_2B +
## TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_SO + TEAM_BASERUN_SB +
## TEAM_PITCHING_BB + TEAM_PITCHING_SO + TEAM_FIELDING_E + TEAM_FIELDING_DP,
## family = Gamma, data = bb_train_imputedG)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.26225 -0.11048 0.00294 0.10193 0.54108
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.927e-02 7.658e-04 25.168 < 2e-16 ***
## TEAM_BATTING_H -6.903e-06 5.603e-07 -12.319 < 2e-16 ***
## TEAM_BATTING_2B 3.333e-06 1.436e-06 2.321 0.0204 *
## TEAM_BATTING_3B -4.162e-06 2.576e-06 -1.615 0.1064
## TEAM_BATTING_HR -1.115e-05 1.470e-06 -7.587 4.75e-14 ***
## TEAM_BATTING_SO 2.217e-06 4.126e-07 5.374 8.49e-08 ***
## TEAM_BASERUN_SB -7.883e-06 6.494e-07 -12.138 < 2e-16 ***
## TEAM_PITCHING_BB -5.325e-07 3.313e-07 -1.607 0.1081
## TEAM_PITCHING_SO -2.820e-07 1.343e-07 -2.100 0.0359 *
## TEAM_FIELDING_E 7.104e-06 3.584e-07 19.822 < 2e-16 ***
## TEAM_FIELDING_DP 1.432e-05 2.017e-06 7.096 1.71e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 0.02803698)
##
## Null deviance: 103.169 on 2275 degrees of freedom
## Residual deviance: 71.785 on 2265 degrees of freedom
## AIC: 18573
##
## Number of Fisher Scoring iterations: 5
## [1] 0.3041924
## [1] "Recursive Model"
##
## Call:
## glm(formula = eval(parse(text = model1)), family = Gamma, data = datainput)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.27407 -0.11196 0.00168 0.10282 0.82901
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.909e-02 7.605e-04 25.094 < 2e-16 ***
## TEAM_BATTING_H -7.016e-06 5.311e-07 -13.211 < 2e-16 ***
## TEAM_BATTING_2B 2.881e-06 1.427e-06 2.019 0.0436 *
## TEAM_BATTING_HR -1.070e-05 1.392e-06 -7.688 2.22e-14 ***
## TEAM_BATTING_SO 2.068e-06 3.677e-07 5.624 2.10e-08 ***
## TEAM_BASERUN_SB -8.466e-06 6.154e-07 -13.758 < 2e-16 ***
## TEAM_FIELDING_E 6.931e-06 3.422e-07 20.256 < 2e-16 ***
## TEAM_FIELDING_DP 1.356e-05 1.985e-06 6.832 1.08e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 0.0282547)
##
## Null deviance: 103.17 on 2275 degrees of freedom
## Residual deviance: 72.22 on 2268 degrees of freedom
## AIC: 18580
##
## Number of Fisher Scoring iterations: 5
## [1] "Step Model"
##
## Call:
## glm(formula = TARGET_WINS ~ ., family = inverse.gaussian, data = bb_train_imputedI)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.97418 -0.01251 0.00066 0.01155 0.06592
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.024e-04 2.091e-05 14.465 < 2e-16 ***
## TEAM_BATTING_H -1.581e-07 1.470e-08 -10.753 < 2e-16 ***
## TEAM_BATTING_2B 6.425e-08 3.658e-08 1.756 0.0792 .
## TEAM_BATTING_3B -5.368e-08 6.331e-08 -0.848 0.3966
## TEAM_BATTING_HR -2.700e-07 3.843e-08 -7.027 2.78e-12 ***
## TEAM_BATTING_BB 3.579e-08 2.399e-08 1.492 0.1358
## TEAM_BATTING_SO 5.328e-08 1.085e-08 4.912 9.67e-07 ***
## TEAM_BASERUN_SB -1.568e-07 1.798e-08 -8.717 < 2e-16 ***
## TEAM_BASERUN_CS -1.711e-08 4.259e-08 -0.402 0.6880
## TEAM_PITCHING_H 6.518e-09 2.691e-09 2.423 0.0155 *
## TEAM_PITCHING_BB -3.735e-08 1.712e-08 -2.182 0.0292 *
## TEAM_PITCHING_SO -6.924e-09 4.835e-09 -1.432 0.1523
## TEAM_FIELDING_E 1.543e-07 1.219e-08 12.654 < 2e-16 ***
## TEAM_FIELDING_DP 2.968e-07 5.333e-08 5.565 2.94e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for inverse.gaussian family taken to be 0.0003897628)
##
## Null deviance: 2.3540 on 2275 degrees of freedom
## Residual deviance: 1.9764 on 2262 degrees of freedom
## AIC: 20363
##
## Number of Fisher Scoring iterations: 8
## [1] "Recursive Model"
##
## Call:
## glm(formula = eval(parse(text = model1)), family = inverse.gaussian,
## data = datainput)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.97705 -0.01238 0.00059 0.01148 0.06833
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.837e-04 1.770e-05 16.030 < 2e-16 ***
## TEAM_BATTING_H -1.339e-07 8.988e-09 -14.902 < 2e-16 ***
## TEAM_BATTING_HR -2.459e-07 3.550e-08 -6.927 5.57e-12 ***
## TEAM_BATTING_SO 5.316e-08 8.958e-09 5.935 3.39e-09 ***
## TEAM_BASERUN_SB -1.765e-07 1.401e-08 -12.602 < 2e-16 ***
## TEAM_PITCHING_BB -1.750e-08 7.469e-09 -2.342 0.0192 *
## TEAM_FIELDING_E 1.629e-07 8.333e-09 19.554 < 2e-16 ***
## TEAM_FIELDING_DP 3.358e-07 5.071e-08 6.622 4.42e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for inverse.gaussian family taken to be 0.0003861428)
##
## Null deviance: 2.3540 on 2275 degrees of freedom
## Residual deviance: 1.9816 on 2268 degrees of freedom
## AIC: 20357
##
## Number of Fisher Scoring iterations: 5
## [1] "Step Model"
##
## Call:
## glm(formula = BI_TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_2B +
## TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_BB + TEAM_BATTING_SO +
## TEAM_BASERUN_SB + TEAM_PITCHING_H + TEAM_PITCHING_BB + TEAM_PITCHING_SO +
## TEAM_FIELDING_E + TEAM_FIELDING_DP, family = binomial, data = bb_train_imputedB)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.0646 -0.9933 0.3852 0.9480 3.0153
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.222e+00 1.022e+00 -5.107 3.28e-07 ***
## TEAM_BATTING_H 4.295e-03 7.484e-04 5.739 9.55e-09 ***
## TEAM_BATTING_2B -2.502e-03 1.637e-03 -1.528 0.126490
## TEAM_BATTING_3B 1.080e-02 3.127e-03 3.455 0.000551 ***
## TEAM_BATTING_HR 1.462e-02 1.778e-03 8.222 < 2e-16 ***
## TEAM_BATTING_BB 4.905e-03 1.148e-03 4.274 1.92e-05 ***
## TEAM_BATTING_SO -3.221e-03 4.849e-04 -6.642 3.09e-11 ***
## TEAM_BASERUN_SB 8.325e-03 9.360e-04 8.894 < 2e-16 ***
## TEAM_PITCHING_H 4.361e-04 8.472e-05 5.148 2.63e-07 ***
## TEAM_PITCHING_BB -2.608e-03 8.872e-04 -2.939 0.003288 **
## TEAM_PITCHING_SO 4.860e-04 1.950e-04 2.492 0.012692 *
## TEAM_FIELDING_E -5.527e-03 6.206e-04 -8.905 < 2e-16 ***
## TEAM_FIELDING_DP -1.413e-02 2.318e-03 -6.094 1.10e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3147.8 on 2275 degrees of freedom
## Residual deviance: 2601.0 on 2263 degrees of freedom
## AIC: 2627
##
## Number of Fisher Scoring iterations: 5
## [1] "Recursive Model"
##
## Call:
## glm(formula = eval(parse(text = model1)), family = binomial,
## data = datainput)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.0662 -0.9898 0.3880 0.9533 2.9819
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.620e+00 9.336e-01 -4.949 7.47e-07 ***
## TEAM_BATTING_H 3.537e-03 5.487e-04 6.447 1.14e-10 ***
## TEAM_BATTING_3B 1.120e-02 3.113e-03 3.598 0.000321 ***
## TEAM_BATTING_HR 1.478e-02 1.772e-03 8.343 < 2e-16 ***
## TEAM_BATTING_BB 4.846e-03 1.142e-03 4.245 2.18e-05 ***
## TEAM_BATTING_SO -3.374e-03 4.723e-04 -7.144 9.05e-13 ***
## TEAM_BASERUN_SB 8.331e-03 9.340e-04 8.920 < 2e-16 ***
## TEAM_PITCHING_H 4.374e-04 8.379e-05 5.220 1.79e-07 ***
## TEAM_PITCHING_BB -2.603e-03 8.818e-04 -2.952 0.003160 **
## TEAM_PITCHING_SO 4.551e-04 1.918e-04 2.373 0.017651 *
## TEAM_FIELDING_E -5.381e-03 6.104e-04 -8.816 < 2e-16 ***
## TEAM_FIELDING_DP -1.417e-02 2.317e-03 -6.116 9.58e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3147.8 on 2275 degrees of freedom
## Residual deviance: 2603.3 on 2264 degrees of freedom
## AIC: 2627.3
##
## Number of Fisher Scoring iterations: 5
Model | Variables | STEPR2 | STEPFit | STEPSkew | STEPAIC | STEPBIC | STEPCalls | STEPVariables | RECRR2 | RECRFit | RECRSkew | RECRAIC | RECRBIC | RECRCalls | RECRVariables |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Gaussian | 13 | 0.3566129 | 114.0797198 | -0.0311825 | 18030.032 | 18104.524 | 2 | 11 | 0.3521932 | 176.1491024 | -0.0206568 | 18037.613 | 18089.185 | 3 | 7 |
Poisson | 13 | 0.3451182 | 0.0000000 | -0.0923911 | 19027.121 | 19095.883 | 2 | 11 | 0.3439106 | 0.0000000 | -0.1112038 | 19032.109 | 19089.410 | 2 | 9 |
Gamma | 13 | 0.3041924 | 1.0000000 | 1.1704365 | 18572.586 | 18641.349 | 3 | 10 | 0.2999763 | 1.0000000 | 0.8365347 | 18580.408 | 18631.980 | 2 | 7 |
Inverse Gaussian | 13 | 0.1604279 | 1.0000000 | 4.0345558 | 20362.598 | 20448.550 | 0 | 13 | 0.1582047 | 1.0000000 | 2.6080747 | 20356.617 | 20408.188 | 3 | 7 |
Binomial | 13 | 0.1737008 | 0.0000008 | -2.9224244 | 2627.005 | 2701.498 | 1 | 12 | 0.1729562 | 0.0000007 | -4.9740247 | 2627.349 | 2696.111 | 2 | 10 |