Model 4 - Base Runs (BsR)


In this model we incorporate our calculated metric Base Runs(BsR), a sabermetric stat created by David Smyth, to predict the number of runs a team would be expected to have scored based on the types of hits and number of walks. We include all available variables, beginning with BsR, and use forward stepwise regression to add statistically significant variables to the model.

BsR is calulated as follows:


\[\frac{A * B}{B + C} + D \\ \\ \\ \]

\[ A = Hits + Walks - Home Runs \\ B = (1.4*Total Bases - 0.6*Hits - 3*HomeRuns + 0.1*Walks) \\ C = AB - H \\ D = HR \\ \\ \]


We approximated the average at bats for a team in a 162 game season infromation from baseball reference. https://www.baseball-reference.com/leagues/MLB/bat.shtml

data <- read.csv("https://raw.githubusercontent.com/vbriot28/Data621_group2/master/data_group2_nbc.csv")

model4 <- step(lm(TARGET_WINS ~ BsR, data = data),
                  direction = "forward", 
               scope = ~ BsR + BATTING_2B + BATTING_3B + BATTING_HR + BATTING_BB + BATTING_SO + 
                 BASERUN_SB + PITCHING_H + PITCHING_HR + PITCHING_BB + PITCHING_SO + 
                 FIELDING_E + FIELDING_DP + BATTING_1B + BATTING_TB + WHGP + PITCHING_SO_BB + BATTING_BB_SO 
)
## Start:  AIC=10015.71
## TARGET_WINS ~ BsR
## 
##                  Df Sum of Sq    RSS     AIC
## + BASERUN_SB      1   12414.5 299138  9937.2
## + BATTING_2B      1    8185.5 303367  9965.0
## + FIELDING_DP     1    7589.4 303963  9968.9
## + BATTING_3B      1    4885.8 306667  9986.4
## + PITCHING_SO_BB  1    3544.7 308008  9995.1
## + BATTING_SO      1    2562.2 308990 10001.4
## + BATTING_TB      1    2370.5 309182 10002.6
## + WHGP            1    2288.9 309264 10003.1
## + PITCHING_BB     1    1991.7 309561 10005.0
## + PITCHING_SO     1    1883.9 309669 10005.7
## + BATTING_BB_SO   1    1876.8 309676 10005.8
## + BATTING_HR      1    1271.5 310281 10009.6
## + BATTING_1B      1    1259.1 310293 10009.7
## + PITCHING_H      1    1250.0 310303 10009.8
## + BATTING_BB      1    1234.7 310318 10009.9
## + PITCHING_HR     1    1094.1 310458 10010.8
## <none>                        311553 10015.7
## + FIELDING_E      1      80.2 311472 10017.2
## 
## Step:  AIC=9937.24
## TARGET_WINS ~ BsR + BASERUN_SB
## 
##                  Df Sum of Sq    RSS    AIC
## + FIELDING_E      1   18642.1 280496 9811.9
## + BATTING_2B      1    5620.9 293517 9901.7
## + BATTING_BB      1    1958.0 297180 9926.2
## + PITCHING_SO_BB  1    1711.5 297427 9927.9
## + PITCHING_SO     1     986.3 298152 9932.7
## + PITCHING_H      1     755.7 298382 9934.2
## + BATTING_SO      1     715.8 298422 9934.5
## + BATTING_TB      1     677.0 298461 9934.8
## + PITCHING_BB     1     625.4 298513 9935.1
## + BATTING_BB_SO   1     431.4 298707 9936.4
## <none>                        299138 9937.2
## + FIELDING_DP     1     226.8 298911 9937.7
## + BATTING_3B      1     216.3 298922 9937.8
## + BATTING_HR      1     167.1 298971 9938.1
## + PITCHING_HR     1     143.2 298995 9938.3
## + WHGP            1     115.8 299022 9938.5
## + BATTING_1B      1      26.2 299112 9939.1
## 
## Step:  AIC=9811.9
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E
## 
##                  Df Sum of Sq    RSS    AIC
## + BATTING_SO      1   13802.2 266694 9714.0
## + BATTING_3B      1   11967.0 268529 9727.6
## + PITCHING_SO_BB  1   11559.1 268937 9730.6
## + PITCHING_SO     1   10657.4 269839 9737.2
## + BATTING_2B      1   10522.1 269974 9738.2
## + BATTING_BB_SO   1    8413.3 272083 9753.6
## + FIELDING_DP     1    7119.8 273376 9763.0
## + BATTING_1B      1    4090.0 276406 9784.8
## + WHGP            1    3861.5 276634 9786.5
## + BATTING_HR      1    3495.6 277000 9789.1
## + BATTING_TB      1    3246.8 277249 9790.9
## + PITCHING_H      1    2947.7 277548 9793.0
## + PITCHING_HR     1    2934.8 277561 9793.1
## + PITCHING_BB     1    1279.0 279217 9804.9
## + BATTING_BB      1     670.8 279825 9809.2
## <none>                        280496 9811.9
## 
## Step:  AIC=9714.04
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO
## 
##                  Df Sum of Sq    RSS    AIC
## + FIELDING_DP     1   12174.0 254520 9623.6
## + BATTING_2B      1    9278.1 257416 9646.0
## + BATTING_3B      1    2474.3 264219 9697.6
## + BATTING_HR      1    2445.5 264248 9697.8
## + PITCHING_HR     1    2202.5 264491 9699.6
## + PITCHING_BB     1    1277.6 265416 9706.5
## + BATTING_BB      1    1271.6 265422 9706.6
## + BATTING_1B      1    1141.1 265553 9707.6
## + PITCHING_SO     1     743.8 265950 9710.5
## + WHGP            1     567.2 266127 9711.8
## <none>                        266694 9714.0
## + PITCHING_SO_BB  1     194.8 266499 9714.6
## + BATTING_BB_SO   1      98.1 266596 9715.3
## + BATTING_TB      1      71.3 266622 9715.5
## + PITCHING_H      1      19.9 266674 9715.9
## 
## Step:  AIC=9623.58
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP
## 
##                  Df Sum of Sq    RSS    AIC
## + BATTING_2B      1   11152.9 243367 9536.9
## + BATTING_HR      1    2508.1 252012 9606.0
## + PITCHING_HR     1    2395.5 252124 9606.9
## + BATTING_BB      1    1964.7 252555 9610.2
## + PITCHING_BB     1    1876.2 252644 9610.9
## + BATTING_3B      1    1846.7 252673 9611.2
## + BATTING_1B      1     859.6 253660 9618.9
## + WHGP            1     713.0 253807 9620.0
## + PITCHING_SO     1     704.9 253815 9620.1
## + PITCHING_SO_BB  1     419.2 254101 9622.3
## + BATTING_TB      1     302.3 254217 9623.2
## <none>                        254520 9623.6
## + BATTING_BB_SO   1      10.3 254509 9625.5
## + PITCHING_H      1       6.9 254513 9625.5
## 
## Step:  AIC=9536.91
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B
## 
##                  Df Sum of Sq    RSS    AIC
## + BATTING_1B      1   1903.03 241464 9523.4
## + BATTING_3B      1   1662.37 241704 9525.3
## + PITCHING_SO     1   1082.59 242284 9530.1
## + BATTING_BB_SO   1    565.72 242801 9534.3
## + WHGP            1    559.25 242808 9534.4
## + PITCHING_H      1    351.72 243015 9536.0
## + PITCHING_BB     1    300.06 243067 9536.5
## <none>                        243367 9536.9
## + BATTING_BB      1    179.12 243188 9537.4
## + PITCHING_HR     1    167.79 243199 9537.5
## + BATTING_HR      1    126.33 243241 9537.9
## + PITCHING_SO_BB  1     87.70 243279 9538.2
## + BATTING_TB      1     73.21 243294 9538.3
## 
## Step:  AIC=9523.37
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B
## 
##                  Df Sum of Sq    RSS    AIC
## + BATTING_3B      1   1873.52 239590 9510.0
## + BATTING_BB_SO   1   1290.44 240173 9514.8
## + PITCHING_H      1   1233.38 240230 9515.2
## + PITCHING_SO     1    948.09 240516 9517.6
## + PITCHING_SO_BB  1    804.59 240659 9518.8
## + WHGP            1    619.25 240845 9520.3
## + BATTING_HR      1    486.58 240977 9521.4
## + PITCHING_HR     1    313.97 241150 9522.8
## <none>                        241464 9523.4
## + BATTING_TB      1     57.42 241406 9524.9
## + BATTING_BB      1     38.64 241425 9525.1
## + PITCHING_BB     1      0.00 241464 9525.4
## 
## Step:  AIC=9509.95
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B
## 
##                  Df Sum of Sq    RSS    AIC
## + PITCHING_H      1   1170.32 238420 9502.3
## + BATTING_BB_SO   1   1129.41 238461 9502.6
## + PITCHING_SO     1   1013.36 238577 9503.6
## + WHGP            1    784.80 238806 9505.5
## + PITCHING_SO_BB  1    409.36 239181 9508.6
## <none>                        239590 9510.0
## + PITCHING_BB     1     54.68 239536 9511.5
## + PITCHING_HR     1     15.11 239575 9511.8
## + BATTING_BB      1      3.29 239587 9511.9
## + BATTING_HR      1      2.86 239587 9511.9
## + BATTING_TB      1      2.86 239587 9511.9
## 
## Step:  AIC=9502.26
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H
## 
##                  Df Sum of Sq    RSS    AIC
## + BATTING_BB_SO   1    855.89 237564 9497.1
## + PITCHING_HR     1    372.76 238047 9501.2
## <none>                        238420 9502.3
## + PITCHING_SO_BB  1    217.87 238202 9502.5
## + BATTING_BB      1     54.13 238366 9503.8
## + PITCHING_BB     1     21.27 238399 9504.1
## + WHGP            1     21.27 238399 9504.1
## + BATTING_HR      1     14.13 238406 9504.1
## + BATTING_TB      1     14.13 238406 9504.1
## + PITCHING_SO     1      1.14 238419 9504.3
## 
## Step:  AIC=9497.15
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO
## 
##                  Df Sum of Sq    RSS    AIC
## + PITCHING_HR     1    4997.6 232566 9457.1
## + BATTING_BB      1    2557.2 235007 9477.7
## + BATTING_HR      1    2281.9 235282 9480.0
## + BATTING_TB      1    2281.9 235282 9480.0
## + WHGP            1     983.1 236581 9490.9
## + PITCHING_BB     1     983.1 236581 9490.9
## <none>                        237564 9497.1
## + PITCHING_SO_BB  1      43.2 237521 9498.8
## + PITCHING_SO     1      13.8 237550 9499.0
## 
## Step:  AIC=9457.07
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO + 
##     PITCHING_HR
## 
##                  Df Sum of Sq    RSS    AIC
## + PITCHING_SO_BB  1   2272.75 230294 9439.6
## + PITCHING_BB     1   1233.31 231333 9448.5
## + WHGP            1   1233.31 231333 9448.5
## + BATTING_HR      1    795.87 231771 9452.3
## + BATTING_TB      1    795.87 231771 9452.3
## <none>                        232566 9457.1
## + BATTING_BB      1    191.55 232375 9457.4
## + PITCHING_SO     1     90.80 232476 9458.3
## 
## Step:  AIC=9439.64
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO + 
##     PITCHING_HR + PITCHING_SO_BB
## 
##               Df Sum of Sq    RSS    AIC
## + BATTING_BB   1    467.42 229826 9437.6
## <none>                     230294 9439.6
## + PITCHING_BB  1     86.75 230207 9440.9
## + WHGP         1     86.75 230207 9440.9
## + BATTING_HR   1     74.52 230219 9441.0
## + BATTING_TB   1     74.52 230219 9441.0
## + PITCHING_SO  1      4.29 230289 9441.6
## 
## Step:  AIC=9437.62
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO + 
##     PITCHING_HR + PITCHING_SO_BB + BATTING_BB
## 
##               Df Sum of Sq    RSS    AIC
## + PITCHING_BB  1    4833.2 224993 9397.6
## + WHGP         1    4833.2 224993 9397.6
## + BATTING_TB   1    1624.2 228202 9425.6
## + BATTING_HR   1    1624.2 228202 9425.6
## <none>                     229826 9437.6
## + PITCHING_SO  1     216.8 229610 9437.7
## 
## Step:  AIC=9397.55
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO + 
##     PITCHING_HR + PITCHING_SO_BB + BATTING_BB + PITCHING_BB
## 
##               Df Sum of Sq    RSS    AIC
## + BATTING_HR   1    579.71 224413 9394.4
## + BATTING_TB   1    579.71 224413 9394.4
## <none>                     224993 9397.6
## + PITCHING_SO  1      6.18 224987 9399.5
## 
## Step:  AIC=9394.45
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO + 
##     PITCHING_HR + PITCHING_SO_BB + BATTING_BB + PITCHING_BB + 
##     BATTING_HR
## 
##               Df Sum of Sq    RSS    AIC
## <none>                     224413 9394.4
## + PITCHING_SO  1    135.49 224278 9395.3
tbl4 <- tidy(model4)
kable(tbl4)
term estimate std.error statistic p.value
(Intercept) 34.1562498 36.4275075 0.9376499 0.3485397
BsR -0.0016708 0.0618906 -0.0269968 0.9784651
BASERUN_SB 0.0687383 0.0047696 14.4116460 0.0000000
FIELDING_E -0.1019165 0.0046408 -21.9609355 0.0000000
BATTING_SO -0.0697976 0.0074867 -9.3228278 0.0000000
FIELDING_DP -0.1303577 0.0124548 -10.4665028 0.0000000
BATTING_2B -0.0801313 0.0560207 -1.4303887 0.1527647
BATTING_1B -0.0381942 0.0375852 -1.0162012 0.3096588
BATTING_3B 0.1160190 0.0767284 1.5120744 0.1306759
PITCHING_H 0.0713896 0.0086834 8.2214163 0.0000000
BATTING_BB_SO -18.2072164 3.6358182 -5.0077357 0.0000006
PITCHING_HR -0.1915482 0.0651850 -2.9385303 0.0033362
PITCHING_SO_BB 17.4975209 3.1161729 5.6150675 0.0000000
BATTING_BB 0.2655928 0.0373253 7.1156189 0.0000000
PITCHING_BB -0.1495730 0.0259817 -5.7568533 0.0000000
BATTING_HR 0.2523246 0.1120516 2.2518602 0.0244412
kable(glance(model4))
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
0.4122061 0.4077145 10.69213 91.77372 0 16 -7489.303 15012.61 15107.64 224413.4 1963
residualPlots(model4)

##                Test stat Pr(>|t|)
## BsR               -1.751    0.080
## BASERUN_SB         0.911    0.362
## FIELDING_E         2.864    0.004
## BATTING_SO        -0.322    0.748
## FIELDING_DP        3.974    0.000
## BATTING_2B         0.134    0.894
## BATTING_1B         0.517    0.605
## BATTING_3B        -0.249    0.803
## PITCHING_H         0.597    0.550
## BATTING_BB_SO     -0.706    0.480
## PITCHING_HR       -0.188    0.851
## PITCHING_SO_BB     0.432    0.666
## BATTING_BB        -0.395    0.693
## PITCHING_BB        0.546    0.585
## BATTING_HR        -0.440    0.660
## Tukey test        -1.594    0.111
qqPlot(model4, id.n=3, main="Q-Q Plot")

## 1762  346 1577 
##    1 1978 1979
influenceIndexPlot(model4, id.n=3)

influencePlot(model4, id.n=3)

##         StudRes        Hat       CookD
## 52    1.1182709 0.10089087 0.008769162
## 346   3.9693061 0.01856337 0.018486362
## 634   0.9292832 0.15625789 0.009996284
## 1379  1.1058032 0.11130519 0.009570821
## 1577  4.5047387 0.01932866 0.024754267
## 1762 -3.8076945 0.01029513 0.009361691
## 1938 -2.7890812 0.03341702 0.016750728
hist(model4$residuals, main="Histogram of Residuals")

Model 4B

As variables were added to the model, the statistical significance of some initial variables was reduced. In fact our main statistic of interest, BsR, is no longer statistically significant. In order to develop the best selection of variables, we also incorporate a bidrectional method to revisit the significance of variables added earlier in the analysis.

model4b <- step(lm(TARGET_WINS ~ BsR, data = data),
               direction = "both", 
               scope = ~ BsR + BATTING_2B + BATTING_3B + BATTING_HR + BATTING_BB + BATTING_SO + 
                 BASERUN_SB + PITCHING_H + PITCHING_HR + PITCHING_BB + PITCHING_SO + 
                 FIELDING_E + FIELDING_DP + BATTING_1B + BATTING_TB + WHGP + PITCHING_SO_BB + BATTING_BB_SO 
)
## Start:  AIC=10015.71
## TARGET_WINS ~ BsR
## 
##                  Df Sum of Sq    RSS     AIC
## + BASERUN_SB      1     12414 299138  9937.2
## + BATTING_2B      1      8186 303367  9965.0
## + FIELDING_DP     1      7589 303963  9968.9
## + BATTING_3B      1      4886 306667  9986.4
## + PITCHING_SO_BB  1      3545 308008  9995.1
## + BATTING_SO      1      2562 308990 10001.4
## + BATTING_TB      1      2371 309182 10002.6
## + WHGP            1      2289 309264 10003.1
## + PITCHING_BB     1      1992 309561 10005.0
## + PITCHING_SO     1      1884 309669 10005.7
## + BATTING_BB_SO   1      1877 309676 10005.8
## + BATTING_HR      1      1271 310281 10009.6
## + BATTING_1B      1      1259 310293 10009.7
## + PITCHING_H      1      1250 310303 10009.8
## + BATTING_BB      1      1235 310318 10009.9
## + PITCHING_HR     1      1094 310458 10010.8
## <none>                        311553 10015.7
## + FIELDING_E      1        80 311472 10017.2
## - BsR             1     70237 381789 10416.0
## 
## Step:  AIC=9937.24
## TARGET_WINS ~ BsR + BASERUN_SB
## 
##                  Df Sum of Sq    RSS     AIC
## + FIELDING_E      1     18642 280496  9811.9
## + BATTING_2B      1      5621 293517  9901.7
## + BATTING_BB      1      1958 297180  9926.2
## + PITCHING_SO_BB  1      1711 297427  9927.9
## + PITCHING_SO     1       986 298152  9932.7
## + PITCHING_H      1       756 298382  9934.2
## + BATTING_SO      1       716 298422  9934.5
## + BATTING_TB      1       677 298461  9934.8
## + PITCHING_BB     1       625 298513  9935.1
## + BATTING_BB_SO   1       431 298707  9936.4
## <none>                        299138  9937.2
## + FIELDING_DP     1       227 298911  9937.7
## + BATTING_3B      1       216 298922  9937.8
## + BATTING_HR      1       167 298971  9938.1
## + PITCHING_HR     1       143 298995  9938.3
## + WHGP            1       116 299022  9938.5
## + BATTING_1B      1        26 299112  9939.1
## - BASERUN_SB      1     12414 311553 10015.7
## - BsR             1     77986 377124 10393.7
## 
## Step:  AIC=9811.9
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E
## 
##                  Df Sum of Sq    RSS     AIC
## + BATTING_SO      1     13802 266694  9714.0
## + BATTING_3B      1     11967 268529  9727.6
## + PITCHING_SO_BB  1     11559 268937  9730.6
## + PITCHING_SO     1     10657 269839  9737.2
## + BATTING_2B      1     10522 269974  9738.2
## + BATTING_BB_SO   1      8413 272083  9753.6
## + FIELDING_DP     1      7120 273376  9763.0
## + BATTING_1B      1      4090 276406  9784.8
## + WHGP            1      3861 276634  9786.5
## + BATTING_HR      1      3496 277000  9789.1
## + BATTING_TB      1      3247 277249  9790.9
## + PITCHING_H      1      2948 277548  9793.0
## + PITCHING_HR     1      2935 277561  9793.1
## + PITCHING_BB     1      1279 279217  9804.9
## + BATTING_BB      1       671 279825  9809.2
## <none>                        280496  9811.9
## - FIELDING_E      1     18642 299138  9937.2
## - BASERUN_SB      1     30976 311472 10017.2
## - BsR             1     64054 344550 10216.9
## 
## Step:  AIC=9714.04
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO
## 
##                  Df Sum of Sq    RSS     AIC
## + FIELDING_DP     1     12174 254520  9623.6
## + BATTING_2B      1      9278 257416  9646.0
## + BATTING_3B      1      2474 264219  9697.6
## + BATTING_HR      1      2445 264248  9697.8
## + PITCHING_HR     1      2202 264491  9699.6
## + PITCHING_BB     1      1278 265416  9706.5
## + BATTING_BB      1      1272 265422  9706.6
## + BATTING_1B      1      1141 265553  9707.6
## + PITCHING_SO     1       744 265950  9710.5
## + WHGP            1       567 266127  9711.8
## <none>                        266694  9714.0
## + PITCHING_SO_BB  1       195 266499  9714.6
## + BATTING_BB_SO   1        98 266596  9715.3
## + BATTING_TB      1        71 266622  9715.5
## + PITCHING_H      1        20 266674  9715.9
## - BATTING_SO      1     13802 280496  9811.9
## - FIELDING_E      1     31729 298422  9934.5
## - BASERUN_SB      1     40651 307345  9992.8
## - BsR             1     60578 327272 10117.1
## 
## Step:  AIC=9623.58
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP
## 
##                  Df Sum of Sq    RSS     AIC
## + BATTING_2B      1     11153 243367  9536.9
## + BATTING_HR      1      2508 252012  9606.0
## + PITCHING_HR     1      2395 252124  9606.9
## + BATTING_BB      1      1965 252555  9610.2
## + PITCHING_BB     1      1876 252644  9610.9
## + BATTING_3B      1      1847 252673  9611.2
## + BATTING_1B      1       860 253660  9618.9
## + WHGP            1       713 253807  9620.0
## + PITCHING_SO     1       705 253815  9620.1
## + PITCHING_SO_BB  1       419 254101  9622.3
## + BATTING_TB      1       302 254217  9623.2
## <none>                        254520  9623.6
## + BATTING_BB_SO   1        10 254509  9625.5
## + PITCHING_H      1         7 254513  9625.5
## - FIELDING_DP     1     12174 266694  9714.0
## - BATTING_SO      1     18856 273376  9763.0
## - BASERUN_SB      1     27521 282041  9824.8
## - FIELDING_E      1     43737 298256  9935.4
## - BsR             1     68870 323390 10095.5
## 
## Step:  AIC=9536.91
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B
## 
##                  Df Sum of Sq    RSS     AIC
## + BATTING_1B      1      1903 241464  9523.4
## + BATTING_3B      1      1662 241704  9525.3
## + PITCHING_SO     1      1083 242284  9530.1
## + BATTING_BB_SO   1       566 242801  9534.3
## + WHGP            1       559 242808  9534.4
## + PITCHING_H      1       352 243015  9536.0
## + PITCHING_BB     1       300 243067  9536.5
## <none>                        243367  9536.9
## + BATTING_BB      1       179 243188  9537.4
## + PITCHING_HR     1       168 243199  9537.5
## + BATTING_HR      1       126 243241  9537.9
## + PITCHING_SO_BB  1        88 243279  9538.2
## + BATTING_TB      1        73 243294  9538.3
## - BATTING_2B      1     11153 254520  9623.6
## - FIELDING_DP     1     14049 257416  9646.0
## - BATTING_SO      1     17768 261135  9674.4
## - BASERUN_SB      1     27620 270987  9747.6
## - FIELDING_E      1     49713 293080  9902.8
## - BsR             1     66757 310124 10014.6
## 
## Step:  AIC=9523.37
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B
## 
##                  Df Sum of Sq    RSS    AIC
## + BATTING_3B      1      1874 239590 9510.0
## + BATTING_BB_SO   1      1290 240173 9514.8
## + PITCHING_H      1      1233 240230 9515.2
## + PITCHING_SO     1       948 240516 9517.6
## + PITCHING_SO_BB  1       805 240659 9518.8
## + WHGP            1       619 240845 9520.3
## + BATTING_HR      1       487 240977 9521.4
## + PITCHING_HR     1       314 241150 9522.8
## <none>                        241464 9523.4
## + BATTING_TB      1        57 241406 9524.9
## + BATTING_BB      1        39 241425 9525.1
## + PITCHING_BB     1         0 241464 9525.4
## - BATTING_1B      1      1903 243367 9536.9
## - BATTING_2B      1     12196 253660 9618.9
## - FIELDING_DP     1     13726 255190 9630.8
## - BATTING_SO      1     15113 256577 9641.5
## - BASERUN_SB      1     29365 270829 9748.5
## - FIELDING_E      1     48637 290101 9884.5
## - BsR             1     61870 303334 9972.8
## 
## Step:  AIC=9509.95
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B
## 
##                  Df Sum of Sq    RSS    AIC
## + PITCHING_H      1      1170 238420 9502.3
## + BATTING_BB_SO   1      1129 238461 9502.6
## + PITCHING_SO     1      1013 238577 9503.6
## + WHGP            1       785 238806 9505.5
## + PITCHING_SO_BB  1       409 239181 9508.6
## <none>                        239590 9510.0
## + PITCHING_BB     1        55 239536 9511.5
## + PITCHING_HR     1        15 239575 9511.8
## + BATTING_BB      1         3 239587 9511.9
## + BATTING_HR      1         3 239587 9511.9
## + BATTING_TB      1         3 239587 9511.9
## - BATTING_3B      1      1874 241464 9523.4
## - BATTING_1B      1      2114 241704 9525.3
## - BATTING_SO      1      8812 248402 9579.4
## - BATTING_2B      1     12067 251657 9605.2
## - FIELDING_DP     1     13015 252605 9612.6
## - BASERUN_SB      1     26576 266166 9716.1
## - FIELDING_E      1     50423 290013 9885.9
## - BsR             1     58432 298022 9939.8
## 
## Step:  AIC=9502.26
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H
## 
##                  Df Sum of Sq    RSS    AIC
## + BATTING_BB_SO   1       856 237564 9497.1
## + PITCHING_HR     1       373 238047 9501.2
## <none>                        238420 9502.3
## + PITCHING_SO_BB  1       218 238202 9502.5
## + BATTING_BB      1        54 238366 9503.8
## + PITCHING_BB     1        21 238399 9504.1
## + WHGP            1        21 238399 9504.1
## + BATTING_HR      1        14 238406 9504.1
## + BATTING_TB      1        14 238406 9504.1
## + PITCHING_SO     1         1 238419 9504.3
## - PITCHING_H      1      1170 239590 9510.0
## - BATTING_3B      1      1810 240230 9515.2
## - BATTING_1B      1      2989 241409 9524.9
## - BATTING_SO      1      8301 246721 9568.0
## - FIELDING_DP     1     12904 251324 9604.6
## - BATTING_2B      1     13109 251529 9606.2
## - BASERUN_SB      1     25481 263901 9701.2
## - BsR             1     49967 288387 9876.8
## - FIELDING_E      1     50504 288924 9880.5
## 
## Step:  AIC=9497.15
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO
## 
##                  Df Sum of Sq    RSS    AIC
## + PITCHING_HR     1      4998 232566 9457.1
## + BATTING_BB      1      2557 235007 9477.7
## + BATTING_HR      1      2282 235282 9480.0
## + BATTING_TB      1      2282 235282 9480.0
## + WHGP            1       983 236581 9490.9
## + PITCHING_BB     1       983 236581 9490.9
## <none>                        237564 9497.1
## + PITCHING_SO_BB  1        43 237521 9498.8
## + PITCHING_SO     1        14 237550 9499.0
## - BATTING_BB_SO   1       856 238420 9502.3
## - PITCHING_H      1       897 238461 9502.6
## - BATTING_3B      1      1678 239242 9509.1
## - BATTING_1B      1      3522 241086 9524.3
## - BATTING_SO      1      6703 244267 9550.2
## - FIELDING_DP     1     12540 250105 9596.9
## - BATTING_2B      1     13932 251496 9607.9
## - BASERUN_SB      1     26254 263818 9702.6
## - BsR             1     40519 278083 9806.8
## - FIELDING_E      1     49930 287494 9872.7
## 
## Step:  AIC=9457.07
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO + 
##     PITCHING_HR
## 
##                  Df Sum of Sq    RSS    AIC
## + PITCHING_SO_BB  1      2273 230294 9439.6
## + PITCHING_BB     1      1233 231333 9448.5
## + WHGP            1      1233 231333 9448.5
## + BATTING_HR      1       796 231771 9452.3
## + BATTING_TB      1       796 231771 9452.3
## <none>                        232566 9457.1
## + BATTING_BB      1       192 232375 9457.4
## + PITCHING_SO     1        91 232476 9458.3
## - BATTING_3B      1       832 233399 9462.1
## - PITCHING_HR     1      4998 237564 9497.1
## - PITCHING_H      1      5015 237582 9497.3
## - BATTING_BB_SO   1      5481 238047 9501.2
## - BATTING_1B      1      7862 240429 9520.9
## - BATTING_SO      1     11524 244091 9550.8
## - FIELDING_DP     1     12035 244601 9554.9
## - BATTING_2B      1     14256 246823 9572.8
## - BsR             1     15405 247972 9582.0
## - BASERUN_SB      1     23533 256100 9645.8
## - FIELDING_E      1     54200 286767 9869.7
## 
## Step:  AIC=9439.64
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO + 
##     PITCHING_HR + PITCHING_SO_BB
## 
##                  Df Sum of Sq    RSS    AIC
## + BATTING_BB      1       467 229826 9437.6
## <none>                        230294 9439.6
## + PITCHING_BB     1        87 230207 9440.9
## + WHGP            1        87 230207 9440.9
## + BATTING_HR      1        75 230219 9441.0
## + BATTING_TB      1        75 230219 9441.0
## + PITCHING_SO     1         4 230289 9441.6
## - PITCHING_SO_BB  1      2273 232566 9457.1
## - BATTING_3B      1      2728 233022 9460.9
## - BATTING_BB_SO   1      5015 235309 9480.3
## - PITCHING_H      1      7101 237394 9497.7
## - PITCHING_HR     1      7227 237521 9498.8
## - BATTING_1B      1      9649 239943 9518.9
## - BATTING_SO      1     11321 241615 9532.6
## - FIELDING_DP     1     11662 241956 9535.4
## - BATTING_2B      1     14791 245085 9560.8
## - BsR             1     15345 245639 9565.3
## - BASERUN_SB      1     25011 255305 9641.7
## - FIELDING_E      1     56403 286697 9871.2
## 
## Step:  AIC=9437.62
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO + 
##     PITCHING_HR + PITCHING_SO_BB + BATTING_BB
## 
##                  Df Sum of Sq    RSS    AIC
## + PITCHING_BB     1      4833 224993 9397.6
## + WHGP            1      4833 224993 9397.6
## + BATTING_TB      1      1624 228202 9425.6
## + BATTING_HR      1      1624 228202 9425.6
## <none>                        229826 9437.6
## + PITCHING_SO     1       217 229610 9437.7
## - BATTING_BB      1       467 230294 9439.6
## - BATTING_3B      1       577 230404 9440.6
## - PITCHING_HR     1      2095 231921 9453.6
## - PITCHING_SO_BB  1      2549 232375 9457.4
## - BATTING_1B      1      2698 232525 9458.7
## - PITCHING_H      1      3055 232881 9461.7
## - BATTING_2B      1      4851 234677 9477.0
## - BsR             1      5057 234883 9478.7
## - BATTING_BB_SO   1      5470 235297 9482.2
## - BATTING_SO      1      9826 239652 9518.5
## - FIELDING_DP     1     12059 241885 9536.8
## - BASERUN_SB      1     25047 254873 9640.3
## - FIELDING_E      1     53516 283342 9849.9
## 
## Step:  AIC=9397.55
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO + 
##     PITCHING_HR + PITCHING_SO_BB + BATTING_BB + PITCHING_BB
## 
##                  Df Sum of Sq    RSS    AIC
## + BATTING_HR      1       580 224413 9394.4
## + BATTING_TB      1       580 224413 9394.4
## - BATTING_3B      1        21 225014 9395.7
## <none>                        224993 9397.6
## + PITCHING_SO     1         6 224987 9399.5
## - PITCHING_HR     1       485 225478 9399.8
## - BsR             1       990 225983 9404.2
## - BATTING_1B      1      2158 227151 9414.4
## - BATTING_2B      1      2988 227982 9421.7
## - PITCHING_SO_BB  1      3194 228187 9423.4
## - BATTING_BB_SO   1      3288 228281 9424.3
## - PITCHING_BB     1      4833 229826 9437.6
## - BATTING_BB      1      5214 230207 9440.9
## - PITCHING_H      1      7679 232672 9462.0
## - BATTING_SO      1      9558 234551 9477.9
## - FIELDING_DP     1     12509 237502 9502.6
## - BASERUN_SB      1     24412 249405 9599.4
## - FIELDING_E      1     54589 279582 9825.4
## 
## Step:  AIC=9394.45
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO + 
##     PITCHING_HR + PITCHING_SO_BB + BATTING_BB + PITCHING_BB + 
##     BATTING_HR
## 
##                  Df Sum of Sq    RSS    AIC
## - BsR             1         0 224413 9392.4
## - BATTING_1B      1       118 224531 9393.5
## <none>                        224413 9394.4
## - BATTING_2B      1       234 224647 9394.5
## - BATTING_3B      1       261 224675 9394.8
## + PITCHING_SO     1       135 224278 9395.3
## - BATTING_HR      1       580 224993 9397.6
## - PITCHING_HR     1       987 225401 9401.1
## - BATTING_BB_SO   1      2867 227280 9417.6
## - PITCHING_SO_BB  1      3604 228018 9424.0
## - PITCHING_BB     1      3789 228202 9425.6
## - BATTING_BB      1      5788 230202 9442.8
## - PITCHING_H      1      7727 232141 9459.4
## - BATTING_SO      1      9936 234350 9478.2
## - FIELDING_DP     1     12524 236937 9499.9
## - BASERUN_SB      1     23744 248157 9591.5
## - FIELDING_E      1     55135 279549 9827.2
## 
## Step:  AIC=9392.45
## TARGET_WINS ~ BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP + 
##     BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO + 
##     PITCHING_HR + PITCHING_SO_BB + BATTING_BB + PITCHING_BB + 
##     BATTING_HR
## 
##                  Df Sum of Sq    RSS    AIC
## <none>                        224413 9392.4
## + PITCHING_SO     1       135 224278 9393.3
## + BsR             1         0 224413 9394.4
## - PITCHING_HR     1       994 225408 9399.2
## - BATTING_HR      1      1570 225983 9404.2
## - BATTING_1B      1      1664 226078 9405.1
## - BATTING_BB_SO   1      3126 227540 9417.8
## - PITCHING_SO_BB  1      3788 228201 9423.6
## - PITCHING_BB     1      3806 228219 9423.7
## - BATTING_3B      1      4132 228546 9426.6
## - BATTING_2B      1      5545 229958 9438.7
## - PITCHING_H      1      7736 232150 9457.5
## - BATTING_BB      1      9396 233810 9471.6
## - BATTING_SO      1      9952 234366 9476.3
## - FIELDING_DP     1     12574 236987 9498.3
## - BASERUN_SB      1     23890 248304 9590.6
## - FIELDING_E      1     55267 279681 9826.1
tbl4b <- tidy(model4b)
kable(tbl4b)
term estimate std.error statistic p.value
(Intercept) 35.1206025 7.1375305 4.920554 0.0000009
BASERUN_SB 0.0687481 0.0047545 14.459620 0.0000000
FIELDING_E -0.1019225 0.0046344 -21.992821 0.0000000
BATTING_SO -0.0697889 0.0074779 -9.332629 0.0000000
FIELDING_DP -0.1303356 0.0124246 -10.490110 0.0000000
BATTING_2B -0.0816102 0.0117156 -6.965929 0.0000000
BATTING_1B -0.0391703 0.0102630 -3.816632 0.0001395
BATTING_3B 0.1140118 0.0189591 6.013576 0.0000000
PITCHING_H 0.0713807 0.0086749 8.228374 0.0000000
BATTING_BB_SO -18.2350168 3.4860528 -5.230849 0.0000002
PITCHING_HR -0.1916832 0.0649763 -2.950049 0.0032148
PITCHING_SO_BB 17.4786242 3.0357689 5.757561 0.0000000
BATTING_BB 0.2649660 0.0292194 9.068154 0.0000000
PITCHING_BB -0.1495224 0.0259075 -5.771391 0.0000000
BATTING_HR 0.2499088 0.0674217 3.706655 0.0002158
kable(glance(model4b))
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
0.4122059 0.4080159 10.68941 98.37899 0 15 -7489.303 15010.61 15100.05 224413.5 1964
residualPlots(model4b)

##                Test stat Pr(>|t|)
## BASERUN_SB         0.907    0.364
## FIELDING_E         2.864    0.004
## BATTING_SO        -0.317    0.752
## FIELDING_DP        3.975    0.000
## BATTING_2B         0.120    0.904
## BATTING_1B         0.437    0.662
## BATTING_3B        -0.251    0.802
## PITCHING_H         0.532    0.595
## BATTING_BB_SO     -0.702    0.483
## PITCHING_HR       -0.184    0.854
## PITCHING_SO_BB     0.430    0.667
## BATTING_BB        -0.396    0.692
## PITCHING_BB        0.544    0.587
## BATTING_HR        -0.434    0.664
## Tukey test        -1.496    0.135
qqPlot(model4b, id.n=3, main="Q-Q Plot")

## 1762  346 1577 
##    1 1978 1979
influenceIndexPlot(model4b, id.n=3)

influencePlot(model4b, id.n=3)

##         StudRes         Hat       CookD
## 217   2.0984504 0.055521218 0.017227449
## 339  -1.1923252 0.093476597 0.009770771
## 346   3.9683849 0.017950392 0.019047053
## 634   0.9298904 0.155841394 0.010642924
## 1379  1.1064085 0.110909714 0.010179207
## 1577  4.5059661 0.019280666 0.026352034
## 1762 -3.8068314 0.009695845 0.009394634
hist(model4b$residuals, main="Histogram of Residuals")

Conclusion

Using the bidirectional approach, BsR was removed from the model. Once BsR was removed, BATTING_2B, BATTING_1B and BATTING_3B regained their significance. This is ikely caused by collinearity within the variables as BsR is a derived stat based in large part on hits. Because BsR was found to not add predictive ability to our model, Model 4B is the superior model with a higher F-statistic and slightly improved adjusted R squared and AIC values.

kable(glance(model4b))
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
0.4122059 0.4080159 10.68941 98.37899 0 15 -7489.303 15010.61 15100.05 224413.5 1964
kable(glance(model4))
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
0.4122061 0.4077145 10.69213 91.77372 0 16 -7489.303 15012.61 15107.64 224413.4 1963