In this model we incorporate our calculated metric Base Runs(BsR), a sabermetric stat created by David Smyth, to predict the number of runs a team would be expected to have scored based on the types of hits and number of walks. We include all available variables, beginning with BsR, and use forward stepwise regression to add statistically significant variables to the model.
BsR is calulated as follows:
\[\frac{A * B}{B + C} + D \\
\\
\\
\]
\[ A = Hits + Walks - Home Runs \\ B = (1.4*Total Bases - 0.6*Hits - 3*HomeRuns + 0.1*Walks) \\ C = AB - H \\ D = HR \\ \\ \]
We approximated the average at bats for a team in a 162 game season infromation from baseball reference. https://www.baseball-reference.com/leagues/MLB/bat.shtml
data <- read.csv("https://raw.githubusercontent.com/vbriot28/Data621_group2/master/data_group2_nbc.csv")
model4 <- step(lm(TARGET_WINS ~ BsR, data = data),
direction = "forward",
scope = ~ BsR + BATTING_2B + BATTING_3B + BATTING_HR + BATTING_BB + BATTING_SO +
BASERUN_SB + PITCHING_H + PITCHING_HR + PITCHING_BB + PITCHING_SO +
FIELDING_E + FIELDING_DP + BATTING_1B + BATTING_TB + WHGP + PITCHING_SO_BB + BATTING_BB_SO
)
## Start: AIC=10015.71
## TARGET_WINS ~ BsR
##
## Df Sum of Sq RSS AIC
## + BASERUN_SB 1 12414.5 299138 9937.2
## + BATTING_2B 1 8185.5 303367 9965.0
## + FIELDING_DP 1 7589.4 303963 9968.9
## + BATTING_3B 1 4885.8 306667 9986.4
## + PITCHING_SO_BB 1 3544.7 308008 9995.1
## + BATTING_SO 1 2562.2 308990 10001.4
## + BATTING_TB 1 2370.5 309182 10002.6
## + WHGP 1 2288.9 309264 10003.1
## + PITCHING_BB 1 1991.7 309561 10005.0
## + PITCHING_SO 1 1883.9 309669 10005.7
## + BATTING_BB_SO 1 1876.8 309676 10005.8
## + BATTING_HR 1 1271.5 310281 10009.6
## + BATTING_1B 1 1259.1 310293 10009.7
## + PITCHING_H 1 1250.0 310303 10009.8
## + BATTING_BB 1 1234.7 310318 10009.9
## + PITCHING_HR 1 1094.1 310458 10010.8
## <none> 311553 10015.7
## + FIELDING_E 1 80.2 311472 10017.2
##
## Step: AIC=9937.24
## TARGET_WINS ~ BsR + BASERUN_SB
##
## Df Sum of Sq RSS AIC
## + FIELDING_E 1 18642.1 280496 9811.9
## + BATTING_2B 1 5620.9 293517 9901.7
## + BATTING_BB 1 1958.0 297180 9926.2
## + PITCHING_SO_BB 1 1711.5 297427 9927.9
## + PITCHING_SO 1 986.3 298152 9932.7
## + PITCHING_H 1 755.7 298382 9934.2
## + BATTING_SO 1 715.8 298422 9934.5
## + BATTING_TB 1 677.0 298461 9934.8
## + PITCHING_BB 1 625.4 298513 9935.1
## + BATTING_BB_SO 1 431.4 298707 9936.4
## <none> 299138 9937.2
## + FIELDING_DP 1 226.8 298911 9937.7
## + BATTING_3B 1 216.3 298922 9937.8
## + BATTING_HR 1 167.1 298971 9938.1
## + PITCHING_HR 1 143.2 298995 9938.3
## + WHGP 1 115.8 299022 9938.5
## + BATTING_1B 1 26.2 299112 9939.1
##
## Step: AIC=9811.9
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E
##
## Df Sum of Sq RSS AIC
## + BATTING_SO 1 13802.2 266694 9714.0
## + BATTING_3B 1 11967.0 268529 9727.6
## + PITCHING_SO_BB 1 11559.1 268937 9730.6
## + PITCHING_SO 1 10657.4 269839 9737.2
## + BATTING_2B 1 10522.1 269974 9738.2
## + BATTING_BB_SO 1 8413.3 272083 9753.6
## + FIELDING_DP 1 7119.8 273376 9763.0
## + BATTING_1B 1 4090.0 276406 9784.8
## + WHGP 1 3861.5 276634 9786.5
## + BATTING_HR 1 3495.6 277000 9789.1
## + BATTING_TB 1 3246.8 277249 9790.9
## + PITCHING_H 1 2947.7 277548 9793.0
## + PITCHING_HR 1 2934.8 277561 9793.1
## + PITCHING_BB 1 1279.0 279217 9804.9
## + BATTING_BB 1 670.8 279825 9809.2
## <none> 280496 9811.9
##
## Step: AIC=9714.04
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO
##
## Df Sum of Sq RSS AIC
## + FIELDING_DP 1 12174.0 254520 9623.6
## + BATTING_2B 1 9278.1 257416 9646.0
## + BATTING_3B 1 2474.3 264219 9697.6
## + BATTING_HR 1 2445.5 264248 9697.8
## + PITCHING_HR 1 2202.5 264491 9699.6
## + PITCHING_BB 1 1277.6 265416 9706.5
## + BATTING_BB 1 1271.6 265422 9706.6
## + BATTING_1B 1 1141.1 265553 9707.6
## + PITCHING_SO 1 743.8 265950 9710.5
## + WHGP 1 567.2 266127 9711.8
## <none> 266694 9714.0
## + PITCHING_SO_BB 1 194.8 266499 9714.6
## + BATTING_BB_SO 1 98.1 266596 9715.3
## + BATTING_TB 1 71.3 266622 9715.5
## + PITCHING_H 1 19.9 266674 9715.9
##
## Step: AIC=9623.58
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP
##
## Df Sum of Sq RSS AIC
## + BATTING_2B 1 11152.9 243367 9536.9
## + BATTING_HR 1 2508.1 252012 9606.0
## + PITCHING_HR 1 2395.5 252124 9606.9
## + BATTING_BB 1 1964.7 252555 9610.2
## + PITCHING_BB 1 1876.2 252644 9610.9
## + BATTING_3B 1 1846.7 252673 9611.2
## + BATTING_1B 1 859.6 253660 9618.9
## + WHGP 1 713.0 253807 9620.0
## + PITCHING_SO 1 704.9 253815 9620.1
## + PITCHING_SO_BB 1 419.2 254101 9622.3
## + BATTING_TB 1 302.3 254217 9623.2
## <none> 254520 9623.6
## + BATTING_BB_SO 1 10.3 254509 9625.5
## + PITCHING_H 1 6.9 254513 9625.5
##
## Step: AIC=9536.91
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B
##
## Df Sum of Sq RSS AIC
## + BATTING_1B 1 1903.03 241464 9523.4
## + BATTING_3B 1 1662.37 241704 9525.3
## + PITCHING_SO 1 1082.59 242284 9530.1
## + BATTING_BB_SO 1 565.72 242801 9534.3
## + WHGP 1 559.25 242808 9534.4
## + PITCHING_H 1 351.72 243015 9536.0
## + PITCHING_BB 1 300.06 243067 9536.5
## <none> 243367 9536.9
## + BATTING_BB 1 179.12 243188 9537.4
## + PITCHING_HR 1 167.79 243199 9537.5
## + BATTING_HR 1 126.33 243241 9537.9
## + PITCHING_SO_BB 1 87.70 243279 9538.2
## + BATTING_TB 1 73.21 243294 9538.3
##
## Step: AIC=9523.37
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B
##
## Df Sum of Sq RSS AIC
## + BATTING_3B 1 1873.52 239590 9510.0
## + BATTING_BB_SO 1 1290.44 240173 9514.8
## + PITCHING_H 1 1233.38 240230 9515.2
## + PITCHING_SO 1 948.09 240516 9517.6
## + PITCHING_SO_BB 1 804.59 240659 9518.8
## + WHGP 1 619.25 240845 9520.3
## + BATTING_HR 1 486.58 240977 9521.4
## + PITCHING_HR 1 313.97 241150 9522.8
## <none> 241464 9523.4
## + BATTING_TB 1 57.42 241406 9524.9
## + BATTING_BB 1 38.64 241425 9525.1
## + PITCHING_BB 1 0.00 241464 9525.4
##
## Step: AIC=9509.95
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B
##
## Df Sum of Sq RSS AIC
## + PITCHING_H 1 1170.32 238420 9502.3
## + BATTING_BB_SO 1 1129.41 238461 9502.6
## + PITCHING_SO 1 1013.36 238577 9503.6
## + WHGP 1 784.80 238806 9505.5
## + PITCHING_SO_BB 1 409.36 239181 9508.6
## <none> 239590 9510.0
## + PITCHING_BB 1 54.68 239536 9511.5
## + PITCHING_HR 1 15.11 239575 9511.8
## + BATTING_BB 1 3.29 239587 9511.9
## + BATTING_HR 1 2.86 239587 9511.9
## + BATTING_TB 1 2.86 239587 9511.9
##
## Step: AIC=9502.26
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H
##
## Df Sum of Sq RSS AIC
## + BATTING_BB_SO 1 855.89 237564 9497.1
## + PITCHING_HR 1 372.76 238047 9501.2
## <none> 238420 9502.3
## + PITCHING_SO_BB 1 217.87 238202 9502.5
## + BATTING_BB 1 54.13 238366 9503.8
## + PITCHING_BB 1 21.27 238399 9504.1
## + WHGP 1 21.27 238399 9504.1
## + BATTING_HR 1 14.13 238406 9504.1
## + BATTING_TB 1 14.13 238406 9504.1
## + PITCHING_SO 1 1.14 238419 9504.3
##
## Step: AIC=9497.15
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO
##
## Df Sum of Sq RSS AIC
## + PITCHING_HR 1 4997.6 232566 9457.1
## + BATTING_BB 1 2557.2 235007 9477.7
## + BATTING_HR 1 2281.9 235282 9480.0
## + BATTING_TB 1 2281.9 235282 9480.0
## + WHGP 1 983.1 236581 9490.9
## + PITCHING_BB 1 983.1 236581 9490.9
## <none> 237564 9497.1
## + PITCHING_SO_BB 1 43.2 237521 9498.8
## + PITCHING_SO 1 13.8 237550 9499.0
##
## Step: AIC=9457.07
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO +
## PITCHING_HR
##
## Df Sum of Sq RSS AIC
## + PITCHING_SO_BB 1 2272.75 230294 9439.6
## + PITCHING_BB 1 1233.31 231333 9448.5
## + WHGP 1 1233.31 231333 9448.5
## + BATTING_HR 1 795.87 231771 9452.3
## + BATTING_TB 1 795.87 231771 9452.3
## <none> 232566 9457.1
## + BATTING_BB 1 191.55 232375 9457.4
## + PITCHING_SO 1 90.80 232476 9458.3
##
## Step: AIC=9439.64
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO +
## PITCHING_HR + PITCHING_SO_BB
##
## Df Sum of Sq RSS AIC
## + BATTING_BB 1 467.42 229826 9437.6
## <none> 230294 9439.6
## + PITCHING_BB 1 86.75 230207 9440.9
## + WHGP 1 86.75 230207 9440.9
## + BATTING_HR 1 74.52 230219 9441.0
## + BATTING_TB 1 74.52 230219 9441.0
## + PITCHING_SO 1 4.29 230289 9441.6
##
## Step: AIC=9437.62
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO +
## PITCHING_HR + PITCHING_SO_BB + BATTING_BB
##
## Df Sum of Sq RSS AIC
## + PITCHING_BB 1 4833.2 224993 9397.6
## + WHGP 1 4833.2 224993 9397.6
## + BATTING_TB 1 1624.2 228202 9425.6
## + BATTING_HR 1 1624.2 228202 9425.6
## <none> 229826 9437.6
## + PITCHING_SO 1 216.8 229610 9437.7
##
## Step: AIC=9397.55
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO +
## PITCHING_HR + PITCHING_SO_BB + BATTING_BB + PITCHING_BB
##
## Df Sum of Sq RSS AIC
## + BATTING_HR 1 579.71 224413 9394.4
## + BATTING_TB 1 579.71 224413 9394.4
## <none> 224993 9397.6
## + PITCHING_SO 1 6.18 224987 9399.5
##
## Step: AIC=9394.45
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO +
## PITCHING_HR + PITCHING_SO_BB + BATTING_BB + PITCHING_BB +
## BATTING_HR
##
## Df Sum of Sq RSS AIC
## <none> 224413 9394.4
## + PITCHING_SO 1 135.49 224278 9395.3
tbl4 <- tidy(model4)
kable(tbl4)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 34.1562498 | 36.4275075 | 0.9376499 | 0.3485397 |
| BsR | -0.0016708 | 0.0618906 | -0.0269968 | 0.9784651 |
| BASERUN_SB | 0.0687383 | 0.0047696 | 14.4116460 | 0.0000000 |
| FIELDING_E | -0.1019165 | 0.0046408 | -21.9609355 | 0.0000000 |
| BATTING_SO | -0.0697976 | 0.0074867 | -9.3228278 | 0.0000000 |
| FIELDING_DP | -0.1303577 | 0.0124548 | -10.4665028 | 0.0000000 |
| BATTING_2B | -0.0801313 | 0.0560207 | -1.4303887 | 0.1527647 |
| BATTING_1B | -0.0381942 | 0.0375852 | -1.0162012 | 0.3096588 |
| BATTING_3B | 0.1160190 | 0.0767284 | 1.5120744 | 0.1306759 |
| PITCHING_H | 0.0713896 | 0.0086834 | 8.2214163 | 0.0000000 |
| BATTING_BB_SO | -18.2072164 | 3.6358182 | -5.0077357 | 0.0000006 |
| PITCHING_HR | -0.1915482 | 0.0651850 | -2.9385303 | 0.0033362 |
| PITCHING_SO_BB | 17.4975209 | 3.1161729 | 5.6150675 | 0.0000000 |
| BATTING_BB | 0.2655928 | 0.0373253 | 7.1156189 | 0.0000000 |
| PITCHING_BB | -0.1495730 | 0.0259817 | -5.7568533 | 0.0000000 |
| BATTING_HR | 0.2523246 | 0.1120516 | 2.2518602 | 0.0244412 |
kable(glance(model4))
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.4122061 | 0.4077145 | 10.69213 | 91.77372 | 0 | 16 | -7489.303 | 15012.61 | 15107.64 | 224413.4 | 1963 |
residualPlots(model4)
## Test stat Pr(>|t|)
## BsR -1.751 0.080
## BASERUN_SB 0.911 0.362
## FIELDING_E 2.864 0.004
## BATTING_SO -0.322 0.748
## FIELDING_DP 3.974 0.000
## BATTING_2B 0.134 0.894
## BATTING_1B 0.517 0.605
## BATTING_3B -0.249 0.803
## PITCHING_H 0.597 0.550
## BATTING_BB_SO -0.706 0.480
## PITCHING_HR -0.188 0.851
## PITCHING_SO_BB 0.432 0.666
## BATTING_BB -0.395 0.693
## PITCHING_BB 0.546 0.585
## BATTING_HR -0.440 0.660
## Tukey test -1.594 0.111
qqPlot(model4, id.n=3, main="Q-Q Plot")
## 1762 346 1577
## 1 1978 1979
influenceIndexPlot(model4, id.n=3)
influencePlot(model4, id.n=3)
## StudRes Hat CookD
## 52 1.1182709 0.10089087 0.008769162
## 346 3.9693061 0.01856337 0.018486362
## 634 0.9292832 0.15625789 0.009996284
## 1379 1.1058032 0.11130519 0.009570821
## 1577 4.5047387 0.01932866 0.024754267
## 1762 -3.8076945 0.01029513 0.009361691
## 1938 -2.7890812 0.03341702 0.016750728
hist(model4$residuals, main="Histogram of Residuals")
As variables were added to the model, the statistical significance of some initial variables was reduced. In fact our main statistic of interest, BsR, is no longer statistically significant. In order to develop the best selection of variables, we also incorporate a bidrectional method to revisit the significance of variables added earlier in the analysis.
model4b <- step(lm(TARGET_WINS ~ BsR, data = data),
direction = "both",
scope = ~ BsR + BATTING_2B + BATTING_3B + BATTING_HR + BATTING_BB + BATTING_SO +
BASERUN_SB + PITCHING_H + PITCHING_HR + PITCHING_BB + PITCHING_SO +
FIELDING_E + FIELDING_DP + BATTING_1B + BATTING_TB + WHGP + PITCHING_SO_BB + BATTING_BB_SO
)
## Start: AIC=10015.71
## TARGET_WINS ~ BsR
##
## Df Sum of Sq RSS AIC
## + BASERUN_SB 1 12414 299138 9937.2
## + BATTING_2B 1 8186 303367 9965.0
## + FIELDING_DP 1 7589 303963 9968.9
## + BATTING_3B 1 4886 306667 9986.4
## + PITCHING_SO_BB 1 3545 308008 9995.1
## + BATTING_SO 1 2562 308990 10001.4
## + BATTING_TB 1 2371 309182 10002.6
## + WHGP 1 2289 309264 10003.1
## + PITCHING_BB 1 1992 309561 10005.0
## + PITCHING_SO 1 1884 309669 10005.7
## + BATTING_BB_SO 1 1877 309676 10005.8
## + BATTING_HR 1 1271 310281 10009.6
## + BATTING_1B 1 1259 310293 10009.7
## + PITCHING_H 1 1250 310303 10009.8
## + BATTING_BB 1 1235 310318 10009.9
## + PITCHING_HR 1 1094 310458 10010.8
## <none> 311553 10015.7
## + FIELDING_E 1 80 311472 10017.2
## - BsR 1 70237 381789 10416.0
##
## Step: AIC=9937.24
## TARGET_WINS ~ BsR + BASERUN_SB
##
## Df Sum of Sq RSS AIC
## + FIELDING_E 1 18642 280496 9811.9
## + BATTING_2B 1 5621 293517 9901.7
## + BATTING_BB 1 1958 297180 9926.2
## + PITCHING_SO_BB 1 1711 297427 9927.9
## + PITCHING_SO 1 986 298152 9932.7
## + PITCHING_H 1 756 298382 9934.2
## + BATTING_SO 1 716 298422 9934.5
## + BATTING_TB 1 677 298461 9934.8
## + PITCHING_BB 1 625 298513 9935.1
## + BATTING_BB_SO 1 431 298707 9936.4
## <none> 299138 9937.2
## + FIELDING_DP 1 227 298911 9937.7
## + BATTING_3B 1 216 298922 9937.8
## + BATTING_HR 1 167 298971 9938.1
## + PITCHING_HR 1 143 298995 9938.3
## + WHGP 1 116 299022 9938.5
## + BATTING_1B 1 26 299112 9939.1
## - BASERUN_SB 1 12414 311553 10015.7
## - BsR 1 77986 377124 10393.7
##
## Step: AIC=9811.9
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E
##
## Df Sum of Sq RSS AIC
## + BATTING_SO 1 13802 266694 9714.0
## + BATTING_3B 1 11967 268529 9727.6
## + PITCHING_SO_BB 1 11559 268937 9730.6
## + PITCHING_SO 1 10657 269839 9737.2
## + BATTING_2B 1 10522 269974 9738.2
## + BATTING_BB_SO 1 8413 272083 9753.6
## + FIELDING_DP 1 7120 273376 9763.0
## + BATTING_1B 1 4090 276406 9784.8
## + WHGP 1 3861 276634 9786.5
## + BATTING_HR 1 3496 277000 9789.1
## + BATTING_TB 1 3247 277249 9790.9
## + PITCHING_H 1 2948 277548 9793.0
## + PITCHING_HR 1 2935 277561 9793.1
## + PITCHING_BB 1 1279 279217 9804.9
## + BATTING_BB 1 671 279825 9809.2
## <none> 280496 9811.9
## - FIELDING_E 1 18642 299138 9937.2
## - BASERUN_SB 1 30976 311472 10017.2
## - BsR 1 64054 344550 10216.9
##
## Step: AIC=9714.04
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO
##
## Df Sum of Sq RSS AIC
## + FIELDING_DP 1 12174 254520 9623.6
## + BATTING_2B 1 9278 257416 9646.0
## + BATTING_3B 1 2474 264219 9697.6
## + BATTING_HR 1 2445 264248 9697.8
## + PITCHING_HR 1 2202 264491 9699.6
## + PITCHING_BB 1 1278 265416 9706.5
## + BATTING_BB 1 1272 265422 9706.6
## + BATTING_1B 1 1141 265553 9707.6
## + PITCHING_SO 1 744 265950 9710.5
## + WHGP 1 567 266127 9711.8
## <none> 266694 9714.0
## + PITCHING_SO_BB 1 195 266499 9714.6
## + BATTING_BB_SO 1 98 266596 9715.3
## + BATTING_TB 1 71 266622 9715.5
## + PITCHING_H 1 20 266674 9715.9
## - BATTING_SO 1 13802 280496 9811.9
## - FIELDING_E 1 31729 298422 9934.5
## - BASERUN_SB 1 40651 307345 9992.8
## - BsR 1 60578 327272 10117.1
##
## Step: AIC=9623.58
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP
##
## Df Sum of Sq RSS AIC
## + BATTING_2B 1 11153 243367 9536.9
## + BATTING_HR 1 2508 252012 9606.0
## + PITCHING_HR 1 2395 252124 9606.9
## + BATTING_BB 1 1965 252555 9610.2
## + PITCHING_BB 1 1876 252644 9610.9
## + BATTING_3B 1 1847 252673 9611.2
## + BATTING_1B 1 860 253660 9618.9
## + WHGP 1 713 253807 9620.0
## + PITCHING_SO 1 705 253815 9620.1
## + PITCHING_SO_BB 1 419 254101 9622.3
## + BATTING_TB 1 302 254217 9623.2
## <none> 254520 9623.6
## + BATTING_BB_SO 1 10 254509 9625.5
## + PITCHING_H 1 7 254513 9625.5
## - FIELDING_DP 1 12174 266694 9714.0
## - BATTING_SO 1 18856 273376 9763.0
## - BASERUN_SB 1 27521 282041 9824.8
## - FIELDING_E 1 43737 298256 9935.4
## - BsR 1 68870 323390 10095.5
##
## Step: AIC=9536.91
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B
##
## Df Sum of Sq RSS AIC
## + BATTING_1B 1 1903 241464 9523.4
## + BATTING_3B 1 1662 241704 9525.3
## + PITCHING_SO 1 1083 242284 9530.1
## + BATTING_BB_SO 1 566 242801 9534.3
## + WHGP 1 559 242808 9534.4
## + PITCHING_H 1 352 243015 9536.0
## + PITCHING_BB 1 300 243067 9536.5
## <none> 243367 9536.9
## + BATTING_BB 1 179 243188 9537.4
## + PITCHING_HR 1 168 243199 9537.5
## + BATTING_HR 1 126 243241 9537.9
## + PITCHING_SO_BB 1 88 243279 9538.2
## + BATTING_TB 1 73 243294 9538.3
## - BATTING_2B 1 11153 254520 9623.6
## - FIELDING_DP 1 14049 257416 9646.0
## - BATTING_SO 1 17768 261135 9674.4
## - BASERUN_SB 1 27620 270987 9747.6
## - FIELDING_E 1 49713 293080 9902.8
## - BsR 1 66757 310124 10014.6
##
## Step: AIC=9523.37
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B
##
## Df Sum of Sq RSS AIC
## + BATTING_3B 1 1874 239590 9510.0
## + BATTING_BB_SO 1 1290 240173 9514.8
## + PITCHING_H 1 1233 240230 9515.2
## + PITCHING_SO 1 948 240516 9517.6
## + PITCHING_SO_BB 1 805 240659 9518.8
## + WHGP 1 619 240845 9520.3
## + BATTING_HR 1 487 240977 9521.4
## + PITCHING_HR 1 314 241150 9522.8
## <none> 241464 9523.4
## + BATTING_TB 1 57 241406 9524.9
## + BATTING_BB 1 39 241425 9525.1
## + PITCHING_BB 1 0 241464 9525.4
## - BATTING_1B 1 1903 243367 9536.9
## - BATTING_2B 1 12196 253660 9618.9
## - FIELDING_DP 1 13726 255190 9630.8
## - BATTING_SO 1 15113 256577 9641.5
## - BASERUN_SB 1 29365 270829 9748.5
## - FIELDING_E 1 48637 290101 9884.5
## - BsR 1 61870 303334 9972.8
##
## Step: AIC=9509.95
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B
##
## Df Sum of Sq RSS AIC
## + PITCHING_H 1 1170 238420 9502.3
## + BATTING_BB_SO 1 1129 238461 9502.6
## + PITCHING_SO 1 1013 238577 9503.6
## + WHGP 1 785 238806 9505.5
## + PITCHING_SO_BB 1 409 239181 9508.6
## <none> 239590 9510.0
## + PITCHING_BB 1 55 239536 9511.5
## + PITCHING_HR 1 15 239575 9511.8
## + BATTING_BB 1 3 239587 9511.9
## + BATTING_HR 1 3 239587 9511.9
## + BATTING_TB 1 3 239587 9511.9
## - BATTING_3B 1 1874 241464 9523.4
## - BATTING_1B 1 2114 241704 9525.3
## - BATTING_SO 1 8812 248402 9579.4
## - BATTING_2B 1 12067 251657 9605.2
## - FIELDING_DP 1 13015 252605 9612.6
## - BASERUN_SB 1 26576 266166 9716.1
## - FIELDING_E 1 50423 290013 9885.9
## - BsR 1 58432 298022 9939.8
##
## Step: AIC=9502.26
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H
##
## Df Sum of Sq RSS AIC
## + BATTING_BB_SO 1 856 237564 9497.1
## + PITCHING_HR 1 373 238047 9501.2
## <none> 238420 9502.3
## + PITCHING_SO_BB 1 218 238202 9502.5
## + BATTING_BB 1 54 238366 9503.8
## + PITCHING_BB 1 21 238399 9504.1
## + WHGP 1 21 238399 9504.1
## + BATTING_HR 1 14 238406 9504.1
## + BATTING_TB 1 14 238406 9504.1
## + PITCHING_SO 1 1 238419 9504.3
## - PITCHING_H 1 1170 239590 9510.0
## - BATTING_3B 1 1810 240230 9515.2
## - BATTING_1B 1 2989 241409 9524.9
## - BATTING_SO 1 8301 246721 9568.0
## - FIELDING_DP 1 12904 251324 9604.6
## - BATTING_2B 1 13109 251529 9606.2
## - BASERUN_SB 1 25481 263901 9701.2
## - BsR 1 49967 288387 9876.8
## - FIELDING_E 1 50504 288924 9880.5
##
## Step: AIC=9497.15
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO
##
## Df Sum of Sq RSS AIC
## + PITCHING_HR 1 4998 232566 9457.1
## + BATTING_BB 1 2557 235007 9477.7
## + BATTING_HR 1 2282 235282 9480.0
## + BATTING_TB 1 2282 235282 9480.0
## + WHGP 1 983 236581 9490.9
## + PITCHING_BB 1 983 236581 9490.9
## <none> 237564 9497.1
## + PITCHING_SO_BB 1 43 237521 9498.8
## + PITCHING_SO 1 14 237550 9499.0
## - BATTING_BB_SO 1 856 238420 9502.3
## - PITCHING_H 1 897 238461 9502.6
## - BATTING_3B 1 1678 239242 9509.1
## - BATTING_1B 1 3522 241086 9524.3
## - BATTING_SO 1 6703 244267 9550.2
## - FIELDING_DP 1 12540 250105 9596.9
## - BATTING_2B 1 13932 251496 9607.9
## - BASERUN_SB 1 26254 263818 9702.6
## - BsR 1 40519 278083 9806.8
## - FIELDING_E 1 49930 287494 9872.7
##
## Step: AIC=9457.07
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO +
## PITCHING_HR
##
## Df Sum of Sq RSS AIC
## + PITCHING_SO_BB 1 2273 230294 9439.6
## + PITCHING_BB 1 1233 231333 9448.5
## + WHGP 1 1233 231333 9448.5
## + BATTING_HR 1 796 231771 9452.3
## + BATTING_TB 1 796 231771 9452.3
## <none> 232566 9457.1
## + BATTING_BB 1 192 232375 9457.4
## + PITCHING_SO 1 91 232476 9458.3
## - BATTING_3B 1 832 233399 9462.1
## - PITCHING_HR 1 4998 237564 9497.1
## - PITCHING_H 1 5015 237582 9497.3
## - BATTING_BB_SO 1 5481 238047 9501.2
## - BATTING_1B 1 7862 240429 9520.9
## - BATTING_SO 1 11524 244091 9550.8
## - FIELDING_DP 1 12035 244601 9554.9
## - BATTING_2B 1 14256 246823 9572.8
## - BsR 1 15405 247972 9582.0
## - BASERUN_SB 1 23533 256100 9645.8
## - FIELDING_E 1 54200 286767 9869.7
##
## Step: AIC=9439.64
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO +
## PITCHING_HR + PITCHING_SO_BB
##
## Df Sum of Sq RSS AIC
## + BATTING_BB 1 467 229826 9437.6
## <none> 230294 9439.6
## + PITCHING_BB 1 87 230207 9440.9
## + WHGP 1 87 230207 9440.9
## + BATTING_HR 1 75 230219 9441.0
## + BATTING_TB 1 75 230219 9441.0
## + PITCHING_SO 1 4 230289 9441.6
## - PITCHING_SO_BB 1 2273 232566 9457.1
## - BATTING_3B 1 2728 233022 9460.9
## - BATTING_BB_SO 1 5015 235309 9480.3
## - PITCHING_H 1 7101 237394 9497.7
## - PITCHING_HR 1 7227 237521 9498.8
## - BATTING_1B 1 9649 239943 9518.9
## - BATTING_SO 1 11321 241615 9532.6
## - FIELDING_DP 1 11662 241956 9535.4
## - BATTING_2B 1 14791 245085 9560.8
## - BsR 1 15345 245639 9565.3
## - BASERUN_SB 1 25011 255305 9641.7
## - FIELDING_E 1 56403 286697 9871.2
##
## Step: AIC=9437.62
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO +
## PITCHING_HR + PITCHING_SO_BB + BATTING_BB
##
## Df Sum of Sq RSS AIC
## + PITCHING_BB 1 4833 224993 9397.6
## + WHGP 1 4833 224993 9397.6
## + BATTING_TB 1 1624 228202 9425.6
## + BATTING_HR 1 1624 228202 9425.6
## <none> 229826 9437.6
## + PITCHING_SO 1 217 229610 9437.7
## - BATTING_BB 1 467 230294 9439.6
## - BATTING_3B 1 577 230404 9440.6
## - PITCHING_HR 1 2095 231921 9453.6
## - PITCHING_SO_BB 1 2549 232375 9457.4
## - BATTING_1B 1 2698 232525 9458.7
## - PITCHING_H 1 3055 232881 9461.7
## - BATTING_2B 1 4851 234677 9477.0
## - BsR 1 5057 234883 9478.7
## - BATTING_BB_SO 1 5470 235297 9482.2
## - BATTING_SO 1 9826 239652 9518.5
## - FIELDING_DP 1 12059 241885 9536.8
## - BASERUN_SB 1 25047 254873 9640.3
## - FIELDING_E 1 53516 283342 9849.9
##
## Step: AIC=9397.55
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO +
## PITCHING_HR + PITCHING_SO_BB + BATTING_BB + PITCHING_BB
##
## Df Sum of Sq RSS AIC
## + BATTING_HR 1 580 224413 9394.4
## + BATTING_TB 1 580 224413 9394.4
## - BATTING_3B 1 21 225014 9395.7
## <none> 224993 9397.6
## + PITCHING_SO 1 6 224987 9399.5
## - PITCHING_HR 1 485 225478 9399.8
## - BsR 1 990 225983 9404.2
## - BATTING_1B 1 2158 227151 9414.4
## - BATTING_2B 1 2988 227982 9421.7
## - PITCHING_SO_BB 1 3194 228187 9423.4
## - BATTING_BB_SO 1 3288 228281 9424.3
## - PITCHING_BB 1 4833 229826 9437.6
## - BATTING_BB 1 5214 230207 9440.9
## - PITCHING_H 1 7679 232672 9462.0
## - BATTING_SO 1 9558 234551 9477.9
## - FIELDING_DP 1 12509 237502 9502.6
## - BASERUN_SB 1 24412 249405 9599.4
## - FIELDING_E 1 54589 279582 9825.4
##
## Step: AIC=9394.45
## TARGET_WINS ~ BsR + BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO +
## PITCHING_HR + PITCHING_SO_BB + BATTING_BB + PITCHING_BB +
## BATTING_HR
##
## Df Sum of Sq RSS AIC
## - BsR 1 0 224413 9392.4
## - BATTING_1B 1 118 224531 9393.5
## <none> 224413 9394.4
## - BATTING_2B 1 234 224647 9394.5
## - BATTING_3B 1 261 224675 9394.8
## + PITCHING_SO 1 135 224278 9395.3
## - BATTING_HR 1 580 224993 9397.6
## - PITCHING_HR 1 987 225401 9401.1
## - BATTING_BB_SO 1 2867 227280 9417.6
## - PITCHING_SO_BB 1 3604 228018 9424.0
## - PITCHING_BB 1 3789 228202 9425.6
## - BATTING_BB 1 5788 230202 9442.8
## - PITCHING_H 1 7727 232141 9459.4
## - BATTING_SO 1 9936 234350 9478.2
## - FIELDING_DP 1 12524 236937 9499.9
## - BASERUN_SB 1 23744 248157 9591.5
## - FIELDING_E 1 55135 279549 9827.2
##
## Step: AIC=9392.45
## TARGET_WINS ~ BASERUN_SB + FIELDING_E + BATTING_SO + FIELDING_DP +
## BATTING_2B + BATTING_1B + BATTING_3B + PITCHING_H + BATTING_BB_SO +
## PITCHING_HR + PITCHING_SO_BB + BATTING_BB + PITCHING_BB +
## BATTING_HR
##
## Df Sum of Sq RSS AIC
## <none> 224413 9392.4
## + PITCHING_SO 1 135 224278 9393.3
## + BsR 1 0 224413 9394.4
## - PITCHING_HR 1 994 225408 9399.2
## - BATTING_HR 1 1570 225983 9404.2
## - BATTING_1B 1 1664 226078 9405.1
## - BATTING_BB_SO 1 3126 227540 9417.8
## - PITCHING_SO_BB 1 3788 228201 9423.6
## - PITCHING_BB 1 3806 228219 9423.7
## - BATTING_3B 1 4132 228546 9426.6
## - BATTING_2B 1 5545 229958 9438.7
## - PITCHING_H 1 7736 232150 9457.5
## - BATTING_BB 1 9396 233810 9471.6
## - BATTING_SO 1 9952 234366 9476.3
## - FIELDING_DP 1 12574 236987 9498.3
## - BASERUN_SB 1 23890 248304 9590.6
## - FIELDING_E 1 55267 279681 9826.1
tbl4b <- tidy(model4b)
kable(tbl4b)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 35.1206025 | 7.1375305 | 4.920554 | 0.0000009 |
| BASERUN_SB | 0.0687481 | 0.0047545 | 14.459620 | 0.0000000 |
| FIELDING_E | -0.1019225 | 0.0046344 | -21.992821 | 0.0000000 |
| BATTING_SO | -0.0697889 | 0.0074779 | -9.332629 | 0.0000000 |
| FIELDING_DP | -0.1303356 | 0.0124246 | -10.490110 | 0.0000000 |
| BATTING_2B | -0.0816102 | 0.0117156 | -6.965929 | 0.0000000 |
| BATTING_1B | -0.0391703 | 0.0102630 | -3.816632 | 0.0001395 |
| BATTING_3B | 0.1140118 | 0.0189591 | 6.013576 | 0.0000000 |
| PITCHING_H | 0.0713807 | 0.0086749 | 8.228374 | 0.0000000 |
| BATTING_BB_SO | -18.2350168 | 3.4860528 | -5.230849 | 0.0000002 |
| PITCHING_HR | -0.1916832 | 0.0649763 | -2.950049 | 0.0032148 |
| PITCHING_SO_BB | 17.4786242 | 3.0357689 | 5.757561 | 0.0000000 |
| BATTING_BB | 0.2649660 | 0.0292194 | 9.068154 | 0.0000000 |
| PITCHING_BB | -0.1495224 | 0.0259075 | -5.771391 | 0.0000000 |
| BATTING_HR | 0.2499088 | 0.0674217 | 3.706655 | 0.0002158 |
kable(glance(model4b))
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.4122059 | 0.4080159 | 10.68941 | 98.37899 | 0 | 15 | -7489.303 | 15010.61 | 15100.05 | 224413.5 | 1964 |
residualPlots(model4b)
## Test stat Pr(>|t|)
## BASERUN_SB 0.907 0.364
## FIELDING_E 2.864 0.004
## BATTING_SO -0.317 0.752
## FIELDING_DP 3.975 0.000
## BATTING_2B 0.120 0.904
## BATTING_1B 0.437 0.662
## BATTING_3B -0.251 0.802
## PITCHING_H 0.532 0.595
## BATTING_BB_SO -0.702 0.483
## PITCHING_HR -0.184 0.854
## PITCHING_SO_BB 0.430 0.667
## BATTING_BB -0.396 0.692
## PITCHING_BB 0.544 0.587
## BATTING_HR -0.434 0.664
## Tukey test -1.496 0.135
qqPlot(model4b, id.n=3, main="Q-Q Plot")
## 1762 346 1577
## 1 1978 1979
influenceIndexPlot(model4b, id.n=3)
influencePlot(model4b, id.n=3)
## StudRes Hat CookD
## 217 2.0984504 0.055521218 0.017227449
## 339 -1.1923252 0.093476597 0.009770771
## 346 3.9683849 0.017950392 0.019047053
## 634 0.9298904 0.155841394 0.010642924
## 1379 1.1064085 0.110909714 0.010179207
## 1577 4.5059661 0.019280666 0.026352034
## 1762 -3.8068314 0.009695845 0.009394634
hist(model4b$residuals, main="Histogram of Residuals")
Using the bidirectional approach, BsR was removed from the model. Once BsR was removed, BATTING_2B, BATTING_1B and BATTING_3B regained their significance. This is ikely caused by collinearity within the variables as BsR is a derived stat based in large part on hits. Because BsR was found to not add predictive ability to our model, Model 4B is the superior model with a higher F-statistic and slightly improved adjusted R squared and AIC values.
kable(glance(model4b))
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.4122059 | 0.4080159 | 10.68941 | 98.37899 | 0 | 15 | -7489.303 | 15010.61 | 15100.05 | 224413.5 | 1964 |
kable(glance(model4))
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.4122061 | 0.4077145 | 10.69213 | 91.77372 | 0 | 16 | -7489.303 | 15012.61 | 15107.64 | 224413.4 | 1963 |