0 which are impossible## Warning: package 'faraway' was built under R version 3.4.4
## Warning: package 'stringr' was built under R version 3.4.4
## Warning: package 'corrplot' was built under R version 3.4.4
| na_count | |
|---|---|
| INDEX | 0 |
| TEAM_BATTING_H | 0 |
| TEAM_BATTING_2B | 0 |
| TEAM_BATTING_3B | 0 |
| TEAM_BATTING_HR | 0 |
| TEAM_BATTING_BB | 0 |
| TEAM_BATTING_SO | 18 |
| TEAM_BASERUN_SB | 13 |
| TEAM_BASERUN_CS | 87 |
| TEAM_BATTING_HBP | 240 |
| TEAM_PITCHING_H | 0 |
| TEAM_PITCHING_HR | 0 |
| TEAM_PITCHING_BB | 0 |
| TEAM_PITCHING_SO | 18 |
| TEAM_FIELDING_E | 0 |
| TEAM_FIELDING_DP | 31 |
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| INDEX | 1 | 2276 | 1268.46353 | 736.34904 | 1270.5 | 1268.56970 | 952.5705 | 1 | 2535 | 2534 | 0.0042149 | -1.2167564 | 15.4346788 |
| TARGET_WINS | 2 | 2276 | 80.79086 | 15.75215 | 82.0 | 81.31229 | 14.8260 | 0 | 146 | 146 | -0.3987232 | 1.0274757 | 0.3301823 |
| TEAM_BATTING_H | 3 | 2276 | 1469.26977 | 144.59120 | 1454.0 | 1459.04116 | 114.1602 | 891 | 2554 | 1663 | 1.5713335 | 7.2785261 | 3.0307891 |
| TEAM_BATTING_2B | 4 | 2276 | 241.24692 | 46.80141 | 238.0 | 240.39627 | 47.4432 | 69 | 458 | 389 | 0.2151018 | 0.0061609 | 0.9810087 |
| TEAM_BATTING_3B | 5 | 2276 | 55.25000 | 27.93856 | 47.0 | 52.17563 | 23.7216 | 0 | 223 | 223 | 1.1094652 | 1.5032418 | 0.5856226 |
| TEAM_BATTING_HR | 6 | 2276 | 99.61204 | 60.54687 | 102.0 | 97.38529 | 78.5778 | 0 | 264 | 264 | 0.1860421 | -0.9631189 | 1.2691285 |
| TEAM_BATTING_BB | 7 | 2276 | 501.55888 | 122.67086 | 512.0 | 512.18331 | 94.8864 | 0 | 878 | 878 | -1.0257599 | 2.1828544 | 2.5713150 |
| TEAM_BATTING_SO | 8 | 2174 | 735.60534 | 248.52642 | 750.0 | 742.31322 | 284.6592 | 0 | 1399 | 1399 | -0.2978001 | -0.3207992 | 5.3301912 |
| TEAM_BASERUN_SB | 9 | 2145 | 124.76177 | 87.79117 | 101.0 | 110.81188 | 60.7866 | 0 | 697 | 697 | 1.9724140 | 5.4896754 | 1.8955584 |
| TEAM_BASERUN_CS | 10 | 1504 | 52.80386 | 22.95634 | 49.0 | 50.35963 | 17.7912 | 0 | 201 | 201 | 1.9762180 | 7.6203818 | 0.5919414 |
| TEAM_BATTING_HBP | 11 | 191 | 59.35602 | 12.96712 | 58.0 | 58.86275 | 11.8608 | 29 | 95 | 66 | 0.3185754 | -0.1119828 | 0.9382681 |
| TEAM_PITCHING_H | 12 | 2276 | 1779.21046 | 1406.84293 | 1518.0 | 1555.89517 | 174.9468 | 1137 | 30132 | 28995 | 10.3295111 | 141.8396985 | 29.4889618 |
| TEAM_PITCHING_HR | 13 | 2276 | 105.69859 | 61.29875 | 107.0 | 103.15697 | 74.1300 | 0 | 343 | 343 | 0.2877877 | -0.6046311 | 1.2848886 |
| TEAM_PITCHING_BB | 14 | 2276 | 553.00791 | 166.35736 | 536.5 | 542.62459 | 98.5929 | 0 | 3645 | 3645 | 6.7438995 | 96.9676398 | 3.4870317 |
| TEAM_PITCHING_SO | 15 | 2174 | 817.73045 | 553.08503 | 813.5 | 796.93391 | 257.2311 | 0 | 19278 | 19278 | 22.1745535 | 671.1891292 | 11.8621151 |
| TEAM_FIELDING_E | 16 | 2276 | 246.48067 | 227.77097 | 159.0 | 193.43798 | 62.2692 | 65 | 1898 | 1833 | 2.9904656 | 10.9702717 | 4.7743279 |
| TEAM_FIELDING_DP | 17 | 1990 | 146.38794 | 26.22639 | 149.0 | 147.57789 | 23.7216 | 52 | 228 | 176 | -0.3889390 | 0.1817397 | 0.5879114 |
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| X1 | 1 | 2276 | 80.79086 | 15.75215 | 82 | 81.31229 | 14.826 | 0 | 146 | 146 | -0.3987232 | 1.027476 | 0.3301823 |
Although not exact as seasons since 1800’s havent all been 162 games, but my interval test shows a left tail interval of 80.35-81 which our 80.79 mean falls within confidently
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| X1 | 1 | 2255 | 80.58537 | 15.05826 | 82 | 81.18726 | 14.826 | 21 | 116 | 95 | -0.4395385 | 0.3152447 | 0.3171039 |
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TEAM_BATTING_H | 1 | 2255 | 1466.81685 | 135.92839 | 1454 | 1458.24377 | 114.1602 | 992 | 2496 | 1504 | 1.2098420 | 4.8627441 | 2.8624434 |
| TEAM_BATTING_2B | 2 | 2255 | 241.09490 | 46.28571 | 238 | 240.29695 | 47.4432 | 69 | 458 | 389 | 0.2012387 | -0.0427041 | 0.9747061 |
| TEAM_BATTING_3B | 3 | 2255 | 54.93348 | 27.63678 | 47 | 51.89695 | 23.7216 | 0 | 223 | 223 | 1.1205339 | 1.5849716 | 0.5819881 |
| TEAM_BATTING_HR | 4 | 2255 | 100.22395 | 60.39496 | 103 | 98.07867 | 77.0952 | 0 | 264 | 264 | 0.1771368 | -0.9584953 | 1.2718252 |
| TEAM_BATTING_BB | 5 | 2255 | 503.80089 | 119.41435 | 513 | 513.29640 | 93.4038 | 29 | 878 | 849 | -0.9614819 | 2.1131941 | 2.5146829 |
| TEAM_BATTING_SO | 6 | 2155 | 739.51323 | 244.66509 | 753 | 744.96464 | 284.6592 | 0 | 1399 | 1399 | -0.2526910 | -0.4140827 | 5.2704582 |
| TEAM_BATTING_HBP | 7 | 191 | 59.35602 | 12.96712 | 58 | 58.86275 | 11.8608 | 29 | 95 | 66 | 0.3185754 | -0.1119828 | 0.9382681 |
| Singles | 8 | 2255 | 1070.56452 | 121.79442 | 1049 | 1058.05319 | 97.8516 | 709 | 2112 | 1403 | 1.7574832 | 6.7989398 | 2.5648036 |
## TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B
## breaks Integer,17 Integer,10 Numeric,13
## counts Integer,16 Integer,9 Integer,12
## density Numeric,16 Numeric,9 Numeric,12
## mids Numeric,16 Numeric,9 Numeric,12
## xname "dots[[1L]][[1L]]" "dots[[1L]][[2L]]" "dots[[1L]][[3L]]"
## equidist TRUE TRUE TRUE
## TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO
## breaks Numeric,15 Numeric,19 Numeric,15
## counts Integer,14 Integer,18 Integer,14
## density Numeric,14 Numeric,18 Numeric,14
## mids Numeric,14 Numeric,18 Numeric,14
## xname "dots[[1L]][[4L]]" "dots[[1L]][[5L]]" "dots[[1L]][[6L]]"
## equidist TRUE TRUE TRUE
## TEAM_BATTING_HBP Singles
## breaks Integer,9 Integer,16
## counts Integer,8 Integer,15
## density Numeric,8 Numeric,15
## mids Numeric,8 Numeric,15
## xname "dots[[1L]][[7L]]" "dots[[1L]][[8L]]"
## equidist TRUE TRUE
## TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B TEAM_BATTING_HR
## stats Numeric,5 Numeric,5 Numeric,5 Numeric,5
## n 2255 2255 2255 2255
## conf Numeric,2 Numeric,2 Numeric,2 Numeric,2
## out Numeric,61 Numeric,11 Numeric,26 Numeric,0
## group Numeric,61 Numeric,11 Numeric,26 Numeric,0
## names "" "" "" ""
## TEAM_BATTING_BB TEAM_BATTING_SO TEAM_BATTING_HBP Singles
## stats Numeric,5 Numeric,5 Numeric,5 Numeric,5
## n 2255 2155 191 2255
## conf Numeric,2 Numeric,2 Numeric,2 Numeric,2
## out Numeric,120 Numeric,0 95 Numeric,70
## group Numeric,120 Numeric,0 1 Numeric,70
## names "" "" "" ""
0's per category. Given that the other categories in these observations seem in line, some sort of data entry mistake was likely made again. In this case, lets replace the 0’s with NA.| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TEAM_BATTING_H | 1 | 2255 | 1466.81685 | 135.92839 | 1454.0 | 1458.24377 | 114.1602 | 992 | 2496 | 1504 | 1.2098420 | 4.8627441 | 2.8624434 |
| TEAM_BATTING_2B | 2 | 2255 | 241.09490 | 46.28571 | 238.0 | 240.29695 | 47.4432 | 69 | 458 | 389 | 0.2012387 | -0.0427041 | 0.9747061 |
| TEAM_BATTING_3B | 3 | 2254 | 54.95785 | 27.61866 | 47.0 | 51.91075 | 23.7216 | 8 | 223 | 215 | 1.1240833 | 1.5881365 | 0.5817357 |
| TEAM_BATTING_HR | 4 | 2244 | 100.71524 | 60.13266 | 104.0 | 98.55679 | 75.6126 | 3 | 264 | 261 | 0.1785408 | -0.9565209 | 1.2694014 |
| TEAM_BATTING_BB | 5 | 2255 | 503.80089 | 119.41435 | 513.0 | 513.29640 | 93.4038 | 29 | 878 | 849 | -0.9614819 | 2.1131941 | 2.5146829 |
| TEAM_BATTING_SO | 6 | 2140 | 744.69673 | 237.52652 | 756.5 | 747.73598 | 283.9179 | 67 | 1399 | 1332 | -0.1320162 | -0.7184334 | 5.1345834 |
| TEAM_BATTING_HBP | 7 | 191 | 59.35602 | 12.96712 | 58.0 | 58.86275 | 11.8608 | 29 | 95 | 66 | 0.3185754 | -0.1119828 | 0.9382681 |
| Singles | 8 | 2255 | 1070.56452 | 121.79442 | 1049.0 | 1058.05319 | 97.8516 | 709 | 2112 | 1403 | 1.7574832 | 6.7989398 | 2.5648036 |
| na_count | |
|---|---|
| INDEX | 0 |
| TARGET_WINS | 0 |
| TEAM_BATTING_H | 0 |
| TEAM_BATTING_2B | 0 |
| TEAM_BATTING_3B | 1 |
| TEAM_BATTING_HR | 11 |
| TEAM_BATTING_BB | 0 |
| TEAM_BATTING_SO | 115 |
| TEAM_BASERUN_SB | 121 |
| TEAM_BASERUN_CS | 757 |
| TEAM_BATTING_HBP | 2064 |
| TEAM_PITCHING_H | 0 |
| TEAM_PITCHING_HR | 0 |
| TEAM_PITCHING_BB | 0 |
| TEAM_PITCHING_SO | 100 |
| TEAM_FIELDING_E | 0 |
| TEAM_FIELDING_DP | 271 |
| Singles | 0 |
## Change in batting dataset
batting <- batting %>%
filter(TEAM_BATTING_3B < 154 &
TEAM_BATTING_3B > 10 &
TEAM_BATTING_2B < 377 &
TEAM_BATTING_2B > 109 &
TEAM_BATTING_H < 1784 &
TEAM_BATTING_BB > 281 &
TEAM_BATTING_BB < 836 &
Singles < 1339 &
Singles > 810 )
# FILL Strikeout NA's with median value
batting_w_na <- batting
batting$TEAM_BATTING_SO[is.na(batting$TEAM_BATTING_SO)] <- median(batting$TEAM_BATTING_SO, na.rm=TRUE)
#lmodr <- lm(logit(TEAM_BATTING_SO/100)~TEAM_BATTING_3B+TEAM_BATTING_2B+TEAM_BATTING_H+TEAM_BATTING_BB,batting)
#ilogit(predict(lmodr,batting[is.na(batting$TEAM_BATTING_SO),]))*100
#summary(lmodr)
na_count <-sapply(batting, function(y) sum(length(which(is.na(y)))))
na_count <- data.frame(na_count)
kable(na_count,caption="NA counts")| na_count | |
|---|---|
| TEAM_BATTING_H | 0 |
| TEAM_BATTING_2B | 0 |
| TEAM_BATTING_3B | 0 |
| TEAM_BATTING_HR | 0 |
| TEAM_BATTING_BB | 0 |
| TEAM_BATTING_SO | 0 |
| TEAM_BATTING_HBP | 1901 |
| Singles | 0 |
## TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B TEAM_BATTING_HR
## stats Numeric,5 Numeric,5 Numeric,5 Numeric,5
## n 2092 2092 2092 2092
## conf Numeric,2 Numeric,2 Numeric,2 Numeric,2
## out Numeric,15 Numeric,6 Numeric,29 Numeric,0
## group Numeric,15 Numeric,6 Numeric,29 Numeric,0
## names "" "" "" ""
## TEAM_BATTING_BB TEAM_BATTING_SO TEAM_BATTING_HBP Singles
## stats Numeric,5 Numeric,5 Numeric,5 Numeric,5
## n 2092 2092 191 2092
## conf Numeric,2 Numeric,2 Numeric,2 Numeric,2
## out Numeric,12 Numeric,0 95 Numeric,15
## group Numeric,12 Numeric,0 1 Numeric,15
## names "" "" "" ""
| na_count | |
|---|---|
| INDEX | 0 |
| TARGET_WINS | 0 |
| TEAM_BATTING_H | 0 |
| TEAM_BATTING_2B | 0 |
| TEAM_BATTING_3B | 0 |
| TEAM_BATTING_HR | 0 |
| TEAM_BATTING_BB | 0 |
| TEAM_BATTING_SO | 0 |
| TEAM_BASERUN_SB | 0 |
| TEAM_BASERUN_CS | 0 |
| TEAM_BATTING_HBP | 1853 |
| TEAM_PITCHING_H | 0 |
| TEAM_PITCHING_HR | 0 |
| TEAM_PITCHING_BB | 0 |
| TEAM_PITCHING_SO | 100 |
| TEAM_FIELDING_E | 0 |
| TEAM_FIELDING_DP | 0 |
| Singles | 0 |
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TARGET_WINS | 1 | 2044 | 80.52789 | 13.89309 | 81.0 | 80.96638 | 14.8260 | 21 | 116 | 95 | -0.3090862 | -0.0162643 | 0.3072970 |
| TEAM_PITCHING_H | 2 | 2044 | 1539.55479 | 220.16288 | 1498.0 | 1513.97127 | 146.7774 | 1137 | 6723 | 5586 | 7.2180667 | 150.1985045 | 4.8697161 |
| TEAM_PITCHING_HR | 3 | 2044 | 111.71771 | 59.92485 | 114.0 | 110.10391 | 68.1996 | 3 | 343 | 340 | 0.1784958 | -0.5918350 | 1.3254597 |
| TEAM_PITCHING_BB | 4 | 2044 | 553.66830 | 109.40697 | 541.0 | 546.10452 | 91.9212 | 312 | 2169 | 1857 | 2.2520497 | 23.8773902 | 2.4199397 |
| TEAM_PITCHING_SO | 5 | 1944 | 807.61060 | 228.05424 | 818.0 | 803.06427 | 250.5594 | 301 | 2309 | 2008 | 0.3567090 | 0.6392627 | 5.1723752 |
| TEAM_FIELDING_E | 6 | 2044 | 188.89922 | 109.36885 | 150.5 | 166.38570 | 49.6671 | 65 | 765 | 700 | 2.3312285 | 5.9463732 | 2.4190964 |
| TEAM_FIELDING_DP | 7 | 2044 | 147.84883 | 24.20592 | 149.0 | 148.80746 | 20.7564 | 68 | 228 | 160 | -0.3664606 | 0.4738264 | 0.5354035 |
## TARGET_WINS TEAM_PITCHING_H TEAM_PITCHING_HR
## breaks Integer,11 Integer,13 Numeric,8
## counts Integer,10 Integer,12 Integer,7
## density Numeric,10 Numeric,12 Numeric,7
## mids Numeric,10 Numeric,12 Numeric,7
## xname "dots[[1L]][[1L]]" "dots[[1L]][[2L]]" "dots[[1L]][[3L]]"
## equidist TRUE TRUE TRUE
## TEAM_PITCHING_BB TEAM_PITCHING_SO TEAM_FIELDING_E
## breaks Integer,11 Integer,12 Integer,16
## counts Integer,10 Integer,11 Integer,15
## density Numeric,10 Numeric,11 Numeric,15
## mids Numeric,10 Numeric,11 Numeric,15
## xname "dots[[1L]][[4L]]" "dots[[1L]][[5L]]" "dots[[1L]][[6L]]"
## equidist TRUE TRUE TRUE
## TEAM_FIELDING_DP
## breaks Integer,18
## counts Integer,17
## density Numeric,17
## mids Numeric,17
## xname "dots[[1L]][[7L]]"
## equidist TRUE
## TARGET_WINS TEAM_PITCHING_H TEAM_PITCHING_HR TEAM_PITCHING_BB
## stats Numeric,5 Numeric,5 Numeric,5 Numeric,5
## n 2044 2044 2044 2044
## conf Numeric,2 Numeric,2 Numeric,2 Numeric,2
## out Numeric,7 Numeric,99 Numeric,3 Numeric,49
## group Numeric,7 Numeric,99 Numeric,3 Numeric,49
## names "" "" "" ""
## TEAM_PITCHING_SO TEAM_FIELDING_E TEAM_FIELDING_DP
## stats Numeric,5 Numeric,5 Numeric,5
## n 1944 2044 2044
## conf Numeric,2 Numeric,2 Numeric,2
## out Numeric,9 Numeric,178 Numeric,56
## group Numeric,9 Numeric,178 Numeric,56
## names "" "" ""
## total_equal_categories
## 1 785
| na_count | |
|---|---|
| INDEX | 0 |
| TARGET_WINS | 0 |
| TEAM_BATTING_H | 0 |
| TEAM_BATTING_2B | 0 |
| TEAM_BATTING_3B | 0 |
| TEAM_BATTING_HR | 0 |
| TEAM_BATTING_BB | 0 |
| TEAM_BATTING_SO | 0 |
| TEAM_BASERUN_SB | 0 |
| TEAM_BASERUN_CS | 0 |
| TEAM_BATTING_HBP | 1844 |
| TEAM_PITCHING_H | 0 |
| TEAM_PITCHING_HR | 0 |
| TEAM_PITCHING_BB | 0 |
| TEAM_PITCHING_SO | 100 |
| TEAM_FIELDING_E | 0 |
| TEAM_FIELDING_DP | 0 |
| Singles | 0 |
| slugging | 0 |
| OBP | 0 |
corrplot(cor(money_ball_train_2[,c(2,20,19)])[,1, drop=FALSE], cl.pos='n',method="number", number.cex = .7)##
## Call:
## lm(formula = y ~ TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_BB +
## TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_FIELDING_E + TEAM_FIELDING_DP +
## Singles, data = fit_1_db)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.987 -7.593 0.217 7.144 37.095
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.431876 5.799453 8.006 1.97e-15 ***
## TEAM_BATTING_3B 0.189918 0.017127 11.089 < 2e-16 ***
## TEAM_BATTING_HR 0.116007 0.007590 15.285 < 2e-16 ***
## TEAM_BATTING_BB 0.031182 0.003161 9.863 < 2e-16 ***
## TEAM_BATTING_SO -0.015229 0.002245 -6.784 1.53e-11 ***
## TEAM_BASERUN_SB 0.079259 0.005247 15.106 < 2e-16 ***
## TEAM_FIELDING_E -0.077432 0.003876 -19.979 < 2e-16 ***
## TEAM_FIELDING_DP -0.112248 0.011893 -9.438 < 2e-16 ***
## Singles 0.027791 0.004155 6.689 2.90e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.04 on 2025 degrees of freedom
## Multiple R-squared: 0.3701, Adjusted R-squared: 0.3677
## F-statistic: 148.8 on 8 and 2025 DF, p-value: < 2.2e-16
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## TEAM_BATTING_3B 1 2208 2208 18.115 2.175e-05 ***
## TEAM_BATTING_HR 1 49985 49985 410.082 < 2.2e-16 ***
## TEAM_BATTING_BB 1 14374 14374 117.927 < 2.2e-16 ***
## TEAM_BATTING_SO 1 1906 1906 15.634 7.952e-05 ***
## TEAM_BASERUN_SB 1 8190 8190 67.193 4.316e-16 ***
## TEAM_FIELDING_E 1 53143 53143 435.990 < 2.2e-16 ***
## TEAM_FIELDING_DP 1 9795 9795 80.362 < 2.2e-16 ***
## Singles 1 5453 5453 44.738 2.905e-11 ***
## Residuals 2025 246829 122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.9845162 0.2427866 16.4116 < 2.2e-16
## fitted(fit_1) -0.0155285 0.0030003 -5.1756 2.496e-07
##
## n = 2034, p = 2, Residual SE = 1.14270, R-Squared = 0.01
##
## Anderson-Darling normality test
##
## data: fit_1$residuals
## A = 0.28768, p-value = 0.6194
##
## Shapiro-Wilk normality test
##
## data: residuals(fit_1)
## W = 0.99913, p-value = 0.4494
##
## Durbin-Watson test
##
## data: y ~ TEAM_BASERUN_CS + TEAM_BASERUN_SB + TEAM_BATTING_2B + TEAM_BATTING_3B + TEAM_BATTING_BB + TEAM_BATTING_HR + TEAM_FIELDING_DP + TEAM_FIELDING_E
## DW = 1.0338, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0
## [1] -3.295338
## 761 1433 1691 1808 1810
## -3.403779 3.320870 3.381547 -3.433216 -3.460660
##
## Call:
## lm(formula = y ~ TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_BB +
## TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_BASERUN_CS + TEAM_FIELDING_E +
## TEAM_FIELDING_DP + Singles, data = fit_1b_db)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.779 -7.474 0.135 7.246 35.294
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.515840 5.771956 8.925 < 2e-16 ***
## TEAM_BATTING_3B 0.186005 0.016825 11.055 < 2e-16 ***
## TEAM_BATTING_HR 0.110316 0.007542 14.627 < 2e-16 ***
## TEAM_BATTING_BB 0.029713 0.003116 9.535 < 2e-16 ***
## TEAM_BATTING_SO -0.016821 0.002210 -7.613 4.10e-14 ***
## TEAM_BASERUN_SB 0.090115 0.005512 16.348 < 2e-16 ***
## TEAM_BASERUN_CS -0.081294 0.014844 -5.477 4.88e-08 ***
## TEAM_FIELDING_E -0.084660 0.004052 -20.892 < 2e-16 ***
## TEAM_FIELDING_DP -0.109285 0.011692 -9.347 < 2e-16 ***
## Singles 0.029337 0.004080 7.190 9.07e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.82 on 2019 degrees of freedom
## Multiple R-squared: 0.3852, Adjusted R-squared: 0.3825
## F-statistic: 140.6 on 9 and 2019 DF, p-value: < 2.2e-16
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## TEAM_BATTING_3B 1 2361 2361 20.1681 7.493e-06 ***
## TEAM_BATTING_HR 1 48976 48976 418.4169 < 2.2e-16 ***
## TEAM_BATTING_BB 1 14081 14081 120.2952 < 2.2e-16 ***
## TEAM_BATTING_SO 1 2401 2401 20.5123 6.271e-06 ***
## TEAM_BASERUN_SB 1 8617 8617 73.6138 < 2.2e-16 ***
## TEAM_BASERUN_CS 1 261 261 2.2316 0.1354
## TEAM_FIELDING_E 1 56180 56180 479.9588 < 2.2e-16 ***
## TEAM_FIELDING_DP 1 9170 9170 78.3399 < 2.2e-16 ***
## Singles 1 6052 6052 51.7002 9.065e-13 ***
## Residuals 2019 236326 117
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call:
## lm(formula = y ~ TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_BB +
## TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_FIELDING_E + TEAM_FIELDING_DP +
## Singles, data = fit_1_db.c)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.987 -7.593 0.217 7.144 37.095
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 80.4784 0.2448 328.753 < 2e-16 ***
## TEAM_BATTING_3B 4.6345 0.4179 11.089 < 2e-16 ***
## TEAM_BATTING_HR 6.7045 0.4386 15.285 < 2e-16 ***
## TEAM_BATTING_BB 2.7601 0.2798 9.863 < 2e-16 ***
## TEAM_BATTING_SO -3.3121 0.4882 -6.784 1.53e-11 ***
## TEAM_BASERUN_SB 5.6539 0.3743 15.106 < 2e-16 ***
## TEAM_FIELDING_E -8.4613 0.4235 -19.979 < 2e-16 ***
## TEAM_FIELDING_DP -2.7208 0.2883 -9.438 < 2e-16 ***
## Singles 2.5687 0.3840 6.689 2.90e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.04 on 2025 degrees of freedom
## Multiple R-squared: 0.3701, Adjusted R-squared: 0.3677
## F-statistic: 148.8 on 8 and 2025 DF, p-value: < 2.2e-16
## vars n mean sd median trimmed mad min max
## TEAM_BASERUN_SB 1 2034 116.60 71.33 100 106.99 59.30 18 414
## TEAM_BASERUN_CS 2 2034 51.94 18.39 50 50.12 10.38 11 186
## TEAM_FIELDING_E 3 2034 189.06 109.27 151 166.62 50.41 65 765
## TEAM_FIELDING_DP 4 2034 147.84 24.24 149 148.80 21.50 68 228
## slugging 5 2034 2118.54 238.31 2128 2120.70 240.92 1453 2832
## OBP 6 2034 1976.50 152.78 1975 1975.51 145.29 1438 2507
## range skew kurtosis se
## TEAM_BASERUN_SB 396 1.35 2.00 1.58
## TEAM_BASERUN_CS 175 2.03 8.49 0.41
## TEAM_FIELDING_E 700 2.33 5.96 2.42
## TEAM_FIELDING_DP 160 -0.37 0.47 0.54
## slugging 1379 -0.08 -0.24 5.28
## OBP 1069 0.05 0.25 3.39
##
## Call:
## lm(formula = y ~ ., data = fit_2_db)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.560 -8.230 0.086 8.084 34.943
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.283586 3.617509 5.331 1.09e-07 ***
## TEAM_BASERUN_SB 0.071440 0.005369 13.307 < 2e-16 ***
## TEAM_BASERUN_CS -0.054363 0.015069 -3.608 0.000317 ***
## TEAM_FIELDING_E -0.054537 0.003476 -15.691 < 2e-16 ***
## TEAM_FIELDING_DP -0.106807 0.012085 -8.838 < 2e-16 ***
## slugging 0.003939 0.001917 2.055 0.040047 *
## OBP 0.037159 0.002788 13.329 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.55 on 2027 degrees of freedom
## Multiple R-squared: 0.3096, Adjusted R-squared: 0.3075
## F-statistic: 151.5 on 6 and 2027 DF, p-value: < 2.2e-16
Outliers-
## Start: AIC=9751.67
## y ~ TEAM_BATTING_2B + TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_BB +
## TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_BASERUN_CS + TEAM_FIELDING_E +
## TEAM_FIELDING_DP + Singles + slugging + OBP
##
##
## Step: AIC=9751.67
## y ~ TEAM_BATTING_2B + TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_BB +
## TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_BASERUN_CS + TEAM_FIELDING_E +
## TEAM_FIELDING_DP + Singles + slugging
##
##
## Step: AIC=9751.67
## y ~ TEAM_BATTING_2B + TEAM_BATTING_3B + TEAM_BATTING_HR + TEAM_BATTING_BB +
## TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_BASERUN_CS + TEAM_FIELDING_E +
## TEAM_FIELDING_DP + Singles
##
## Df Sum of Sq RSS AIC
## <none> 243111 9751.7
## - TEAM_BATTING_2B 1 258 243370 9751.8
## - TEAM_BASERUN_CS 1 3374 246485 9777.7
## - Singles 1 5897 249008 9798.4
## - TEAM_BATTING_SO 1 6229 249340 9801.1
## - TEAM_FIELDING_DP 1 9905 253016 9830.9
## - TEAM_BATTING_BB 1 11029 254141 9839.9
## - TEAM_BATTING_3B 1 15474 258585 9875.2
## - TEAM_BATTING_HR 1 23336 266447 9936.1
## - TEAM_BASERUN_SB 1 31319 274430 9996.1
## - TEAM_FIELDING_E 1 51195 294307 10138.4
##
## Call:
## lm(formula = y ~ TEAM_BATTING_2B + TEAM_BATTING_3B + TEAM_BATTING_HR +
## TEAM_BATTING_BB + TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_BASERUN_CS +
## TEAM_FIELDING_E + TEAM_FIELDING_DP + Singles, data = fit_1_db)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.859 -7.587 0.197 7.363 37.316
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.478451 5.841835 8.812 < 2e-16 ***
## TEAM_BATTING_2B -0.010004 0.006823 -1.466 0.143
## TEAM_BATTING_3B 0.195439 0.017223 11.347 < 2e-16 ***
## TEAM_BATTING_HR 0.113537 0.008148 13.935 < 2e-16 ***
## TEAM_BATTING_BB 0.030298 0.003163 9.580 < 2e-16 ***
## TEAM_BATTING_SO -0.016108 0.002237 -7.200 8.47e-13 ***
## TEAM_BASERUN_SB 0.090137 0.005583 16.144 < 2e-16 ***
## TEAM_BASERUN_CS -0.079757 0.015053 -5.299 1.29e-07 ***
## TEAM_FIELDING_E -0.086042 0.004169 -20.640 < 2e-16 ***
## TEAM_FIELDING_DP -0.107498 0.011841 -9.079 < 2e-16 ***
## Singles 0.029968 0.004278 7.005 3.35e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.96 on 2023 degrees of freedom
## Multiple R-squared: 0.3796, Adjusted R-squared: 0.3766
## F-statistic: 123.8 on 10 and 2023 DF, p-value: < 2.2e-16
lambda <- MASS::boxcox(lm(y~.,fit_1_db),lambda=seq(0.5,2.5,by=0.1))my_x <- lambda$x
my_y <- lambda$y
boxpower <- cbind(my_y,my_x)
#boxpower[order(-my_y),]
my_power <- 1.34
fit_4 <- lm(y^(1.34/1)~.,fit_1_db)
fit_4 <- update(fit_4, . ~ . -slugging)
fit_4 <- update(fit_4, . ~ . -OBP)
fit_4 <- update(fit_4, . ~ . -TEAM_BATTING_2B)
layout(matrix(c(1,2,3,4),2,2))
plot(fit_4)summary(fit_4)##
## Call:
## lm(formula = y^(1.34/1) ~ TEAM_BATTING_3B + TEAM_BATTING_HR +
## TEAM_BATTING_BB + TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_BASERUN_CS +
## TEAM_FIELDING_E + TEAM_FIELDING_DP + Singles, data = fit_1_db)
##
## Residuals:
## Min 1Q Median 3Q Max
## -217.917 -44.612 -0.329 42.735 220.299
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 191.90759 34.49281 5.564 2.99e-08 ***
## TEAM_BATTING_3B 1.12718 0.10045 11.222 < 2e-16 ***
## TEAM_BATTING_HR 0.64807 0.04510 14.369 < 2e-16 ***
## TEAM_BATTING_BB 0.17726 0.01860 9.533 < 2e-16 ***
## TEAM_BATTING_SO -0.09707 0.01321 -7.350 2.86e-13 ***
## TEAM_BASERUN_SB 0.52484 0.03297 15.918 < 2e-16 ***
## TEAM_BASERUN_CS -0.47928 0.08882 -5.396 7.61e-08 ***
## TEAM_FIELDING_E -0.48928 0.02415 -20.258 < 2e-16 ***
## TEAM_FIELDING_DP -0.64199 0.06990 -9.184 < 2e-16 ***
## Singles 0.16534 0.02437 6.784 1.53e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 64.74 on 2024 degrees of freedom
## Multiple R-squared: 0.3752, Adjusted R-squared: 0.3724
## F-statistic: 135 on 9 and 2024 DF, p-value: < 2.2e-16
anova(fit_4)## Analysis of Variance Table
##
## Response: y^(1.34/1)
## Df Sum Sq Mean Sq F value Pr(>F)
## TEAM_BATTING_3B 1 92055 92055 21.9623 2.966e-06 ***
## TEAM_BATTING_HR 1 1720434 1720434 410.4554 < 2.2e-16 ***
## TEAM_BATTING_BB 1 505824 505824 120.6777 < 2.2e-16 ***
## TEAM_BATTING_SO 1 71413 71413 17.0375 3.814e-05 ***
## TEAM_BASERUN_SB 1 289337 289337 69.0292 < 2.2e-16 ***
## TEAM_BASERUN_CS 1 8182 8182 1.9521 0.1625
## TEAM_FIELDING_E 1 1895413 1895413 452.2014 < 2.2e-16 ***
## TEAM_FIELDING_DP 1 318290 318290 75.9366 < 2.2e-16 ***
## Singles 1 192910 192910 46.0238 1.529e-11 ***
## Residuals 2024 8483646 4192
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
resids <- (abs(fit_4$residuals)**(5/7))
my_rse <- sqrt(sum(resids^2)/2024)
my_rse## [1] 18.12186
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
## This is mgcv 1.8-22. For overview type 'help("mgcv-package")'.
##
## Attaching package: 'mgcv'
## The following object is masked from 'package:pracma':
##
## magic
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(TEAM_BASERUN_CS) + s(TEAM_BASERUN_SB) + s(TEAM_BATTING_2B) +
## s(TEAM_BATTING_3B) + s(TEAM_BATTING_BB) + s(TEAM_BATTING_HR) +
## s(TEAM_FIELDING_DP) + s(TEAM_FIELDING_E) + s(slugging) +
## s(OBP)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 80.4784 0.2363 340.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(TEAM_BASERUN_CS) 3.5470 4.4214 3.472 0.006372 **
## s(TEAM_BASERUN_SB) 3.7776 4.7323 36.190 < 2e-16 ***
## s(TEAM_BATTING_2B) 5.5072 6.6699 13.945 < 2e-16 ***
## s(TEAM_BATTING_3B) 5.7467 6.8703 5.092 1.73e-05 ***
## s(TEAM_BATTING_BB) 3.0675 3.9436 3.334 0.010085 *
## s(TEAM_BATTING_HR) 0.7544 0.7544 11.195 0.003699 **
## s(TEAM_FIELDING_DP) 2.5068 3.2147 27.101 < 2e-16 ***
## s(TEAM_FIELDING_E) 6.3478 7.4980 58.282 < 2e-16 ***
## s(slugging) 6.2073 7.3527 3.737 0.000418 ***
## s(OBP) 5.6877 6.8884 2.911 0.005187 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Rank: 90/91
## R-sq.(adj) = 0.411 Deviance explained = 42.3%
## GCV = 116.07 Scale est. = 113.55 n = 2034
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(TEAM_BASERUN_CS) + s(TEAM_BASERUN_SB) + s(TEAM_BATTING_2B) +
## s(TEAM_BATTING_3B) + s(TEAM_BATTING_BB) + s(TEAM_BATTING_HR) +
## s(TEAM_FIELDING_DP) + s(TEAM_FIELDING_E) + s(slugging) +
## s(OBP)
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(TEAM_BASERUN_CS) 3.5470 4.4214 3.472 0.006372
## s(TEAM_BASERUN_SB) 3.7776 4.7323 36.190 < 2e-16
## s(TEAM_BATTING_2B) 5.5072 6.6699 13.945 < 2e-16
## s(TEAM_BATTING_3B) 5.7467 6.8703 5.092 1.73e-05
## s(TEAM_BATTING_BB) 3.0675 3.9436 3.334 0.010085
## s(TEAM_BATTING_HR) 0.7544 0.7544 11.195 0.003699
## s(TEAM_FIELDING_DP) 2.5068 3.2147 27.101 < 2e-16
## s(TEAM_FIELDING_E) 6.3478 7.4980 58.282 < 2e-16
## s(slugging) 6.2073 7.3527 3.737 0.000418
## s(OBP) 5.6877 6.8884 2.911 0.005187
| x |
|---|
| 10.5656 |
##
## Call:
## lm(formula = y_na ~ ., data = fit_6_db)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.9356 -6.5508 -0.0419 6.6782 30.0121
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 61.320410 6.649090 9.222 < 2e-16 ***
## TEAM_BATTING_2B -0.036499 0.006967 -5.239 1.85e-07 ***
## TEAM_BATTING_3B 0.205953 0.021152 9.737 < 2e-16 ***
## TEAM_BATTING_HR 0.129672 0.008349 15.531 < 2e-16 ***
## TEAM_BATTING_BB 0.037615 0.003402 11.055 < 2e-16 ***
## TEAM_BATTING_SO -0.021573 0.002431 -8.873 < 2e-16 ***
## TEAM_BASERUN_SB 0.050535 0.006336 7.976 3.04e-15 ***
## TEAM_FIELDING_E -0.152176 0.009680 -15.721 < 2e-16 ***
## TEAM_FIELDING_DP -0.116077 0.013213 -8.785 < 2e-16 ***
## Singles 0.034687 0.004714 7.358 3.09e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.549 on 1458 degrees of freedom
## Multiple R-squared: 0.4317, Adjusted R-squared: 0.4282
## F-statistic: 123.1 on 9 and 1458 DF, p-value: < 2.2e-16
| Â | Â | Model 1A | Â | Model 1B | Â | Model 2(moneyball stats) | Â | Model 3(stepaic) | Â | Model 6(Dropped NA observfations | ||||||||||
| Â | Â | B | CI | p | Â | B | CI | p | Â | B | CI | p | Â | B | CI | p | Â | B | CI | p |
| (Intercept) |  | 46.43 | 35.06 – 57.81 | <.001 |  | 51.52 | 40.20 – 62.84 | <.001 |  | 19.28 | 12.19 – 26.38 | <.001 |  | 51.48 | 40.02 – 62.94 | <.001 |  | 61.32 | 48.28 – 74.36 | <.001 |
| TEAM_BATTING_3B |  | 0.19 | 0.16 – 0.22 | <.001 |  | 0.19 | 0.15 – 0.22 | <.001 |  |  |  | 0.20 | 0.16 – 0.23 | <.001 |  | 0.21 | 0.16 – 0.25 | <.001 | ||
| TEAM_BATTING_HR |  | 0.12 | 0.10 – 0.13 | <.001 |  | 0.11 | 0.10 – 0.13 | <.001 |  |  |  | 0.11 | 0.10 – 0.13 | <.001 |  | 0.13 | 0.11 – 0.15 | <.001 | ||
| TEAM_BATTING_BB |  | 0.03 | 0.02 – 0.04 | <.001 |  | 0.03 | 0.02 – 0.04 | <.001 |  |  |  | 0.03 | 0.02 – 0.04 | <.001 |  | 0.04 | 0.03 – 0.04 | <.001 | ||
| TEAM_BATTING_SO |  | -0.02 | -0.02 – -0.01 | <.001 |  | -0.02 | -0.02 – -0.01 | <.001 |  |  |  | -0.02 | -0.02 – -0.01 | <.001 |  | -0.02 | -0.03 – -0.02 | <.001 | ||
| TEAM_BASERUN_SB |  | 0.08 | 0.07 – 0.09 | <.001 |  | 0.09 | 0.08 – 0.10 | <.001 |  | 0.07 | 0.06 – 0.08 | <.001 |  | 0.09 | 0.08 – 0.10 | <.001 |  | 0.05 | 0.04 – 0.06 | <.001 |
| TEAM_FIELDING_E |  | -0.08 | -0.09 – -0.07 | <.001 |  | -0.08 | -0.09 – -0.08 | <.001 |  | -0.05 | -0.06 – -0.05 | <.001 |  | -0.09 | -0.09 – -0.08 | <.001 |  | -0.15 | -0.17 – -0.13 | <.001 |
| TEAM_FIELDING_DP |  | -0.11 | -0.14 – -0.09 | <.001 |  | -0.11 | -0.13 – -0.09 | <.001 |  | -0.11 | -0.13 – -0.08 | <.001 |  | -0.11 | -0.13 – -0.08 | <.001 |  | -0.12 | -0.14 – -0.09 | <.001 |
| Singles |  | 0.03 | 0.02 – 0.04 | <.001 |  | 0.03 | 0.02 – 0.04 | <.001 |  |  |  | 0.03 | 0.02 – 0.04 | <.001 |  | 0.03 | 0.03 – 0.04 | <.001 | ||
| TEAM_BASERUN_CS |  |  |  | -0.08 | -0.11 – -0.05 | <.001 |  | -0.05 | -0.08 – -0.02 | <.001 |  | -0.08 | -0.11 – -0.05 | <.001 |  |  | ||||
| slugging |  |  |  |  |  | 0.00 | 0.00 – 0.01 | .040 |  |  |  |  | ||||||||
| OBP |  |  |  |  |  | 0.04 | 0.03 – 0.04 | <.001 |  |  |  |  | ||||||||
| TEAM_BATTING_2B |  |  |  |  |  |  |  | -0.01 | -0.02 – 0.00 | .143 |  | -0.04 | -0.05 – -0.02 | <.001 | ||||||
| Observations | Â | 2034 | Â | 2029 | Â | 2034 | Â | 2034 | Â | 1468 | ||||||||||
| R2 / adj. R2 | Â | .370 / .368 | Â | .385 / .383 | Â | .310 / .308 | Â | .380 / .377 | Â | .432 / .428 | ||||||||||
## Warning in predict.lm(fit_1, test_df): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(fit_4, test_df): prediction from a rank-deficient fit
## may be misleading
## [1] "TEAM_BATTING_2B" "TEAM_BATTING_3B" "TEAM_BATTING_HR"
## [4] "TEAM_BATTING_BB" "TEAM_BATTING_SO" "TEAM_BASERUN_SB"
## [7] "TEAM_BASERUN_CS" "TEAM_FIELDING_E" "TEAM_FIELDING_DP"
## [10] "Singles" "slugging" "OBP"
## Warning in predict.lm(fit_6, test_df): prediction from a rank-deficient fit
## may be misleading
| V1 | |
|---|---|
| model_1_rmse | my RMSE for model 1A is 10.5681339038901 |
| model_2_rmse | my RMSE for model 2 is 11.2939527952715 |
| model_4_rmse | my RMSE for model 4 is 10.7346911574377 |
| model_5_rmse | my RMSE for model 5 is 10.8498621720078 |
| model_6_rmse | my RMSE for model 6 is 9.30217646362863 |
## [1] "TEAM_BATTING_2B" "TEAM_BATTING_3B" "TEAM_BATTING_HR"
## [4] "TEAM_BATTING_BB" "TEAM_BATTING_SO" "TEAM_BASERUN_SB"
## [7] "TEAM_FIELDING_E" "TEAM_FIELDING_DP" "Singles"
## Start: AIC=6618.11
## y_na ~ TEAM_BATTING_2B + TEAM_BATTING_3B + TEAM_BATTING_HR +
## TEAM_BATTING_BB + TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_FIELDING_E +
## TEAM_FIELDING_DP + Singles + I(TEAM_FIELDING_DP^2) + I(TEAM_FIELDING_E^2) +
## I(TEAM_BATTING_2B^2) + I(TEAM_BATTING_3B^2) + I(TEAM_BATTING_HR^2) +
## I(TEAM_BASERUN_SB^2) + I(TEAM_BATTING_SO^2) + I(TEAM_BATTING_BB^2) +
## I(TEAM_FIELDING_DP^3) + I(TEAM_BATTING_2B^3) + I(TEAM_BATTING_3B^3) +
## I(TEAM_BATTING_SO^3) + I(TEAM_BATTING_BB^3)
##
## Df Sum of Sq RSS AIC
## - I(TEAM_FIELDING_DP^3) 1 0.1 129130 6616.1
## - I(TEAM_FIELDING_DP^2) 1 5.5 129135 6616.2
## - I(TEAM_FIELDING_E^2) 1 12.4 129142 6616.3
## - I(TEAM_BASERUN_SB^2) 1 18.8 129148 6616.3
## - TEAM_FIELDING_DP 1 30.8 129160 6616.5
## - TEAM_BATTING_3B 1 42.6 129172 6616.6
## - I(TEAM_BATTING_BB^3) 1 75.7 129205 6617.0
## - TEAM_BATTING_BB 1 77.9 129207 6617.0
## - I(TEAM_BATTING_3B^3) 1 78.5 129208 6617.0
## - I(TEAM_BATTING_BB^2) 1 93.6 129223 6617.2
## - I(TEAM_BATTING_SO^2) 1 98.3 129228 6617.2
## - TEAM_BATTING_SO 1 130.2 129260 6617.6
## - I(TEAM_BATTING_HR^2) 1 133.4 129263 6617.6
## - I(TEAM_BATTING_SO^3) 1 152.6 129282 6617.8
## <none> 129129 6618.1
## - I(TEAM_BATTING_3B^2) 1 178.5 129308 6618.1
## - TEAM_BASERUN_SB 1 270.9 129400 6619.2
## - TEAM_BATTING_2B 1 527.9 129657 6622.1
## - I(TEAM_BATTING_2B^2) 1 654.3 129784 6623.5
## - I(TEAM_BATTING_2B^3) 1 746.9 129876 6624.6
## - TEAM_FIELDING_E 1 1770.2 130900 6636.1
## - TEAM_BATTING_HR 1 2546.7 131676 6644.8
## - Singles 1 4427.8 133557 6665.6
##
## Step: AIC=6616.11
## y_na ~ TEAM_BATTING_2B + TEAM_BATTING_3B + TEAM_BATTING_HR +
## TEAM_BATTING_BB + TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_FIELDING_E +
## TEAM_FIELDING_DP + Singles + I(TEAM_FIELDING_DP^2) + I(TEAM_FIELDING_E^2) +
## I(TEAM_BATTING_2B^2) + I(TEAM_BATTING_3B^2) + I(TEAM_BATTING_HR^2) +
## I(TEAM_BASERUN_SB^2) + I(TEAM_BATTING_SO^2) + I(TEAM_BATTING_BB^2) +
## I(TEAM_BATTING_2B^3) + I(TEAM_BATTING_3B^3) + I(TEAM_BATTING_SO^3) +
## I(TEAM_BATTING_BB^3)
##
## Df Sum of Sq RSS AIC
## - I(TEAM_FIELDING_E^2) 1 12.6 129142 6614.3
## - I(TEAM_BASERUN_SB^2) 1 18.8 129148 6614.3
## - TEAM_BATTING_3B 1 42.9 129172 6614.6
## - I(TEAM_BATTING_BB^3) 1 76.0 129206 6615.0
## - TEAM_BATTING_BB 1 78.0 129208 6615.0
## - I(TEAM_BATTING_3B^3) 1 78.7 129208 6615.0
## - I(TEAM_BATTING_BB^2) 1 93.9 129223 6615.2
## - I(TEAM_BATTING_SO^2) 1 98.4 129228 6615.2
## - TEAM_BATTING_SO 1 130.6 129260 6615.6
## - I(TEAM_BATTING_HR^2) 1 134.0 129264 6615.6
## - I(TEAM_BATTING_SO^3) 1 152.7 129282 6615.8
## <none> 129130 6616.1
## - I(TEAM_BATTING_3B^2) 1 179.1 129309 6616.1
## - TEAM_BASERUN_SB 1 271.1 129401 6617.2
## - TEAM_BATTING_2B 1 527.8 129657 6620.1
## - I(TEAM_BATTING_2B^2) 1 654.3 129784 6621.5
## - I(TEAM_BATTING_2B^3) 1 746.9 129876 6622.6
## - I(TEAM_FIELDING_DP^2) 1 747.1 129877 6622.6
## - TEAM_FIELDING_DP 1 1225.2 130355 6628.0
## - TEAM_FIELDING_E 1 1774.6 130904 6634.1
## - TEAM_BATTING_HR 1 2570.7 131700 6643.0
## - Singles 1 4430.0 133560 6663.6
##
## Step: AIC=6614.25
## y_na ~ TEAM_BATTING_2B + TEAM_BATTING_3B + TEAM_BATTING_HR +
## TEAM_BATTING_BB + TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_FIELDING_E +
## TEAM_FIELDING_DP + Singles + I(TEAM_FIELDING_DP^2) + I(TEAM_BATTING_2B^2) +
## I(TEAM_BATTING_3B^2) + I(TEAM_BATTING_HR^2) + I(TEAM_BASERUN_SB^2) +
## I(TEAM_BATTING_SO^2) + I(TEAM_BATTING_BB^2) + I(TEAM_BATTING_2B^3) +
## I(TEAM_BATTING_3B^3) + I(TEAM_BATTING_SO^3) + I(TEAM_BATTING_BB^3)
##
## Df Sum of Sq RSS AIC
## - I(TEAM_BASERUN_SB^2) 1 18.9 129161 6612.5
## - TEAM_BATTING_3B 1 48.0 129190 6612.8
## - I(TEAM_BATTING_BB^3) 1 80.2 129222 6613.2
## - TEAM_BATTING_BB 1 82.7 129225 6613.2
## - I(TEAM_BATTING_3B^3) 1 84.3 129226 6613.2
## - I(TEAM_BATTING_SO^2) 1 95.8 129238 6613.3
## - I(TEAM_BATTING_BB^2) 1 98.8 129241 6613.4
## - I(TEAM_BATTING_HR^2) 1 122.1 129264 6613.6
## - TEAM_BATTING_SO 1 126.2 129268 6613.7
## - I(TEAM_BATTING_SO^3) 1 150.7 129293 6614.0
## <none> 129142 6614.3
## - I(TEAM_BATTING_3B^2) 1 189.4 129331 6614.4
## - TEAM_BASERUN_SB 1 272.3 129414 6615.3
## - TEAM_BATTING_2B 1 517.7 129660 6618.1
## - I(TEAM_BATTING_2B^2) 1 643.3 129785 6619.5
## - I(TEAM_BATTING_2B^3) 1 735.6 129878 6620.6
## - I(TEAM_FIELDING_DP^2) 1 757.1 129899 6620.8
## - TEAM_FIELDING_DP 1 1240.9 130383 6626.3
## - TEAM_BATTING_HR 1 2623.0 131765 6641.8
## - Singles 1 4435.9 133578 6661.8
## - TEAM_FIELDING_E 1 20166.9 149309 6825.3
##
## Step: AIC=6612.47
## y_na ~ TEAM_BATTING_2B + TEAM_BATTING_3B + TEAM_BATTING_HR +
## TEAM_BATTING_BB + TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_FIELDING_E +
## TEAM_FIELDING_DP + Singles + I(TEAM_FIELDING_DP^2) + I(TEAM_BATTING_2B^2) +
## I(TEAM_BATTING_3B^2) + I(TEAM_BATTING_HR^2) + I(TEAM_BATTING_SO^2) +
## I(TEAM_BATTING_BB^2) + I(TEAM_BATTING_2B^3) + I(TEAM_BATTING_3B^3) +
## I(TEAM_BATTING_SO^3) + I(TEAM_BATTING_BB^3)
##
## Df Sum of Sq RSS AIC
## - TEAM_BATTING_3B 1 45.3 129206 6611.0
## - I(TEAM_BATTING_BB^3) 1 78.4 129239 6611.4
## - TEAM_BATTING_BB 1 80.8 129242 6611.4
## - I(TEAM_BATTING_3B^3) 1 83.4 129244 6611.4
## - I(TEAM_BATTING_SO^2) 1 88.7 129250 6611.5
## - I(TEAM_BATTING_BB^2) 1 96.8 129258 6611.6
## - TEAM_BATTING_SO 1 118.2 129279 6611.8
## - I(TEAM_BATTING_HR^2) 1 120.2 129281 6611.8
## - I(TEAM_BATTING_SO^3) 1 142.4 129303 6612.1
## <none> 129161 6612.5
## - I(TEAM_BATTING_3B^2) 1 186.6 129348 6612.6
## - TEAM_BATTING_2B 1 521.5 129683 6616.4
## - I(TEAM_BATTING_2B^2) 1 650.1 129811 6617.8
## - I(TEAM_BATTING_2B^3) 1 745.2 129906 6618.9
## - I(TEAM_FIELDING_DP^2) 1 764.3 129925 6619.1
## - TEAM_FIELDING_DP 1 1248.6 130410 6624.6
## - TEAM_BATTING_HR 1 2616.3 131777 6639.9
## - Singles 1 4420.7 133582 6659.9
## - TEAM_BASERUN_SB 1 5191.0 134352 6668.3
## - TEAM_FIELDING_E 1 20163.8 149325 6823.4
##
## Step: AIC=6610.98
## y_na ~ TEAM_BATTING_2B + TEAM_BATTING_HR + TEAM_BATTING_BB +
## TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_FIELDING_E + TEAM_FIELDING_DP +
## Singles + I(TEAM_FIELDING_DP^2) + I(TEAM_BATTING_2B^2) +
## I(TEAM_BATTING_3B^2) + I(TEAM_BATTING_HR^2) + I(TEAM_BATTING_SO^2) +
## I(TEAM_BATTING_BB^2) + I(TEAM_BATTING_2B^3) + I(TEAM_BATTING_3B^3) +
## I(TEAM_BATTING_SO^3) + I(TEAM_BATTING_BB^3)
##
## Df Sum of Sq RSS AIC
## - I(TEAM_BATTING_3B^3) 1 68.7 129275 6609.8
## - I(TEAM_BATTING_BB^3) 1 80.3 129287 6609.9
## - TEAM_BATTING_BB 1 82.0 129288 6609.9
## - I(TEAM_BATTING_SO^2) 1 86.2 129293 6610.0
## - I(TEAM_BATTING_BB^2) 1 98.6 129305 6610.1
## - I(TEAM_BATTING_HR^2) 1 107.0 129313 6610.2
## - TEAM_BATTING_SO 1 117.3 129324 6610.3
## - I(TEAM_BATTING_SO^3) 1 137.6 129344 6610.5
## <none> 129206 6611.0
## - TEAM_BATTING_2B 1 495.9 129702 6614.6
## - I(TEAM_BATTING_2B^2) 1 622.0 129828 6616.0
## - I(TEAM_BATTING_2B^3) 1 715.9 129922 6617.1
## - I(TEAM_FIELDING_DP^2) 1 772.1 129978 6617.7
## - TEAM_FIELDING_DP 1 1260.1 130466 6623.2
## - I(TEAM_BATTING_3B^2) 1 1274.2 130481 6623.4
## - TEAM_BATTING_HR 1 2576.1 131783 6638.0
## - Singles 1 4429.0 133635 6658.5
## - TEAM_BASERUN_SB 1 5169.8 134376 6666.6
## - TEAM_FIELDING_E 1 20266.4 149473 6822.9
##
## Step: AIC=6609.77
## y_na ~ TEAM_BATTING_2B + TEAM_BATTING_HR + TEAM_BATTING_BB +
## TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_FIELDING_E + TEAM_FIELDING_DP +
## Singles + I(TEAM_FIELDING_DP^2) + I(TEAM_BATTING_2B^2) +
## I(TEAM_BATTING_3B^2) + I(TEAM_BATTING_HR^2) + I(TEAM_BATTING_SO^2) +
## I(TEAM_BATTING_BB^2) + I(TEAM_BATTING_2B^3) + I(TEAM_BATTING_SO^3) +
## I(TEAM_BATTING_BB^3)
##
## Df Sum of Sq RSS AIC
## - I(TEAM_BATTING_SO^2) 1 60.4 129335 6608.5
## - I(TEAM_BATTING_BB^3) 1 71.9 129347 6608.6
## - TEAM_BATTING_BB 1 73.1 129348 6608.6
## - TEAM_BATTING_SO 1 86.0 129361 6608.7
## - I(TEAM_BATTING_BB^2) 1 89.2 129364 6608.8
## - I(TEAM_BATTING_SO^3) 1 107.8 129383 6609.0
## - I(TEAM_BATTING_HR^2) 1 109.1 129384 6609.0
## <none> 129275 6609.8
## - TEAM_BATTING_2B 1 487.0 129762 6613.3
## - I(TEAM_BATTING_2B^2) 1 609.8 129885 6614.7
## - I(TEAM_BATTING_2B^3) 1 701.4 129976 6615.7
## - I(TEAM_FIELDING_DP^2) 1 803.7 130079 6616.9
## - TEAM_FIELDING_DP 1 1298.0 130573 6622.4
## - TEAM_BATTING_HR 1 2551.5 131827 6636.5
## - Singles 1 4466.1 133741 6657.6
## - TEAM_BASERUN_SB 1 5109.7 134385 6664.7
## - I(TEAM_BATTING_3B^2) 1 9093.5 138369 6707.6
## - TEAM_FIELDING_E 1 20311.1 149586 6822.0
##
## Step: AIC=6608.45
## y_na ~ TEAM_BATTING_2B + TEAM_BATTING_HR + TEAM_BATTING_BB +
## TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_FIELDING_E + TEAM_FIELDING_DP +
## Singles + I(TEAM_FIELDING_DP^2) + I(TEAM_BATTING_2B^2) +
## I(TEAM_BATTING_3B^2) + I(TEAM_BATTING_HR^2) + I(TEAM_BATTING_BB^2) +
## I(TEAM_BATTING_2B^3) + I(TEAM_BATTING_SO^3) + I(TEAM_BATTING_BB^3)
##
## Df Sum of Sq RSS AIC
## - I(TEAM_BATTING_BB^3) 1 72.9 129408 6607.3
## - TEAM_BATTING_BB 1 74.5 129410 6607.3
## - I(TEAM_BATTING_BB^2) 1 90.4 129426 6607.5
## - I(TEAM_BATTING_HR^2) 1 92.8 129428 6607.5
## - TEAM_BATTING_SO 1 98.3 129434 6607.6
## <none> 129335 6608.5
## - I(TEAM_BATTING_SO^3) 1 436.6 129772 6611.4
## - TEAM_BATTING_2B 1 476.9 129812 6611.9
## - I(TEAM_BATTING_2B^2) 1 600.3 129936 6613.2
## - I(TEAM_BATTING_2B^3) 1 693.6 130029 6614.3
## - I(TEAM_FIELDING_DP^2) 1 780.4 130116 6615.3
## - TEAM_FIELDING_DP 1 1272.2 130608 6620.8
## - TEAM_BATTING_HR 1 2494.5 131830 6634.5
## - Singles 1 4640.8 133976 6658.2
## - TEAM_BASERUN_SB 1 5153.1 134489 6663.8
## - I(TEAM_BATTING_3B^2) 1 9556.1 138892 6711.1
## - TEAM_FIELDING_E 1 20728.2 150064 6824.7
##
## Step: AIC=6607.28
## y_na ~ TEAM_BATTING_2B + TEAM_BATTING_HR + TEAM_BATTING_BB +
## TEAM_BATTING_SO + TEAM_BASERUN_SB + TEAM_FIELDING_E + TEAM_FIELDING_DP +
## Singles + I(TEAM_FIELDING_DP^2) + I(TEAM_BATTING_2B^2) +
## I(TEAM_BATTING_3B^2) + I(TEAM_BATTING_HR^2) + I(TEAM_BATTING_BB^2) +
## I(TEAM_BATTING_2B^3) + I(TEAM_BATTING_SO^3)
##
## Df Sum of Sq RSS AIC
## - TEAM_BATTING_BB 1 1.6 129410 6605.3
## - I(TEAM_BATTING_HR^2) 1 78.1 129486 6606.2
## - TEAM_BATTING_SO 1 91.3 129500 6606.3
## - I(TEAM_BATTING_BB^2) 1 158.3 129567 6607.1
## <none> 129408 6607.3
## - I(TEAM_BATTING_SO^3) 1 445.8 129854 6610.3
## - TEAM_BATTING_2B 1 482.2 129891 6610.7
## - I(TEAM_BATTING_2B^2) 1 607.5 130016 6612.2
## - I(TEAM_BATTING_2B^3) 1 702.5 130111 6613.2
## - I(TEAM_FIELDING_DP^2) 1 752.7 130161 6613.8
## - TEAM_FIELDING_DP 1 1239.5 130648 6619.3
## - TEAM_BATTING_HR 1 2441.5 131850 6632.7
## - Singles 1 4717.7 134126 6657.8
## - TEAM_BASERUN_SB 1 5153.9 134562 6662.6
## - I(TEAM_BATTING_3B^2) 1 9645.0 139053 6710.8
## - TEAM_FIELDING_E 1 20658.3 150067 6822.7
##
## Step: AIC=6605.3
## y_na ~ TEAM_BATTING_2B + TEAM_BATTING_HR + TEAM_BATTING_SO +
## TEAM_BASERUN_SB + TEAM_FIELDING_E + TEAM_FIELDING_DP + Singles +
## I(TEAM_FIELDING_DP^2) + I(TEAM_BATTING_2B^2) + I(TEAM_BATTING_3B^2) +
## I(TEAM_BATTING_HR^2) + I(TEAM_BATTING_BB^2) + I(TEAM_BATTING_2B^3) +
## I(TEAM_BATTING_SO^3)
##
## Df Sum of Sq RSS AIC
## - I(TEAM_BATTING_HR^2) 1 77.5 129488 6604.2
## - TEAM_BATTING_SO 1 91.7 129502 6604.3
## <none> 129410 6605.3
## - I(TEAM_BATTING_SO^3) 1 444.8 129855 6608.3
## - TEAM_BATTING_2B 1 480.6 129891 6608.7
## - I(TEAM_BATTING_2B^2) 1 605.9 130016 6610.2
## - I(TEAM_BATTING_2B^3) 1 701.0 130111 6611.2
## - I(TEAM_FIELDING_DP^2) 1 763.2 130173 6611.9
## - TEAM_FIELDING_DP 1 1254.1 130664 6617.5
## - TEAM_BATTING_HR 1 2439.8 131850 6630.7
## - Singles 1 4735.5 134145 6656.1
## - TEAM_BASERUN_SB 1 5175.7 134586 6660.9
## - I(TEAM_BATTING_3B^2) 1 9702.9 139113 6709.4
## - I(TEAM_BATTING_BB^2) 1 11033.1 140443 6723.4
## - TEAM_FIELDING_E 1 20667.1 150077 6820.8
##
## Step: AIC=6604.18
## y_na ~ TEAM_BATTING_2B + TEAM_BATTING_HR + TEAM_BATTING_SO +
## TEAM_BASERUN_SB + TEAM_FIELDING_E + TEAM_FIELDING_DP + Singles +
## I(TEAM_FIELDING_DP^2) + I(TEAM_BATTING_2B^2) + I(TEAM_BATTING_3B^2) +
## I(TEAM_BATTING_BB^2) + I(TEAM_BATTING_2B^3) + I(TEAM_BATTING_SO^3)
##
## Df Sum of Sq RSS AIC
## - TEAM_BATTING_SO 1 48.8 129536 6602.7
## <none> 129488 6604.2
## - TEAM_BATTING_2B 1 499.7 129987 6607.8
## - I(TEAM_BATTING_2B^2) 1 621.6 130109 6609.2
## - I(TEAM_BATTING_SO^3) 1 647.1 130135 6609.5
## - I(TEAM_BATTING_2B^3) 1 711.6 130199 6610.2
## - I(TEAM_FIELDING_DP^2) 1 740.3 130228 6610.5
## - TEAM_FIELDING_DP 1 1223.1 130711 6616.0
## - Singles 1 4844.5 134332 6656.1
## - TEAM_BASERUN_SB 1 5115.0 134603 6659.0
## - I(TEAM_BATTING_3B^2) 1 9697.3 139185 6708.2
## - I(TEAM_BATTING_BB^2) 1 11102.5 140590 6722.9
## - TEAM_BATTING_HR 1 19521.0 149009 6808.3
## - TEAM_FIELDING_E 1 21631.4 151119 6829.0
##
## Step: AIC=6602.73
## y_na ~ TEAM_BATTING_2B + TEAM_BATTING_HR + TEAM_BASERUN_SB +
## TEAM_FIELDING_E + TEAM_FIELDING_DP + Singles + I(TEAM_FIELDING_DP^2) +
## I(TEAM_BATTING_2B^2) + I(TEAM_BATTING_3B^2) + I(TEAM_BATTING_BB^2) +
## I(TEAM_BATTING_2B^3) + I(TEAM_BATTING_SO^3)
##
## Df Sum of Sq RSS AIC
## <none> 129536 6602.7
## - TEAM_BATTING_2B 1 494.8 130031 6606.3
## - I(TEAM_BATTING_2B^2) 1 613.5 130150 6607.7
## - I(TEAM_BATTING_2B^3) 1 701.9 130238 6608.7
## - I(TEAM_FIELDING_DP^2) 1 740.6 130277 6609.1
## - TEAM_FIELDING_DP 1 1221.8 130758 6614.5
## - TEAM_BASERUN_SB 1 5230.3 134767 6658.8
## - Singles 1 5385.6 134922 6660.5
## - I(TEAM_BATTING_SO^3) 1 7680.9 137217 6685.3
## - I(TEAM_BATTING_BB^2) 1 11652.2 141189 6727.2
## - I(TEAM_BATTING_3B^2) 1 12051.5 141588 6731.3
## - TEAM_BATTING_HR 1 20707.0 150243 6818.4
## - TEAM_FIELDING_E 1 22289.2 151826 6833.8
##
## Call:
## lm(formula = y_na ~ TEAM_BATTING_2B + TEAM_BATTING_HR + TEAM_BASERUN_SB +
## TEAM_FIELDING_E + TEAM_FIELDING_DP + Singles + I(TEAM_FIELDING_DP^2) +
## I(TEAM_BATTING_2B^2) + I(TEAM_BATTING_3B^2) + I(TEAM_BATTING_BB^2) +
## I(TEAM_BATTING_2B^3) + I(TEAM_BATTING_SO^3), data = fit_6_db)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.5933 -6.6402 0.0684 6.3734 27.2096
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.226e+01 3.658e+01 0.335 0.73768
## TEAM_BATTING_2B 9.898e-01 4.198e-01 2.358 0.01853 *
## TEAM_BATTING_HR 1.221e-01 8.003e-03 15.251 < 2e-16 ***
## TEAM_BASERUN_SB 4.770e-02 6.224e-03 7.665 3.26e-14 ***
## TEAM_FIELDING_E -1.510e-01 9.542e-03 -15.823 < 2e-16 ***
## TEAM_FIELDING_DP -4.888e-01 1.319e-01 -3.705 0.00022 ***
## Singles 3.541e-02 4.552e-03 7.778 1.39e-14 ***
## I(TEAM_FIELDING_DP^2) 1.217e-03 4.221e-04 2.884 0.00398 **
## I(TEAM_BATTING_2B^2) -4.315e-03 1.644e-03 -2.625 0.00875 **
## I(TEAM_BATTING_3B^2) 2.067e-03 1.776e-04 11.635 < 2e-16 ***
## I(TEAM_BATTING_BB^2) 3.424e-05 2.993e-06 11.440 < 2e-16 ***
## I(TEAM_BATTING_2B^3) 5.928e-06 2.111e-06 2.808 0.00506 **
## I(TEAM_BATTING_SO^3) -8.823e-09 9.499e-10 -9.288 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.435 on 1455 degrees of freedom
## Multiple R-squared: 0.4463, Adjusted R-squared: 0.4417
## F-statistic: 97.74 on 12 and 1455 DF, p-value: < 2.2e-16
y_na <- money_ball_train_2_withna$TARGET_WINS
fit_6_db<-money_ball_train_2_withna[,c(4,5,6,7,8,9,16,17,18)]
fit_6 <- lm(y_na~.,data=fit_6_db)
layout(matrix(c(1,2,3,4),2,2))
plot(fit_6)summary(fit_6)##
## Call:
## lm(formula = y_na ~ ., data = fit_6_db)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.9356 -6.5508 -0.0419 6.6782 30.0121
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 61.320410 6.649090 9.222 < 2e-16 ***
## TEAM_BATTING_2B -0.036499 0.006967 -5.239 1.85e-07 ***
## TEAM_BATTING_3B 0.205953 0.021152 9.737 < 2e-16 ***
## TEAM_BATTING_HR 0.129672 0.008349 15.531 < 2e-16 ***
## TEAM_BATTING_BB 0.037615 0.003402 11.055 < 2e-16 ***
## TEAM_BATTING_SO -0.021573 0.002431 -8.873 < 2e-16 ***
## TEAM_BASERUN_SB 0.050535 0.006336 7.976 3.04e-15 ***
## TEAM_FIELDING_E -0.152176 0.009680 -15.721 < 2e-16 ***
## TEAM_FIELDING_DP -0.116077 0.013213 -8.785 < 2e-16 ***
## Singles 0.034687 0.004714 7.358 3.09e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.549 on 1458 degrees of freedom
## Multiple R-squared: 0.4317, Adjusted R-squared: 0.4282
## F-statistic: 123.1 on 9 and 1458 DF, p-value: < 2.2e-16
my_names <- colnames(fit_6_db)
lm_target <-y_na
lm_inputs <- fit_6_db[,my_names]
train_df <- fit_6_db[,my_names]
test_df <- money_eval_na[,my_names]
My_prediction<- lm(y_na~.,data=train_df)
fit_6_predict <- predict(My_prediction,test_df)
kable(fit_6_predict)| x |
|---|
| 61.92662 |
| 68.76625 |
| 73.50705 |
| 82.07337 |
| NA |
| 56.05863 |
| 68.46613 |
| 69.58884 |
| 61.49032 |
| 83.05783 |
| 86.48455 |
| 83.95449 |
| 89.68919 |
| 77.04353 |
| 70.81106 |
| 76.53099 |
| NA |
| 81.54091 |
| 83.35541 |
| 82.27576 |
| 82.48616 |
| 70.57153 |
| 78.97134 |
| 85.16802 |
| 62.85810 |
| 77.37633 |
| 60.94855 |
| 89.83601 |
| 87.97425 |
| 85.99453 |
| 84.43743 |
| 80.46297 |
| 85.23486 |
| 76.45860 |
| 91.37220 |
| 82.66758 |
| 89.15680 |
| 80.11327 |
| 94.71830 |
| NA |
| NA |
| NA |
| 63.54624 |
| 57.46252 |
| 72.97738 |
| 71.23652 |
| 81.35044 |
| 73.02683 |
| 76.18881 |
| 67.52168 |
| 79.88642 |
| 75.41477 |
| 68.30099 |
| NA |
| NA |
| 86.36666 |
| 84.77333 |
| 87.73882 |
| 84.60453 |
| 84.63909 |
| NA |
| 45.69831 |
| 54.88904 |
| NA |
| 83.36602 |
| 88.77719 |
| 73.07122 |
| 81.97568 |
| 95.37573 |
| 70.48764 |
| 78.28982 |
| 89.16533 |
| 81.63855 |
| NA |
| NA |
| 77.04870 |
| 80.69682 |
| 69.98533 |
| 88.53044 |
| 80.19605 |
| 87.02073 |
| 87.22192 |
| 96.65479 |
| 88.90896 |
| NA |
| 52.29533 |
| NA |
| NA |
| NA |
| 88.55088 |
| 85.40561 |
| 86.06973 |
| 75.51747 |
| 70.47183 |
| 83.36415 |
| 89.45490 |
| 73.82390 |
| NA |
| 71.37230 |
| 88.87607 |
| NA |
| 87.76417 |
| 85.14594 |
| 94.35076 |
| 89.81292 |
| 78.36933 |
| 74.65148 |
| 84.12522 |
| 79.97949 |
| 67.60061 |
| NA |
| NA |
| NA |
| NA |
| NA |
| 63.91358 |
| 80.43352 |
| 83.83552 |
| 70.44251 |
| 88.06252 |
| 86.38444 |
| 79.09888 |
| 80.51968 |
| 71.21323 |
| 82.19965 |
| 93.76708 |
| 76.91460 |
| 75.89617 |
| 91.83577 |
| 80.48406 |
| 42.12529 |
| NA |
| 89.52964 |
| 64.65567 |
| 76.42108 |
| 74.66181 |
| 72.95899 |
| 80.68064 |
| 79.38578 |
| 83.17903 |
| 82.88350 |
| 79.46469 |
| 61.97677 |
| 80.18481 |
| 68.20260 |
| 89.11799 |
| NA |
| 63.29854 |
| NA |
| 103.43310 |
| 112.91436 |
| 98.60398 |
| 106.41409 |
| 101.46655 |
| 100.19181 |
| 88.90789 |
| 86.64024 |
| 71.29591 |
| 79.02157 |
| NA |
| 87.52662 |
| 79.49987 |
| 90.60211 |
| 73.87771 |
| 79.03048 |
| 84.91793 |
| 68.05868 |
| 75.84968 |
| 81.83688 |
| 93.67837 |
| 87.65334 |
| 88.02007 |
| 87.27270 |
| NA |
| NA |
| NA |
| NA |
| NA |
| 61.64871 |
| 68.20794 |
| 63.71984 |
| 57.09436 |
| 68.74322 |
| 95.96053 |
| 84.86653 |
| 86.29287 |
| 73.09883 |
| 81.95712 |
| 77.52093 |
| 91.91348 |
| 81.09405 |
| 87.96287 |
| 75.82657 |
| 76.27937 |
| NA |
| NA |
| NA |
| 82.10810 |
| 62.01272 |
| 68.55481 |
| 83.92376 |
| 79.48529 |
| 91.75034 |
| 75.61021 |
| 81.23918 |
| 71.99662 |
| 66.53253 |
| 75.63967 |
| 70.15693 |
| 80.57726 |
| 78.58580 |
| 76.14602 |
| 84.43725 |
| NA |
| NA |
| 93.54387 |
| 86.51571 |
| 89.00744 |
| 78.07078 |
| 75.66783 |
| 75.92058 |
| 81.84754 |
| NA |
| 63.15399 |
| 84.66570 |
| 91.44331 |
| 83.23063 |
| 78.66853 |
| 58.38997 |
| 83.12404 |
| 77.47833 |
| 82.07125 |
| 74.67415 |
| 83.06915 |
| 78.26561 |
| NA |
| 71.58667 |
| 80.62028 |
| 80.80553 |
| 81.20826 |
| 67.56517 |
```