#HW #1 Assignment - Moneyball Model
Overview In this homework assignment, you will explore, analyze and model a data set containing approximately 2200 records. Each record represents a professional baseball team from the years 1871 to 2006 inclusive. Each record has the performance of the team for the given year, with all of the statistics adjusted to match the performance of a 162 game season.
Your objective is to build a multiple linear regression model on the training data to predict the number of wins for the team. You can only use the variables given to you (or variables that you derive from the variables provided). Below is a short description of the variables of interest in the data set:
#install.packages('caret')
#install.packages('e1071', dependencies=TRUE)
library(knitr)
library(stringr)
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(reshape)
## Warning: package 'reshape' was built under R version 4.0.5
##
## Attaching package: 'reshape'
## The following object is masked from 'package:dplyr':
##
## rename
## The following objects are masked from 'package:tidyr':
##
## expand, smiths
library(corrgram)
## Warning: package 'corrgram' was built under R version 4.0.5
library(mice)
## Warning: package 'mice' was built under R version 4.0.5
##
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
##
## filter
## The following objects are masked from 'package:base':
##
## cbind, rbind
library(caret)
## Loading required package: lattice
##
## Attaching package: 'lattice'
## The following object is masked from 'package:corrgram':
##
## panel.fill
library(e1071)
## Warning: package 'e1071' was built under R version 4.0.4
#install.packages('Rcpp')
library(Rcpp)
## Warning: package 'Rcpp' was built under R version 4.0.5
#DATA EXPLORATION:
Load the data and understand the data by using some stats and plots.
mtd <- read.csv("https://raw.githubusercontent.com/yathdeep/msds-data621/main/moneyball-training-data.csv")
count(mtd)
## n
## 1 2276
names(mtd)
## [1] "INDEX" "TARGET_WINS" "TEAM_BATTING_H" "TEAM_BATTING_2B"
## [5] "TEAM_BATTING_3B" "TEAM_BATTING_HR" "TEAM_BATTING_BB" "TEAM_BATTING_SO"
## [9] "TEAM_BASERUN_SB" "TEAM_BASERUN_CS" "TEAM_BATTING_HBP" "TEAM_PITCHING_H"
## [13] "TEAM_PITCHING_HR" "TEAM_PITCHING_BB" "TEAM_PITCHING_SO" "TEAM_FIELDING_E"
## [17] "TEAM_FIELDING_DP"
summary(mtd)
## INDEX TARGET_WINS TEAM_BATTING_H TEAM_BATTING_2B
## Min. : 1.0 Min. : 0.00 Min. : 891 Min. : 69.0
## 1st Qu.: 630.8 1st Qu.: 71.00 1st Qu.:1383 1st Qu.:208.0
## Median :1270.5 Median : 82.00 Median :1454 Median :238.0
## Mean :1268.5 Mean : 80.79 Mean :1469 Mean :241.2
## 3rd Qu.:1915.5 3rd Qu.: 92.00 3rd Qu.:1537 3rd Qu.:273.0
## Max. :2535.0 Max. :146.00 Max. :2554 Max. :458.0
##
## TEAM_BATTING_3B TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 34.00 1st Qu.: 42.00 1st Qu.:451.0 1st Qu.: 548.0
## Median : 47.00 Median :102.00 Median :512.0 Median : 750.0
## Mean : 55.25 Mean : 99.61 Mean :501.6 Mean : 735.6
## 3rd Qu.: 72.00 3rd Qu.:147.00 3rd Qu.:580.0 3rd Qu.: 930.0
## Max. :223.00 Max. :264.00 Max. :878.0 Max. :1399.0
## NA's :102
## TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_H
## Min. : 0.0 Min. : 0.0 Min. :29.00 Min. : 1137
## 1st Qu.: 66.0 1st Qu.: 38.0 1st Qu.:50.50 1st Qu.: 1419
## Median :101.0 Median : 49.0 Median :58.00 Median : 1518
## Mean :124.8 Mean : 52.8 Mean :59.36 Mean : 1779
## 3rd Qu.:156.0 3rd Qu.: 62.0 3rd Qu.:67.00 3rd Qu.: 1682
## Max. :697.0 Max. :201.0 Max. :95.00 Max. :30132
## NA's :131 NA's :772 NA's :2085
## TEAM_PITCHING_HR TEAM_PITCHING_BB TEAM_PITCHING_SO TEAM_FIELDING_E
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 65.0
## 1st Qu.: 50.0 1st Qu.: 476.0 1st Qu.: 615.0 1st Qu.: 127.0
## Median :107.0 Median : 536.5 Median : 813.5 Median : 159.0
## Mean :105.7 Mean : 553.0 Mean : 817.7 Mean : 246.5
## 3rd Qu.:150.0 3rd Qu.: 611.0 3rd Qu.: 968.0 3rd Qu.: 249.2
## Max. :343.0 Max. :3645.0 Max. :19278.0 Max. :1898.0
## NA's :102
## TEAM_FIELDING_DP
## Min. : 52.0
## 1st Qu.:131.0
## Median :149.0
## Mean :146.4
## 3rd Qu.:164.0
## Max. :228.0
## NA's :286
The dataset consists of 17 elements, with 2276 total cases. There are multiple variables with missing (NA) values and TEAM-BATTING_HBP has the highest NAs.
Checking for outliers:
ggplot(stack(mtd), aes(x = ind, y = values)) +
geom_boxplot() +
coord_cartesian(ylim = c(0, 1000)) +
theme(legend.position="none") +
theme(axis.text.x=element_text(angle=45, hjust=1)) +
theme(panel.background = element_rect(fill = 'grey'))
## Warning: Removed 3478 rows containing non-finite values (stat_boxplot).
Checking for skewness in the data
mtd1 = melt(mtd)
## Using as id variables
ggplot(mtd1, aes(x= value)) +
geom_density(fill='red') + facet_wrap(~variable, scales = 'free')
## Warning: Removed 3478 rows containing non-finite values (stat_density).
As seen there are several variables that are skewed and also there are outliers.
Finding correlations:
mtd2 <- mtd[,-1 ]
names(mtd2)
## [1] "TARGET_WINS" "TEAM_BATTING_H" "TEAM_BATTING_2B" "TEAM_BATTING_3B"
## [5] "TEAM_BATTING_HR" "TEAM_BATTING_BB" "TEAM_BATTING_SO" "TEAM_BASERUN_SB"
## [9] "TEAM_BASERUN_CS" "TEAM_BATTING_HBP" "TEAM_PITCHING_H" "TEAM_PITCHING_HR"
## [13] "TEAM_PITCHING_BB" "TEAM_PITCHING_SO" "TEAM_FIELDING_E" "TEAM_FIELDING_DP"
cor(drop_na(mtd2))
## TARGET_WINS TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B
## TARGET_WINS 1.00000000 0.46994665 0.31298400 -0.12434586
## TEAM_BATTING_H 0.46994665 1.00000000 0.56177286 0.21391883
## TEAM_BATTING_2B 0.31298400 0.56177286 1.00000000 0.04203441
## TEAM_BATTING_3B -0.12434586 0.21391883 0.04203441 1.00000000
## TEAM_BATTING_HR 0.42241683 0.39627593 0.25099045 -0.21879927
## TEAM_BATTING_BB 0.46868793 0.19735234 0.19749256 -0.20584392
## TEAM_BATTING_SO -0.22889273 -0.34174328 -0.06415123 -0.19291841
## TEAM_BASERUN_SB 0.01483639 0.07167495 -0.18768279 0.16946086
## TEAM_BASERUN_CS -0.17875598 -0.09377545 -0.20413884 0.23213978
## TEAM_BATTING_HBP 0.07350424 -0.02911218 0.04608475 -0.17424715
## TEAM_PITCHING_H 0.47123431 0.99919269 0.56045355 0.21250322
## TEAM_PITCHING_HR 0.42246683 0.39495630 0.24999875 -0.21973263
## TEAM_PITCHING_BB 0.46839882 0.19529071 0.19592157 -0.20675383
## TEAM_PITCHING_SO -0.22936481 -0.34445001 -0.06616615 -0.19386654
## TEAM_FIELDING_E -0.38668800 -0.25381638 -0.19427027 -0.06513145
## TEAM_FIELDING_DP -0.19586601 0.01776946 -0.02488808 0.13314758
## TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO
## TARGET_WINS 0.42241683 0.46868793 -0.22889273
## TEAM_BATTING_H 0.39627593 0.19735234 -0.34174328
## TEAM_BATTING_2B 0.25099045 0.19749256 -0.06415123
## TEAM_BATTING_3B -0.21879927 -0.20584392 -0.19291841
## TEAM_BATTING_HR 1.00000000 0.45638161 0.21045444
## TEAM_BATTING_BB 0.45638161 1.00000000 0.21833871
## TEAM_BATTING_SO 0.21045444 0.21833871 1.00000000
## TEAM_BASERUN_SB -0.19021893 -0.08806123 -0.07475974
## TEAM_BASERUN_CS -0.27579838 -0.20878051 -0.05613035
## TEAM_BATTING_HBP 0.10618116 0.04746007 0.22094219
## TEAM_PITCHING_H 0.39549390 0.19848687 -0.34145321
## TEAM_PITCHING_HR 0.99993259 0.45659283 0.21111617
## TEAM_PITCHING_BB 0.45542468 0.99988140 0.21895783
## TEAM_PITCHING_SO 0.20829574 0.21793253 0.99976835
## TEAM_FIELDING_E 0.01567397 -0.07847126 0.30814540
## TEAM_FIELDING_DP -0.06182222 -0.07929078 -0.12319072
## TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP
## TARGET_WINS 0.01483639 -0.178755979 0.07350424
## TEAM_BATTING_H 0.07167495 -0.093775445 -0.02911218
## TEAM_BATTING_2B -0.18768279 -0.204138837 0.04608475
## TEAM_BATTING_3B 0.16946086 0.232139777 -0.17424715
## TEAM_BATTING_HR -0.19021893 -0.275798375 0.10618116
## TEAM_BATTING_BB -0.08806123 -0.208780510 0.04746007
## TEAM_BATTING_SO -0.07475974 -0.056130355 0.22094219
## TEAM_BASERUN_SB 1.00000000 0.624737808 -0.06400498
## TEAM_BASERUN_CS 0.62473781 1.000000000 -0.07051390
## TEAM_BATTING_HBP -0.06400498 -0.070513896 1.00000000
## TEAM_PITCHING_H 0.07395373 -0.092977893 -0.02769699
## TEAM_PITCHING_HR -0.18948057 -0.275471495 0.10675878
## TEAM_PITCHING_BB -0.08741902 -0.208470154 0.04785137
## TEAM_PITCHING_SO -0.07351325 -0.055308336 0.22157375
## TEAM_FIELDING_E 0.04292341 0.207701189 0.04178971
## TEAM_FIELDING_DP -0.13023054 -0.006764233 -0.07120824
## TEAM_PITCHING_H TEAM_PITCHING_HR TEAM_PITCHING_BB
## TARGET_WINS 0.47123431 0.42246683 0.46839882
## TEAM_BATTING_H 0.99919269 0.39495630 0.19529071
## TEAM_BATTING_2B 0.56045355 0.24999875 0.19592157
## TEAM_BATTING_3B 0.21250322 -0.21973263 -0.20675383
## TEAM_BATTING_HR 0.39549390 0.99993259 0.45542468
## TEAM_BATTING_BB 0.19848687 0.45659283 0.99988140
## TEAM_BATTING_SO -0.34145321 0.21111617 0.21895783
## TEAM_BASERUN_SB 0.07395373 -0.18948057 -0.08741902
## TEAM_BASERUN_CS -0.09297789 -0.27547150 -0.20847015
## TEAM_BATTING_HBP -0.02769699 0.10675878 0.04785137
## TEAM_PITCHING_H 1.00000000 0.39463199 0.19703302
## TEAM_PITCHING_HR 0.39463199 1.00000000 0.45580983
## TEAM_PITCHING_BB 0.19703302 0.45580983 1.00000000
## TEAM_PITCHING_SO -0.34330646 0.20920115 0.21887700
## TEAM_FIELDING_E -0.25073028 0.01689330 -0.07692315
## TEAM_FIELDING_DP 0.01416807 -0.06292475 -0.08040645
## TEAM_PITCHING_SO TEAM_FIELDING_E TEAM_FIELDING_DP
## TARGET_WINS -0.22936481 -0.38668800 -0.195866006
## TEAM_BATTING_H -0.34445001 -0.25381638 0.017769456
## TEAM_BATTING_2B -0.06616615 -0.19427027 -0.024888081
## TEAM_BATTING_3B -0.19386654 -0.06513145 0.133147578
## TEAM_BATTING_HR 0.20829574 0.01567397 -0.061822219
## TEAM_BATTING_BB 0.21793253 -0.07847126 -0.079290775
## TEAM_BATTING_SO 0.99976835 0.30814540 -0.123190715
## TEAM_BASERUN_SB -0.07351325 0.04292341 -0.130230537
## TEAM_BASERUN_CS -0.05530834 0.20770119 -0.006764233
## TEAM_BATTING_HBP 0.22157375 0.04178971 -0.071208241
## TEAM_PITCHING_H -0.34330646 -0.25073028 0.014168073
## TEAM_PITCHING_HR 0.20920115 0.01689330 -0.062924751
## TEAM_PITCHING_BB 0.21887700 -0.07692315 -0.080406452
## TEAM_PITCHING_SO 1.00000000 0.31008407 -0.124923213
## TEAM_FIELDING_E 0.31008407 1.00000000 0.040205814
## TEAM_FIELDING_DP -0.12492321 0.04020581 1.000000000
pairs.panels(mtd2[1:8])
pairs.panels(mtd2[9:16])
We can see there are some positively and some negatively correlated variables.
#DATA PREPARATION
Removing the variables:
mtd_f <- mtd[,-1 ]
names(mtd_f)
## [1] "TARGET_WINS" "TEAM_BATTING_H" "TEAM_BATTING_2B" "TEAM_BATTING_3B"
## [5] "TEAM_BATTING_HR" "TEAM_BATTING_BB" "TEAM_BATTING_SO" "TEAM_BASERUN_SB"
## [9] "TEAM_BASERUN_CS" "TEAM_BATTING_HBP" "TEAM_PITCHING_H" "TEAM_PITCHING_HR"
## [13] "TEAM_PITCHING_BB" "TEAM_PITCHING_SO" "TEAM_FIELDING_E" "TEAM_FIELDING_DP"
The variable TEAM_BATTING_HBP is having mostly missing values so the variable will be removed completely.
mtd_f <- mtd_f[,-10 ]
names(mtd_f )
## [1] "TARGET_WINS" "TEAM_BATTING_H" "TEAM_BATTING_2B" "TEAM_BATTING_3B"
## [5] "TEAM_BATTING_HR" "TEAM_BATTING_BB" "TEAM_BATTING_SO" "TEAM_BASERUN_SB"
## [9] "TEAM_BASERUN_CS" "TEAM_PITCHING_H" "TEAM_PITCHING_HR" "TEAM_PITCHING_BB"
## [13] "TEAM_PITCHING_SO" "TEAM_FIELDING_E" "TEAM_FIELDING_DP"
TEAM_PITCHING_HR and TEAM_BATTING_HR are highly correlated, so we can remove one of them.
mtd_f <- mtd_f[,-11 ]
names(mtd_f)
## [1] "TARGET_WINS" "TEAM_BATTING_H" "TEAM_BATTING_2B" "TEAM_BATTING_3B"
## [5] "TEAM_BATTING_HR" "TEAM_BATTING_BB" "TEAM_BATTING_SO" "TEAM_BASERUN_SB"
## [9] "TEAM_BASERUN_CS" "TEAM_PITCHING_H" "TEAM_PITCHING_BB" "TEAM_PITCHING_SO"
## [13] "TEAM_FIELDING_E" "TEAM_FIELDING_DP"
Imputing the NAs using Mice(pmm - predictive mean matching)
imputed_mtd_Data <- mice(mtd_f, m=5, maxit = 5, method = 'pmm')
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_PITCHING_SO TEAM_FIELDING_DP
imputed_mtd_Data <- complete(imputed_mtd_Data)
summary(imputed_mtd_Data)
## TARGET_WINS TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B
## Min. : 0.00 Min. : 891 Min. : 69.0 Min. : 0.00
## 1st Qu.: 71.00 1st Qu.:1383 1st Qu.:208.0 1st Qu.: 34.00
## Median : 82.00 Median :1454 Median :238.0 Median : 47.00
## Mean : 80.79 Mean :1469 Mean :241.2 Mean : 55.25
## 3rd Qu.: 92.00 3rd Qu.:1537 3rd Qu.:273.0 3rd Qu.: 72.00
## Max. :146.00 Max. :2554 Max. :458.0 Max. :223.00
## TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO TEAM_BASERUN_SB
## Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 42.00 1st Qu.:451.0 1st Qu.: 544.0 1st Qu.: 67.0
## Median :102.00 Median :512.0 Median : 735.0 Median :106.0
## Mean : 99.61 Mean :501.6 Mean : 728.6 Mean :135.1
## 3rd Qu.:147.00 3rd Qu.:580.0 3rd Qu.: 925.0 3rd Qu.:170.0
## Max. :264.00 Max. :878.0 Max. :1399.0 Max. :697.0
## TEAM_BASERUN_CS TEAM_PITCHING_H TEAM_PITCHING_BB TEAM_PITCHING_SO
## Min. : 0.00 Min. : 1137 Min. : 0.0 Min. : 0.0
## 1st Qu.: 42.75 1st Qu.: 1419 1st Qu.: 476.0 1st Qu.: 613.8
## Median : 57.00 Median : 1518 Median : 536.5 Median : 805.0
## Mean : 74.76 Mean : 1779 Mean : 553.0 Mean : 812.5
## 3rd Qu.: 89.00 3rd Qu.: 1682 3rd Qu.: 611.0 3rd Qu.: 958.0
## Max. :201.00 Max. :30132 Max. :3645.0 Max. :19278.0
## TEAM_FIELDING_E TEAM_FIELDING_DP
## Min. : 65.0 Min. : 52.0
## 1st Qu.: 127.0 1st Qu.:125.0
## Median : 159.0 Median :146.0
## Mean : 246.5 Mean :142.4
## 3rd Qu.: 249.2 3rd Qu.:162.0
## Max. :1898.0 Max. :228.0
Centering and scaling was used to transform individual predictors in the dataset using the caret library.
t = preProcess(imputed_mtd_Data,
c("BoxCox", "center", "scale"))
mtd_final = data.frame(
t = predict(t, imputed_mtd_Data))
summary(mtd_final)
## t.TARGET_WINS t.TEAM_BATTING_H t.TEAM_BATTING_2B t.TEAM_BATTING_3B
## Min. :-5.12888 Min. :-7.537074 Min. :-4.48108 Min. :-1.9776
## 1st Qu.:-0.62156 1st Qu.:-0.573089 1st Qu.:-0.68949 1st Qu.:-0.7606
## Median : 0.07676 Median :-0.003988 Median :-0.03019 Median :-0.2953
## Mean : 0.00000 Mean : 0.000000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.71159 3rd Qu.: 0.586908 3rd Qu.: 0.69827 3rd Qu.: 0.5995
## Max. : 4.13970 Max. : 4.390097 Max. : 4.05391 Max. : 6.0042
## t.TEAM_BATTING_HR t.TEAM_BATTING_BB t.TEAM_BATTING_SO t.TEAM_BASERUN_SB
## Min. :-1.64521 Min. :-4.08866 Min. :-2.96021 Min. :-1.3692
## 1st Qu.:-0.95153 1st Qu.:-0.41215 1st Qu.:-0.75001 1st Qu.:-0.6904
## Median : 0.03944 Median : 0.08511 Median : 0.02599 Median :-0.2952
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.78267 3rd Qu.: 0.63944 3rd Qu.: 0.79794 3rd Qu.: 0.3532
## Max. : 2.71505 Max. : 3.06871 Max. : 2.72373 Max. : 5.6926
## t.TEAM_BASERUN_CS t.TEAM_PITCHING_H t.TEAM_PITCHING_BB t.TEAM_PITCHING_SO
## Min. :-1.5183 Min. :-2.8556 Min. :-3.32422 Min. :-1.49945
## 1st Qu.:-0.6501 1st Qu.:-0.6710 1st Qu.:-0.46291 1st Qu.:-0.36674
## Median :-0.3607 Median :-0.1765 Median :-0.09923 Median :-0.01378
## Mean : 0.0000 Mean : 0.0000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.2891 3rd Qu.: 0.4602 3rd Qu.: 0.34860 3rd Qu.: 0.26859
## Max. : 2.5636 Max. : 3.2387 Max. :18.58645 Max. :34.07917
## t.TEAM_FIELDING_E t.TEAM_FIELDING_DP
## Min. :-3.3092 Min. :-2.64252
## 1st Qu.:-0.7163 1st Qu.:-0.64171
## Median :-0.1424 Median : 0.07556
## Mean : 0.0000 Mean : 0.00000
## 3rd Qu.: 0.7096 3rd Qu.: 0.65825
## Max. : 2.1432 Max. : 3.36000
mtd_final1 = melt(mtd_final)
## Using as id variables
ggplot(mtd_final1, aes(x= value)) +
geom_density(fill='red') + facet_wrap(~variable, scales = 'free')
#BUILD MODELS:
Model1:
With all variables:
model1 <- lm(t.TARGET_WINS ~., mtd_final)
summary(model1)
##
## Call:
## lm(formula = t.TARGET_WINS ~ ., data = mtd_final)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8120 -0.5138 0.0011 0.5169 3.6552
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.338e-11 1.702e-02 0.000 1.000
## t.TEAM_BATTING_H 4.062e-01 3.647e-02 11.137 < 2e-16 ***
## t.TEAM_BATTING_2B -4.092e-02 2.758e-02 -1.483 0.138
## t.TEAM_BATTING_3B 1.852e-01 3.006e-02 6.160 8.56e-10 ***
## t.TEAM_BATTING_HR 2.225e-01 3.807e-02 5.844 5.83e-09 ***
## t.TEAM_BATTING_BB 1.641e-01 3.454e-02 4.752 2.14e-06 ***
## t.TEAM_BATTING_SO -3.583e-01 4.069e-02 -8.807 < 2e-16 ***
## t.TEAM_BASERUN_SB 2.712e-01 3.210e-02 8.449 < 2e-16 ***
## t.TEAM_BASERUN_CS -2.814e-02 3.371e-02 -0.835 0.404
## t.TEAM_PITCHING_H -1.612e-01 3.806e-02 -4.235 2.38e-05 ***
## t.TEAM_PITCHING_BB -3.539e-02 3.283e-02 -1.078 0.281
## t.TEAM_PITCHING_SO 1.552e-01 2.919e-02 5.315 1.17e-07 ***
## t.TEAM_FIELDING_E -4.791e-01 3.829e-02 -12.511 < 2e-16 ***
## t.TEAM_FIELDING_DP -2.147e-01 2.285e-02 -9.398 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.812 on 2262 degrees of freedom
## Multiple R-squared: 0.3445, Adjusted R-squared: 0.3407
## F-statistic: 91.44 on 13 and 2262 DF, p-value: < 2.2e-16
Model2:
With only the significant variables:
model2 <- lm(t.TARGET_WINS ~ t.TEAM_BATTING_H + t.TEAM_BATTING_3B + t.TEAM_BATTING_HR + t.TEAM_BATTING_BB + t.TEAM_BATTING_SO + t.TEAM_BASERUN_SB + t.TEAM_PITCHING_SO + t.TEAM_PITCHING_H + t.TEAM_PITCHING_SO + t.TEAM_FIELDING_E + t.TEAM_FIELDING_DP, mtd_final)
summary(model2)
##
## Call:
## lm(formula = t.TARGET_WINS ~ t.TEAM_BATTING_H + t.TEAM_BATTING_3B +
## t.TEAM_BATTING_HR + t.TEAM_BATTING_BB + t.TEAM_BATTING_SO +
## t.TEAM_BASERUN_SB + t.TEAM_PITCHING_SO + t.TEAM_PITCHING_H +
## t.TEAM_PITCHING_SO + t.TEAM_FIELDING_E + t.TEAM_FIELDING_DP,
## data = mtd_final)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7683 -0.5118 0.0035 0.5185 3.6256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.470e-11 1.702e-02 0.000 1
## t.TEAM_BATTING_H 3.838e-01 3.045e-02 12.604 < 2e-16 ***
## t.TEAM_BATTING_3B 1.818e-01 2.949e-02 6.165 8.31e-10 ***
## t.TEAM_BATTING_HR 2.309e-01 3.776e-02 6.116 1.13e-09 ***
## t.TEAM_BATTING_BB 1.327e-01 2.233e-02 5.941 3.27e-09 ***
## t.TEAM_BATTING_SO -3.600e-01 3.904e-02 -9.222 < 2e-16 ***
## t.TEAM_BASERUN_SB 2.575e-01 2.586e-02 9.957 < 2e-16 ***
## t.TEAM_PITCHING_SO 1.300e-01 2.197e-02 5.919 3.72e-09 ***
## t.TEAM_PITCHING_H -1.815e-01 3.504e-02 -5.179 2.43e-07 ***
## t.TEAM_FIELDING_E -4.716e-01 3.750e-02 -12.576 < 2e-16 ***
## t.TEAM_FIELDING_DP -2.163e-01 2.250e-02 -9.616 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8121 on 2265 degrees of freedom
## Multiple R-squared: 0.3434, Adjusted R-squared: 0.3405
## F-statistic: 118.5 on 10 and 2265 DF, p-value: < 2.2e-16
Model3:
Further reducing the variables(TEAM_PITCHING_SO and TEAM_BATTING_SO are having high correlation, TEAM_BATTING_H and TEAM_PITCHING_H are also having high correlation, TEAM_BATTING_SO and TEAM_PITCHING_SO are also having high correlation):
model3 <- lm(t.TARGET_WINS ~ t.TEAM_BATTING_H + t.TEAM_BATTING_3B + t.TEAM_BATTING_HR + t.TEAM_BATTING_BB + t.TEAM_BATTING_SO + t.TEAM_BASERUN_SB + t.TEAM_FIELDING_E + t.TEAM_FIELDING_DP, mtd_final)
summary(model3)
##
## Call:
## lm(formula = t.TARGET_WINS ~ t.TEAM_BATTING_H + t.TEAM_BATTING_3B +
## t.TEAM_BATTING_HR + t.TEAM_BATTING_BB + t.TEAM_BATTING_SO +
## t.TEAM_BASERUN_SB + t.TEAM_FIELDING_E + t.TEAM_FIELDING_DP,
## data = mtd_final)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7107 -0.5214 -0.0031 0.5269 4.2580
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.417e-12 1.718e-02 0.000 1
## t.TEAM_BATTING_H 2.839e-01 2.445e-02 11.611 < 2e-16 ***
## t.TEAM_BATTING_3B 1.859e-01 2.965e-02 6.271 4.29e-10 ***
## t.TEAM_BATTING_HR 1.911e-01 3.747e-02 5.099 3.69e-07 ***
## t.TEAM_BATTING_BB 1.616e-01 2.102e-02 7.689 2.19e-14 ***
## t.TEAM_BATTING_SO -2.408e-01 3.488e-02 -6.902 6.62e-12 ***
## t.TEAM_BASERUN_SB 2.304e-01 2.505e-02 9.198 < 2e-16 ***
## t.TEAM_FIELDING_E -4.926e-01 3.641e-02 -13.528 < 2e-16 ***
## t.TEAM_FIELDING_DP -2.069e-01 2.234e-02 -9.262 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8195 on 2267 degrees of freedom
## Multiple R-squared: 0.3307, Adjusted R-squared: 0.3284
## F-statistic: 140 on 8 and 2267 DF, p-value: < 2.2e-16
#SELECT MODELS AND PREDICTION:
summary(model1)
##
## Call:
## lm(formula = t.TARGET_WINS ~ ., data = mtd_final)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8120 -0.5138 0.0011 0.5169 3.6552
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.338e-11 1.702e-02 0.000 1.000
## t.TEAM_BATTING_H 4.062e-01 3.647e-02 11.137 < 2e-16 ***
## t.TEAM_BATTING_2B -4.092e-02 2.758e-02 -1.483 0.138
## t.TEAM_BATTING_3B 1.852e-01 3.006e-02 6.160 8.56e-10 ***
## t.TEAM_BATTING_HR 2.225e-01 3.807e-02 5.844 5.83e-09 ***
## t.TEAM_BATTING_BB 1.641e-01 3.454e-02 4.752 2.14e-06 ***
## t.TEAM_BATTING_SO -3.583e-01 4.069e-02 -8.807 < 2e-16 ***
## t.TEAM_BASERUN_SB 2.712e-01 3.210e-02 8.449 < 2e-16 ***
## t.TEAM_BASERUN_CS -2.814e-02 3.371e-02 -0.835 0.404
## t.TEAM_PITCHING_H -1.612e-01 3.806e-02 -4.235 2.38e-05 ***
## t.TEAM_PITCHING_BB -3.539e-02 3.283e-02 -1.078 0.281
## t.TEAM_PITCHING_SO 1.552e-01 2.919e-02 5.315 1.17e-07 ***
## t.TEAM_FIELDING_E -4.791e-01 3.829e-02 -12.511 < 2e-16 ***
## t.TEAM_FIELDING_DP -2.147e-01 2.285e-02 -9.398 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.812 on 2262 degrees of freedom
## Multiple R-squared: 0.3445, Adjusted R-squared: 0.3407
## F-statistic: 91.44 on 13 and 2262 DF, p-value: < 2.2e-16
summary(model2)
##
## Call:
## lm(formula = t.TARGET_WINS ~ t.TEAM_BATTING_H + t.TEAM_BATTING_3B +
## t.TEAM_BATTING_HR + t.TEAM_BATTING_BB + t.TEAM_BATTING_SO +
## t.TEAM_BASERUN_SB + t.TEAM_PITCHING_SO + t.TEAM_PITCHING_H +
## t.TEAM_PITCHING_SO + t.TEAM_FIELDING_E + t.TEAM_FIELDING_DP,
## data = mtd_final)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7683 -0.5118 0.0035 0.5185 3.6256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.470e-11 1.702e-02 0.000 1
## t.TEAM_BATTING_H 3.838e-01 3.045e-02 12.604 < 2e-16 ***
## t.TEAM_BATTING_3B 1.818e-01 2.949e-02 6.165 8.31e-10 ***
## t.TEAM_BATTING_HR 2.309e-01 3.776e-02 6.116 1.13e-09 ***
## t.TEAM_BATTING_BB 1.327e-01 2.233e-02 5.941 3.27e-09 ***
## t.TEAM_BATTING_SO -3.600e-01 3.904e-02 -9.222 < 2e-16 ***
## t.TEAM_BASERUN_SB 2.575e-01 2.586e-02 9.957 < 2e-16 ***
## t.TEAM_PITCHING_SO 1.300e-01 2.197e-02 5.919 3.72e-09 ***
## t.TEAM_PITCHING_H -1.815e-01 3.504e-02 -5.179 2.43e-07 ***
## t.TEAM_FIELDING_E -4.716e-01 3.750e-02 -12.576 < 2e-16 ***
## t.TEAM_FIELDING_DP -2.163e-01 2.250e-02 -9.616 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8121 on 2265 degrees of freedom
## Multiple R-squared: 0.3434, Adjusted R-squared: 0.3405
## F-statistic: 118.5 on 10 and 2265 DF, p-value: < 2.2e-16
summary(model3)
##
## Call:
## lm(formula = t.TARGET_WINS ~ t.TEAM_BATTING_H + t.TEAM_BATTING_3B +
## t.TEAM_BATTING_HR + t.TEAM_BATTING_BB + t.TEAM_BATTING_SO +
## t.TEAM_BASERUN_SB + t.TEAM_FIELDING_E + t.TEAM_FIELDING_DP,
## data = mtd_final)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7107 -0.5214 -0.0031 0.5269 4.2580
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.417e-12 1.718e-02 0.000 1
## t.TEAM_BATTING_H 2.839e-01 2.445e-02 11.611 < 2e-16 ***
## t.TEAM_BATTING_3B 1.859e-01 2.965e-02 6.271 4.29e-10 ***
## t.TEAM_BATTING_HR 1.911e-01 3.747e-02 5.099 3.69e-07 ***
## t.TEAM_BATTING_BB 1.616e-01 2.102e-02 7.689 2.19e-14 ***
## t.TEAM_BATTING_SO -2.408e-01 3.488e-02 -6.902 6.62e-12 ***
## t.TEAM_BASERUN_SB 2.304e-01 2.505e-02 9.198 < 2e-16 ***
## t.TEAM_FIELDING_E -4.926e-01 3.641e-02 -13.528 < 2e-16 ***
## t.TEAM_FIELDING_DP -2.069e-01 2.234e-02 -9.262 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8195 on 2267 degrees of freedom
## Multiple R-squared: 0.3307, Adjusted R-squared: 0.3284
## F-statistic: 140 on 8 and 2267 DF, p-value: < 2.2e-16
From the three models, I decided to use model3 for the predictions considering its more parsimonious model. There is no significant difference in R2, Adjusted R2 and RMSE even when i did the treatment for multi-collinearity.
#PREDICTION:
For the evaluation dataset also we will be doing all the preprocessing steps.
med <- read.csv("https://raw.githubusercontent.com/yathdeep/msds-data621/main/moneyball-evaluation-data.csv")
Removing the variables:
med_f <- med[,-1 ]
names(med_f)
## [1] "TEAM_BATTING_H" "TEAM_BATTING_2B" "TEAM_BATTING_3B" "TEAM_BATTING_HR"
## [5] "TEAM_BATTING_BB" "TEAM_BATTING_SO" "TEAM_BASERUN_SB" "TEAM_BASERUN_CS"
## [9] "TEAM_BATTING_HBP" "TEAM_PITCHING_H" "TEAM_PITCHING_HR" "TEAM_PITCHING_BB"
## [13] "TEAM_PITCHING_SO" "TEAM_FIELDING_E" "TEAM_FIELDING_DP"
med_f <- med_f[,-10 ]
names(med_f )
## [1] "TEAM_BATTING_H" "TEAM_BATTING_2B" "TEAM_BATTING_3B" "TEAM_BATTING_HR"
## [5] "TEAM_BATTING_BB" "TEAM_BATTING_SO" "TEAM_BASERUN_SB" "TEAM_BASERUN_CS"
## [9] "TEAM_BATTING_HBP" "TEAM_PITCHING_HR" "TEAM_PITCHING_BB" "TEAM_PITCHING_SO"
## [13] "TEAM_FIELDING_E" "TEAM_FIELDING_DP"
med_f <- med_f[,-11 ]
names(med_f)
## [1] "TEAM_BATTING_H" "TEAM_BATTING_2B" "TEAM_BATTING_3B" "TEAM_BATTING_HR"
## [5] "TEAM_BATTING_BB" "TEAM_BATTING_SO" "TEAM_BASERUN_SB" "TEAM_BASERUN_CS"
## [9] "TEAM_BATTING_HBP" "TEAM_PITCHING_HR" "TEAM_PITCHING_SO" "TEAM_FIELDING_E"
## [13] "TEAM_FIELDING_DP"
Imputing the NAs using Mice(pmm - predictive mean matching)
imputed_med_Data <- mice(med_f, m=5, maxit = 5, method = 'pmm')
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_SO TEAM_FIELDING_DP
## Warning: Number of logged events: 25
imputed_med_Data <- complete(imputed_med_Data)
summary(imputed_med_Data)
## TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B TEAM_BATTING_HR
## Min. : 819 Min. : 44.0 Min. : 14.00 Min. : 0.00
## 1st Qu.:1387 1st Qu.:210.0 1st Qu.: 35.00 1st Qu.: 44.50
## Median :1455 Median :239.0 Median : 52.00 Median :101.00
## Mean :1469 Mean :241.3 Mean : 55.91 Mean : 95.63
## 3rd Qu.:1548 3rd Qu.:278.5 3rd Qu.: 72.00 3rd Qu.:135.50
## Max. :2170 Max. :376.0 Max. :155.00 Max. :242.00
## TEAM_BATTING_BB TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS
## Min. : 15.0 Min. : 0.0 Min. : 0.0 Min. : 0.00
## 1st Qu.:436.5 1st Qu.: 547.5 1st Qu.: 59.0 1st Qu.: 41.00
## Median :509.0 Median : 680.0 Median : 92.0 Median : 56.00
## Mean :499.0 Mean : 703.2 Mean :125.7 Mean : 65.38
## 3rd Qu.:565.5 3rd Qu.: 904.5 3rd Qu.:155.5 3rd Qu.: 76.00
## Max. :792.0 Max. :1268.0 Max. :580.0 Max. :154.00
## TEAM_BATTING_HBP TEAM_PITCHING_HR TEAM_PITCHING_SO TEAM_FIELDING_E
## Min. :42.00 Min. : 0.0 Min. : 0.0 Min. : 73.0
## 1st Qu.:49.00 1st Qu.: 52.0 1st Qu.: 616.0 1st Qu.: 131.0
## Median :62.00 Median :104.0 Median : 758.0 Median : 163.0
## Mean :60.88 Mean :102.1 Mean : 803.4 Mean : 249.7
## 3rd Qu.:66.00 3rd Qu.:142.5 3rd Qu.: 945.5 3rd Qu.: 252.0
## Max. :96.00 Max. :336.0 Max. :9963.0 Max. :1568.0
## TEAM_FIELDING_DP
## Min. : 69.0
## 1st Qu.:123.0
## Median :146.0
## Mean :139.6
## 3rd Qu.:160.5
## Max. :204.0
Centering and scaling was used to transform individual predictors in the dataset using the caret library.
t = preProcess(imputed_med_Data,
c("BoxCox", "center", "scale"))
med_final = data.frame(
t = predict(t, imputed_med_Data))
summary(med_final)
## t.TEAM_BATTING_H t.TEAM_BATTING_2B t.TEAM_BATTING_3B t.TEAM_BATTING_HR
## Min. :-5.07603 Min. :-3.26217 Min. :-2.64215 Min. :-1.69766
## 1st Qu.:-0.52836 1st Qu.:-0.67016 1st Qu.:-0.73771 1st Qu.:-0.90771
## Median :-0.06571 Median :-0.09057 Median : 0.08513 Median : 0.09527
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.54648 3rd Qu.: 0.74495 3rd Qu.: 0.76149 3rd Qu.: 0.70771
## Max. : 4.16429 Max. : 3.00896 Max. : 2.35514 Max. : 2.59828
## t.TEAM_BATTING_BB t.TEAM_BATTING_SO t.TEAM_BASERUN_SB t.TEAM_BASERUN_CS
## Min. :-2.859388 Min. :-2.96170 Min. :-1.3250 Min. :-1.8578
## 1st Qu.:-0.619108 1st Qu.:-0.65577 1st Qu.:-0.7029 1st Qu.:-0.6927
## Median : 0.008141 Median :-0.09772 Median :-0.3550 Median :-0.2665
## Mean : 0.000000 Mean : 0.00000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.536019 3rd Qu.: 0.84782 3rd Qu.: 0.3145 3rd Qu.: 0.3018
## Max. : 2.968415 Max. : 2.37879 Max. : 4.7902 Max. : 2.5182
## t.TEAM_BATTING_HBP t.TEAM_PITCHING_HR t.TEAM_PITCHING_SO t.TEAM_FIELDING_E
## Min. :-1.5450 Min. :-1.77169 Min. :-1.30653 Min. :-3.1354
## 1st Qu.:-0.8043 1st Qu.:-0.86977 1st Qu.:-0.30471 1st Qu.:-0.7317
## Median : 0.2418 Median : 0.03214 Median :-0.07377 Median :-0.1378
## Mean : 0.0000 Mean : 0.00000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.5036 3rd Qu.: 0.69991 3rd Qu.: 0.23117 3rd Qu.: 0.7208
## Max. : 1.9426 Max. : 4.05609 Max. :14.89661 Max. : 2.0409
## t.TEAM_FIELDING_DP
## Min. :-2.0185
## 1st Qu.:-0.6173
## Median : 0.1403
## Mean : 0.0000
## 3rd Qu.: 0.6638
## Max. : 2.4358
eval_data <- predict(model3, newdata = med_final, interval="prediction")
eval_data
## fit lwr upr
## 1 -1.177140396 -2.787716925 0.4334361
## 2 -0.963353683 -2.572587925 0.6458806
## 3 -0.586983418 -2.195572375 1.0216055
## 4 0.309017739 -1.299971016 1.9180065
## 5 -1.259411226 -2.871228540 0.3524061
## 6 -1.117093899 -2.729238742 0.4950509
## 7 0.282700634 -1.329558889 1.8949602
## 8 -0.686586839 -2.295842179 0.9226685
## 9 -0.665843181 -2.275451035 0.9437647
## 10 -0.552761146 -2.161130934 1.0556086
## 11 -0.890441599 -2.500564128 0.7196809
## 12 0.001791503 -1.608829560 1.6124126
## 13 0.104691626 -1.507298163 1.7166814
## 14 0.021702030 -1.589101919 1.6325060
## 15 0.329250565 -1.282804258 1.9413054
## 16 -0.462461014 -2.072085673 1.1471636
## 17 -0.718774044 -2.328397415 0.8908493
## 18 -0.099792899 -1.708515611 1.5089298
## 19 -0.814953225 -2.424650005 0.7947436
## 20 0.381683848 -1.228213690 1.9915814
## 21 0.309425488 -1.300235069 1.9190860
## 22 0.188867743 -1.420686780 1.7984223
## 23 0.077345751 -1.532040187 1.6867317
## 24 -0.696221789 -2.305580052 0.9131365
## 25 0.171145958 -1.438806619 1.7810985
## 26 0.492086357 -1.118955041 2.1031278
## 27 -0.514213731 -2.135292267 1.1068648
## 28 -0.517045324 -2.125588023 1.0914974
## 29 0.274717914 -1.335718613 1.8851544
## 30 -0.548281111 -2.158751660 1.0621894
## 31 0.671105399 -0.939271800 2.2814826
## 32 0.348285599 -1.260462101 1.9570333
## 33 0.316629498 -1.293237077 1.9264961
## 34 0.178055604 -1.434105137 1.7902163
## 35 0.047858018 -1.561123732 1.6568398
## 36 0.103460331 -1.508246305 1.7151670
## 37 -0.247061457 -1.854931792 1.3608089
## 38 0.484354453 -1.127039327 2.0957482
## 39 0.060001226 -1.549254672 1.6692571
## 40 0.460657733 -1.149498491 2.0708140
## 41 0.155280654 -1.454950617 1.7655119
## 42 1.297882313 -0.314513187 2.9102778
## 43 -0.979385020 -2.609705671 0.6509356
## 44 1.646054181 0.024765327 3.2673430
## 45 0.710634766 -0.901974261 2.3232438
## 46 0.955297308 -0.655535442 2.5661301
## 47 1.045508871 -0.565093509 2.6561113
## 48 -0.444398976 -2.053152599 1.1643546
## 49 -0.808937177 -2.417919870 0.8000455
## 50 -0.133546557 -1.742015949 1.4749228
## 51 -0.322333339 -1.931113108 1.2864464
## 52 0.212220766 -1.397257209 1.8216987
## 53 -0.426115526 -2.035925300 1.1836942
## 54 -0.253441757 -1.863290955 1.3564074
## 55 -0.604624161 -2.213213067 1.0039647
## 56 0.090457054 -1.519033219 1.6999473
## 57 0.656311747 -0.953987098 2.2666106
## 58 -0.403350482 -2.012623542 1.2059226
## 59 -1.122246309 -2.732811211 0.4883186
## 60 -0.231997314 -1.840549069 1.3765544
## 61 0.418913237 -1.189960480 2.0277870
## 62 0.031357325 -1.582333393 1.6450480
## 63 0.405435435 -1.203298433 2.0141693
## 64 0.327576579 -1.284215041 1.9393682
## 65 0.415589893 -1.196422337 2.0276021
## 66 1.320973440 -0.293240888 2.9351878
## 67 -0.614880420 -2.224209419 0.9944486
## 68 -0.335744973 -1.945441765 1.2739518
## 69 -0.170782297 -1.780348817 1.4387842
## 70 0.399962655 -1.210836051 2.0107614
## 71 0.272586847 -1.338738303 1.8839120
## 72 -0.358323178 -1.971318412 1.2546721
## 73 -0.182076917 -1.793264555 1.4291107
## 74 0.538356980 -1.074334350 2.1510483
## 75 -0.252774515 -1.863513514 1.3579645
## 76 -0.229290804 -1.840035800 1.3814542
## 77 0.418574549 -1.190410699 2.0275598
## 78 0.073923190 -1.534940978 1.6827874
## 79 -0.817062575 -2.427026265 0.7929011
## 80 -0.428209813 -2.037129312 1.1807097
## 81 0.193417431 -1.415603144 1.8024380
## 82 0.326650060 -1.282298910 1.9355990
## 83 0.793602131 -0.816363779 2.4035680
## 84 -0.465613163 -2.076297598 1.1450713
## 85 0.223127619 -1.386448562 1.8327038
## 86 -0.179389140 -1.790297038 1.4315188
## 87 0.165804568 -1.444378938 1.7759881
## 88 0.314418334 -1.293623223 1.9224599
## 89 0.773212880 -0.837675425 2.3841012
## 90 0.758173840 -0.851216979 2.3675647
## 91 0.152002102 -1.457640270 1.7616445
## 92 1.149628077 -0.464768753 2.7640249
## 93 -0.499711615 -2.108532842 1.1091096
## 94 -0.142278516 -1.752598122 1.4680411
## 95 -0.034841334 -1.644534479 1.5748518
## 96 -0.151275035 -1.761333196 1.4587831
## 97 0.562853158 -1.049327170 2.1750335
## 98 1.085943426 -0.525837917 2.6977248
## 99 0.411579392 -1.198806999 2.0219658
## 100 0.370550647 -1.240431273 1.9815326
## 101 -0.092845963 -1.702231637 1.5165397
## 102 -0.489913039 -2.099000942 1.1191749
## 103 0.252846609 -1.355290011 1.8609832
## 104 0.246546680 -1.362853099 1.8559465
## 105 -0.470712420 -2.082426513 1.1410017
## 106 -0.965755837 -2.577279369 0.6457677
## 107 -1.753166833 -3.370600280 -0.1357334
## 108 -0.035792189 -1.645911629 1.5743273
## 109 0.751445148 -0.857943245 2.3608335
## 110 -1.214810906 -2.827387859 0.3977660
## 111 0.360137376 -1.248459472 1.9687342
## 112 0.436644888 -1.172563603 2.0458534
## 113 0.773614685 -0.835023374 2.3822527
## 114 0.726968512 -0.882252819 2.3361898
## 115 0.077263349 -1.531845339 1.6863720
## 116 0.058665324 -1.550168457 1.6674991
## 117 0.298023643 -1.312195989 1.9082433
## 118 0.117168945 -1.490973026 1.7253109
## 119 -0.410112167 -2.019467574 1.1992432
## 120 0.052403104 -1.558562827 1.6633690
## 121 0.916586147 -0.693887001 2.5270593
## 122 -0.891471378 -2.501113101 0.7181703
## 123 -0.809738792 -2.419370549 0.7998930
## 124 -1.115469594 -2.728085434 0.4971462
## 125 -0.819611796 -2.429489413 0.7902658
## 126 0.200686864 -1.408772386 1.8101461
## 127 0.366884489 -1.242903832 1.9766728
## 128 -0.357195337 -1.965895937 1.2515053
## 129 0.616651821 -0.992595090 2.2258987
## 130 0.442171535 -1.167510507 2.0518536
## 131 0.189508287 -1.419410786 1.7984274
## 132 0.122622941 -1.487278496 1.7325244
## 133 -0.670464445 -2.284402381 0.9434735
## 134 -0.051829470 -1.661700319 1.5580414
## 135 1.251555321 -0.363439307 2.8665499
## 136 -0.537871078 -2.148432996 1.0726908
## 137 -0.234652313 -1.843808956 1.3745043
## 138 -0.210162930 -1.818550982 1.3982251
## 139 1.118369196 -0.499243466 2.7359819
## 140 -0.056571120 -1.665454853 1.5523126
## 141 -1.218533408 -2.829497732 0.3924309
## 142 -0.602011440 -2.211512766 1.0074899
## 143 0.591482493 -1.018361358 2.2013263
## 144 -0.574574759 -2.184072338 1.0349228
## 145 -0.200774051 -1.810539558 1.4089915
## 146 -0.399346035 -2.007723791 1.2090317
## 147 -0.410197417 -2.019338170 1.1989433
## 148 0.033717914 -1.574938863 1.6423747
## 149 -0.104321800 -1.714360929 1.5057173
## 150 0.347238341 -1.261348405 1.9558251
## 151 0.141183061 -1.468574670 1.7509408
## 152 0.474720816 -1.137287737 2.0867294
## 153 -1.087715368 -2.709986382 0.5345556
## 154 -1.034980723 -2.644905111 0.5749437
## 155 -0.022944457 -1.632415080 1.5865262
## 156 -0.944985941 -2.555317075 0.6653452
## 157 0.843116966 -0.767338146 2.4535721
## 158 -0.473950080 -2.083689049 1.1357889
## 159 0.527659340 -1.081784159 2.1371028
## 160 -0.591476916 -2.200201209 1.0172474
## 161 1.176545789 -0.436669128 2.7897607
## 162 1.606952817 -0.006816591 3.2207222
## 163 0.941173121 -0.669113092 2.5514593
## 164 1.345714266 -0.268084353 2.9595129
## 165 1.056081246 -0.557588121 2.6697506
## 166 0.926255084 -0.685752036 2.5382622
## 167 0.166446150 -1.443539381 1.7764317
## 168 0.188943775 -1.421737875 1.7996254
## 169 -0.680445232 -2.290604989 0.9297145
## 170 -0.007138755 -1.616920496 1.6026430
## 171 0.646374585 -0.963398360 2.2561475
## 172 0.457583919 -1.151353667 2.0665215
## 173 0.101531190 -1.507167874 1.7102303
## 174 0.763812274 -0.845841506 2.3734661
## 175 0.005635363 -1.602910211 1.6141809
## 176 -0.165334709 -1.775004818 1.4443354
## 177 0.082323297 -1.528344565 1.6929912
## 178 -0.843763638 -2.453928950 0.7664017
## 179 -0.319640124 -1.927673727 1.2883935
## 180 -0.170837370 -1.779465478 1.4377907
## 181 0.466575623 -1.146664722 2.0798160
## 182 0.314312757 -1.296046600 1.9246721
## 183 0.461008375 -1.148646933 2.0706637
## 184 0.542140972 -1.067143273 2.1514252
## 185 0.941899765 -0.671227394 2.5550269
## 186 1.186247881 -0.431009363 2.8035051
## 187 0.607245923 -1.005363216 2.2198551
## 188 -0.541637014 -2.151839207 1.0685652
## 189 -1.066351102 -2.676862461 0.5441603
## 190 1.766531022 0.151120581 3.3819415
## 191 -0.726415398 -2.335335212 0.8825044
## 192 -0.018756713 -1.627175394 1.5896620
## 193 -0.552751766 -2.161448295 1.0559448
## 194 -0.442551402 -2.051443948 1.1663411
## 195 -0.395237506 -2.005455223 1.2149802
## 196 -1.076883271 -2.687389505 0.5336230
## 197 -0.429738568 -2.038193902 1.1787168
## 198 0.751315888 -0.860309544 2.3629413
## 199 0.056918275 -1.551949331 1.6657859
## 200 0.298105838 -1.310937951 1.9071496
## 201 -0.547638779 -2.158629651 1.0633521
## 202 0.158855539 -1.450640253 1.7683513
## 203 -0.068795233 -1.680271933 1.5426815
## 204 0.640613298 -0.968406945 2.2496335
## 205 0.080973447 -1.528209043 1.6901559
## 206 0.204124088 -1.404776241 1.8130244
## 207 0.120828050 -1.489004642 1.7306607
## 208 0.173157322 -1.436519027 1.7828337
## 209 -0.033113340 -1.642307951 1.5760813
## 210 -0.263859065 -1.873198644 1.3454805
## 211 1.377226052 -0.234240950 2.9886931
## 212 0.444869915 -1.164482055 2.0542219
## 213 0.030093799 -1.579463786 1.6396514
## 214 -1.181721499 -2.791421511 0.4279785
## 215 -0.769065054 -2.379487518 0.8413574
## 216 0.163278371 -1.445555659 1.7721124
## 217 -0.178726302 -1.790439904 1.4329873
## 218 0.688285343 -0.921359173 2.2979299
## 219 -0.168094132 -1.776655134 1.4404669
## 220 0.092013977 -1.516588405 1.7006164
## 221 -0.341643676 -1.950926930 1.2676396
## 222 -0.546695700 -2.157056114 1.0636647
## 223 -0.061640886 -1.670593958 1.5473122
## 224 -0.271478973 -1.883158035 1.3402001
## 225 0.476400400 -1.146127510 2.0989283
## 226 -0.188076019 -1.796572657 1.4204206
## 227 -0.132747149 -1.741483532 1.4759892
## 228 -0.170493490 -1.780523884 1.4395369
## 229 0.429262037 -1.179540032 2.0380641
## 230 -0.299101333 -1.909788411 1.3115857
## 231 -0.044086522 -1.654602784 1.5664297
## 232 0.591433703 -1.017821396 2.2006888
## 233 0.022628133 -1.587752166 1.6330084
## 234 0.268837432 -1.341432066 1.8791069
## 235 -0.214618185 -1.823079644 1.3938433
## 236 -0.358581829 -1.966819095 1.2496554
## 237 -0.302127785 -1.912958051 1.3087025
## 238 0.116807332 -1.493358378 1.7269730
## 239 0.616272080 -0.993995710 2.2265399
## 240 -0.671653290 -2.280558726 0.9372521
## 241 0.319586749 -1.289026358 1.9281999
## 242 0.744886801 -0.865396164 2.3551698
## 243 0.273774135 -1.335446588 1.8829949
## 244 0.198673386 -1.410936435 1.8082832
## 245 -1.496888621 -3.109978576 0.1162013
## 246 0.129267911 -1.480883051 1.7394189
## 247 -0.170027223 -1.778589057 1.4385346
## 248 0.206228386 -1.402847651 1.8153044
## 249 -0.342303670 -1.951267738 1.2666604
## 250 0.434352108 -1.177795470 2.0464997
## 251 0.201227008 -1.408309325 1.8107633
## 252 -1.222878849 -2.836247196 0.3904895
## 253 1.038441951 -0.573422571 2.6503065
## 254 -3.113028304 -4.742077638 -1.4839790
## 255 -0.795865995 -2.404927559 0.8131956
## 256 -0.359272597 -1.970575489 1.2520303
## 257 0.208802185 -1.400816972 1.8184213
## 258 0.081447262 -1.527468419 1.6903629
## 259 -0.298186692 -1.907859199 1.3114858
summary(eval_data)
## fit lwr upr
## Min. :-3.11303 Min. :-4.7421 Min. :-1.484
## 1st Qu.:-0.41015 1st Qu.:-2.0194 1st Qu.: 1.199
## Median : 0.05692 Median :-1.5519 Median : 1.666
## Mean : 0.00000 Mean :-1.6105 Mean : 1.611
## 3rd Qu.: 0.39082 3rd Qu.:-1.2195 3rd Qu.: 2.001
## Max. : 1.76653 Max. : 0.1511 Max. : 3.382