#HW #1 Assignment - Moneyball Model

Overview In this homework assignment, you will explore, analyze and model a data set containing approximately 2200 records. Each record represents a professional baseball team from the years 1871 to 2006 inclusive. Each record has the performance of the team for the given year, with all of the statistics adjusted to match the performance of a 162 game season.

Your objective is to build a multiple linear regression model on the training data to predict the number of wins for the team. You can only use the variables given to you (or variables that you derive from the variables provided). Below is a short description of the variables of interest in the data set:

#install.packages('caret')
#install.packages('e1071', dependencies=TRUE)
library(knitr)
library(stringr)
library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(reshape)
## Warning: package 'reshape' was built under R version 4.0.5
## 
## Attaching package: 'reshape'
## The following object is masked from 'package:dplyr':
## 
##     rename
## The following objects are masked from 'package:tidyr':
## 
##     expand, smiths
library(corrgram)
## Warning: package 'corrgram' was built under R version 4.0.5
library(mice)
## Warning: package 'mice' was built under R version 4.0.5
## 
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
## 
##     filter
## The following objects are masked from 'package:base':
## 
##     cbind, rbind
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'lattice'
## The following object is masked from 'package:corrgram':
## 
##     panel.fill
library(e1071)
## Warning: package 'e1071' was built under R version 4.0.4
#install.packages('Rcpp')
library(Rcpp)
## Warning: package 'Rcpp' was built under R version 4.0.5

#DATA EXPLORATION:

Load the data and understand the data by using some stats and plots.

mtd <- read.csv("https://raw.githubusercontent.com/yathdeep/msds-data621/main/moneyball-training-data.csv")
count(mtd)
##      n
## 1 2276
names(mtd)
##  [1] "INDEX"            "TARGET_WINS"      "TEAM_BATTING_H"   "TEAM_BATTING_2B" 
##  [5] "TEAM_BATTING_3B"  "TEAM_BATTING_HR"  "TEAM_BATTING_BB"  "TEAM_BATTING_SO" 
##  [9] "TEAM_BASERUN_SB"  "TEAM_BASERUN_CS"  "TEAM_BATTING_HBP" "TEAM_PITCHING_H" 
## [13] "TEAM_PITCHING_HR" "TEAM_PITCHING_BB" "TEAM_PITCHING_SO" "TEAM_FIELDING_E" 
## [17] "TEAM_FIELDING_DP"
summary(mtd)
##      INDEX         TARGET_WINS     TEAM_BATTING_H TEAM_BATTING_2B
##  Min.   :   1.0   Min.   :  0.00   Min.   : 891   Min.   : 69.0  
##  1st Qu.: 630.8   1st Qu.: 71.00   1st Qu.:1383   1st Qu.:208.0  
##  Median :1270.5   Median : 82.00   Median :1454   Median :238.0  
##  Mean   :1268.5   Mean   : 80.79   Mean   :1469   Mean   :241.2  
##  3rd Qu.:1915.5   3rd Qu.: 92.00   3rd Qu.:1537   3rd Qu.:273.0  
##  Max.   :2535.0   Max.   :146.00   Max.   :2554   Max.   :458.0  
##                                                                  
##  TEAM_BATTING_3B  TEAM_BATTING_HR  TEAM_BATTING_BB TEAM_BATTING_SO 
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.0   Min.   :   0.0  
##  1st Qu.: 34.00   1st Qu.: 42.00   1st Qu.:451.0   1st Qu.: 548.0  
##  Median : 47.00   Median :102.00   Median :512.0   Median : 750.0  
##  Mean   : 55.25   Mean   : 99.61   Mean   :501.6   Mean   : 735.6  
##  3rd Qu.: 72.00   3rd Qu.:147.00   3rd Qu.:580.0   3rd Qu.: 930.0  
##  Max.   :223.00   Max.   :264.00   Max.   :878.0   Max.   :1399.0  
##                                                    NA's   :102     
##  TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_H
##  Min.   :  0.0   Min.   :  0.0   Min.   :29.00    Min.   : 1137  
##  1st Qu.: 66.0   1st Qu.: 38.0   1st Qu.:50.50    1st Qu.: 1419  
##  Median :101.0   Median : 49.0   Median :58.00    Median : 1518  
##  Mean   :124.8   Mean   : 52.8   Mean   :59.36    Mean   : 1779  
##  3rd Qu.:156.0   3rd Qu.: 62.0   3rd Qu.:67.00    3rd Qu.: 1682  
##  Max.   :697.0   Max.   :201.0   Max.   :95.00    Max.   :30132  
##  NA's   :131     NA's   :772     NA's   :2085                    
##  TEAM_PITCHING_HR TEAM_PITCHING_BB TEAM_PITCHING_SO  TEAM_FIELDING_E 
##  Min.   :  0.0    Min.   :   0.0   Min.   :    0.0   Min.   :  65.0  
##  1st Qu.: 50.0    1st Qu.: 476.0   1st Qu.:  615.0   1st Qu.: 127.0  
##  Median :107.0    Median : 536.5   Median :  813.5   Median : 159.0  
##  Mean   :105.7    Mean   : 553.0   Mean   :  817.7   Mean   : 246.5  
##  3rd Qu.:150.0    3rd Qu.: 611.0   3rd Qu.:  968.0   3rd Qu.: 249.2  
##  Max.   :343.0    Max.   :3645.0   Max.   :19278.0   Max.   :1898.0  
##                                    NA's   :102                       
##  TEAM_FIELDING_DP
##  Min.   : 52.0   
##  1st Qu.:131.0   
##  Median :149.0   
##  Mean   :146.4   
##  3rd Qu.:164.0   
##  Max.   :228.0   
##  NA's   :286

The dataset consists of 17 elements, with 2276 total cases. There are multiple variables with missing (NA) values and TEAM-BATTING_HBP has the highest NAs.

Checking for outliers:

ggplot(stack(mtd), aes(x = ind, y = values)) + 
  geom_boxplot() +
  coord_cartesian(ylim = c(0, 1000)) +
  theme(legend.position="none") +
  theme(axis.text.x=element_text(angle=45, hjust=1)) + 
  theme(panel.background = element_rect(fill = 'grey'))
## Warning: Removed 3478 rows containing non-finite values (stat_boxplot).

Checking for skewness in the data

mtd1 = melt(mtd)
## Using  as id variables
ggplot(mtd1, aes(x= value)) + 
    geom_density(fill='red') + facet_wrap(~variable, scales = 'free') 
## Warning: Removed 3478 rows containing non-finite values (stat_density).

As seen there are several variables that are skewed and also there are outliers.

Finding correlations:

mtd2 <- mtd[,-1 ]
names(mtd2)
##  [1] "TARGET_WINS"      "TEAM_BATTING_H"   "TEAM_BATTING_2B"  "TEAM_BATTING_3B" 
##  [5] "TEAM_BATTING_HR"  "TEAM_BATTING_BB"  "TEAM_BATTING_SO"  "TEAM_BASERUN_SB" 
##  [9] "TEAM_BASERUN_CS"  "TEAM_BATTING_HBP" "TEAM_PITCHING_H"  "TEAM_PITCHING_HR"
## [13] "TEAM_PITCHING_BB" "TEAM_PITCHING_SO" "TEAM_FIELDING_E"  "TEAM_FIELDING_DP"
cor(drop_na(mtd2))
##                  TARGET_WINS TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B
## TARGET_WINS       1.00000000     0.46994665      0.31298400     -0.12434586
## TEAM_BATTING_H    0.46994665     1.00000000      0.56177286      0.21391883
## TEAM_BATTING_2B   0.31298400     0.56177286      1.00000000      0.04203441
## TEAM_BATTING_3B  -0.12434586     0.21391883      0.04203441      1.00000000
## TEAM_BATTING_HR   0.42241683     0.39627593      0.25099045     -0.21879927
## TEAM_BATTING_BB   0.46868793     0.19735234      0.19749256     -0.20584392
## TEAM_BATTING_SO  -0.22889273    -0.34174328     -0.06415123     -0.19291841
## TEAM_BASERUN_SB   0.01483639     0.07167495     -0.18768279      0.16946086
## TEAM_BASERUN_CS  -0.17875598    -0.09377545     -0.20413884      0.23213978
## TEAM_BATTING_HBP  0.07350424    -0.02911218      0.04608475     -0.17424715
## TEAM_PITCHING_H   0.47123431     0.99919269      0.56045355      0.21250322
## TEAM_PITCHING_HR  0.42246683     0.39495630      0.24999875     -0.21973263
## TEAM_PITCHING_BB  0.46839882     0.19529071      0.19592157     -0.20675383
## TEAM_PITCHING_SO -0.22936481    -0.34445001     -0.06616615     -0.19386654
## TEAM_FIELDING_E  -0.38668800    -0.25381638     -0.19427027     -0.06513145
## TEAM_FIELDING_DP -0.19586601     0.01776946     -0.02488808      0.13314758
##                  TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO
## TARGET_WINS           0.42241683      0.46868793     -0.22889273
## TEAM_BATTING_H        0.39627593      0.19735234     -0.34174328
## TEAM_BATTING_2B       0.25099045      0.19749256     -0.06415123
## TEAM_BATTING_3B      -0.21879927     -0.20584392     -0.19291841
## TEAM_BATTING_HR       1.00000000      0.45638161      0.21045444
## TEAM_BATTING_BB       0.45638161      1.00000000      0.21833871
## TEAM_BATTING_SO       0.21045444      0.21833871      1.00000000
## TEAM_BASERUN_SB      -0.19021893     -0.08806123     -0.07475974
## TEAM_BASERUN_CS      -0.27579838     -0.20878051     -0.05613035
## TEAM_BATTING_HBP      0.10618116      0.04746007      0.22094219
## TEAM_PITCHING_H       0.39549390      0.19848687     -0.34145321
## TEAM_PITCHING_HR      0.99993259      0.45659283      0.21111617
## TEAM_PITCHING_BB      0.45542468      0.99988140      0.21895783
## TEAM_PITCHING_SO      0.20829574      0.21793253      0.99976835
## TEAM_FIELDING_E       0.01567397     -0.07847126      0.30814540
## TEAM_FIELDING_DP     -0.06182222     -0.07929078     -0.12319072
##                  TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP
## TARGET_WINS           0.01483639    -0.178755979       0.07350424
## TEAM_BATTING_H        0.07167495    -0.093775445      -0.02911218
## TEAM_BATTING_2B      -0.18768279    -0.204138837       0.04608475
## TEAM_BATTING_3B       0.16946086     0.232139777      -0.17424715
## TEAM_BATTING_HR      -0.19021893    -0.275798375       0.10618116
## TEAM_BATTING_BB      -0.08806123    -0.208780510       0.04746007
## TEAM_BATTING_SO      -0.07475974    -0.056130355       0.22094219
## TEAM_BASERUN_SB       1.00000000     0.624737808      -0.06400498
## TEAM_BASERUN_CS       0.62473781     1.000000000      -0.07051390
## TEAM_BATTING_HBP     -0.06400498    -0.070513896       1.00000000
## TEAM_PITCHING_H       0.07395373    -0.092977893      -0.02769699
## TEAM_PITCHING_HR     -0.18948057    -0.275471495       0.10675878
## TEAM_PITCHING_BB     -0.08741902    -0.208470154       0.04785137
## TEAM_PITCHING_SO     -0.07351325    -0.055308336       0.22157375
## TEAM_FIELDING_E       0.04292341     0.207701189       0.04178971
## TEAM_FIELDING_DP     -0.13023054    -0.006764233      -0.07120824
##                  TEAM_PITCHING_H TEAM_PITCHING_HR TEAM_PITCHING_BB
## TARGET_WINS           0.47123431       0.42246683       0.46839882
## TEAM_BATTING_H        0.99919269       0.39495630       0.19529071
## TEAM_BATTING_2B       0.56045355       0.24999875       0.19592157
## TEAM_BATTING_3B       0.21250322      -0.21973263      -0.20675383
## TEAM_BATTING_HR       0.39549390       0.99993259       0.45542468
## TEAM_BATTING_BB       0.19848687       0.45659283       0.99988140
## TEAM_BATTING_SO      -0.34145321       0.21111617       0.21895783
## TEAM_BASERUN_SB       0.07395373      -0.18948057      -0.08741902
## TEAM_BASERUN_CS      -0.09297789      -0.27547150      -0.20847015
## TEAM_BATTING_HBP     -0.02769699       0.10675878       0.04785137
## TEAM_PITCHING_H       1.00000000       0.39463199       0.19703302
## TEAM_PITCHING_HR      0.39463199       1.00000000       0.45580983
## TEAM_PITCHING_BB      0.19703302       0.45580983       1.00000000
## TEAM_PITCHING_SO     -0.34330646       0.20920115       0.21887700
## TEAM_FIELDING_E      -0.25073028       0.01689330      -0.07692315
## TEAM_FIELDING_DP      0.01416807      -0.06292475      -0.08040645
##                  TEAM_PITCHING_SO TEAM_FIELDING_E TEAM_FIELDING_DP
## TARGET_WINS           -0.22936481     -0.38668800     -0.195866006
## TEAM_BATTING_H        -0.34445001     -0.25381638      0.017769456
## TEAM_BATTING_2B       -0.06616615     -0.19427027     -0.024888081
## TEAM_BATTING_3B       -0.19386654     -0.06513145      0.133147578
## TEAM_BATTING_HR        0.20829574      0.01567397     -0.061822219
## TEAM_BATTING_BB        0.21793253     -0.07847126     -0.079290775
## TEAM_BATTING_SO        0.99976835      0.30814540     -0.123190715
## TEAM_BASERUN_SB       -0.07351325      0.04292341     -0.130230537
## TEAM_BASERUN_CS       -0.05530834      0.20770119     -0.006764233
## TEAM_BATTING_HBP       0.22157375      0.04178971     -0.071208241
## TEAM_PITCHING_H       -0.34330646     -0.25073028      0.014168073
## TEAM_PITCHING_HR       0.20920115      0.01689330     -0.062924751
## TEAM_PITCHING_BB       0.21887700     -0.07692315     -0.080406452
## TEAM_PITCHING_SO       1.00000000      0.31008407     -0.124923213
## TEAM_FIELDING_E        0.31008407      1.00000000      0.040205814
## TEAM_FIELDING_DP      -0.12492321      0.04020581      1.000000000
pairs.panels(mtd2[1:8]) 

pairs.panels(mtd2[9:16]) 

We can see there are some positively and some negatively correlated variables.

#DATA PREPARATION

Removing the variables:

mtd_f <- mtd[,-1 ]
names(mtd_f)
##  [1] "TARGET_WINS"      "TEAM_BATTING_H"   "TEAM_BATTING_2B"  "TEAM_BATTING_3B" 
##  [5] "TEAM_BATTING_HR"  "TEAM_BATTING_BB"  "TEAM_BATTING_SO"  "TEAM_BASERUN_SB" 
##  [9] "TEAM_BASERUN_CS"  "TEAM_BATTING_HBP" "TEAM_PITCHING_H"  "TEAM_PITCHING_HR"
## [13] "TEAM_PITCHING_BB" "TEAM_PITCHING_SO" "TEAM_FIELDING_E"  "TEAM_FIELDING_DP"

The variable TEAM_BATTING_HBP is having mostly missing values so the variable will be removed completely.

mtd_f <- mtd_f[,-10 ]
names(mtd_f )
##  [1] "TARGET_WINS"      "TEAM_BATTING_H"   "TEAM_BATTING_2B"  "TEAM_BATTING_3B" 
##  [5] "TEAM_BATTING_HR"  "TEAM_BATTING_BB"  "TEAM_BATTING_SO"  "TEAM_BASERUN_SB" 
##  [9] "TEAM_BASERUN_CS"  "TEAM_PITCHING_H"  "TEAM_PITCHING_HR" "TEAM_PITCHING_BB"
## [13] "TEAM_PITCHING_SO" "TEAM_FIELDING_E"  "TEAM_FIELDING_DP"

TEAM_PITCHING_HR and TEAM_BATTING_HR are highly correlated, so we can remove one of them.

mtd_f <- mtd_f[,-11 ]
names(mtd_f)
##  [1] "TARGET_WINS"      "TEAM_BATTING_H"   "TEAM_BATTING_2B"  "TEAM_BATTING_3B" 
##  [5] "TEAM_BATTING_HR"  "TEAM_BATTING_BB"  "TEAM_BATTING_SO"  "TEAM_BASERUN_SB" 
##  [9] "TEAM_BASERUN_CS"  "TEAM_PITCHING_H"  "TEAM_PITCHING_BB" "TEAM_PITCHING_SO"
## [13] "TEAM_FIELDING_E"  "TEAM_FIELDING_DP"

Imputing the NAs using Mice(pmm - predictive mean matching)

imputed_mtd_Data <- mice(mtd_f, m=5, maxit = 5, method = 'pmm')
## 
##  iter imp variable
##   1   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   1   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   1   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   1   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   1   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_PITCHING_SO  TEAM_FIELDING_DP
imputed_mtd_Data <- complete(imputed_mtd_Data)
summary(imputed_mtd_Data)
##   TARGET_WINS     TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B 
##  Min.   :  0.00   Min.   : 891   Min.   : 69.0   Min.   :  0.00  
##  1st Qu.: 71.00   1st Qu.:1383   1st Qu.:208.0   1st Qu.: 34.00  
##  Median : 82.00   Median :1454   Median :238.0   Median : 47.00  
##  Mean   : 80.79   Mean   :1469   Mean   :241.2   Mean   : 55.25  
##  3rd Qu.: 92.00   3rd Qu.:1537   3rd Qu.:273.0   3rd Qu.: 72.00  
##  Max.   :146.00   Max.   :2554   Max.   :458.0   Max.   :223.00  
##  TEAM_BATTING_HR  TEAM_BATTING_BB TEAM_BATTING_SO  TEAM_BASERUN_SB
##  Min.   :  0.00   Min.   :  0.0   Min.   :   0.0   Min.   :  0.0  
##  1st Qu.: 42.00   1st Qu.:451.0   1st Qu.: 544.0   1st Qu.: 67.0  
##  Median :102.00   Median :512.0   Median : 735.0   Median :106.0  
##  Mean   : 99.61   Mean   :501.6   Mean   : 728.6   Mean   :135.1  
##  3rd Qu.:147.00   3rd Qu.:580.0   3rd Qu.: 925.0   3rd Qu.:170.0  
##  Max.   :264.00   Max.   :878.0   Max.   :1399.0   Max.   :697.0  
##  TEAM_BASERUN_CS  TEAM_PITCHING_H TEAM_PITCHING_BB TEAM_PITCHING_SO 
##  Min.   :  0.00   Min.   : 1137   Min.   :   0.0   Min.   :    0.0  
##  1st Qu.: 42.75   1st Qu.: 1419   1st Qu.: 476.0   1st Qu.:  613.8  
##  Median : 57.00   Median : 1518   Median : 536.5   Median :  805.0  
##  Mean   : 74.76   Mean   : 1779   Mean   : 553.0   Mean   :  812.5  
##  3rd Qu.: 89.00   3rd Qu.: 1682   3rd Qu.: 611.0   3rd Qu.:  958.0  
##  Max.   :201.00   Max.   :30132   Max.   :3645.0   Max.   :19278.0  
##  TEAM_FIELDING_E  TEAM_FIELDING_DP
##  Min.   :  65.0   Min.   : 52.0   
##  1st Qu.: 127.0   1st Qu.:125.0   
##  Median : 159.0   Median :146.0   
##  Mean   : 246.5   Mean   :142.4   
##  3rd Qu.: 249.2   3rd Qu.:162.0   
##  Max.   :1898.0   Max.   :228.0

Centering and scaling was used to transform individual predictors in the dataset using the caret library.

t = preProcess(imputed_mtd_Data, 
                   c("BoxCox", "center", "scale"))
mtd_final = data.frame(
      t = predict(t, imputed_mtd_Data))
summary(mtd_final)
##  t.TARGET_WINS      t.TEAM_BATTING_H    t.TEAM_BATTING_2B  t.TEAM_BATTING_3B
##  Min.   :-5.12888   Min.   :-7.537074   Min.   :-4.48108   Min.   :-1.9776  
##  1st Qu.:-0.62156   1st Qu.:-0.573089   1st Qu.:-0.68949   1st Qu.:-0.7606  
##  Median : 0.07676   Median :-0.003988   Median :-0.03019   Median :-0.2953  
##  Mean   : 0.00000   Mean   : 0.000000   Mean   : 0.00000   Mean   : 0.0000  
##  3rd Qu.: 0.71159   3rd Qu.: 0.586908   3rd Qu.: 0.69827   3rd Qu.: 0.5995  
##  Max.   : 4.13970   Max.   : 4.390097   Max.   : 4.05391   Max.   : 6.0042  
##  t.TEAM_BATTING_HR  t.TEAM_BATTING_BB  t.TEAM_BATTING_SO  t.TEAM_BASERUN_SB
##  Min.   :-1.64521   Min.   :-4.08866   Min.   :-2.96021   Min.   :-1.3692  
##  1st Qu.:-0.95153   1st Qu.:-0.41215   1st Qu.:-0.75001   1st Qu.:-0.6904  
##  Median : 0.03944   Median : 0.08511   Median : 0.02599   Median :-0.2952  
##  Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.0000  
##  3rd Qu.: 0.78267   3rd Qu.: 0.63944   3rd Qu.: 0.79794   3rd Qu.: 0.3532  
##  Max.   : 2.71505   Max.   : 3.06871   Max.   : 2.72373   Max.   : 5.6926  
##  t.TEAM_BASERUN_CS t.TEAM_PITCHING_H t.TEAM_PITCHING_BB t.TEAM_PITCHING_SO
##  Min.   :-1.5183   Min.   :-2.8556   Min.   :-3.32422   Min.   :-1.49945  
##  1st Qu.:-0.6501   1st Qu.:-0.6710   1st Qu.:-0.46291   1st Qu.:-0.36674  
##  Median :-0.3607   Median :-0.1765   Median :-0.09923   Median :-0.01378  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.00000  
##  3rd Qu.: 0.2891   3rd Qu.: 0.4602   3rd Qu.: 0.34860   3rd Qu.: 0.26859  
##  Max.   : 2.5636   Max.   : 3.2387   Max.   :18.58645   Max.   :34.07917  
##  t.TEAM_FIELDING_E t.TEAM_FIELDING_DP
##  Min.   :-3.3092   Min.   :-2.64252  
##  1st Qu.:-0.7163   1st Qu.:-0.64171  
##  Median :-0.1424   Median : 0.07556  
##  Mean   : 0.0000   Mean   : 0.00000  
##  3rd Qu.: 0.7096   3rd Qu.: 0.65825  
##  Max.   : 2.1432   Max.   : 3.36000
mtd_final1 = melt(mtd_final)
## Using  as id variables
ggplot(mtd_final1, aes(x= value)) + 
    geom_density(fill='red') + facet_wrap(~variable, scales = 'free') 

#BUILD MODELS:

Model1:

With all variables:

model1 <- lm(t.TARGET_WINS ~., mtd_final)
summary(model1)
## 
## Call:
## lm(formula = t.TARGET_WINS ~ ., data = mtd_final)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8120 -0.5138  0.0011  0.5169  3.6552 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.338e-11  1.702e-02   0.000    1.000    
## t.TEAM_BATTING_H    4.062e-01  3.647e-02  11.137  < 2e-16 ***
## t.TEAM_BATTING_2B  -4.092e-02  2.758e-02  -1.483    0.138    
## t.TEAM_BATTING_3B   1.852e-01  3.006e-02   6.160 8.56e-10 ***
## t.TEAM_BATTING_HR   2.225e-01  3.807e-02   5.844 5.83e-09 ***
## t.TEAM_BATTING_BB   1.641e-01  3.454e-02   4.752 2.14e-06 ***
## t.TEAM_BATTING_SO  -3.583e-01  4.069e-02  -8.807  < 2e-16 ***
## t.TEAM_BASERUN_SB   2.712e-01  3.210e-02   8.449  < 2e-16 ***
## t.TEAM_BASERUN_CS  -2.814e-02  3.371e-02  -0.835    0.404    
## t.TEAM_PITCHING_H  -1.612e-01  3.806e-02  -4.235 2.38e-05 ***
## t.TEAM_PITCHING_BB -3.539e-02  3.283e-02  -1.078    0.281    
## t.TEAM_PITCHING_SO  1.552e-01  2.919e-02   5.315 1.17e-07 ***
## t.TEAM_FIELDING_E  -4.791e-01  3.829e-02 -12.511  < 2e-16 ***
## t.TEAM_FIELDING_DP -2.147e-01  2.285e-02  -9.398  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.812 on 2262 degrees of freedom
## Multiple R-squared:  0.3445, Adjusted R-squared:  0.3407 
## F-statistic: 91.44 on 13 and 2262 DF,  p-value: < 2.2e-16

Model2:

With only the significant variables:

model2 <- lm(t.TARGET_WINS ~ t.TEAM_BATTING_H  + t.TEAM_BATTING_3B  + t.TEAM_BATTING_HR  + t.TEAM_BATTING_BB + t.TEAM_BATTING_SO + t.TEAM_BASERUN_SB + t.TEAM_PITCHING_SO + t.TEAM_PITCHING_H + t.TEAM_PITCHING_SO + t.TEAM_FIELDING_E + t.TEAM_FIELDING_DP, mtd_final)
summary(model2)
## 
## Call:
## lm(formula = t.TARGET_WINS ~ t.TEAM_BATTING_H + t.TEAM_BATTING_3B + 
##     t.TEAM_BATTING_HR + t.TEAM_BATTING_BB + t.TEAM_BATTING_SO + 
##     t.TEAM_BASERUN_SB + t.TEAM_PITCHING_SO + t.TEAM_PITCHING_H + 
##     t.TEAM_PITCHING_SO + t.TEAM_FIELDING_E + t.TEAM_FIELDING_DP, 
##     data = mtd_final)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7683 -0.5118  0.0035  0.5185  3.6256 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.470e-11  1.702e-02   0.000        1    
## t.TEAM_BATTING_H    3.838e-01  3.045e-02  12.604  < 2e-16 ***
## t.TEAM_BATTING_3B   1.818e-01  2.949e-02   6.165 8.31e-10 ***
## t.TEAM_BATTING_HR   2.309e-01  3.776e-02   6.116 1.13e-09 ***
## t.TEAM_BATTING_BB   1.327e-01  2.233e-02   5.941 3.27e-09 ***
## t.TEAM_BATTING_SO  -3.600e-01  3.904e-02  -9.222  < 2e-16 ***
## t.TEAM_BASERUN_SB   2.575e-01  2.586e-02   9.957  < 2e-16 ***
## t.TEAM_PITCHING_SO  1.300e-01  2.197e-02   5.919 3.72e-09 ***
## t.TEAM_PITCHING_H  -1.815e-01  3.504e-02  -5.179 2.43e-07 ***
## t.TEAM_FIELDING_E  -4.716e-01  3.750e-02 -12.576  < 2e-16 ***
## t.TEAM_FIELDING_DP -2.163e-01  2.250e-02  -9.616  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8121 on 2265 degrees of freedom
## Multiple R-squared:  0.3434, Adjusted R-squared:  0.3405 
## F-statistic: 118.5 on 10 and 2265 DF,  p-value: < 2.2e-16

Model3:

Further reducing the variables(TEAM_PITCHING_SO and TEAM_BATTING_SO are having high correlation, TEAM_BATTING_H and TEAM_PITCHING_H are also having high correlation, TEAM_BATTING_SO and TEAM_PITCHING_SO are also having high correlation):

model3 <- lm(t.TARGET_WINS ~ t.TEAM_BATTING_H  + t.TEAM_BATTING_3B  + t.TEAM_BATTING_HR  + t.TEAM_BATTING_BB + t.TEAM_BATTING_SO + t.TEAM_BASERUN_SB  + t.TEAM_FIELDING_E + t.TEAM_FIELDING_DP, mtd_final)
summary(model3)
## 
## Call:
## lm(formula = t.TARGET_WINS ~ t.TEAM_BATTING_H + t.TEAM_BATTING_3B + 
##     t.TEAM_BATTING_HR + t.TEAM_BATTING_BB + t.TEAM_BATTING_SO + 
##     t.TEAM_BASERUN_SB + t.TEAM_FIELDING_E + t.TEAM_FIELDING_DP, 
##     data = mtd_final)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7107 -0.5214 -0.0031  0.5269  4.2580 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.417e-12  1.718e-02   0.000        1    
## t.TEAM_BATTING_H    2.839e-01  2.445e-02  11.611  < 2e-16 ***
## t.TEAM_BATTING_3B   1.859e-01  2.965e-02   6.271 4.29e-10 ***
## t.TEAM_BATTING_HR   1.911e-01  3.747e-02   5.099 3.69e-07 ***
## t.TEAM_BATTING_BB   1.616e-01  2.102e-02   7.689 2.19e-14 ***
## t.TEAM_BATTING_SO  -2.408e-01  3.488e-02  -6.902 6.62e-12 ***
## t.TEAM_BASERUN_SB   2.304e-01  2.505e-02   9.198  < 2e-16 ***
## t.TEAM_FIELDING_E  -4.926e-01  3.641e-02 -13.528  < 2e-16 ***
## t.TEAM_FIELDING_DP -2.069e-01  2.234e-02  -9.262  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8195 on 2267 degrees of freedom
## Multiple R-squared:  0.3307, Adjusted R-squared:  0.3284 
## F-statistic:   140 on 8 and 2267 DF,  p-value: < 2.2e-16

#SELECT MODELS AND PREDICTION:

summary(model1)
## 
## Call:
## lm(formula = t.TARGET_WINS ~ ., data = mtd_final)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8120 -0.5138  0.0011  0.5169  3.6552 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.338e-11  1.702e-02   0.000    1.000    
## t.TEAM_BATTING_H    4.062e-01  3.647e-02  11.137  < 2e-16 ***
## t.TEAM_BATTING_2B  -4.092e-02  2.758e-02  -1.483    0.138    
## t.TEAM_BATTING_3B   1.852e-01  3.006e-02   6.160 8.56e-10 ***
## t.TEAM_BATTING_HR   2.225e-01  3.807e-02   5.844 5.83e-09 ***
## t.TEAM_BATTING_BB   1.641e-01  3.454e-02   4.752 2.14e-06 ***
## t.TEAM_BATTING_SO  -3.583e-01  4.069e-02  -8.807  < 2e-16 ***
## t.TEAM_BASERUN_SB   2.712e-01  3.210e-02   8.449  < 2e-16 ***
## t.TEAM_BASERUN_CS  -2.814e-02  3.371e-02  -0.835    0.404    
## t.TEAM_PITCHING_H  -1.612e-01  3.806e-02  -4.235 2.38e-05 ***
## t.TEAM_PITCHING_BB -3.539e-02  3.283e-02  -1.078    0.281    
## t.TEAM_PITCHING_SO  1.552e-01  2.919e-02   5.315 1.17e-07 ***
## t.TEAM_FIELDING_E  -4.791e-01  3.829e-02 -12.511  < 2e-16 ***
## t.TEAM_FIELDING_DP -2.147e-01  2.285e-02  -9.398  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.812 on 2262 degrees of freedom
## Multiple R-squared:  0.3445, Adjusted R-squared:  0.3407 
## F-statistic: 91.44 on 13 and 2262 DF,  p-value: < 2.2e-16
summary(model2)
## 
## Call:
## lm(formula = t.TARGET_WINS ~ t.TEAM_BATTING_H + t.TEAM_BATTING_3B + 
##     t.TEAM_BATTING_HR + t.TEAM_BATTING_BB + t.TEAM_BATTING_SO + 
##     t.TEAM_BASERUN_SB + t.TEAM_PITCHING_SO + t.TEAM_PITCHING_H + 
##     t.TEAM_PITCHING_SO + t.TEAM_FIELDING_E + t.TEAM_FIELDING_DP, 
##     data = mtd_final)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7683 -0.5118  0.0035  0.5185  3.6256 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.470e-11  1.702e-02   0.000        1    
## t.TEAM_BATTING_H    3.838e-01  3.045e-02  12.604  < 2e-16 ***
## t.TEAM_BATTING_3B   1.818e-01  2.949e-02   6.165 8.31e-10 ***
## t.TEAM_BATTING_HR   2.309e-01  3.776e-02   6.116 1.13e-09 ***
## t.TEAM_BATTING_BB   1.327e-01  2.233e-02   5.941 3.27e-09 ***
## t.TEAM_BATTING_SO  -3.600e-01  3.904e-02  -9.222  < 2e-16 ***
## t.TEAM_BASERUN_SB   2.575e-01  2.586e-02   9.957  < 2e-16 ***
## t.TEAM_PITCHING_SO  1.300e-01  2.197e-02   5.919 3.72e-09 ***
## t.TEAM_PITCHING_H  -1.815e-01  3.504e-02  -5.179 2.43e-07 ***
## t.TEAM_FIELDING_E  -4.716e-01  3.750e-02 -12.576  < 2e-16 ***
## t.TEAM_FIELDING_DP -2.163e-01  2.250e-02  -9.616  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8121 on 2265 degrees of freedom
## Multiple R-squared:  0.3434, Adjusted R-squared:  0.3405 
## F-statistic: 118.5 on 10 and 2265 DF,  p-value: < 2.2e-16
summary(model3)
## 
## Call:
## lm(formula = t.TARGET_WINS ~ t.TEAM_BATTING_H + t.TEAM_BATTING_3B + 
##     t.TEAM_BATTING_HR + t.TEAM_BATTING_BB + t.TEAM_BATTING_SO + 
##     t.TEAM_BASERUN_SB + t.TEAM_FIELDING_E + t.TEAM_FIELDING_DP, 
##     data = mtd_final)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7107 -0.5214 -0.0031  0.5269  4.2580 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.417e-12  1.718e-02   0.000        1    
## t.TEAM_BATTING_H    2.839e-01  2.445e-02  11.611  < 2e-16 ***
## t.TEAM_BATTING_3B   1.859e-01  2.965e-02   6.271 4.29e-10 ***
## t.TEAM_BATTING_HR   1.911e-01  3.747e-02   5.099 3.69e-07 ***
## t.TEAM_BATTING_BB   1.616e-01  2.102e-02   7.689 2.19e-14 ***
## t.TEAM_BATTING_SO  -2.408e-01  3.488e-02  -6.902 6.62e-12 ***
## t.TEAM_BASERUN_SB   2.304e-01  2.505e-02   9.198  < 2e-16 ***
## t.TEAM_FIELDING_E  -4.926e-01  3.641e-02 -13.528  < 2e-16 ***
## t.TEAM_FIELDING_DP -2.069e-01  2.234e-02  -9.262  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8195 on 2267 degrees of freedom
## Multiple R-squared:  0.3307, Adjusted R-squared:  0.3284 
## F-statistic:   140 on 8 and 2267 DF,  p-value: < 2.2e-16

From the three models, I decided to use model3 for the predictions considering its more parsimonious model. There is no significant difference in R2, Adjusted R2 and RMSE even when i did the treatment for multi-collinearity.

#PREDICTION:

For the evaluation dataset also we will be doing all the preprocessing steps.

med <- read.csv("https://raw.githubusercontent.com/yathdeep/msds-data621/main/moneyball-evaluation-data.csv")

Removing the variables:

med_f <- med[,-1 ]
names(med_f)
##  [1] "TEAM_BATTING_H"   "TEAM_BATTING_2B"  "TEAM_BATTING_3B"  "TEAM_BATTING_HR" 
##  [5] "TEAM_BATTING_BB"  "TEAM_BATTING_SO"  "TEAM_BASERUN_SB"  "TEAM_BASERUN_CS" 
##  [9] "TEAM_BATTING_HBP" "TEAM_PITCHING_H"  "TEAM_PITCHING_HR" "TEAM_PITCHING_BB"
## [13] "TEAM_PITCHING_SO" "TEAM_FIELDING_E"  "TEAM_FIELDING_DP"
med_f <- med_f[,-10 ]
names(med_f )
##  [1] "TEAM_BATTING_H"   "TEAM_BATTING_2B"  "TEAM_BATTING_3B"  "TEAM_BATTING_HR" 
##  [5] "TEAM_BATTING_BB"  "TEAM_BATTING_SO"  "TEAM_BASERUN_SB"  "TEAM_BASERUN_CS" 
##  [9] "TEAM_BATTING_HBP" "TEAM_PITCHING_HR" "TEAM_PITCHING_BB" "TEAM_PITCHING_SO"
## [13] "TEAM_FIELDING_E"  "TEAM_FIELDING_DP"
med_f <- med_f[,-11 ]
names(med_f)
##  [1] "TEAM_BATTING_H"   "TEAM_BATTING_2B"  "TEAM_BATTING_3B"  "TEAM_BATTING_HR" 
##  [5] "TEAM_BATTING_BB"  "TEAM_BATTING_SO"  "TEAM_BASERUN_SB"  "TEAM_BASERUN_CS" 
##  [9] "TEAM_BATTING_HBP" "TEAM_PITCHING_HR" "TEAM_PITCHING_SO" "TEAM_FIELDING_E" 
## [13] "TEAM_FIELDING_DP"

Imputing the NAs using Mice(pmm - predictive mean matching)

imputed_med_Data <- mice(med_f, m=5, maxit = 5, method = 'pmm')
## 
##  iter imp variable
##   1   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   1   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   1   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   1   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   1   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   2   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   3   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   4   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   1  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   2  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   3  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   4  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
##   5   5  TEAM_BATTING_SO  TEAM_BASERUN_SB  TEAM_BASERUN_CS  TEAM_BATTING_HBP  TEAM_PITCHING_SO  TEAM_FIELDING_DP
## Warning: Number of logged events: 25
imputed_med_Data <- complete(imputed_med_Data)
summary(imputed_med_Data)
##  TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B  TEAM_BATTING_HR 
##  Min.   : 819   Min.   : 44.0   Min.   : 14.00   Min.   :  0.00  
##  1st Qu.:1387   1st Qu.:210.0   1st Qu.: 35.00   1st Qu.: 44.50  
##  Median :1455   Median :239.0   Median : 52.00   Median :101.00  
##  Mean   :1469   Mean   :241.3   Mean   : 55.91   Mean   : 95.63  
##  3rd Qu.:1548   3rd Qu.:278.5   3rd Qu.: 72.00   3rd Qu.:135.50  
##  Max.   :2170   Max.   :376.0   Max.   :155.00   Max.   :242.00  
##  TEAM_BATTING_BB TEAM_BATTING_SO  TEAM_BASERUN_SB TEAM_BASERUN_CS 
##  Min.   : 15.0   Min.   :   0.0   Min.   :  0.0   Min.   :  0.00  
##  1st Qu.:436.5   1st Qu.: 547.5   1st Qu.: 59.0   1st Qu.: 41.00  
##  Median :509.0   Median : 680.0   Median : 92.0   Median : 56.00  
##  Mean   :499.0   Mean   : 703.2   Mean   :125.7   Mean   : 65.38  
##  3rd Qu.:565.5   3rd Qu.: 904.5   3rd Qu.:155.5   3rd Qu.: 76.00  
##  Max.   :792.0   Max.   :1268.0   Max.   :580.0   Max.   :154.00  
##  TEAM_BATTING_HBP TEAM_PITCHING_HR TEAM_PITCHING_SO TEAM_FIELDING_E 
##  Min.   :42.00    Min.   :  0.0    Min.   :   0.0   Min.   :  73.0  
##  1st Qu.:49.00    1st Qu.: 52.0    1st Qu.: 616.0   1st Qu.: 131.0  
##  Median :62.00    Median :104.0    Median : 758.0   Median : 163.0  
##  Mean   :60.88    Mean   :102.1    Mean   : 803.4   Mean   : 249.7  
##  3rd Qu.:66.00    3rd Qu.:142.5    3rd Qu.: 945.5   3rd Qu.: 252.0  
##  Max.   :96.00    Max.   :336.0    Max.   :9963.0   Max.   :1568.0  
##  TEAM_FIELDING_DP
##  Min.   : 69.0   
##  1st Qu.:123.0   
##  Median :146.0   
##  Mean   :139.6   
##  3rd Qu.:160.5   
##  Max.   :204.0

Centering and scaling was used to transform individual predictors in the dataset using the caret library.

t = preProcess(imputed_med_Data, 
                   c("BoxCox", "center", "scale"))
med_final = data.frame(
      t = predict(t, imputed_med_Data))
summary(med_final)
##  t.TEAM_BATTING_H   t.TEAM_BATTING_2B  t.TEAM_BATTING_3B  t.TEAM_BATTING_HR 
##  Min.   :-5.07603   Min.   :-3.26217   Min.   :-2.64215   Min.   :-1.69766  
##  1st Qu.:-0.52836   1st Qu.:-0.67016   1st Qu.:-0.73771   1st Qu.:-0.90771  
##  Median :-0.06571   Median :-0.09057   Median : 0.08513   Median : 0.09527  
##  Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000  
##  3rd Qu.: 0.54648   3rd Qu.: 0.74495   3rd Qu.: 0.76149   3rd Qu.: 0.70771  
##  Max.   : 4.16429   Max.   : 3.00896   Max.   : 2.35514   Max.   : 2.59828  
##  t.TEAM_BATTING_BB   t.TEAM_BATTING_SO  t.TEAM_BASERUN_SB t.TEAM_BASERUN_CS
##  Min.   :-2.859388   Min.   :-2.96170   Min.   :-1.3250   Min.   :-1.8578  
##  1st Qu.:-0.619108   1st Qu.:-0.65577   1st Qu.:-0.7029   1st Qu.:-0.6927  
##  Median : 0.008141   Median :-0.09772   Median :-0.3550   Median :-0.2665  
##  Mean   : 0.000000   Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.536019   3rd Qu.: 0.84782   3rd Qu.: 0.3145   3rd Qu.: 0.3018  
##  Max.   : 2.968415   Max.   : 2.37879   Max.   : 4.7902   Max.   : 2.5182  
##  t.TEAM_BATTING_HBP t.TEAM_PITCHING_HR t.TEAM_PITCHING_SO t.TEAM_FIELDING_E
##  Min.   :-1.5450    Min.   :-1.77169   Min.   :-1.30653   Min.   :-3.1354  
##  1st Qu.:-0.8043    1st Qu.:-0.86977   1st Qu.:-0.30471   1st Qu.:-0.7317  
##  Median : 0.2418    Median : 0.03214   Median :-0.07377   Median :-0.1378  
##  Mean   : 0.0000    Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.0000  
##  3rd Qu.: 0.5036    3rd Qu.: 0.69991   3rd Qu.: 0.23117   3rd Qu.: 0.7208  
##  Max.   : 1.9426    Max.   : 4.05609   Max.   :14.89661   Max.   : 2.0409  
##  t.TEAM_FIELDING_DP
##  Min.   :-2.0185   
##  1st Qu.:-0.6173   
##  Median : 0.1403   
##  Mean   : 0.0000   
##  3rd Qu.: 0.6638   
##  Max.   : 2.4358
eval_data <- predict(model3, newdata = med_final, interval="prediction")
eval_data
##              fit          lwr        upr
## 1   -1.177140396 -2.787716925  0.4334361
## 2   -0.963353683 -2.572587925  0.6458806
## 3   -0.586983418 -2.195572375  1.0216055
## 4    0.309017739 -1.299971016  1.9180065
## 5   -1.259411226 -2.871228540  0.3524061
## 6   -1.117093899 -2.729238742  0.4950509
## 7    0.282700634 -1.329558889  1.8949602
## 8   -0.686586839 -2.295842179  0.9226685
## 9   -0.665843181 -2.275451035  0.9437647
## 10  -0.552761146 -2.161130934  1.0556086
## 11  -0.890441599 -2.500564128  0.7196809
## 12   0.001791503 -1.608829560  1.6124126
## 13   0.104691626 -1.507298163  1.7166814
## 14   0.021702030 -1.589101919  1.6325060
## 15   0.329250565 -1.282804258  1.9413054
## 16  -0.462461014 -2.072085673  1.1471636
## 17  -0.718774044 -2.328397415  0.8908493
## 18  -0.099792899 -1.708515611  1.5089298
## 19  -0.814953225 -2.424650005  0.7947436
## 20   0.381683848 -1.228213690  1.9915814
## 21   0.309425488 -1.300235069  1.9190860
## 22   0.188867743 -1.420686780  1.7984223
## 23   0.077345751 -1.532040187  1.6867317
## 24  -0.696221789 -2.305580052  0.9131365
## 25   0.171145958 -1.438806619  1.7810985
## 26   0.492086357 -1.118955041  2.1031278
## 27  -0.514213731 -2.135292267  1.1068648
## 28  -0.517045324 -2.125588023  1.0914974
## 29   0.274717914 -1.335718613  1.8851544
## 30  -0.548281111 -2.158751660  1.0621894
## 31   0.671105399 -0.939271800  2.2814826
## 32   0.348285599 -1.260462101  1.9570333
## 33   0.316629498 -1.293237077  1.9264961
## 34   0.178055604 -1.434105137  1.7902163
## 35   0.047858018 -1.561123732  1.6568398
## 36   0.103460331 -1.508246305  1.7151670
## 37  -0.247061457 -1.854931792  1.3608089
## 38   0.484354453 -1.127039327  2.0957482
## 39   0.060001226 -1.549254672  1.6692571
## 40   0.460657733 -1.149498491  2.0708140
## 41   0.155280654 -1.454950617  1.7655119
## 42   1.297882313 -0.314513187  2.9102778
## 43  -0.979385020 -2.609705671  0.6509356
## 44   1.646054181  0.024765327  3.2673430
## 45   0.710634766 -0.901974261  2.3232438
## 46   0.955297308 -0.655535442  2.5661301
## 47   1.045508871 -0.565093509  2.6561113
## 48  -0.444398976 -2.053152599  1.1643546
## 49  -0.808937177 -2.417919870  0.8000455
## 50  -0.133546557 -1.742015949  1.4749228
## 51  -0.322333339 -1.931113108  1.2864464
## 52   0.212220766 -1.397257209  1.8216987
## 53  -0.426115526 -2.035925300  1.1836942
## 54  -0.253441757 -1.863290955  1.3564074
## 55  -0.604624161 -2.213213067  1.0039647
## 56   0.090457054 -1.519033219  1.6999473
## 57   0.656311747 -0.953987098  2.2666106
## 58  -0.403350482 -2.012623542  1.2059226
## 59  -1.122246309 -2.732811211  0.4883186
## 60  -0.231997314 -1.840549069  1.3765544
## 61   0.418913237 -1.189960480  2.0277870
## 62   0.031357325 -1.582333393  1.6450480
## 63   0.405435435 -1.203298433  2.0141693
## 64   0.327576579 -1.284215041  1.9393682
## 65   0.415589893 -1.196422337  2.0276021
## 66   1.320973440 -0.293240888  2.9351878
## 67  -0.614880420 -2.224209419  0.9944486
## 68  -0.335744973 -1.945441765  1.2739518
## 69  -0.170782297 -1.780348817  1.4387842
## 70   0.399962655 -1.210836051  2.0107614
## 71   0.272586847 -1.338738303  1.8839120
## 72  -0.358323178 -1.971318412  1.2546721
## 73  -0.182076917 -1.793264555  1.4291107
## 74   0.538356980 -1.074334350  2.1510483
## 75  -0.252774515 -1.863513514  1.3579645
## 76  -0.229290804 -1.840035800  1.3814542
## 77   0.418574549 -1.190410699  2.0275598
## 78   0.073923190 -1.534940978  1.6827874
## 79  -0.817062575 -2.427026265  0.7929011
## 80  -0.428209813 -2.037129312  1.1807097
## 81   0.193417431 -1.415603144  1.8024380
## 82   0.326650060 -1.282298910  1.9355990
## 83   0.793602131 -0.816363779  2.4035680
## 84  -0.465613163 -2.076297598  1.1450713
## 85   0.223127619 -1.386448562  1.8327038
## 86  -0.179389140 -1.790297038  1.4315188
## 87   0.165804568 -1.444378938  1.7759881
## 88   0.314418334 -1.293623223  1.9224599
## 89   0.773212880 -0.837675425  2.3841012
## 90   0.758173840 -0.851216979  2.3675647
## 91   0.152002102 -1.457640270  1.7616445
## 92   1.149628077 -0.464768753  2.7640249
## 93  -0.499711615 -2.108532842  1.1091096
## 94  -0.142278516 -1.752598122  1.4680411
## 95  -0.034841334 -1.644534479  1.5748518
## 96  -0.151275035 -1.761333196  1.4587831
## 97   0.562853158 -1.049327170  2.1750335
## 98   1.085943426 -0.525837917  2.6977248
## 99   0.411579392 -1.198806999  2.0219658
## 100  0.370550647 -1.240431273  1.9815326
## 101 -0.092845963 -1.702231637  1.5165397
## 102 -0.489913039 -2.099000942  1.1191749
## 103  0.252846609 -1.355290011  1.8609832
## 104  0.246546680 -1.362853099  1.8559465
## 105 -0.470712420 -2.082426513  1.1410017
## 106 -0.965755837 -2.577279369  0.6457677
## 107 -1.753166833 -3.370600280 -0.1357334
## 108 -0.035792189 -1.645911629  1.5743273
## 109  0.751445148 -0.857943245  2.3608335
## 110 -1.214810906 -2.827387859  0.3977660
## 111  0.360137376 -1.248459472  1.9687342
## 112  0.436644888 -1.172563603  2.0458534
## 113  0.773614685 -0.835023374  2.3822527
## 114  0.726968512 -0.882252819  2.3361898
## 115  0.077263349 -1.531845339  1.6863720
## 116  0.058665324 -1.550168457  1.6674991
## 117  0.298023643 -1.312195989  1.9082433
## 118  0.117168945 -1.490973026  1.7253109
## 119 -0.410112167 -2.019467574  1.1992432
## 120  0.052403104 -1.558562827  1.6633690
## 121  0.916586147 -0.693887001  2.5270593
## 122 -0.891471378 -2.501113101  0.7181703
## 123 -0.809738792 -2.419370549  0.7998930
## 124 -1.115469594 -2.728085434  0.4971462
## 125 -0.819611796 -2.429489413  0.7902658
## 126  0.200686864 -1.408772386  1.8101461
## 127  0.366884489 -1.242903832  1.9766728
## 128 -0.357195337 -1.965895937  1.2515053
## 129  0.616651821 -0.992595090  2.2258987
## 130  0.442171535 -1.167510507  2.0518536
## 131  0.189508287 -1.419410786  1.7984274
## 132  0.122622941 -1.487278496  1.7325244
## 133 -0.670464445 -2.284402381  0.9434735
## 134 -0.051829470 -1.661700319  1.5580414
## 135  1.251555321 -0.363439307  2.8665499
## 136 -0.537871078 -2.148432996  1.0726908
## 137 -0.234652313 -1.843808956  1.3745043
## 138 -0.210162930 -1.818550982  1.3982251
## 139  1.118369196 -0.499243466  2.7359819
## 140 -0.056571120 -1.665454853  1.5523126
## 141 -1.218533408 -2.829497732  0.3924309
## 142 -0.602011440 -2.211512766  1.0074899
## 143  0.591482493 -1.018361358  2.2013263
## 144 -0.574574759 -2.184072338  1.0349228
## 145 -0.200774051 -1.810539558  1.4089915
## 146 -0.399346035 -2.007723791  1.2090317
## 147 -0.410197417 -2.019338170  1.1989433
## 148  0.033717914 -1.574938863  1.6423747
## 149 -0.104321800 -1.714360929  1.5057173
## 150  0.347238341 -1.261348405  1.9558251
## 151  0.141183061 -1.468574670  1.7509408
## 152  0.474720816 -1.137287737  2.0867294
## 153 -1.087715368 -2.709986382  0.5345556
## 154 -1.034980723 -2.644905111  0.5749437
## 155 -0.022944457 -1.632415080  1.5865262
## 156 -0.944985941 -2.555317075  0.6653452
## 157  0.843116966 -0.767338146  2.4535721
## 158 -0.473950080 -2.083689049  1.1357889
## 159  0.527659340 -1.081784159  2.1371028
## 160 -0.591476916 -2.200201209  1.0172474
## 161  1.176545789 -0.436669128  2.7897607
## 162  1.606952817 -0.006816591  3.2207222
## 163  0.941173121 -0.669113092  2.5514593
## 164  1.345714266 -0.268084353  2.9595129
## 165  1.056081246 -0.557588121  2.6697506
## 166  0.926255084 -0.685752036  2.5382622
## 167  0.166446150 -1.443539381  1.7764317
## 168  0.188943775 -1.421737875  1.7996254
## 169 -0.680445232 -2.290604989  0.9297145
## 170 -0.007138755 -1.616920496  1.6026430
## 171  0.646374585 -0.963398360  2.2561475
## 172  0.457583919 -1.151353667  2.0665215
## 173  0.101531190 -1.507167874  1.7102303
## 174  0.763812274 -0.845841506  2.3734661
## 175  0.005635363 -1.602910211  1.6141809
## 176 -0.165334709 -1.775004818  1.4443354
## 177  0.082323297 -1.528344565  1.6929912
## 178 -0.843763638 -2.453928950  0.7664017
## 179 -0.319640124 -1.927673727  1.2883935
## 180 -0.170837370 -1.779465478  1.4377907
## 181  0.466575623 -1.146664722  2.0798160
## 182  0.314312757 -1.296046600  1.9246721
## 183  0.461008375 -1.148646933  2.0706637
## 184  0.542140972 -1.067143273  2.1514252
## 185  0.941899765 -0.671227394  2.5550269
## 186  1.186247881 -0.431009363  2.8035051
## 187  0.607245923 -1.005363216  2.2198551
## 188 -0.541637014 -2.151839207  1.0685652
## 189 -1.066351102 -2.676862461  0.5441603
## 190  1.766531022  0.151120581  3.3819415
## 191 -0.726415398 -2.335335212  0.8825044
## 192 -0.018756713 -1.627175394  1.5896620
## 193 -0.552751766 -2.161448295  1.0559448
## 194 -0.442551402 -2.051443948  1.1663411
## 195 -0.395237506 -2.005455223  1.2149802
## 196 -1.076883271 -2.687389505  0.5336230
## 197 -0.429738568 -2.038193902  1.1787168
## 198  0.751315888 -0.860309544  2.3629413
## 199  0.056918275 -1.551949331  1.6657859
## 200  0.298105838 -1.310937951  1.9071496
## 201 -0.547638779 -2.158629651  1.0633521
## 202  0.158855539 -1.450640253  1.7683513
## 203 -0.068795233 -1.680271933  1.5426815
## 204  0.640613298 -0.968406945  2.2496335
## 205  0.080973447 -1.528209043  1.6901559
## 206  0.204124088 -1.404776241  1.8130244
## 207  0.120828050 -1.489004642  1.7306607
## 208  0.173157322 -1.436519027  1.7828337
## 209 -0.033113340 -1.642307951  1.5760813
## 210 -0.263859065 -1.873198644  1.3454805
## 211  1.377226052 -0.234240950  2.9886931
## 212  0.444869915 -1.164482055  2.0542219
## 213  0.030093799 -1.579463786  1.6396514
## 214 -1.181721499 -2.791421511  0.4279785
## 215 -0.769065054 -2.379487518  0.8413574
## 216  0.163278371 -1.445555659  1.7721124
## 217 -0.178726302 -1.790439904  1.4329873
## 218  0.688285343 -0.921359173  2.2979299
## 219 -0.168094132 -1.776655134  1.4404669
## 220  0.092013977 -1.516588405  1.7006164
## 221 -0.341643676 -1.950926930  1.2676396
## 222 -0.546695700 -2.157056114  1.0636647
## 223 -0.061640886 -1.670593958  1.5473122
## 224 -0.271478973 -1.883158035  1.3402001
## 225  0.476400400 -1.146127510  2.0989283
## 226 -0.188076019 -1.796572657  1.4204206
## 227 -0.132747149 -1.741483532  1.4759892
## 228 -0.170493490 -1.780523884  1.4395369
## 229  0.429262037 -1.179540032  2.0380641
## 230 -0.299101333 -1.909788411  1.3115857
## 231 -0.044086522 -1.654602784  1.5664297
## 232  0.591433703 -1.017821396  2.2006888
## 233  0.022628133 -1.587752166  1.6330084
## 234  0.268837432 -1.341432066  1.8791069
## 235 -0.214618185 -1.823079644  1.3938433
## 236 -0.358581829 -1.966819095  1.2496554
## 237 -0.302127785 -1.912958051  1.3087025
## 238  0.116807332 -1.493358378  1.7269730
## 239  0.616272080 -0.993995710  2.2265399
## 240 -0.671653290 -2.280558726  0.9372521
## 241  0.319586749 -1.289026358  1.9281999
## 242  0.744886801 -0.865396164  2.3551698
## 243  0.273774135 -1.335446588  1.8829949
## 244  0.198673386 -1.410936435  1.8082832
## 245 -1.496888621 -3.109978576  0.1162013
## 246  0.129267911 -1.480883051  1.7394189
## 247 -0.170027223 -1.778589057  1.4385346
## 248  0.206228386 -1.402847651  1.8153044
## 249 -0.342303670 -1.951267738  1.2666604
## 250  0.434352108 -1.177795470  2.0464997
## 251  0.201227008 -1.408309325  1.8107633
## 252 -1.222878849 -2.836247196  0.3904895
## 253  1.038441951 -0.573422571  2.6503065
## 254 -3.113028304 -4.742077638 -1.4839790
## 255 -0.795865995 -2.404927559  0.8131956
## 256 -0.359272597 -1.970575489  1.2520303
## 257  0.208802185 -1.400816972  1.8184213
## 258  0.081447262 -1.527468419  1.6903629
## 259 -0.298186692 -1.907859199  1.3114858
summary(eval_data)
##       fit                lwr               upr        
##  Min.   :-3.11303   Min.   :-4.7421   Min.   :-1.484  
##  1st Qu.:-0.41015   1st Qu.:-2.0194   1st Qu.: 1.199  
##  Median : 0.05692   Median :-1.5519   Median : 1.666  
##  Mean   : 0.00000   Mean   :-1.6105   Mean   : 1.611  
##  3rd Qu.: 0.39082   3rd Qu.:-1.2195   3rd Qu.: 2.001  
##  Max.   : 1.76653   Max.   : 0.1511   Max.   : 3.382