Introducing the dataset.

options(warn = -1)

suppressMessages(require(plotly))
library(knitr)
suppressMessages(library(RCurl))
suppressMessages(library(plyr))
suppressMessages(library(ggplot2))
suppressMessages(library(plotly))
suppressMessages(require(scatterplot3d));



training <- read.csv("https://raw.githubusercontent.com/mascotinme/MSDA-IS621/master/moneyball-training-data.csv", header = TRUE, sep = ",")

evaluation <- read.csv("https://raw.githubusercontent.com/mascotinme/MSDA-IS621/master/moneyball-training-data.csv", header = TRUE, sep = ",")

str(training)
## 'data.frame':    2276 obs. of  17 variables:
##  $ INDEX           : int  1 2 3 4 5 6 7 8 11 12 ...
##  $ TARGET_WINS     : int  39 70 86 70 82 75 80 85 86 76 ...
##  $ TEAM_BATTING_H  : int  1445 1339 1377 1387 1297 1279 1244 1273 1391 1271 ...
##  $ TEAM_BATTING_2B : int  194 219 232 209 186 200 179 171 197 213 ...
##  $ TEAM_BATTING_3B : int  39 22 35 38 27 36 54 37 40 18 ...
##  $ TEAM_BATTING_HR : int  13 190 137 96 102 92 122 115 114 96 ...
##  $ TEAM_BATTING_BB : int  143 685 602 451 472 443 525 456 447 441 ...
##  $ TEAM_BATTING_SO : int  842 1075 917 922 920 973 1062 1027 922 827 ...
##  $ TEAM_BASERUN_SB : int  NA 37 46 43 49 107 80 40 69 72 ...
##  $ TEAM_BASERUN_CS : int  NA 28 27 30 39 59 54 36 27 34 ...
##  $ TEAM_BATTING_HBP: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ TEAM_PITCHING_H : int  9364 1347 1377 1396 1297 1279 1244 1281 1391 1271 ...
##  $ TEAM_PITCHING_HR: int  84 191 137 97 102 92 122 116 114 96 ...
##  $ TEAM_PITCHING_BB: int  927 689 602 454 472 443 525 459 447 441 ...
##  $ TEAM_PITCHING_SO: int  5456 1082 917 928 920 973 1062 1033 922 827 ...
##  $ TEAM_FIELDING_E : int  1011 193 175 164 138 123 136 112 127 131 ...
##  $ TEAM_FIELDING_DP: int  NA 155 153 156 168 149 186 136 169 159 ...
dim(training)
## [1] 2276   17
kable(summary(training))
INDEX TARGET_WINS TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_H TEAM_PITCHING_HR TEAM_PITCHING_BB TEAM_PITCHING_SO TEAM_FIELDING_E TEAM_FIELDING_DP
Min. : 1.0 Min. : 0.00 Min. : 891 Min. : 69.0 Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. :29.00 Min. : 1137 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 65.0 Min. : 52.0
1st Qu.: 630.8 1st Qu.: 71.00 1st Qu.:1383 1st Qu.:208.0 1st Qu.: 34.00 1st Qu.: 42.00 1st Qu.:451.0 1st Qu.: 548.0 1st Qu.: 66.0 1st Qu.: 38.0 1st Qu.:50.50 1st Qu.: 1419 1st Qu.: 50.0 1st Qu.: 476.0 1st Qu.: 615.0 1st Qu.: 127.0 1st Qu.:131.0
Median :1270.5 Median : 82.00 Median :1454 Median :238.0 Median : 47.00 Median :102.00 Median :512.0 Median : 750.0 Median :101.0 Median : 49.0 Median :58.00 Median : 1518 Median :107.0 Median : 536.5 Median : 813.5 Median : 159.0 Median :149.0
Mean :1268.5 Mean : 80.79 Mean :1469 Mean :241.2 Mean : 55.25 Mean : 99.61 Mean :501.6 Mean : 735.6 Mean :124.8 Mean : 52.8 Mean :59.36 Mean : 1779 Mean :105.7 Mean : 553.0 Mean : 817.7 Mean : 246.5 Mean :146.4
3rd Qu.:1915.5 3rd Qu.: 92.00 3rd Qu.:1537 3rd Qu.:273.0 3rd Qu.: 72.00 3rd Qu.:147.00 3rd Qu.:580.0 3rd Qu.: 930.0 3rd Qu.:156.0 3rd Qu.: 62.0 3rd Qu.:67.00 3rd Qu.: 1682 3rd Qu.:150.0 3rd Qu.: 611.0 3rd Qu.: 968.0 3rd Qu.: 249.2 3rd Qu.:164.0
Max. :2535.0 Max. :146.00 Max. :2554 Max. :458.0 Max. :223.00 Max. :264.00 Max. :878.0 Max. :1399.0 Max. :697.0 Max. :201.0 Max. :95.00 Max. :30132 Max. :343.0 Max. :3645.0 Max. :19278.0 Max. :1898.0 Max. :228.0
NA NA NA NA NA NA NA NA’s :102 NA’s :131 NA’s :772 NA’s :2085 NA NA NA NA’s :102 NA NA’s :286

The Multiple Linear Regression Equation for the data analysis is:

\({ Y }\quad =\quad { B }_{ 0 }\quad +\quad { B }_{ 1 }{ x }_{ 1 }\quad +\quad { B }_{ 2 }{ x }_{ 2 }\quad +\quad\) ………+\(\quad { B }_{ n }{ x }_{ n }\quad\) +\(\quad { e}\\\)

Where,

\(\quad { Y }\quad\) = Reponse or Dependent Variable,

\(\quad{ x }_{ 1 }\) …..\({ x }_{ n }\quad\) = Explantory or Independent Variables

\(\quad { B }_{ 0 }\quad\) = Intercept,

\(\quad { B }_{ 1 }\quad , ...., \quad { B }_{ n }\quad\) = Slope of Independent variables or Model Parameter.

\(\quad { e}\\\) = Residual or Error term ( the difference between an actual and a predicted value of y)

Could be re-written in terms of the training dataset as:

\({ Y }\quad =\quad { B }_{ 0 }\quad +\quad { B }_{ target-wins }{ X }_{ target-wins}\quad +\quad { B }_{ team-batting-H }{ X }_{ team-batting-H }\quad +\quad\) ………+\(\quad { B }_{ team-fielding-DP }{ X }_{ team-fielding-DP }\quad\) +\(\quad { e}\\\)

A glimpse at the multiple linear regression Analysis:

fit1 <- lm(TARGET_WINS ~. -INDEX, data = training) # The Variable INDEX is intentional omitted as it has nothing to do with the analysis

summary(fit1)
## 
## Call:
## lm(formula = TARGET_WINS ~ . - INDEX, data = training)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.8708  -5.6564  -0.0599   5.2545  22.9274 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      60.28826   19.67842   3.064  0.00253 ** 
## TEAM_BATTING_H    1.91348    2.76139   0.693  0.48927    
## TEAM_BATTING_2B   0.02639    0.03029   0.871  0.38484    
## TEAM_BATTING_3B  -0.10118    0.07751  -1.305  0.19348    
## TEAM_BATTING_HR  -4.84371   10.50851  -0.461  0.64542    
## TEAM_BATTING_BB  -4.45969    3.63624  -1.226  0.22167    
## TEAM_BATTING_SO   0.34196    2.59876   0.132  0.89546    
## TEAM_BASERUN_SB   0.03304    0.02867   1.152  0.25071    
## TEAM_BASERUN_CS  -0.01104    0.07143  -0.155  0.87730    
## TEAM_BATTING_HBP  0.08247    0.04960   1.663  0.09815 .  
## TEAM_PITCHING_H  -1.89096    2.76095  -0.685  0.49432    
## TEAM_PITCHING_HR  4.93043   10.50664   0.469  0.63946    
## TEAM_PITCHING_BB  4.51089    3.63372   1.241  0.21612    
## TEAM_PITCHING_SO -0.37364    2.59705  -0.144  0.88577    
## TEAM_FIELDING_E  -0.17204    0.04140  -4.155 5.08e-05 ***
## TEAM_FIELDING_DP -0.10819    0.03654  -2.961  0.00349 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.467 on 175 degrees of freedom
##   (2085 observations deleted due to missingness)
## Multiple R-squared:  0.5501, Adjusted R-squared:  0.5116 
## F-statistic: 14.27 on 15 and 175 DF,  p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(fit1)

plot_ly(data = training, x = TEAM_BATTING_2B , y = TARGET_WINS, mode = "markers",
        color = "blue", line = list(shape = "linear"))
plot(TARGET_WINS~TEAM_BATTING_2B, training)

fitline <- lm(training$TARGET_WINS~training$TEAM_BATTING_2B)
abline(fitline)

cor(training$TARGET_WINS, training$TEAM_BATTING_2B)
## [1] 0.2891036

A 3D Scatterplot display for TARGET_WINS, TEAM_BATTING_2B and TEAM_BATTING_BB

attach(training);

#Run the this query to display it in 3D

scatterplot3d(TARGET_WINS, TEAM_BATTING_2B, TEAM_BATTING_BB ,pch = 20, highlight.3d = TRUE, type = "h", main = "3D ScatterPlots"); 

hist(training$TARGET_WINS, col="green")

hist(training$TEAM_FIELDING_DP, col="blue")

stepwise <- step(fit1, direction = "both") # Model Selection using both FORWARD AND BACKWARD selection.
## Start:  AIC=831.31
## TARGET_WINS ~ (INDEX + TEAM_BATTING_H + TEAM_BATTING_2B + TEAM_BATTING_3B + 
##     TEAM_BATTING_HR + TEAM_BATTING_BB + TEAM_BATTING_SO + TEAM_BASERUN_SB + 
##     TEAM_BASERUN_CS + TEAM_BATTING_HBP + TEAM_PITCHING_H + TEAM_PITCHING_HR + 
##     TEAM_PITCHING_BB + TEAM_PITCHING_SO + TEAM_FIELDING_E + TEAM_FIELDING_DP) - 
##     INDEX
## 
##                    Df Sum of Sq   RSS    AIC
## - TEAM_BATTING_SO   1      1.24 12547 829.33
## - TEAM_PITCHING_SO  1      1.48 12547 829.33
## - TEAM_BASERUN_CS   1      1.71 12548 829.34
## - TEAM_BATTING_HR   1     15.23 12561 829.54
## - TEAM_PITCHING_HR  1     15.79 12562 829.55
## - TEAM_PITCHING_H   1     33.63 12580 829.82
## - TEAM_BATTING_H    1     34.42 12580 829.83
## - TEAM_BATTING_2B   1     54.41 12600 830.14
## - TEAM_BASERUN_SB   1     95.22 12641 830.76
## - TEAM_BATTING_BB   1    107.84 12654 830.95
## - TEAM_PITCHING_BB  1    110.48 12656 830.99
## - TEAM_BATTING_3B   1    122.16 12668 831.16
## <none>                          12546 831.31
## - TEAM_BATTING_HBP  1    198.21 12744 832.31
## - TEAM_FIELDING_DP  1    628.49 13174 838.65
## - TEAM_FIELDING_E   1   1237.79 13784 847.28
## 
## Step:  AIC=829.33
## TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_2B + TEAM_BATTING_3B + 
##     TEAM_BATTING_HR + TEAM_BATTING_BB + TEAM_BASERUN_SB + TEAM_BASERUN_CS + 
##     TEAM_BATTING_HBP + TEAM_PITCHING_H + TEAM_PITCHING_HR + TEAM_PITCHING_BB + 
##     TEAM_PITCHING_SO + TEAM_FIELDING_E + TEAM_FIELDING_DP
## 
##                    Df Sum of Sq   RSS    AIC
## - TEAM_BASERUN_CS   1      1.59 12549 827.35
## - TEAM_BATTING_HR   1     15.82 12563 827.57
## - TEAM_PITCHING_HR  1     16.39 12564 827.58
## - TEAM_BATTING_2B   1     53.47 12601 828.14
## - TEAM_PITCHING_H   1     88.45 12636 828.67
## - TEAM_BATTING_H    1     90.30 12637 828.70
## - TEAM_BASERUN_SB   1     94.19 12641 828.76
## - TEAM_BATTING_BB   1    107.95 12655 828.97
## - TEAM_PITCHING_BB  1    110.60 12658 829.01
## - TEAM_BATTING_3B   1    122.20 12669 829.18
## <none>                          12547 829.33
## - TEAM_BATTING_HBP  1    197.11 12744 830.31
## + TEAM_BATTING_SO   1      1.24 12546 831.31
## - TEAM_FIELDING_DP  1    630.68 13178 836.70
## - TEAM_FIELDING_E   1   1240.80 13788 845.34
## - TEAM_PITCHING_SO  1   1312.89 13860 846.34
## 
## Step:  AIC=827.35
## TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_2B + TEAM_BATTING_3B + 
##     TEAM_BATTING_HR + TEAM_BATTING_BB + TEAM_BASERUN_SB + TEAM_BATTING_HBP + 
##     TEAM_PITCHING_H + TEAM_PITCHING_HR + TEAM_PITCHING_BB + TEAM_PITCHING_SO + 
##     TEAM_FIELDING_E + TEAM_FIELDING_DP
## 
##                    Df Sum of Sq   RSS    AIC
## - TEAM_BATTING_HR   1     16.06 12565 825.60
## - TEAM_PITCHING_HR  1     16.64 12565 825.61
## - TEAM_BATTING_2B   1     53.05 12602 826.16
## - TEAM_PITCHING_H   1     90.24 12639 826.72
## - TEAM_BATTING_H    1     92.13 12641 826.75
## - TEAM_BATTING_BB   1    110.31 12659 827.03
## - TEAM_PITCHING_BB  1    113.00 12662 827.07
## - TEAM_BASERUN_SB   1    123.42 12672 827.22
## - TEAM_BATTING_3B   1    129.33 12678 827.31
## <none>                          12549 827.35
## - TEAM_BATTING_HBP  1    197.23 12746 828.33
## + TEAM_BASERUN_CS   1      1.59 12547 829.33
## + TEAM_BATTING_SO   1      1.12 12548 829.34
## - TEAM_FIELDING_DP  1    635.62 13184 834.79
## - TEAM_PITCHING_SO  1   1311.88 13861 844.35
## - TEAM_FIELDING_E   1   1322.05 13871 844.49
## 
## Step:  AIC=825.6
## TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_2B + TEAM_BATTING_3B + 
##     TEAM_BATTING_BB + TEAM_BASERUN_SB + TEAM_BATTING_HBP + TEAM_PITCHING_H + 
##     TEAM_PITCHING_HR + TEAM_PITCHING_BB + TEAM_PITCHING_SO + 
##     TEAM_FIELDING_E + TEAM_FIELDING_DP
## 
##                    Df Sum of Sq   RSS    AIC
## - TEAM_BATTING_2B   1     55.48 12620 824.44
## - TEAM_PITCHING_H   1     89.26 12654 824.95
## - TEAM_BATTING_H    1     91.97 12657 824.99
## - TEAM_BATTING_BB   1    104.58 12669 825.18
## - TEAM_PITCHING_BB  1    107.19 12672 825.22
## <none>                          12565 825.60
## - TEAM_BATTING_3B   1    137.48 12702 825.68
## - TEAM_BASERUN_SB   1    146.90 12712 825.82
## - TEAM_BATTING_HBP  1    200.36 12765 826.62
## + TEAM_BATTING_HR   1     16.06 12549 827.35
## + TEAM_BASERUN_CS   1      1.83 12563 827.57
## + TEAM_BATTING_SO   1      1.67 12563 827.57
## - TEAM_FIELDING_DP  1    628.95 13194 832.93
## - TEAM_PITCHING_HR  1    853.54 13418 836.15
## - TEAM_PITCHING_SO  1   1316.68 13882 842.63
## - TEAM_FIELDING_E   1   1333.15 13898 842.86
## 
## Step:  AIC=824.44
## TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_3B + TEAM_BATTING_BB + 
##     TEAM_BASERUN_SB + TEAM_BATTING_HBP + TEAM_PITCHING_H + TEAM_PITCHING_HR + 
##     TEAM_PITCHING_BB + TEAM_PITCHING_SO + TEAM_FIELDING_E + TEAM_FIELDING_DP
## 
##                    Df Sum of Sq   RSS    AIC
## - TEAM_PITCHING_H   1     84.47 12705 823.71
## - TEAM_BATTING_H    1     87.79 12708 823.76
## - TEAM_BATTING_BB   1     98.92 12719 823.93
## - TEAM_PITCHING_BB  1    101.48 12722 823.97
## - TEAM_BASERUN_SB   1    109.27 12730 824.09
## <none>                          12620 824.44
## - TEAM_BATTING_3B   1    147.01 12767 824.65
## - TEAM_BATTING_HBP  1    204.39 12825 825.51
## + TEAM_BATTING_2B   1     55.48 12565 825.60
## + TEAM_BATTING_HR   1     18.48 12602 826.16
## + TEAM_BASERUN_CS   1      1.38 12619 826.42
## + TEAM_BATTING_SO   1      0.55 12620 826.43
## - TEAM_FIELDING_DP  1    649.12 13269 832.02
## - TEAM_PITCHING_HR  1    812.92 13433 834.36
## - TEAM_PITCHING_SO  1   1262.90 13883 840.66
## - TEAM_FIELDING_E   1   1379.34 14000 842.25
## 
## Step:  AIC=823.71
## TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_3B + TEAM_BATTING_BB + 
##     TEAM_BASERUN_SB + TEAM_BATTING_HBP + TEAM_PITCHING_HR + TEAM_PITCHING_BB + 
##     TEAM_PITCHING_SO + TEAM_FIELDING_E + TEAM_FIELDING_DP
## 
##                    Df Sum of Sq   RSS    AIC
## - TEAM_BATTING_BB   1     32.85 12738 822.21
## - TEAM_PITCHING_BB  1     43.42 12748 822.37
## - TEAM_BASERUN_SB   1    105.16 12810 823.29
## <none>                          12705 823.71
## - TEAM_BATTING_3B   1    153.13 12858 824.00
## + TEAM_PITCHING_H   1     84.47 12620 824.44
## - TEAM_BATTING_HBP  1    183.82 12888 824.46
## + TEAM_BATTING_SO   1     62.04 12643 824.78
## + TEAM_BATTING_2B   1     50.69 12654 824.95
## + TEAM_BATTING_HR   1     12.25 12692 825.53
## + TEAM_BASERUN_CS   1      3.11 12702 825.67
## - TEAM_BATTING_H    1    504.11 13209 829.15
## - TEAM_FIELDING_DP  1    602.80 13308 830.57
## - TEAM_PITCHING_HR  1    850.25 13555 834.09
## - TEAM_PITCHING_SO  1   1259.72 13964 839.77
## - TEAM_FIELDING_E   1   1419.39 14124 841.94
## 
## Step:  AIC=822.21
## TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_3B + TEAM_BASERUN_SB + 
##     TEAM_BATTING_HBP + TEAM_PITCHING_HR + TEAM_PITCHING_BB + 
##     TEAM_PITCHING_SO + TEAM_FIELDING_E + TEAM_FIELDING_DP
## 
##                    Df Sum of Sq   RSS    AIC
## - TEAM_BASERUN_SB   1    109.99 12848 821.85
## <none>                          12738 822.21
## - TEAM_BATTING_3B   1    156.45 12894 822.54
## - TEAM_BATTING_HBP  1    186.58 12924 822.98
## + TEAM_BATTING_2B   1     48.63 12689 823.48
## + TEAM_BATTING_BB   1     32.85 12705 823.71
## + TEAM_BATTING_HR   1     22.99 12715 823.86
## + TEAM_PITCHING_H   1     18.40 12719 823.93
## + TEAM_BATTING_SO   1     17.51 12720 823.94
## + TEAM_BASERUN_CS   1      3.86 12734 824.15
## - TEAM_BATTING_H    1    485.67 13223 827.35
## - TEAM_FIELDING_DP  1    623.19 13361 829.33
## - TEAM_PITCHING_HR  1    843.83 13581 832.46
## - TEAM_PITCHING_SO  1   1267.25 14005 838.32
## - TEAM_FIELDING_E   1   1395.02 14133 840.06
## - TEAM_PITCHING_BB  1   2364.81 15102 852.73
## 
## Step:  AIC=821.85
## TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_3B + TEAM_BATTING_HBP + 
##     TEAM_PITCHING_HR + TEAM_PITCHING_BB + TEAM_PITCHING_SO + 
##     TEAM_FIELDING_E + TEAM_FIELDING_DP
## 
##                    Df Sum of Sq   RSS    AIC
## - TEAM_BATTING_3B   1    133.47 12981 821.82
## <none>                          12848 821.85
## + TEAM_BASERUN_SB   1    109.99 12738 822.21
## - TEAM_BATTING_HBP  1    177.11 13025 822.46
## + TEAM_BATTING_BB   1     37.69 12810 823.29
## + TEAM_BATTING_HR   1     30.72 12817 823.39
## + TEAM_BASERUN_CS   1     23.16 12824 823.51
## + TEAM_PITCHING_H   1     22.34 12825 823.52
## + TEAM_BATTING_SO   1     21.53 12826 823.53
## + TEAM_BATTING_2B   1     14.11 12834 823.64
## - TEAM_BATTING_H    1    566.11 13414 828.09
## - TEAM_FIELDING_DP  1    737.46 13585 830.51
## - TEAM_PITCHING_HR  1    756.49 13604 830.78
## - TEAM_PITCHING_SO  1   1257.91 14106 837.69
## - TEAM_FIELDING_E   1   1330.40 14178 838.67
## - TEAM_PITCHING_BB  1   2371.12 15219 852.20
## 
## Step:  AIC=821.82
## TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_HBP + TEAM_PITCHING_HR + 
##     TEAM_PITCHING_BB + TEAM_PITCHING_SO + TEAM_FIELDING_E + TEAM_FIELDING_DP
## 
##                    Df Sum of Sq   RSS    AIC
## <none>                          12981 821.82
## + TEAM_BATTING_3B   1    133.47 12848 821.85
## + TEAM_BASERUN_SB   1     87.02 12894 822.54
## - TEAM_BATTING_HBP  1    228.70 13210 823.16
## + TEAM_BATTING_BB   1     40.42 12941 823.23
## + TEAM_BATTING_HR   1     33.83 12947 823.33
## + TEAM_PITCHING_H   1     23.95 12957 823.47
## + TEAM_BATTING_SO   1     23.13 12958 823.48
## + TEAM_BATTING_2B   1     21.28 12960 823.51
## + TEAM_BASERUN_CS   1      7.07 12974 823.72
## - TEAM_BATTING_H    1    449.87 13431 826.33
## - TEAM_FIELDING_DP  1    813.17 13794 831.43
## - TEAM_PITCHING_HR  1    990.20 13971 833.86
## - TEAM_PITCHING_SO  1   1316.56 14298 838.27
## - TEAM_FIELDING_E   1   1334.60 14316 838.52
## - TEAM_PITCHING_BB  1   2583.00 15564 854.49
summary(stepwise)
## 
## Call:
## lm(formula = TARGET_WINS ~ TEAM_BATTING_H + TEAM_BATTING_HBP + 
##     TEAM_PITCHING_HR + TEAM_PITCHING_BB + TEAM_PITCHING_SO + 
##     TEAM_FIELDING_E + TEAM_FIELDING_DP, data = training)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.2248  -5.6294  -0.0212   5.0439  21.3065 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      60.95454   19.10292   3.191 0.001670 ** 
## TEAM_BATTING_H    0.02541    0.01009   2.518 0.012648 *  
## TEAM_BATTING_HBP  0.08712    0.04852   1.796 0.074211 .  
## TEAM_PITCHING_HR  0.08945    0.02394   3.736 0.000249 ***
## TEAM_PITCHING_BB  0.05672    0.00940   6.034 8.66e-09 ***
## TEAM_PITCHING_SO -0.03136    0.00728  -4.308 2.68e-05 ***
## TEAM_FIELDING_E  -0.17218    0.03970  -4.338 2.38e-05 ***
## TEAM_FIELDING_DP -0.11904    0.03516  -3.386 0.000869 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.422 on 183 degrees of freedom
##   (2085 observations deleted due to missingness)
## Multiple R-squared:  0.5345, Adjusted R-squared:  0.5167 
## F-statistic: 30.02 on 7 and 183 DF,  p-value: < 2.2e-16

The above selection process depicts the best model for the analysis.

fit2 <- training[, c("TEAM_BATTING_H", "TEAM_PITCHING_HR" , "TEAM_PITCHING_BB", "TEAM_PITCHING_SO", "TEAM_FIELDING_E", "TEAM_FIELDING_DP", "TARGET_WINS", "TEAM_BATTING_HBP")]

par(mfrow=c(2,2))
plot(fit2)

fit3 <- lm(TARGET_WINS ~. -TEAM_BATTING_HBP, data = fit2)


summary(fit3)
## 
## Call:
## lm(formula = TARGET_WINS ~ . - TEAM_BATTING_HBP, data = fit2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.8415  -6.0133  -0.0886   5.3245  22.1650 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      63.466852  19.166412   3.311 0.001117 ** 
## TEAM_BATTING_H    0.025806   0.010150   2.543 0.011829 *  
## TEAM_PITCHING_HR  0.091740   0.024051   3.814 0.000186 ***
## TEAM_PITCHING_BB  0.056080   0.009450   5.935 1.44e-08 ***
## TEAM_PITCHING_SO -0.028885   0.007191  -4.017 8.59e-05 ***
## TEAM_FIELDING_E  -0.173892   0.039923  -4.356 2.20e-05 ***
## TEAM_FIELDING_DP -0.121696   0.035340  -3.444 0.000711 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.473 on 184 degrees of freedom
##   (2085 observations deleted due to missingness)
## Multiple R-squared:  0.5263, Adjusted R-squared:  0.5109 
## F-statistic: 34.07 on 6 and 184 DF,  p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(fit3)

\(\hat { \quad y } =\quad \hat { { \beta }_{ 0 } } \quad +\quad \hat { { \beta }_{ 1 }{ x }_{ 1 } } \quad +\quad \hat { { \beta }_{ 2 }{ x }_{ 2 } } +....+\quad \hat { { \beta }_{ n }{ x }_{ n } } + \quad \hat {\quad e}\)

where \(\hat { \quad y }\) is the predicted value of y, and \({ \beta }_{ 0 },\quad { \beta }_{ 1 },\quad { \beta }_{ 2 }\)

are the estimated co-effients.

INTERPRETATIONS:

The R Squared:

The Initial Adjusted Rsquare before model selection was 0.5126, while the Adjusted Rsquared after the was .5167. A variable called TEAM_BATTING_HBP was not contributing significantly and was removed, the final Adjusted Rsquare is 0.5109 which shows the model is significance and that the removal of TEAM_BATTING_BP doesnt have any meaniful effect on other variables.

The P-Value

The least square prediction is:

\(\hat { \quad y } =\quad 63.4669\quad +\quad 0.0258TEAM_{ B }ATTING_{ H }\quad +\quad 0.0917TEAM_{ P }ITCHING_{ H }R\quad +\quad 0.0561TEAM-{ P }ITCHING-{ B }B\quad -\quad 0.0289TEAM-{ P }ITCHING-{ S }O\quad -\quad 0.1739TEAM-{ F }IELDING-{ E }\quad -\quad 0.1217TEAM-FIELDING-DP\)

The Co-efficient interpretations: First-Order Quantative Variables

If we increase the TEAM_BATTING_H by one unit, keeping the other variables constant, the mean value of Y increases by 0.0258. Same is applicable for other variables.

Analysis of Variance (ANOVA) is adopted here to show the effect and interaction between the variables.

anova(fit3, test= "F")
## Analysis of Variance Table
## 
## Response: TARGET_WINS
##                   Df  Sum Sq Mean Sq F value    Pr(>F)    
## TEAM_BATTING_H     1  6158.8  6158.8  85.787 < 2.2e-16 ***
## TEAM_PITCHING_HR   1  1853.7  1853.7  25.820 9.161e-07 ***
## TEAM_PITCHING_BB   1  2573.7  2573.7  35.849 1.095e-08 ***
## TEAM_PITCHING_SO   1  1698.6  1698.6  23.660 2.459e-06 ***
## TEAM_FIELDING_E    1  1541.1  1541.1  21.466 6.800e-06 ***
## TEAM_FIELDING_DP   1   851.3   851.3  11.858 0.0007108 ***
## Residuals        184 13209.8    71.8                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The 95% and 5% confident interval of the variables to check if any of the variable is equal to zero.

confint(fit3)
##                         2.5 %       97.5 %
## (Intercept)      25.652660685 101.28104402
## TEAM_BATTING_H    0.005781302   0.04583131
## TEAM_PITCHING_HR  0.044287901   0.13919156
## TEAM_PITCHING_BB  0.037436027   0.07472318
## TEAM_PITCHING_SO -0.043071941  -0.01469730
## TEAM_FIELDING_E  -0.252657614  -0.09512704
## TEAM_FIELDING_DP -0.191420320  -0.05197243