Introduction:

The purpose of this analysis was to identify if there is any relationship between amount of games an MLB hitter plays and their: on base percentage, runs batted in, and the specific league that the hitter plays for (either NL or AL). Below is the data set used which complied all MLB hitters statistics since 1871!. This analysis used Lahman datasets. His datasets are the best complied and follow all the most recent baseball events.

Packages Used

library(tidyverse)
library(Zelig)
library(pander)
library(texreg)
library(visreg)
library(lmtest)
library(visreg)
library(sjmisc)

Data Used

Baseball_Batting_Stats <- read_csv("/Users/andyr1017/Downloads/baseballdatabank-2017.1 4/core/Batting.csv")
head(Baseball_Batting_Stats)

Piping Code

Below is a compilation of the nesscary code needed to refine the dataset to only the vectors that would be used. I created the varible “On_Base_Percentage” using the varibles “Hits,” “Base_On_Balls,” and “Stirke_Out.” This was used to identify the percentage a hitter reached base safely via walk or hit. The varible “Games_Played” serves as a continuous dependent variable to be analyzed against the variables: On_Base_Percentage, Runs_Batted_In and League.
Baseball_Batting_Stats1 <- Baseball_Batting_Stats%>%
  rename(Y2016 = yearID,
         Team_2016 = teamID,
         Games_Played = G, 
         At_Bat = AB, 
         Hits = H,
         Home_Run = HR,
         Runs_Batted_In = RBI, 
         Base_On_Balls = BB, 
         Intentional_Base_On_Balls = IBB, 
         Stirke_Out = SO,
         League = lgID)%>%
  select(Y2016,
         Team_2016, 
         Games_Played, 
         At_Bat, 
         Hits, 
         Home_Run, 
         Runs_Batted_In, 
         Base_On_Balls, 
         Intentional_Base_On_Balls, 
         Stirke_Out,
         League)%>%
  mutate(On_Base_Percentage = ((Hits + Base_On_Balls) / (Hits + Base_On_Balls + Stirke_Out)),
         Home_Run_Stirke_Out = (Stirke_Out/Home_Run), 
         Home_Run_Base_On_Balls = (Base_On_Balls/Home_Run),
         Home_Run_Hits = (Hits/Home_Run))%>%
  filter(On_Base_Percentage>=0, Y2016 == 2016)
  
head(Baseball_Batting_Stats1)

Linear Regretion Models


Model1

In this Model, I used Games_Played as the dependent varible and tested it by On_Base_Percentage to figure out its significance. Based on these results, On_Base_Percentage proved to be highly important when taking into account the number of games an athlete played
Model1 <- lm(Games_Played ~ On_Base_Percentage, Baseball_Batting_Stats1)
summary(Model1)

Call:
lm(formula = Games_Played ~ On_Base_Percentage, data = Baseball_Batting_Stats1)

Residuals:
     Min       1Q   Median       3Q      Max 
-107.689  -32.665   -8.657   31.966   98.238 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)          16.657      2.912   5.721 1.41e-08 ***
On_Base_Percentage   92.032      5.619  16.378  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 44.1 on 989 degrees of freedom
Multiple R-squared:  0.2134,    Adjusted R-squared:  0.2126 
F-statistic: 268.2 on 1 and 989 DF,  p-value: < 2.2e-16

Model2

In this Model, I again used Games_Played as the dependent varible and tested it by On_Base_Percentage, but this time I included Runs_batted_in to figure out if these varibles heightened significance.Based on these results, On_Base_Percentage continued to be significant when implementing Runs_batted_in with a P vaule of 3.12e-05 respectfully.
Model2 <- lm(Games_Played ~ On_Base_Percentage + Runs_Batted_In, Baseball_Batting_Stats1)
summary(Model2)

Call:
lm(formula = Games_Played ~ On_Base_Percentage + Runs_Batted_In, 
    data = Baseball_Batting_Stats1)

Residuals:
    Min      1Q  Median      3Q     Max 
-71.452 -15.012  -4.133  10.651  84.765 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)        21.01246    1.41320  14.869  < 2e-16 ***
On_Base_Percentage 12.80363    3.06002   4.184 3.12e-05 ***
Runs_Batted_In      1.51120    0.02662  56.770  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.37 on 988 degrees of freedom
Multiple R-squared:  0.8154,    Adjusted R-squared:  0.8151 
F-statistic:  2182 on 2 and 988 DF,  p-value: < 2.2e-16

Model3

In this Model, I again used Games_Played as the dependent variable and tested it by On_Base_Percentage, Runs_batted_in while introducing League. I added League to the equation to try to Identify if there is a significance based on if hitters played on the AL or NL. I incorporated both Runs_Batted_In and League to see if this would signify a Leagues significance. The results show that Runs_batted_in combined with League contributes little impact to a players “playrate.”
Model3 <- lm(Games_Played ~ On_Base_Percentage + Runs_Batted_In * League, Baseball_Batting_Stats1)
summary(Model3)

Call:
lm(formula = Games_Played ~ On_Base_Percentage + Runs_Batted_In * 
    League, data = Baseball_Batting_Stats1)

Residuals:
    Min      1Q  Median      3Q     Max 
-76.277 -15.099  -4.185  10.953  84.595 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)             21.39029    1.82515  11.720  < 2e-16 ***
On_Base_Percentage      12.55268    3.08042   4.075 4.97e-05 ***
Runs_Batted_In           1.47596    0.03495  42.228  < 2e-16 ***
LeagueNL                -0.57201    1.72331  -0.332    0.740    
Runs_Batted_In:LeagueNL  0.07719    0.04782   1.614    0.107    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.36 on 986 degrees of freedom
Multiple R-squared:  0.816, Adjusted R-squared:  0.8153 
F-statistic:  1093 on 4 and 986 DF,  p-value: < 2.2e-16

Tables and Graphics

library(texreg)
htmlreg(list(Model1, Model2, Model3))
Statistical models
Model 1 Model 2 Model 3
(Intercept) 16.66*** 21.01*** 21.39***
(2.91) (1.41) (1.83)
On_Base_Percentage 92.03*** 12.80*** 12.55***
(5.62) (3.06) (3.08)
Runs_Batted_In 1.51*** 1.48***
(0.03) (0.03)
LeagueNL -0.57
(1.72)
Runs_Batted_In:LeagueNL 0.08
(0.05)
R2 0.21 0.82 0.82
Adj. R2 0.21 0.82 0.82
Num. obs. 991 991 991
RMSE 44.10 21.37 21.36
p < 0.001, p < 0.01, p < 0.05
visreg(Model3, "Games_Played", by = "On_Base_Percentage", scale = "response")

ggplot(data = Baseball_Batting_Stats1) + 
  geom_point(mapping = aes(x = Runs_Batted_In , y = Games_Played))

Conclusion

Based on this analysis one is able to concluded that On_Base_Percentage and Runs_Batted_In, are large factors in how many games a hitter will play. However, a hitters league will not told any significance on weather or not an individual plays any games.

LS0tCnRpdGxlOiAnTUxCIENvbXBhcmlzb25zOiBHYW1lcyBQbGF5ZWQ6IFdoYXQgY2F1c2VzIGl0ICMyICcKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQKICBodG1sX2RvY3VtZW50OiBkZWZhdWx0Ci0tLQoKIyMjX0ludHJvZHVjdGlvbjpfCgojIyMjIyNUaGUgcHVycG9zZSBvZiB0aGlzIGFuYWx5c2lzIHdhcyB0byBpZGVudGlmeSBpZiB0aGVyZSBpcyBhbnkgcmVsYXRpb25zaGlwIGJldHdlZW4gYW1vdW50IG9mIGdhbWVzIGFuIE1MQiBoaXR0ZXIgcGxheXMgYW5kIHRoZWlyOiBvbiBiYXNlIHBlcmNlbnRhZ2UsIHJ1bnMgYmF0dGVkIGluLCBhbmQgdGhlIHNwZWNpZmljIGxlYWd1ZSB0aGF0IHRoZSBoaXR0ZXIgcGxheXMgZm9yIChlaXRoZXIgTkwgb3IgQUwpLiBCZWxvdyBpcyB0aGUgZGF0YSBzZXQgdXNlZCB3aGljaCBjb21wbGllZCBhbGwgTUxCIGhpdHRlcnMgc3RhdGlzdGljcyBzaW5jZSAxODcxIS4gVGhpcyBhbmFseXNpcyB1c2VkIExhaG1hbiBkYXRhc2V0cy4gSGlzIGRhdGFzZXRzIGFyZSB0aGUgYmVzdCBjb21wbGllZCBhbmQgZm9sbG93IGFsbCB0aGUgbW9zdCByZWNlbnQgYmFzZWJhbGwgZXZlbnRzLgoKLS0tLQoKIyMjUGFja2FnZXMgVXNlZApgYGB7ciwgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkoWmVsaWcpCmxpYnJhcnkocGFuZGVyKQpsaWJyYXJ5KHRleHJlZykKbGlicmFyeSh2aXNyZWcpCmxpYnJhcnkobG10ZXN0KQpsaWJyYXJ5KHZpc3JlZykKbGlicmFyeShzam1pc2MpCmBgYAoKLS0tLS0tCgojIyNEYXRhIFVzZWQKCmBgYHtyLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQpCYXNlYmFsbF9CYXR0aW5nX1N0YXRzIDwtIHJlYWRfY3N2KCIvVXNlcnMvYW5keXIxMDE3L0Rvd25sb2Fkcy9iYXNlYmFsbGRhdGFiYW5rLTIwMTcuMSA0L2NvcmUvQmF0dGluZy5jc3YiKQpoZWFkKEJhc2ViYWxsX0JhdHRpbmdfU3RhdHMpCmBgYAoKLS0tLQoKIyMjUGlwaW5nIENvZGUgCgojIyMjIyNCZWxvdyBpcyBhIGNvbXBpbGF0aW9uIG9mIHRoZSBuZXNzY2FyeSBjb2RlIG5lZWRlZCB0byByZWZpbmUgdGhlIGRhdGFzZXQgdG8gb25seSB0aGUgdmVjdG9ycyB0aGF0IHdvdWxkIGJlIHVzZWQuIEkgY3JlYXRlZCB0aGUgdmFyaWJsZSAiT25fQmFzZV9QZXJjZW50YWdlIiB1c2luZyB0aGUgdmFyaWJsZXMgIkhpdHMsIiAiQmFzZV9Pbl9CYWxscywiIGFuZCAiU3RpcmtlX091dC4iIFRoaXMgd2FzIHVzZWQgdG8gaWRlbnRpZnkgdGhlIHBlcmNlbnRhZ2UgYSBoaXR0ZXIgcmVhY2hlZCBiYXNlIHNhZmVseSB2aWEgd2FsayBvciBoaXQuIFRoZSB2YXJpYmxlICJHYW1lc19QbGF5ZWQiIHNlcnZlcyBhcyBhIGNvbnRpbnVvdXMgZGVwZW5kZW50IHZhcmlhYmxlIHRvIGJlIGFuYWx5emVkIGFnYWluc3QgdGhlIHZhcmlhYmxlczogT25fQmFzZV9QZXJjZW50YWdlLCBSdW5zX0JhdHRlZF9JbiBhbmQgTGVhZ3VlLgoKYGBge3J9CkJhc2ViYWxsX0JhdHRpbmdfU3RhdHMxIDwtIEJhc2ViYWxsX0JhdHRpbmdfU3RhdHMlPiUKICByZW5hbWUoWTIwMTYgPSB5ZWFySUQsCiAgICAgICAgIFRlYW1fMjAxNiA9IHRlYW1JRCwKICAgICAgICAgR2FtZXNfUGxheWVkID0gRywgCiAgICAgICAgIEF0X0JhdCA9IEFCLCAKICAgICAgICAgSGl0cyA9IEgsCiAgICAgICAgIEhvbWVfUnVuID0gSFIsCiAgICAgICAgIFJ1bnNfQmF0dGVkX0luID0gUkJJLCAKICAgICAgICAgQmFzZV9Pbl9CYWxscyA9IEJCLCAKICAgICAgICAgSW50ZW50aW9uYWxfQmFzZV9Pbl9CYWxscyA9IElCQiwgCiAgICAgICAgIFN0aXJrZV9PdXQgPSBTTywKICAgICAgICAgTGVhZ3VlID0gbGdJRCklPiUKICBzZWxlY3QoWTIwMTYsCiAgICAgICAgIFRlYW1fMjAxNiwgCiAgICAgICAgIEdhbWVzX1BsYXllZCwgCiAgICAgICAgIEF0X0JhdCwgCiAgICAgICAgIEhpdHMsIAogICAgICAgICBIb21lX1J1biwgCiAgICAgICAgIFJ1bnNfQmF0dGVkX0luLCAKICAgICAgICAgQmFzZV9Pbl9CYWxscywgCiAgICAgICAgIEludGVudGlvbmFsX0Jhc2VfT25fQmFsbHMsIAogICAgICAgICBTdGlya2VfT3V0LAogICAgICAgICBMZWFndWUpJT4lCiAgbXV0YXRlKE9uX0Jhc2VfUGVyY2VudGFnZSA9ICgoSGl0cyArIEJhc2VfT25fQmFsbHMpIC8gKEhpdHMgKyBCYXNlX09uX0JhbGxzICsgU3RpcmtlX091dCkpLAogICAgICAgICBIb21lX1J1bl9TdGlya2VfT3V0ID0gKFN0aXJrZV9PdXQvSG9tZV9SdW4pLCAKICAgICAgICAgSG9tZV9SdW5fQmFzZV9Pbl9CYWxscyA9IChCYXNlX09uX0JhbGxzL0hvbWVfUnVuKSwKICAgICAgICAgSG9tZV9SdW5fSGl0cyA9IChIaXRzL0hvbWVfUnVuKSklPiUKICBmaWx0ZXIoT25fQmFzZV9QZXJjZW50YWdlPj0wLCBZMjAxNiA9PSAyMDE2KQogIApoZWFkKEJhc2ViYWxsX0JhdHRpbmdfU3RhdHMxKQpgYGAKCi0tLS0tCgojIyNMaW5lYXIgUmVncmV0aW9uIE1vZGVscyAKCi0tLS0tCgoKIyMjTW9kZWwxCgojIyMjIyNJbiB0aGlzIE1vZGVsLCBJIHVzZWQgR2FtZXNfUGxheWVkIGFzIHRoZSBkZXBlbmRlbnQgdmFyaWJsZSBhbmQgdGVzdGVkIGl0IGJ5IE9uX0Jhc2VfUGVyY2VudGFnZSB0byBmaWd1cmUgb3V0IGl0cyBzaWduaWZpY2FuY2UuIEJhc2VkIG9uIHRoZXNlIHJlc3VsdHMsIE9uX0Jhc2VfUGVyY2VudGFnZSBwcm92ZWQgdG8gYmUgaGlnaGx5IGltcG9ydGFudCB3aGVuIHRha2luZyBpbnRvIGFjY291bnQgdGhlIG51bWJlciBvZiBnYW1lcyBhbiBhdGhsZXRlIHBsYXllZAoKYGBge3J9Ck1vZGVsMSA8LSBsbShHYW1lc19QbGF5ZWQgfiBPbl9CYXNlX1BlcmNlbnRhZ2UsIEJhc2ViYWxsX0JhdHRpbmdfU3RhdHMxKQpzdW1tYXJ5KE1vZGVsMSkKYGBgCgotLS0tCgojIyNNb2RlbDIKCiMjIyMjI0luIHRoaXMgTW9kZWwsIEkgYWdhaW4gdXNlZCBHYW1lc19QbGF5ZWQgYXMgdGhlIGRlcGVuZGVudCB2YXJpYmxlIGFuZCB0ZXN0ZWQgaXQgYnkgT25fQmFzZV9QZXJjZW50YWdlLCBidXQgdGhpcyB0aW1lIEkgaW5jbHVkZWQgUnVuc19iYXR0ZWRfaW4gdG8gZmlndXJlIG91dCBpZiB0aGVzZSB2YXJpYmxlcyBoZWlnaHRlbmVkIHNpZ25pZmljYW5jZS5CYXNlZCBvbiB0aGVzZSByZXN1bHRzLCBPbl9CYXNlX1BlcmNlbnRhZ2UgY29udGludWVkIHRvIGJlIHNpZ25pZmljYW50IHdoZW4gaW1wbGVtZW50aW5nIFJ1bnNfYmF0dGVkX2luIHdpdGggYSBQIHZhdWxlIG9mIDMuMTJlLTA1IHJlc3BlY3RmdWxseS4KCmBgYHtyLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQpNb2RlbDIgPC0gbG0oR2FtZXNfUGxheWVkIH4gT25fQmFzZV9QZXJjZW50YWdlICsgUnVuc19CYXR0ZWRfSW4sIEJhc2ViYWxsX0JhdHRpbmdfU3RhdHMxKQpzdW1tYXJ5KE1vZGVsMikKCmBgYAoKLS0tLS0KCiMjI01vZGVsMwoKIyMjIyMjSW4gdGhpcyBNb2RlbCwgSSBhZ2FpbiB1c2VkIEdhbWVzX1BsYXllZCBhcyB0aGUgZGVwZW5kZW50IHZhcmlhYmxlIGFuZCB0ZXN0ZWQgaXQgYnkgT25fQmFzZV9QZXJjZW50YWdlLCBSdW5zX2JhdHRlZF9pbiB3aGlsZSBpbnRyb2R1Y2luZyBMZWFndWUuIEkgYWRkZWQgTGVhZ3VlIHRvIHRoZSBlcXVhdGlvbiB0byB0cnkgdG8gSWRlbnRpZnkgaWYgdGhlcmUgaXMgYSBzaWduaWZpY2FuY2UgYmFzZWQgb24gaWYgaGl0dGVycyBwbGF5ZWQgb24gdGhlIEFMIG9yIE5MLiBJIGluY29ycG9yYXRlZCBib3RoIFJ1bnNfQmF0dGVkX0luIGFuZCBMZWFndWUgdG8gc2VlIGlmIHRoaXMgd291bGQgc2lnbmlmeSBhIExlYWd1ZXMgc2lnbmlmaWNhbmNlLiBUaGUgcmVzdWx0cyBzaG93IHRoYXQgUnVuc19iYXR0ZWRfaW4gY29tYmluZWQgd2l0aCBMZWFndWUgY29udHJpYnV0ZXMgbGl0dGxlIGltcGFjdCB0byBhIHBsYXllcnMgInBsYXlyYXRlLiIgCgpgYGB7cn0KTW9kZWwzIDwtIGxtKEdhbWVzX1BsYXllZCB+IE9uX0Jhc2VfUGVyY2VudGFnZSArIFJ1bnNfQmF0dGVkX0luICogTGVhZ3VlLCBCYXNlYmFsbF9CYXR0aW5nX1N0YXRzMSkKc3VtbWFyeShNb2RlbDMpCmBgYAoKLS0tLS0KCiMjIyNUYWJsZXMgYW5kIEdyYXBoaWNzIAoKYGBge3IsIHJlc3VsdHM9J2FzaXMnfQpsaWJyYXJ5KHRleHJlZykKaHRtbHJlZyhsaXN0KE1vZGVsMSwgTW9kZWwyLCBNb2RlbDMpKQpgYGAKCmBgYHtyfQp2aXNyZWcoTW9kZWwzLCAiR2FtZXNfUGxheWVkIiwgYnkgPSAiT25fQmFzZV9QZXJjZW50YWdlIiwgc2NhbGUgPSAicmVzcG9uc2UiKQpgYGAKCmBgYHtyfQpnZ3Bsb3QoZGF0YSA9IEJhc2ViYWxsX0JhdHRpbmdfU3RhdHMxKSArIAogIGdlb21fcG9pbnQobWFwcGluZyA9IGFlcyh4ID0gUnVuc19CYXR0ZWRfSW4gLCB5ID0gR2FtZXNfUGxheWVkKSkKYGBgCgojIyNfQ29uY2x1c2lvbl8KCkJhc2VkIG9uIHRoaXMgYW5hbHlzaXMgb25lIGlzIGFibGUgdG8gY29uY2x1ZGVkIHRoYXQgT25fQmFzZV9QZXJjZW50YWdlIGFuZCBSdW5zX0JhdHRlZF9JbiwgYXJlIGxhcmdlIGZhY3RvcnMgaW4gaG93IG1hbnkgZ2FtZXMgYSBoaXR0ZXIgd2lsbCBwbGF5LiBIb3dldmVyLCBhIGhpdHRlcnMgbGVhZ3VlIHdpbGwgbm90IHRvbGQgYW55IHNpZ25pZmljYW5jZSBvbiB3ZWF0aGVyIG9yIG5vdCBhbiBpbmRpdmlkdWFsIHBsYXlzIGFueSBnYW1lcy4gCgoK