Data Overview

Row

Baseball Data

          player homeruns years atbat hits runs  rbi walks league86 team86 league87 team87
1 Reggie Jackson      548    20  9528 2510 1509 1659  1342        A    Cal        A    Oak
2   Dave Kingman      442    16  6677 1575  901 1210   608        A    Oak        A    Oak
3  Graig Nettles      384    20  8716 2172 1172 1267  1057        N     SD        N    Atl
4     Tony Perez      379    23  9778 2732 1272 1652   925        N    Cin        N    Cin
5       Jim Rice      351    13  7127 2163 1104 1289   564        A    Bos        A    Bos
6  George Foster      348    18  7023 1925  986 1239   666        N     NY        N     NY

This data set originally had 322 observations and 25 variables. After pre-processing 12 variables remain. “homeruns” indicates the player’s number of career homeruns. “years” indicates the player’s total number of years in the major leagues. “atbat” indicates a count of the player’s career at-bat’s. “hits” indicates the number of career hits for each player. Similarly, “runs”, “rbi”, and “walks” indicate the counts of the player’s career runs scored, runs batted in, and walks (respectively). Lastly, “league86” and “team86” describe the league/conference or team for which the athlete played in during the 1986 season while “league87” and “team87” describe the league/conference or team for which the athlete played in during the 1987 season.

Row

Ratio of American League in 1986

Approximately 54% of all players in the MLB in 1986 were members of the American League.

# A tibble: 1 x 1
      n
  <int>
1   175

Ratio of National League in 1986

Approximately 46% of all players in the MLB in 1986 were members of the National League.

# A tibble: 1 x 1
      n
  <int>
1   147

Ratio of American League in 1987

Approximately 55% of all players in the MLB in 1987 were members of the American League.

# A tibble: 1 x 1
      n
  <int>
1   176

Ratio of National League in 1987

Approximately 45% of all players in the MLB in 1987 were members of the National League.

# A tibble: 1 x 1
      n
  <int>
1   146

Team Consideration

Column

1986

1987

Row

Switching Teams between ’86 and ’87 Seasons

275 of the 322 players remained at their teams for both the ’86 and ’87 seasons while 47 players did not. Therefore, approximately 15% of players switched teams that year. This is representative of typical MLB patterns for players changing teams since players can be added to or removed from the 25 man roster depending on injury, performance level, contracts, etc. The data presents that these two seasons do not need to be considered abnormal in regards to roster changing.

Relationship between Team and Runs

This plot compares each team in the MLB on the basis of the particular players career run totals. Each dot represents a player. During the 1986 season, the two players with the highest record for career runs played for the Cincinatti Reds and California Angels. Yet, during the 1987 season, the California Angels no longer had one of the highest career run scoring players. Instead, the Oakland Athletics rose to the top. The Cincinatti Reds remained at the top as well.

Data Table

Model

In order to better predict a player’s career homerun totals, I build a linear regression mode with atbat, rbi, runs, hits and walks as predictors. All indicators were found to be significant with an adjusted r squared value of 0.97.

Call:
lm(formula = homeruns ~ atbat + rbi + runs + hits + walks, data = edit_baseball, 
    family = binomial)

Residuals:
    Min      1Q  Median      3Q     Max 
-62.524  -5.802   0.966   6.802  45.229 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.738351   1.241151   0.595    0.552    
atbat        0.015221   0.003832   3.972 8.83e-05 ***
rbi          0.431491   0.007574  56.973  < 2e-16 ***
runs         0.284017   0.017403  16.320  < 2e-16 ***
hits        -0.282260   0.014799 -19.073  < 2e-16 ***
walks       -0.051369   0.009158  -5.609 4.44e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 13.9 on 316 degrees of freedom
Multiple R-squared:  0.9744,    Adjusted R-squared:  0.974 
F-statistic:  2409 on 5 and 316 DF,  p-value: < 2.2e-16
The following plots demonstrate the individual relationships between career homeruns and each predictor.