History of Baseball Data Set

Original Source: https://www.kaggle.com/kaggle/the-history-of-baseball/downloads/baseball_2016-03-08-22-23-12.zip

My Source for the player.csv: https://github.com/kylegilde/cuny-r-programming/blob/master/player.csv

## Loading required package: RCurl
## Loading required package: bitops
##    player_id birth_year birth_month birth_day birth_country birth_state
## 3  aaronto01       1939           8         5           USA          AL
## 7  abadijo01       1854          11         4           USA          PA
## 8  abbated01       1877           4        15           USA          PA
## 9  abbeybe01       1869          11        11           USA          VT
## 10 abbeych01       1866          10        14           USA          NE
## 11 abbotda01       1862           3        16           USA          OH
##      birth_city death_year death_month death_day death_country death_state
## 3        Mobile       1984           8        16           USA          GA
## 7  Philadelphia       1905           5        17           USA          NJ
## 8       Latrobe       1957           1         6           USA          FL
## 9         Essex       1962           6        11           USA          VT
## 10   Falls City       1926           4        27           USA          CA
## 11      Portage       1930           2        13           USA          MI
##         death_city name_first   name_last       name_given weight height
## 3          Atlanta     Tommie       Aaron       Tommie Lee    190     75
## 7        Pemberton       John      Abadie          John W.    192     72
## 8  Fort Lauderdale         Ed Abbaticchio     Edward James    170     71
## 9       Colchester       Bert       Abbey        Bert Wood    175     71
## 10   San Francisco    Charlie       Abbey       Charles S.    169     68
## 11     Ottawa Lake        Dan      Abbott Leander Franklin    190     71
##    bats throws      debut final_game retro_id  bbref_id
## 3     R      R 1962-04-10 1971-09-26 aarot101 aaronto01
## 7     R      R 1875-04-26 1875-06-10 abadj101 abadijo01
## 8     R      R 1897-09-04 1910-09-15 abbae101 abbated01
## 9     R      R 1892-06-14 1896-09-23 abbeb101 abbeybe01
## 10    L      L 1893-08-16 1897-08-19 abbec101 abbeych01
## 11    R      R 1890-04-19 1890-05-23 abbod101 abbotda01
## 'data.frame':    8392 obs. of  24 variables:
##  $ player_id    : Factor w/ 18846 levels "aardsda01","aaronha01",..: 3 7 8 9 10 11 12 18 20 23 ...
##  $ birth_year   : num  1939 1854 1877 1869 1866 ...
##  $ birth_month  : num  8 11 4 11 10 3 10 9 7 1 ...
##  $ birth_day    : num  5 4 15 11 14 16 22 5 31 30 ...
##  $ birth_country: Factor w/ 53 levels "","Afghanistan",..: 50 50 50 50 50 50 50 50 50 50 ...
##  $ birth_state  : Factor w/ 246 levels "","AB","Aberdeen",..: 6 173 173 229 148 165 165 173 165 30 ...
##  $ birth_city   : Factor w/ 4714 levels "","Aberdeen",..: 2718 3279 2291 1337 1382 3380 4347 2896 838 4223 ...
##  $ death_year   : num  1984 1905 1957 1962 1926 ...
##  $ death_month  : num  8 5 1 6 4 2 6 4 5 2 ...
##  $ death_day    : num  16 17 6 11 27 13 11 13 20 19 ...
##  $ death_country: Factor w/ 24 levels "","American Samoa",..: 22 22 22 22 22 22 22 22 22 22 ...
##  $ death_state  : Factor w/ 93 levels "","AB","AK","AL",..: 26 57 25 88 12 44 12 21 63 12 ...
##  $ death_city   : Factor w/ 2554 levels "","Aberdeen",..: 91 1735 758 462 1984 1672 1259 2380 813 2548 ...
##  $ name_first   : Factor w/ 2313 levels "","A. J.","Aaron",..: 2087 1148 649 158 335 482 806 1575 26 168 ...
##  $ name_last    : Factor w/ 9713 levels "Aardsma","Aaron",..: 2 5 6 7 7 8 8 8 9 11 ...
##  $ name_given   : Factor w/ 12437 levels "","A. Harry",..: 11411 6605 3201 954 1714 7564 4885 8907 177 12058 ...
##  $ weight       : num  190 192 170 175 169 190 180 180 195 190 ...
##  $ height       : num  75 72 71 71 68 71 70 74 74 70 ...
##  $ bats         : Factor w/ 4 levels "","B","L","R": 4 4 4 4 3 4 4 4 3 4 ...
##  $ throws       : Factor w/ 3 levels "","L","R": 3 3 3 3 2 3 3 3 2 3 ...
##  $ debut        : Factor w/ 10037 levels "","1871-05-04",..: 5145 106 1132 875 934 722 1433 1947 4531 4647 ...
##  $ final_game   : Factor w/ 9029 levels "","1871-05-05",..: 5560 103 1851 1078 1110 684 1523 1868 4762 4497 ...
##  $ retro_id     : Factor w/ 18793 levels "","aardd001",..: 4 8 9 10 11 12 13 19 21 23 ...
##  $ bbref_id     : Factor w/ 18846 levels "","aardsda01",..: 4 8 9 10 11 12 13 19 21 24 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:10454] 1 2 4 5 6 13 14 15 16 17 ...
##   .. ..- attr(*, "names")= chr [1:10454] "1" "2" "4" "5" ...

Base R Plots

ggplot2

## Loading required package: ggplot2
## Loading required package: ggthemes
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Initial Insights

There appears to be a mild correlation between a player’s height and weight. This relationship persists whether the player bats right-handed, left-handed or both.