The data presented in this markup was scrapped from the site https://mykbostats.com/players/ and placed into a csv file beforehand. The csv file can be found in the following github repo titled as “KBO_Batters.csv”: https://github.com/Ryungje/DATA607/tree/main/Assignment%201
I had initially collected this data during a personal project in which I wanted to create a predictive model for KBO baseball player performance. Though the project did not get very far, the data I had collected is still very useful.
This is large data set, as seen from the dimensions.
batters <- read.csv("KBO_Batters.csv")
head(batters)
## Name Year Team BA OBP SLG OPS G PA AB R H X2B
## 1 Ko Young-min 2016 Doosan Bears 0.250 0.400 0.500 0.900 8 5 4 1 1 1
## 2 Ko Young-min 2015 Doosan Bears 0.328 0.403 0.478 0.881 41 77 67 13 22 1
## 3 Ko Young-min 2014 Doosan Bears 0.287 0.355 0.340 0.695 52 108 94 18 27 2
## 4 Ko Young-min 2013 Doosan Bears 0.286 0.412 0.571 0.983 10 17 14 3 4 1
## 5 Ko Young-min 2012 Doosan Bears 0.265 0.335 0.404 0.739 58 173 151 33 40 10
## 6 Ko Young-min 2011 Doosan Bears 0.210 0.305 0.301 0.606 93 208 176 31 37 5
## X3B HR RBI SB CS BB SO TB GDP HBP SH SF IBB RISP PHBA
## 1 0 0 1 0 0 1 1 2 0 0 0 0 0 0.50 0.333
## 2 0 3 11 4 2 6 22 32 4 3 0 1 0 0.24 0.444
## 3 0 1 7 1 1 11 18 32 3 0 1 2 0 0.20 0.294
## 4 0 1 1 1 0 3 5 8 1 0 0 1 0 0.25 0.333
## 5 1 3 26 7 1 13 28 61 5 3 NA NA NA NA NA
## 6 1 3 16 6 6 18 50 53 4 7 NA NA NA NA NA
For our current purposes, we only care about players yearly Team, On-Base Percentage (OBP), Slugging Average (SLG), and OPS.
batters <- batters[c('Name', 'Team', 'Year', 'OBP', 'SLG', 'OPS')]
head(batters)
## Name Team Year OBP SLG OPS
## 1 Ko Young-min Doosan Bears 2016 0.400 0.500 0.900
## 2 Ko Young-min Doosan Bears 2015 0.403 0.478 0.881
## 3 Ko Young-min Doosan Bears 2014 0.355 0.340 0.695
## 4 Ko Young-min Doosan Bears 2013 0.412 0.571 0.983
## 5 Ko Young-min Doosan Bears 2012 0.335 0.404 0.739
## 6 Ko Young-min Doosan Bears 2011 0.305 0.301 0.606
In the simplest way, I shall find the player (disregarding year and team) with the highest OPS.
batters[which.max(batters$OPS),]
## Name Team Year OBP SLG OPS
## 2984 Heo Joon Hyundai Unicorns 2006 1 4 5
There are many, and quite complicated, ways to measure a baseball player’s performance. The methods presented here are trivial and does not take into account nearly all the data that is available per player. But with that said, the best player in the KBO is Heo Joon.