1. Introduction

From my very early age, loving baseball and enjoying playing baseball to be a baseball player, i’m currently enjoying watching some korean players - Ryu, Hyun Jin (LAD), Oh, Seung Hwan(STL) who have played well in Major League Baseball (MLB) in USA.

At the same time, starting to learn Data Science to be a data scientist, I have been studying statistics and R programming via DataCamp(E-Learning) on Online.

My 1st project with R Programming is aiming at analyzing one korean player who plays an important role as closure at St. Lewis Cardinals (STL).

Collecting pitch data from http://www.brooksbaseball.net/pfxVB/pfx.php, saving it using Excel, analyzing it in a various views in detail through learning “Exploring Pitch Data with R” lectured by Brian Mills (ref. https://campus.datacamp.com/courses/exploring-pitch-data-with-r)

The very 1st project is honored to share, hoping this anlaysis to help my hero - Oh, SeungHwan’s pitches througout this whole year.

2. Dataset of SeungHwan, Oh (2017, April to May)

library(XML)
library(RCurl)

## Loading required package: bitops

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:RCurl':
## 
##     complete

library(xlsx)

## Loading required package: rJava

## 
## Attaching package: 'rJava'

## The following object is masked from 'package:RCurl':
## 
##     clone

## Loading required package: xlsxjars

library(ggplot2)
library(RColorBrewer)
library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

seunghwan_oh <- read.xlsx(file = "seunghwan_oh.xlsx", sheetIndex = 1)
seunghwan_oh$dateStamp <- as.Date(seunghwan_oh$dateStamp)
seunghwan_oh <- separate(data = seunghwan_oh, col = dateStamp, into = c("year", "month", "day"), sep = "-", remove = FALSE)
seunghwan_oh$month <- as.numeric(seunghwan_oh$month)
seunghwan_oh$april <- ifelse(seunghwan_oh$month == 4, "april", "other")
seunghwan_oh$mlbam_pitch_name <- as.character(seunghwan_oh$mlbam_pitch_name)

Explaining the structure of dataset called “seunghwan_oh”, this dataset consists of 50 variables to one picth so that researcher is able to analyze in various ways. This article will be constructed in four chapters - Exploring Pitch Velocity, Exploring pitch types, Exploring pitch locations, and Exploring batted outcomes. Have fun to read my 1st project.

Chapter-1. Exploring pitch velocities

Point 1. The different velocities between April and Other Months

Comparing pitch velocity of April with of other months, the average of velocity in April (89.48 mph) was a bit slower than other months (90.54 mph). The slower velocity results in which batters were easily to create a chance than other months. According to ’stat’s of MLB.com, the AVG that Oh allowed was .280 in April and .229 in May which is .06 higher than other months. For the pithcers, the velocity of pitch is very important, obviously.

seunghwan_oh_april <- subset(seunghwan_oh, seunghwan_oh$april == "april")
hist(seunghwan_oh_april$start_speed, xlab = "Velocity (mph)", main = "Oh all pitch Velocity for (April)") 
abline(v = mean(seunghwan_oh_april$start_speed), col = "#00009950", lwd = 2)

seunghwan_oh_other <- subset(seunghwan_oh, seunghwan_oh$april == "other")
hist(seunghwan_oh_other$start_speed, xlab = "Velocity (mph)", main = "Oh all pitch Velocity for (Other)")
abline(v = mean(seunghwan_oh_other$start_speed), col = "#00009950", lwd = 2)

Point 2. The different velocities of Four-seam Fastball between April and Other Months

Comparing pitch velocity of April with of other months, the average of Four-Seam velocity in April (93.00 mph) was a bit slower than other months (93.67 mph). The slower velocity results in which batters were easily to create a chance than other months. According to ’stat’s of MLB.com, the AVG that Oh allowed was .280 in April and .229 in May which is .06 higher than other months. For the pithcers, the velocity of pitch is very important, obviously.

# Create aprul_ff ("FF" = FourSeam Fastball)
april_ff <- subset(seunghwan_oh_april, seunghwan_oh_april$mlbam_pitch_name == "FF")
other_ff <- subset(seunghwan_oh_other, seunghwan_oh_other$mlbam_pitch_name == "FF")

# Make a fastball speed histogram for other months
hist(april_ff$start_speed, col = "#00009950", freq = FALSE, ylim = c(0, .6), xlab = "Velocity (mph)", main = "Oh 4-Seam Fastball Velocity")
hist(other_ff$start_speed, add = TRUE, col = "#99000050", freq = FALSE)
abline(v = mean(april_ff$start_speed), col = "#00009950", lwd = 2)
abline(v = mean(other_ff$start_speed), col = "#99000050", lwd = 2)
legend("topleft", c("april", "other"), lty=c(1, 1), lwd = c(2,2), col=c("#00009950","#99000050"))

Point 3. Daily records of Four-Seam Fastball

Overall, the velocity of Oh, Seunghwan is getting better, resulting in great performance for himself and for his team. His great performance and pitches has saved 15 times (of 26 games). Comparing last season, which saved 34 times (of 102 games), his currenct race and performance are amazing to be a final closure in his team. However, the main reason why his pitch performance is great comes not only from the velocity but also from the pitch types, and location. We will see the follwing contents in the next chapters.

# Summarize velocity in July and other months
# tapply(seunghwan_oh$start_speed, seunghwan_oh$april, mean)
seunghwan_oh_ff <- subset(seunghwan_oh, seunghwan_oh$mlbam_pitch_name == "FF")
oh_ff_velo_month <- tapply(seunghwan_oh_ff$start_speed, seunghwan_oh_ff$april, mean) # add mean
ff_data <- data.frame(tapply(seunghwan_oh_ff$start_speed, seunghwan_oh_ff$dateStamp, mean))  # add mean
ff_data$game_date <- as.Date(row.names(ff_data), "%Y-%m-%d")

# Rename the first column
colnames(ff_data)[1] <- "start_speed"
row.names(ff_data) <- NULL

plot(ff_data$start_speed ~ ff_data$game_date, lwd = 4, type = "l", ylim = c(88, 96), main = "Oh 4-Seam Fastball Velocity", xlab = "Date", ylab = "Velocity (mph)")
points(seunghwan_oh_ff$start_speed ~ jitter(as.numeric(seunghwan_oh_ff$dateStamp)), pch = 16, col = "#99004450")

Chapter-2. Exploring pitch types

Pitchers throw various types of pitches with different velocities and trajectories in order to make it more difficult for the batter to hit the ball. This chapter will explain what types of pitch Oh, seung-hwan had thrown in the last two months. To summarise Oh’s pitch types, he has been throwing four different pitch types - Changeup(CH, 52 counts), Curve(CU, 7 counts), Fourseam Fastball(FF, 282 counts), Slider(SL, 137 counts). Let’s see how he has set up strategy to beat his opponent’s batters.

On the account of his records, his records in April are not good than in other months in the view of winning and losing, and also of other areas such as HR, HIT. In the first chapter, I explained that After april, he has upgraded his average velocity about 1 mph which results in better performance and results. Then, now we will see how Oh has changed his pitch types from April to other months

table(seunghwan_oh$mlbam_pitch_name) # CH: 52, CU: 7, FF: 282, SL: 137

## 
##  CH  CU  FF  SL 
##  52   7 282 137

seunghwan_oh_pitch_type <- seunghwan_oh$mlbam_pitch_name
oh_type_tab <- table(seunghwan_oh$mlbam_pitch_name, seunghwan_oh$april)
oh_type_prop <- as.data.frame.matrix(round(prop.table(oh_type_tab, margin = 2), digit = 3))
oh_ff_prop <- oh_type_prop[3,] # extract Fourseam Fastball 
oh_type_prop$difference <- round(((oh_type_prop$other - oh_type_prop$april) / 2) * 10 , digit = 3)
row.names(oh_type_prop) <- NULL
oh_type_prop <- cbind(pitch = c("CH", "CU", "FF", "SL"), oh_type_prop)

Point 1. Pitch Usage in April vs. Other months

This graph explains the different proportions between four axes - CH, CU, FF, SL. When data is located in + area, the data means that Oh has tried throwing a pitch type more than April. Let’s see the data below.

oh_type_tab

##     
##      april other
##   CH    15    37
##   CU     6     1
##   FF   114   168
##   SL    76    61

# Plot a barplot
barplot(oh_type_prop$difference, names.arg = oh_type_prop$pitch, 
        main = "Pitch Usage in April vs. Other Months", 
        ylab = "Percentage Change in April", 
        ylim = c(-0.9, 0.9))

Now, I won’t explain the decrease of Changeup(CH) and Curve(CU) pitch types. First of all, both Changeup & Curve are not his main weapons. So, the rate of two pitches are not moderate to explain the change. See the data above. However, Fastball and Slider shows the very different results. Comparing April with Other months, the percentage of FF has been rapidply increasing while SL decreases. When related with his records, the pitch type of SL is not important to get good result, although SL is his main pitch type.

Point 2. Ball - Strike Count Frequency with Pitch Type

oh_bs_table <- table(seunghwan_oh$balls, seunghwan_oh$strikes)
oh_bs_prop_table <- round((prop.table(oh_bs_table)), digit = 3)
seunghwan_oh$bs_count <- paste(seunghwan_oh$balls, seunghwan_oh$strikes, sep = "-")
oh_bs_count_tab <- table(seunghwan_oh$bs_count, seunghwan_oh$april)
oh_bs_month <- round(prop.table(oh_bs_count_tab, margin = 2), digit = 3)
oh_diff_bs <- round((oh_bs_month[, 1] - oh_bs_month[, 2]) / oh_bs_month[, 2], digit = 3)
oh_type_bs <- table(seunghwan_oh$mlbam_pitch_name, seunghwan_oh$bs_count)
print(oh_type_bs)

##     
##      0-0 0-1 0-2 1-0 1-1 1-2 2-0 2-1 2-2 3-0 3-1 3-2
##   CH   1  12   4   5   8  13   1   3   5   0   0   0
##   CU   4   2   0   0   0   0   0   0   1   0   0   0
##   FF  84  34  21  28  29  24   7  10  21   2   6  16
##   SL  28  21  17   9  13  21   2   9   8   0   0   9

oh_type_bs_prop <- round(prop.table(oh_type_bs, margin = 2), digit = 3)
print(oh_type_bs_prop)

##     
##        0-0   0-1   0-2   1-0   1-1   1-2   2-0   2-1   2-2   3-0   3-1
##   CH 0.009 0.174 0.095 0.119 0.160 0.224 0.100 0.136 0.143 0.000 0.000
##   CU 0.034 0.029 0.000 0.000 0.000 0.000 0.000 0.000 0.029 0.000 0.000
##   FF 0.718 0.493 0.500 0.667 0.580 0.414 0.700 0.455 0.600 1.000 1.000
##   SL 0.239 0.304 0.405 0.214 0.260 0.362 0.200 0.409 0.229 0.000 0.000
##     
##        3-2
##   CH 0.000
##   CU 0.000
##   FF 0.640
##   SL 0.360

(1) 3-0, 3-1, 3-2 VS (2) 0-2, 1-2, 2-2

This point is outstanding momoment to analyze because, from this case, batter can predict what kinds of pitch types Oh Seunghwan will pitch in different ball-count counts. In the first case, Oh almost liked to pitch Fastball. No doubt, when 3-0, 3-1, Oh chose to Fastball 100%, but Oh also tries to pitch Slider(40%) as final weapon when 3-2.

Point 3. Pitch type by Inning

This grpah describes how he has been different in inning. To protect pitcher’s physical capacity, it is normal to limit pitcher’s inning, (usually 1 inning). But, it depends on the game situation and available other pitchers inside team. Sometimes, pitchers are allowed to pitch more than 1-inning. Thus, the quality of pitch type and result of pitch can be different over innings.

# Create type_late
seunghwan_oh$late_in_game <- ifelse(seunghwan_oh$inning == 9, 1, 0)
seunghwan_oh$late_in_game <- factor(seunghwan_oh$late_in_game)
oh_type_late <- table(seunghwan_oh$mlbam_pitch_name, seunghwan_oh$late_in_game)

# Create type_late_prop
oh_type_late_prop <- round(prop.table(oh_type_late, margin = 2), digit = 3)
oh_type_late_prop <- t(oh_type_late_prop)

# Change row names
rownames(oh_type_late_prop) <- c("Over-1 inning", "1 inning")

# Print
oh_type_late_prop

##                
##                    CH    CU    FF    SL
##   Over-1 inning 0.149 0.019 0.565 0.267
##   1 inning      0.088 0.013 0.603 0.297

“Over-1 inning”, “1 inning”

When looking at the graph below, two interesting points would be described. On the case of “Over-1 inning”, (1) Oh increased Fastball and Slider in his pitch types. Within 1 inning, (2) the rate of Changeup was a bit higher than other cases. In conclusion, Within 1 inning, Oh tried to differently pitch. But, Over-1 inning, Oh limited his pitch types, Fastball & Slider.

# Make barplot using t_type_late
barplot(oh_type_late_prop, beside = TRUE, col = c("red", "blue"), 
        main = "Early vs. Late In Game Pitch Selection", 
        ylab = "Pitch Selection Proportion") 
legend("topleft", rownames(oh_type_late_prop), xpd = TRUE, horiz = TRUE, 
       inset = c(0.3, 1.1), bty = "n", pch = c(22, 15), col = c("red", "blue"))

Chapter-3. Exploring pitch locations

As with velocity and pitch type, pitch location can play a key role in pitching success. Let’s see the graph below.

Point 1. April vs Others in view of Control

seunghwan_oh_lhb <- subset(seunghwan_oh, stand == "L") # lhb
seunghwan_oh_rhb <- subset(seunghwan_oh, stand == "R") # rhb

# Plot location of all pitches
plot(seunghwan_oh$pz ~ seunghwan_oh$px, col = factor(seunghwan_oh$april), xlim = c(-3, 3), 
     xlab = "Strike_Zone PX", ylab = "Strike_Zone PZ")
legend("bottomleft", legend = c("April", "Others"), col = c("red", "black"), pch = 1)

# Formatting code, don't change this
par(mfrow = c(1, 2))

# Plot the pitch loctions for July
plot(pz ~ px, data = seunghwan_oh_april,
     col = "red", pch = 16,
     xlim = c(-3, 3), ylim = c(-1, 6),
     main = "April")

# Plot the pitch locations for other months
plot(pz ~ px, data = seunghwan_oh_other,
     col = "black", pch = 16,
     xlim = c(-3, 3), ylim = c(-1, 6),
     main = "Other months")

Control, when it comes to ‘control’ in baseball, it is an ability that a pitcher are able to locate his pitch. Looking at the pitches in april, the locations of his numerable pitches are not more stable in April than in other months as the points had been more scattered. Perhaps, this could be more explainable that his pitch was not good in April for this reason.

Point 2. Pitch Location Zone Difference between April and Other

# Create oh_sub
seunghwan_oh_sub <- subset(seunghwan_oh, seunghwan_oh$px > -2 & seunghwan_oh$px < 2 & seunghwan_oh$pz > 0 & seunghwan_oh$pz < 5)

# Add zone_px & zone_pz, I made this by my own.
seunghwan_oh_sub <- mutate(seunghwan_oh_sub, zone_px = ifelse(zone_location %in% c(1,5,9,13,17), -1.5, 
                                                              ifelse(zone_location %in% c(2,6,10,14,18), -0.5, 
                                                                     ifelse(zone_location %in% c(3,7,11,15,19), 0.5, 
                                                                            ifelse(zone_location %in% c(4,8,12,16,20), 1.5, NA)))), 
                           zone_pz = ifelse(zone_location %in% 1:4, 4.5, 
                                            ifelse(zone_location %in% 5:8, 3.5, 
                                                   ifelse(zone_location %in% 9:12, 2.5, 
                                                          ifelse(zone_location %in% 13:16, 1.5, 
                                                                 ifelse(zone_location %in% 17:20, 0.5, NA)))))
)

# Create oh_table
seunghwan_oh_table <- table(seunghwan_oh_sub$zone_location)

# Create zone_prop
oh_zone_prop <- round(prop.table(seunghwan_oh_table), digit = 3)

# Function
plot_grid <- function() {
  # Plot pitch location window
  plot(x = c(-2, 2), y = c(0, 5), type = "n",
       xlab = "Horizontal Location (ft.; Catcher's View)",
       ylab = "Vertical Location (ft.)")
  
  # Add the grid lines
  grid(lty = "solid", col = "black")
}

# Total Proportions
plot(x = c(-2, 2), y = c(0, 5), type = "n",
     main = "Oh Locational Zone (Total(%))",
     xlab = "Horizontal Location (ft.; Catcher's View)",
     ylab = "Vertical Location (ft.)")
grid(lty = "solid", col = "black") # Plot text using for loop
for(i in 1:20) {
  text(mean(seunghwan_oh_sub$zone_px[seunghwan_oh_sub$zone_location == i]),
       mean(seunghwan_oh_sub$zone_pz[seunghwan_oh_sub$zone_location == i]),
       oh_zone_prop[i] * 100, cex = 1.5)
}

# Create zone_prop_april
oh_zone_prop_april <- round(
  table(seunghwan_oh_sub$zone_location[seunghwan_oh_sub$april == "april"]) /
    nrow(subset(seunghwan_oh_sub, april == "april")), 3)

# Create zone_prop_other
oh_zone_prop_other <- round(
  table(seunghwan_oh_sub$zone_location[seunghwan_oh_sub$april == "other"]) /
    nrow(subset(seunghwan_oh_sub, april == "other")), 3)

oh_zone_prop_april2 <- c(oh_zone_prop_april[1:20], 0.00, oh_zone_prop_april[22:25])
names(oh_zone_prop_april2) <- c(1:25)
# Create zone_prop_diff
oh_zone_prop_diff <- oh_zone_prop_april2 - oh_zone_prop_other 
oh_zone_prop_diff_df <- as.data.frame(oh_zone_prop_diff)
oh_zone_prop_diff_df <- oh_zone_prop_diff_df[, -1]
oh_zone_prop_diff_df <- oh_zone_prop_diff_df[-1]
oh_zone_prop_diff_df <- oh_zone_prop_diff_df[-c(21:24)]

## Graph 2
plot(x = c(-2, 2), y = c(0, 5), type = "n",
     main = "Oh Locational Zone(%) (April - Other)",
     xlab = "Horizontal Location (ft.; Catcher's View)",
     ylab = "Vertical Location (ft.)")
grid(lty = "solid", col = "black")

# Create for loop
for(i in 1:20) {
  text(mean(seunghwan_oh_sub$zone_px[seunghwan_oh_sub$zone_location == i]),
       mean(seunghwan_oh_sub$zone_pz[seunghwan_oh_sub$zone_location == i]),
       oh_zone_prop_diff_df[i] * 100, cex = 1.5)
}

(1) Locational Zone, Overall, his pitches were mainly located a bit low inside the strike zone. So, he has been good to get great saved records so far in this year. However, In April, his performance has not been good as we know. One finding that location and records are not correlated was revealed when analyzed the difference between APril and Other. To explain (2) graph, the ‘- number’ means that the portion of certain area in others was higher than in april, explaining that Oh had more pitched than in april. As you see, Oh has pitched quite evenly throughout strike zone. Rather, oh has somehow dangeroulsy pitched near the center inside strike-zone. However, his performance in April was not so good.

Point 3. Ball-Count Pitch Location Graph

seunghwan_oh_sub_df <- seunghwan_oh_sub %>% 
  filter(zone_location %in% c(1:20))

oh_zone_tab <- table(seunghwan_oh_sub_df$zone_location, seunghwan_oh_sub_df$bs_count)
oh_zone_count_prop <- round(prop.table(oh_zone_tab, margin = 2), digit = 3)
oh_zone_count_prop_df <- as.data.frame.matrix(oh_zone_count_prop) 

# Create zone_count_diff
oh_zone_count_diff <- oh_zone_count_prop_df$`0-2` - oh_zone_count_prop_df$`3-0`

plot(x = c(-2, 2), y = c(0, 5), type = "n",
     main = "Oh Locational Zone(%) (0-2 vs. 3-0 Counts) ",
     xlab = "Horizontal Location (ft.; Catcher's View)",
     ylab = "Vertical Location (ft.)")
grid(lty = "solid", col = "black")

# Add text to the figure for location differences
for(i in 1:20) {
  text(mean(seunghwan_oh_sub_df$zone_px[seunghwan_oh_sub_df$zone_location == i]),
       mean(seunghwan_oh_sub_df$zone_pz[seunghwan_oh_sub_df$zone_location == i]),
       oh_zone_count_diff[i] * 100, cex = 1.5)
}

This section is a kind of miscelaneous thing to talk Oh’s pitch locational zone by the two different extreme ball-strike count (0-2 VS 3-0). However, in this time (June 1), it’s not right time to analyze this graph becasue the whole poupulations of each ball-strike frequency were small (10 counts (0-2) vs 2 counts (3-0)). I hope updated this graph again at the end of season.

Chapter 4. Batted ball Outcomes

The main key point is to minimize damage on each pitch. By looking closely at outcomes from pitches thrown by Oh in different months, we would see why analyzing pitches thrown by pitchers is important to help each pitcher to pitch in better way.

Before explaining Contact Rate by location, one thing about the ‘contact’ shuold be understood in this chapter. The pitch result can be categorized into two types - Swing or Not Swing. When Swing, both swing strike and missed bunt can be defined. When defining swing, there could be four types - “ball”, “called strike”, “Ball in Dirt”, and “Hit by Pitch”. Thus, When it comes to ‘contact’, every pitch results except for 6 cases explained above.

# Create batter_swing
oh_no_swing <- c("Ball", "Called Strike", "Ball in Dirt", "Hit By Pitch")
seunghwan_oh_ff$batter_swing <- ifelse(seunghwan_oh_ff$pdes %in% oh_no_swing, 0, 1)

# Create swing_ff
oh_swing_ff <- subset(seunghwan_oh_ff, batter_swing == 1)

# Create the contact variable
no_contact <- c("Swinging Strike", "Missed Bunt")
oh_swing_ff$contact <- ifelse(oh_swing_ff$pdes %in% no_contact, 0, 1)

# Create velo_bin: add one line for "Fast"
oh_swing_ff$velo_bin <- ifelse(oh_swing_ff$start_speed < 90.5, "Slow", NA)
oh_swing_ff$velo_bin <- ifelse(oh_swing_ff$start_speed >= 90.5 & oh_swing_ff$start_speed < 92.5, 
                               "Medium", oh_swing_ff$velo_bin)
oh_swing_ff$velo_bin <- ifelse(oh_swing_ff$start_speed >= 92.5, 
                               "Fast", oh_swing_ff$velo_bin)

# Create the swings dataset, which includes only pitches at which a batter has swung
oh_no_swing <- c("Ball", "Called Strike", "Ball in Dirt", "Hit By Pitch")
seunghwan_oh$batter_swing <- ifelse(seunghwan_oh$pdes %in% oh_no_swing, 0, 1)
oh_swings <- subset(seunghwan_oh, seunghwan_oh$batter_swing == 1)

# Create a contact variable
oh_no_contact <- c("Swinging Strike", "Missed Bunt")
oh_swings$contact <- ifelse(oh_swings$pdes %in% oh_no_contact, 0, 1)

# Create a new function called bin_pitch_speed() for use in calculating velo_bin.
bin_pitch_speed <- function(x) {
  cut(x, breaks = quantile(x, probs = c(0,1/3,2/3,1)), labels = FALSE)
}

# Create the subsets for each pitch type
oh_swing_ff <- subset(oh_swings, mlbam_pitch_name == "FF")
oh_swing_ch <- subset(oh_swings, mlbam_pitch_name == "CH")
oh_swing_cu <- subset(oh_swings, mlbam_pitch_name == "CU")
oh_swing_sl <- subset(oh_swings, mlbam_pitch_name == "SL")

# Make velo_bin_pitch variable for each subset
oh_swing_ff$velo_bin <- bin_pitch_speed(oh_swing_ff$start_speed)
oh_swing_ch$velo_bin <- bin_pitch_speed(oh_swing_ch$start_speed)
oh_swing_cu$velo_bin <- bin_pitch_speed(oh_swing_cu$start_speed)
oh_swing_sl$velo_bin <- bin_pitch_speed(oh_swing_sl$start_speed)

oh_swings_str2 <- subset(oh_swings, oh_swings$strikes == 2)

#### zone_px & zone_pz
oh_swings <- mutate(oh_swings, zone_px = ifelse(zone_location %in% c(1,5,9,13,17), -1.5, 
                                                ifelse(zone_location %in% c(2,6,10,14,18), -0.5, 
                                                       ifelse(zone_location %in% c(3,7,11,15,19), 0.5, 
                                                              ifelse(zone_location %in% c(4,8,12,16,20), 1.5, NA)))), 
                    zone_pz = ifelse(zone_location %in% 1:4, 4.5, 
                                     ifelse(zone_location %in% 5:8, 3.5, 
                                            ifelse(zone_location %in% 9:12, 2.5, 
                                                   ifelse(zone_location %in% 13:16, 1.5, 
                                                          ifelse(zone_location %in% 17:20, 0.5, NA)))))
)

oh_swings_rhb <- subset(oh_swings, stand == "R")
oh_swings_rhb_rate <- oh_swings_rhb %>% select(contact, zone_location)

# Create subset of swings: swings_lhb
oh_swings_lhb <- subset(oh_swings, stand == "L")

# Create zone_contact_r
oh_zone_contact_r <- round(tapply(oh_swings_rhb$contact, oh_swings_rhb$zone_location, mean), digit = 3)
oh_zone_contact_r_rate <- as.data.frame(oh_zone_contact_r)
oh_zone_contact_r_rate <- mutate(oh_zone_contact_r_rate, zone = c(1:19, 22:24))

# Create zone_contact_l
oh_zone_contact_l <- round(tapply(oh_swings_lhb$contact, oh_swings_lhb$zone_location, mean), digit = 3)
oh_zone_contact_l_rate <- as.data.frame(oh_zone_contact_l)
oh_zone_contact_l_rate <- mutate(oh_zone_contact_l_rate, zone = c(0,2:3,5:24))

Point 1. Overall Contact Rate by Location (RHB VS LHB)

# Plot figure grid for RHB
par(mfrow = c(1, 2))
plot(x = c(-1, 1), y = c(1, 4), type = "n", 
     main = "Contact Rate by Location (RHB)", 
     xlab = "Horizontal Location (ft.; Catcher's View)", 
     ylab = "Vertical Location (ft.)")
abline(v = 0)
abline(h = 2)
abline(h = 3)

# Add text for RHB contact rate
for(i in unique(c(6, 7, 10, 11, 14, 15))) {
  text(mean(oh_swings_rhb$zone_px[oh_swings_rhb$zone_location == i]), 
       mean(oh_swings_rhb$zone_pz[oh_swings_rhb$zone_location == i]), 
       oh_zone_contact_r[rownames(oh_zone_contact_r) == i], cex = 1.5)
}

# Add LHB plot
plot(x = c(-1, 1), y = c(1, 4), type = "n", 
     main = "Contact Rate by Location (LHB)", 
     xlab = "Horizontal Location (ft.; Catcher's View)", 
     ylab = "Vertical Location (ft.)")
abline(v = 0)
abline(h = 2)
abline(h = 3)

# Add text for LHB contact rate
for(i in unique(c(6, 7, 10, 11, 14, 15))) {
  text(mean(oh_swings_lhb$zone_px[oh_swings_lhb$zone_location == i]), 
       mean(oh_swings_lhb$zone_pz[oh_swings_lhb$zone_location == i]), 
       oh_zone_contact_l[rownames(oh_zone_contact_l) == i], cex = 1.5)
}

Point 2. Detailed Contact Rate by Location (RHB VS LHB)

#### ggplot2 graph ####
# Create data
px <- rep(seq(-1.5, 1.5, by = 1), times = 5)
pz <- rep(seq(4.5, 0.5, by = -1), each = 4)
zone <- seq(1:20)
oh_locgrid <- data.frame(zone, px, pz)

# Merge locgrid with zone_contact_r
oh_locgrid <- merge(oh_locgrid, oh_zone_contact_r_rate, by = 'zone', all.x = TRUE)

# Merge locgrid with zone_contact_l
oh_locgrid <- merge(oh_locgrid, oh_zone_contact_l_rate, by = 'zone', all.x = TRUE)

# Make base grid with ggplot()
oh_plot_base_grid <- ggplot(oh_locgrid, aes(px, pz))

# Make RHB plot
oh_plot_titles_rhb <- oh_plot_base_grid + 
  ggtitle("RHB Contact Rates") + 
  labs(x = "Horizontal Location(ft.; Catcher's View)", 
       y = "Vertical Location (ft.)") + 
  theme(plot.title = element_text(size = 15))

# Make LHB plot
oh_plot_titles_lhb <- oh_plot_base_grid + 
  ggtitle("LHB Contact Rates") + 
  labs(x = "Horizontal Location(ft.; Catcher's View)", 
       y = "Vertical Location (ft.)") + 
  theme(plot.title = element_text(size = 15))

# Make RHB plot
oh_plot_colors_rhb <- oh_plot_titles_rhb + 
  geom_tile(aes(fill = oh_zone_contact_r)) + 
  scale_fill_gradientn(name = "Contact Rate", 
                       limits = c(0.5, 1), 
                       breaks = seq(from = 0.5, to = 1, by = 0.1), 
                       colors = c(brewer.pal(n = 7, name = "Reds")))

# Make LHB plot
oh_plot_colors_lhb <- oh_plot_titles_lhb + 
  geom_tile(aes(fill = oh_zone_contact_l)) + 
  scale_fill_gradientn(name = "Contact Rate", 
                       limits = c(0.5, 1), 
                       breaks = seq(from = 0.5, to = 1, by = 0.1), 
                       colors = c(brewer.pal(n = 7, name = "Reds")))

# Make RHB plot
oh_plot_contact_rhb <- oh_plot_colors_rhb + 
  annotate("text", x = oh_locgrid$px, y = oh_locgrid$pz, 
           label = oh_locgrid$oh_zone_contact_r, size = 3)

# Make LHB plot
oh_plot_contact_lhb <- oh_plot_colors_lhb + 
  annotate("text", x = oh_locgrid$px, y = oh_locgrid$pz, 
           label = oh_locgrid$oh_zone_contact_l, size = 3)

# Plot them side-by-side
grid.arrange(oh_plot_contact_rhb, ncol = 2, oh_plot_contact_lhb)

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 2 rows containing missing values (geom_text).

For RHB, many batters had swung in a miss when ball was located in high strike-zone. The interesting point for LHB, many had swung in a miss when ball was located at the low area at the right corner. Oh had been strategically strong for LHB when compared to RHB.

Chapter 5. Summary

In April, his performance was not good enough to be closer. There could be some reasons - (1) his slider was not effective (2) location was not stable. (3) His overall pitch velocity slower than other months.

Relatively, from may, his slider, pitch velocity, and location get better which results in better performance and better saves.

To run successfully his pitch throughout this year, three things - slider, velocity, and location

Evan’s 1st Project, Pitch Analysis - Oh, SeungHwan (2017)

Evan_Jung

2017년 6월 20일