R Markdown

Introduction

I have been interested in baseball statistics almost my whole life. A few years ago, I started developing a statistic that ranks players by their position against others in the league. I rank players not just position-by-position, but also by bench role and bullpen role. The system is called imSS, which stands for It Me Scoring System. (‘it me’ is a meme and a phrase seen in certain Twitter circles and other areas of the internet).

Starting Pitchers and Starting Position players are scaled and scored on a 1-100 basis, and relievers and bench players can be anywhere from 1-20 to 1-90, depending on their role’s importance. (Super-Utility players have a max score of 90, Closers and Super-Relievers have a max score of 90 and 85, while regular utility infielders have a max score of 50 and shuttle Relievers (guys who pitched fewer than 15 innings) have a max score of 40.)

imSS is meant to be a way to rank position players against pitchers, and see total team production, using some of the same metrics that go into the more commonly used Wins Above Replacement, but many of the other metrics that I use are stats I created myself, and players are scaled to the rest of the league, instead of a hypothetical AAA player. For this project, I wanted to use R Studio with some of my imSS data.

In this Data Analysis, I wanted to keep it somewhat simple. So I am only going to be looking Starting Pitchers. I started doing imSS three seasons ago, so I will be looking at the 91 Starting Pitchers that appeared in each of the first three years of imSS rankings, so Pitchers such as Patrick Corbin or Luis Severino weren’t in the dataset, because they had seasons where they are ranked in a reliever or swingman role. In this project I will take everything I learned in Data Analysis in R, try to calculate an aging curve, and try to predict future performance.

library(readxl)
SPimSS <- read_excel("SPScoreYearSheet.xlsx")

I imported the sheet, and there are four variables: Pitcher Name, Season, Age, and their score. The total imSS Score comes from the sum of their Workload, Run Prevention, Peripherals, and Control component scores. While there are a dozen statistics that carry some weight when I do these rankings, for this project, I am really only going to be looking at the relationship between player age and their score, as I am still very new to R Studio, and only wanted to look at a couple variables at a time.

Here are the Top 10 seasons by a Starting Pitcher, who were a Starter in all three seasons, in the first three years of imSS:

sort1.SPimSS <- SPimSS[order(-SPimSS$Score),]
sort1.SPimSS[1:4]
## # A tibble: 273 x 4
##    Name               Year   Age Score
##    <chr>             <dbl> <dbl> <dbl>
##  1 Rich Hill          2015    35    97
##  2 Clayton Kershaw    2016    28    92
##  3 Corey Kluber       2017    31    91
##  4 Jake Arrieta       2015    29    87
##  5 Clayton Kershaw    2015    27    86
##  6 Clayton Kershaw    2017    29    86
##  7 Max Scherzer       2017    32    85
##  8 Chris Sale         2017    28    84
##  9 Stephen Strasburg  2017    28    84
## 10 Zack Greinke       2015    31    83
## # ... with 263 more rows

Bi-Variate analysis

Next, I plotted a basic scatterplot of the player age with the score. When I was doing imSS in 2015, I was transitioning the project from when it was something I did with a baseball card game, which does reflect real MLB stats. However in 2015, there was no workload component, which explains the outlier- Rich Hill’s 2015 season, Age 35, Score 97, where he only made four starts. From 2018 on, I will be doing seperate rankings for non-Qualified Starters.

r = getOption("repos")
r["CRAN"] = "http://cran.us.r-project.org"
options(repos = r)
install.packages("weatherData")
## Installing package into 'C:/Users/Corey/Documents/R/win-library/3.5'
## (as 'lib' is unspecified)
## Warning: package 'weatherData' is not available (for R version 3.5.1)
install.packages('ggplot2')
## Installing package into 'C:/Users/Corey/Documents/R/win-library/3.5'
## (as 'lib' is unspecified)
## package 'ggplot2' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Corey\AppData\Local\Temp\RtmpoRUet4\downloaded_packages
library('ggplot2')
qplot(x = Age, y = Score, data = SPimSS)

Here’s the same Scatterplot but with the three seperate seasons colored.

ggplot(aes(x = Age, y = Score, colour = Year), data = SPimSS) +
  geom_point()

Here’s a line graph, however, since all of the individual best and worst pitchers are represented, there is no clear aging curve, yet.

ggplot(SPimSS, aes(x = Age, y = Score)) +
  geom_line()

Here is a scatterplot and line graph showing the age and Scores with the average score by age.

ggplot(aes(x = Age, y = Score), data = SPimSS) +
  coord_cartesian(xlim = c(18, 50), ylim = c(0, 100)) +
  geom_point() +
  geom_line(stat = 'summary', fun.y = mean)

Next, I took a few measures to compact the data. First, I grouped every age of every pitcher by their mean and median score:

install.packages("dplyr")
## Installing package into 'C:/Users/Corey/Documents/R/win-library/3.5'
## (as 'lib' is unspecified)
## package 'dplyr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Corey\AppData\Local\Temp\RtmpoRUet4\downloaded_packages
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:ggplot2':
## 
##     vars
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
SPage_groups <- group_by(SPimSS, Age)
SP.by.age <- summarise(SPage_groups,
          Score_mean = mean(Score),
          Score_median = median(Score),
          n = n())
SP.by.age <- arrange(SP.by.age)
head(SP.by.age, 23)
## # A tibble: 23 x 4
##      Age Score_mean Score_median     n
##    <dbl>      <dbl>        <dbl> <int>
##  1    21       66           66       1
##  2    22       49.9         50.5     8
##  3    23       52.6         53      12
##  4    24       55.7         55      22
##  5    25       60.1         64      21
##  6    26       55.3         58      29
##  7    27       59.1         57      29
##  8    28       57.5         57      30
##  9    29       56.2         54.5    26
## 10    30       55.3         51      19
## # ... with 13 more rows
sort1.SP.by.age <- SP.by.age[order(-SP.by.age$Score_mean),]
sort1.SP.by.age[1:4]
## # A tibble: 23 x 4
##      Age Score_mean Score_median     n
##    <dbl>      <dbl>        <dbl> <int>
##  1    43       69           69       1
##  2    21       66           66       1
##  3    37       65           65       2
##  4    36       61.5         63       4
##  5    33       61.1         59.5    14
##  6    35       60.6         58       5
##  7    25       60.1         64      21
##  8    27       59.1         57      29
##  9    38       58           58       1
## 10    31       57.5         58      19
## # ... with 13 more rows

With a minumum of 10 ranked starters, Age 33 has the highest mean score. With a minumum of 20 ranked starters, Age 25 has the highest mean score. Already I am learning things about my own data because it is the general thought that an athlete’s peak is the ages of 27-32, but in the three seasons of imSS, several starting pitchers have not quite tailed off yet.

After grouping the ages by their mean score, now I have a geom_line with a straighter aging curve. However, due to many age groups only having one or a few pitchers, we don’t have a straight curve. There are no pitchers with an age-39 season scored which explains the dip there.

ggplot(aes(x = Age, y = Score_mean),
             data = subset(SP.by.age)) +
  geom_line()

This shows the aging curve between just the ages from 23 to 33, as the 11 ages in that range have 10 or more pitchers in the dataset, to try to straighten out the line. Adding a smoother to the geom makes it look like more of an even curve.

ggplot(aes(x = Age, y = Score_mean),
             data = subset(SP.by.age, Age > 22 & Age < 34)) +
  geom_line()

ggplot(aes(x = Age, y = Score_mean),
             data = subset(SP.by.age, Age > 22 & Age < 34)) +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Pearson’s product-moment correlation for Age and Score is .032. Indicating a weak positive relationship. However, we know that there are many peaks and valleys to the graphs, and the relationship isn’t linear.

with(SPimSS, cor.test(Age, Score, method = 'pearson'))
## 
##  Pearson's product-moment correlation
## 
## data:  Age and Score
## t = 0.53132, df = 271, p-value = 0.5956
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08679077  0.15039987
## sample estimates:
##        cor 
## 0.03225873

Here, I subsetted the data by pitcher age AND the season, with the Mean imSS Score. You can see that 33-year old Pitchers had high mean Scores in all three seasons of imSS history.

install.packages("reshape2")
## Installing package into 'C:/Users/Corey/Documents/R/win-library/3.5'
## (as 'lib' is unspecified)
## package 'reshape2' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Corey\AppData\Local\Temp\RtmpoRUet4\downloaded_packages
library(reshape2)
SP_by_age_year <- SPimSS %>%
  group_by(Age, Year) %>%
  summarise(Score_mean = mean(Score),
            n =n()) %>%
  ungroup() %>%
  arrange(Age)

SP_by_age_year_wide <- dcast(SP_by_age_year,
                        Year ~ Age,
                        value.var = 'Score_mean')

SP_by_age_year_wide[1:24]
##   Year 21       22       23       24       25       26       27       28
## 1 2015 66 48.71429 50.75000 55.27273 55.83333 51.00000 55.72727 49.28571
## 2 2016 NA 58.00000 52.28571 54.75000 58.18182 53.83333 61.83333 53.54545
## 3 2017 NA       NA 62.00000 56.85714 71.75000 60.72727 59.66667 65.83333
##         29       30       31       32       33 34 35 36 37 38 40 41 42 43
## 1 59.50000 54.50000 52.28571 50.60000 65.00000 47 97 61 NA NA 43 NA 40 NA
## 2 49.00000 55.25000 62.75000 58.57143 60.60000 40 58 77 65 NA NA 52 NA 69
## 3 58.36364 55.71429 59.37500 63.25000 60.28571 61 45 54 65 58 NA NA 61 NA
##   44
## 1 NA
## 2 NA
## 3 43

Multi-Variate Analysis

Next, I took the 91 pitchers that were scored all three years, took their total score of the three years, with their ages in the three seasons. Here are the Top 10 Starting Pitchers of the first three years of imSS:

SPName_Groups <- group_by(SPimSS, Name)
SP.by.Pitcher <- summarise(SPName_Groups,
          Age15 = min(Age),
          Age16 = median(Age),
          Age17 = max(Age),
          TotalScore = sum(Score),
          n = n())
SP.by.Pitcher <- arrange(SP.by.Pitcher)

sort1.SP.by.Pitcher <- SP.by.Pitcher[order(-SP.by.Pitcher$TotalScore),]
sort1.SP.by.Pitcher[1:5]
## # A tibble: 91 x 5
##    Name              Age15 Age16 Age17 TotalScore
##    <chr>             <dbl> <dbl> <dbl>      <dbl>
##  1 Clayton Kershaw      27    28    29        264
##  2 Rich Hill            35    36    37        239
##  3 Max Scherzer         30    31    32        234
##  4 Corey Kluber         29    30    31        233
##  5 Chris Sale           26    27    28        229
##  6 Jacob DeGrom         27    28    29        225
##  7 Jake Arrieta         29    30    31        222
##  8 Madison Bumgarner    25    26    27        221
##  9 Justin Verlander     32    33    34        220
## 10 Zack Greinke         31    32    33        220
## # ... with 81 more rows

I printed a scatterplot of the 91 pitchers with their Ages in 2016, the middle year of the three seasons, with their total score. You can see fewer outliers, but you can still see a wide variance of age and score, and not much of a linear relationship.

qplot(x = Age16, y = TotalScore, data = SP.by.Pitcher)

Next, I took a sample of 12 individual pitchers and plotted their three seasons of imSS, with their Age and Score. You can see Mat Latos, having a low score and trending downward, and wound up pitching in the Independent League in 2018.

set.seed(1000)
sample_sps <- sample(SPimSS$Name, 12)

ggplot(aes(x = Age, y = Score),
    data = subset(SPimSS, Name %in% sample_sps)) +
    facet_wrap( ~ Name) +
    geom_line() +
    geom_point(aes(size = Score))

Here’s a random sample of 6 Pitchers, with their age by score, and color-coded.

sample_sps2 <- sample(SPimSS$Name, 6)

ggplot(aes(x = Age, y = Score),
  data = subset(SPimSS, Name %in% sample_sps2)) +
  geom_point(aes(color = Name)) +
  scale_fill_brewer(type = 'qual') +
  coord_cartesian(xlim = c(20,50))

Here’s a sample of 10 Pitchers with their total score with their age in the 2016 season. Here you can almost make out a natural aging curve.:

set.seed(1000)
sample_sps3 <- sample(SP.by.Pitcher$Name, 10)

ggplot(aes(x = Age16, y = TotalScore),
  data = subset(SP.by.Pitcher, Name %in% sample_sps3)) +
  geom_point(aes(color = Name)) +
  scale_fill_brewer(type = 'qual')

Lastly, I built a linear model using the four variables.

install.packages("memisc")
## package 'memisc' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Corey\AppData\Local\Temp\RtmpoRUet4\downloaded_packages
library(memisc)
m1 <- lm(I(log(Score)) ~ I(Age^(1/3)), data = SPimSS)
m2 <- update(m1, ~ . + Year)
m3 <- update(m2, ~ . + Name)
mtable(m1, m2, m3, sdigits = 3)
## 
## Calls:
## m1: lm(formula = I(log(Score)) ~ I(Age^(1/3)), data = SPimSS)
## m2: lm(formula = I(log(Score)) ~ I(Age^(1/3)) + Year, data = SPimSS)
## m3: lm(formula = I(log(Score)) ~ I(Age^(1/3)) + Year + Name, data = SPimSS)
## 
## ===========================================================================
##                                             m1          m2         m3      
## ---------------------------------------------------------------------------
##   (Intercept)                             3.843***  -114.769**   280.619   
##                                          (0.379)     (43.190)   (359.648)  
##   I(Age^(1/3))                            0.052       -0.021       5.634   
##                                          (0.124)      (0.125)     (5.123)  
##   Year                                                 0.059**    -0.145   
##                                                       (0.021)     (0.186)  
##   Name: Adam Wainwright/Aaron Nola                                -2.215   
##                                                                   (2.036)  
##   Name: Andrew Cashner/Aaron Nola                                 -1.511   
##                                                                   (1.184)  
##   Name: Bartolo Colon/Aaron Nola                                  -3.856   
##                                                                   (3.385)  
##   Name: Brandon Finnegan/Aaron Nola                               -0.238   
##                                                                   (0.177)  
##   Name: Carlos Carrasco/Aaron Nola                                -1.033   
##                                                                   (1.184)  
##   Name: Carlos Martinez/Aaron Nola                                -0.025   
##                                                                   (0.273)  
##   Name: Carlos Rodon/Aaron Nola                                   -0.117   
##                                                                   (0.177)  
##   Name: CC Sabathia/Aaron Nola                                    -2.511   
##                                                                   (2.197)  
##   Name: Chad Bettis/Aaron Nola                                    -1.173   
##                                                                   (0.820)  
##   Name: Chase Anderson/Aaron Nola                                 -1.120   
##                                                                   (1.004)  
##   Name: Chris Archer/Aaron Nola                                   -0.735   
##                                                                   (0.820)  
##   Name: Chris Sale/Aaron Nola                                     -0.581   
##                                                                   (0.820)  
##   Name: Chris Tillman/Aaron Nola                                  -1.522   
##                                                                   (1.004)  
##   Name: Clayton Kershaw/Aaron Nola                                -0.643   
##                                                                   (1.004)  
##   Name: Cole Hamels/Aaron Nola                                    -1.762   
##                                                                   (1.705)  
##   Name: Collin McHugh/Aaron Nola                                  -1.352   
##                                                                   (1.184)  
##   Name: Corey Kluber/Aaron Nola                                   -1.174   
##                                                                   (1.361)  
##   Name: Dallas Keuchel/Aaron Nola                                 -0.940   
##                                                                   (1.004)  
##   Name: Daniel Norris/Aaron Nola                                  -0.137   
##                                                                   (0.177)  
##   Name: Danny Duffy/Aaron Nola                                    -0.742   
##                                                                   (0.820)  
##   Name: Danny Salazar/Aaron Nola                                  -0.643   
##                                                                   (0.633)  
##   Name: Derek Holland/Aaron Nola                                  -1.724   
##                                                                   (1.184)  
##   Name: Edinson Volquez/Aaron Nola                                -2.098   
##                                                                   (1.705)  
##   Name: Eduardo Rodriguez/Aaron Nola                              -0.041   
##                                                                   (0.177)  
##   Name: Ervin Santana/Aaron Nola                                  -2.009   
##                                                                   (1.872)  
##   Name: Felix Hernandez/Aaron Nola                                -1.555   
##                                                                   (1.361)  
##   Name: Garrett Richards/Aaron Nola                               -0.984   
##                                                                   (1.004)  
##   Name: Gerrit Cole/Aaron Nola                                    -0.305   
##                                                                   (0.447)  
##   Name: Gio Gonzalez/Aaron Nola                                   -1.449   
##                                                                   (1.361)  
##   Name: Hector Santiago/Aaron Nola                                -1.318   
##                                                                   (1.004)  
##   Name: Hisashi Iwakuma/Aaron Nola                                -2.475   
##                                                                   (2.197)  
##   Name: Ian Kennedy/Aaron Nola                                    -1.765   
##                                                                   (1.535)  
##   Name: J.A. Happ/Aaron Nola                                      -1.963   
##                                                                   (1.872)  
##   Name: Jacob DeGrom/Aaron Nola                                   -0.803   
##                                                                   (1.004)  
##   Name: Jaime Garcia/Aaron Nola                                   -1.223   
##                                                                   (1.184)  
##   Name: Jake Arrieta/Aaron Nola                                   -1.222   
##                                                                   (1.361)  
##   Name: Jake Odorizzi/Aaron Nola                                  -0.738   
##                                                                   (0.633)  
##   Name: James Paxton/Aaron Nola                                   -0.913   
##                                                                   (0.820)  
##   Name: James Shields/Aaron Nola                                  -2.572   
##                                                                   (2.036)  
##   Name: Jarred Cosart/Aaron Nola                                  -1.352*  
##                                                                   (0.633)  
##   Name: Jason Hammel/Aaron Nola                                   -2.058   
##                                                                   (1.872)  
##   Name: Jeff Samardzija/Aaron Nola                                -1.677   
##                                                                   (1.535)  
##   Name: Jerad Eickhoff/Aaron Nola                                 -0.343   
##                                                                   (0.447)  
##   Name: Jered Weaver/Aaron Nola                                   -2.560   
##                                                                   (1.872)  
##   Name: Jeremy Hellickson/Aaron Nola                              -1.536   
##                                                                   (1.184)  
##   Name: Jimmy Nelson/Aaron Nola                                   -0.932   
##                                                                   (0.820)  
##   Name: Joe Ross/Aaron Nola                                       -0.020   
##                                                                   (0.177)  
##   Name: John Lackey/Aaron Nola                                    -2.671   
##                                                                   (2.509)  
##   Name: Johnny Cueto/Aaron Nola                                   -1.339   
##                                                                   (1.361)  
##   Name: Jon Gray/Aaron Nola                                       -0.406   
##                                                                   (0.273)  
##   Name: Jon Lester/Aaron Nola                                     -1.642   
##                                                                   (1.705)  
##   Name: Jordan Zimmermann/Aaron Nola                              -1.717   
##                                                                   (1.361)  
##   Name: Jose Quintana/Aaron Nola                                  -0.742   
##                                                                   (0.820)  
##   Name: Julio Teheran/Aaron Nola                                  -0.423   
##                                                                   (0.447)  
##   Name: Justin Verlander/Aaron Nola                               -1.792   
##                                                                   (1.872)  
##   Name: Kendall Graveman/Aaron Nola                               -0.660   
##                                                                   (0.447)  
##   Name: Kevin Gausman/Aaron Nola                                  -0.449   
##                                                                   (0.447)  
##   Name: Kyle Gibson/Aaron Nola                                    -1.348   
##                                                                   (1.004)  
##   Name: Kyle Hendricks/Aaron Nola                                 -0.471   
##                                                                   (0.633)  
##   Name: Lance McCullers Jr./Aaron Nola                             0.328   
##                                                                   (0.278)  
##   Name: Madison Bumgarner/Aaron Nola                              -0.404   
##                                                                   (0.633)  
##   Name: Marco Estrada/Aaron Nola                                  -1.833   
##                                                                   (1.705)  
##   Name: Marcus Stroman/Aaron Nola                                 -0.235   
##                                                                   (0.447)  
##   Name: Martin Perez/Aaron Nola                                   -0.572   
##                                                                   (0.447)  
##   Name: Masahiro Tanaka/Aaron Nola                                -0.744   
##                                                                   (0.820)  
##   Name: Mat Latos/Aaron Nola                                      -1.695   
##                                                                   (1.004)  
##   Name: Matt Garza/Aaron Nola                                     -2.268   
##                                                                   (1.705)  
##   Name: Matt Moore/Aaron Nola                                     -1.263   
##                                                                   (0.820)  
##   Name: Max Scherzer/Aaron Nola                                   -1.356   
##                                                                   (1.535)  
##   Name: Michael Pineda/Aaron Nola                                 -1.011   
##                                                                   (0.820)  
##   Name: Michael Wacha/Aaron Nola                                  -0.274   
##                                                                   (0.273)  
##   Name: Miguel Gonzalez/Aaron Nola                                -2.060   
##                                                                   (1.705)  
##   Name: Mike Fiers/Aaron Nola                                     -1.770   
##                                                                   (1.535)  
##   Name: Mike Leake/Aaron Nola                                     -1.102   
##                                                                   (1.004)  
##   Name: Nate Karns/Aaron Nola                                     -1.208   
##                                                                   (1.004)  
##   Name: Noah Syndergaard/Aaron Nola                               -0.230   
##                                                                   (0.447)  
##   Name: R.A. Dickey/Aaron Nola                                    -3.498   
##                                                                   (3.102)  
##   Name: Rich Hill/Aaron Nola                                      -2.251   
##                                                                   (2.354)  
##   Name: Rick Porcello/Aaron Nola                                  -0.848   
##                                                                   (0.820)  
##   Name: Robbie Ray/Aaron Nola                                     -0.164   
##                                                                   (0.273)  
##   Name: Shelby Miller/Aaron Nola                                  -0.653   
##                                                                   (0.447)  
##   Name: Sonny Gray/Aaron Nola                                     -0.764   
##                                                                   (0.633)  
##   Name: Stephen Strasburg/Aaron Nola                              -0.644   
##                                                                   (0.820)  
##   Name: Steven Matz/Aaron Nola                                    -0.542   
##                                                                   (0.447)  
##   Name: Taijuan Walker/Aaron Nola                                 -0.032   
##                                                                   (0.177)  
##   Name: Trevor Bauer/Aaron Nola                                   -0.552   
##                                                                   (0.447)  
##   Name: Wade Miley/Aaron Nola                                     -1.568   
##                                                                   (1.184)  
##   Name: Wei-Yin Chen/Aaron Nola                                   -1.617   
##                                                                   (1.361)  
##   Name: Yovani Gallardo/Aaron Nola                                -1.848   
##                                                                   (1.361)  
##   Name: Zack Greinke/Aaron Nola                                   -1.621   
##                                                                   (1.705)  
## ---------------------------------------------------------------------------
##   R-squared                               0.001        0.028       0.620   
##   adj. R-squared                         -0.003        0.021       0.426   
##   sigma                                   0.286        0.283       0.217   
##   F                                       0.177        3.862       3.191   
##   p                                       0.675        0.022       0.000   
##   Log-likelihood                        -45.042      -41.281      86.919   
##   Deviance                               22.233       21.629       8.456   
##   AIC                                    96.084       90.562      14.161   
##   BIC                                   106.913      105.000     353.452   
##   N                                     273          273         273       
## ===========================================================================

Now, I attempted to my first projection analysis with imSS. I took the Top 25 Pitchers of the first three years of imSS and tried to predict their 2018 imSS Scores.:

thisPitcher1 = data.frame(Age = 30, Year = 2018, Name = "Clayton Kershaw")
modelEstimate = predict(m3, newdata = thisPitcher1,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 98.11711 59.62718 161.4527
thisPitcher2 = data.frame(Age = 38, Year = 2018, Name = "Rich Hill")
modelEstimate = predict(m3, newdata = thisPitcher2,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 82.54869 49.45472 137.7884
thisPitcher3 = data.frame(Age = 33, Year = 2018, Name = "Max Scherzer")
modelEstimate = predict(m3, newdata = thisPitcher3,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 84.61914 51.25655 139.6972
thisPitcher4 = data.frame(Age = 32, Year = 2018, Name = "Corey Kluber")
modelEstimate = predict(m3, newdata = thisPitcher4,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 84.45229 51.23131 139.2155
thisPitcher5 = data.frame(Age = 29, Year = 2018, Name = "Chris Sale")
modelEstimate = predict(m3, newdata = thisPitcher5,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 85.72368 52.10293 141.0391
thisPitcher6 = data.frame(Age = 30, Year = 2018, Name = "Jacob DeGrom")
modelEstimate = predict(m3, newdata = thisPitcher6,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 83.56883 50.78598 137.5133
thisPitcher7 = data.frame(Age = 32, Year = 2018, Name = "Jake Arrieta")
modelEstimate = predict(m3, newdata = thisPitcher7,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 80.48764 48.82623 132.6799
thisPitcher8 = data.frame(Age = 28, Year = 2018, Name = "Madison Bumgarner")
modelEstimate = predict(m3, newdata = thisPitcher8,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr     upr
## 1 83.65609 50.82127 137.705
thisPitcher9 = data.frame(Age = 35, Year = 2018, Name = "Justin Verlander")
modelEstimate = predict(m3, newdata = thisPitcher9,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 78.29741 47.24639 129.7556
thisPitcher10 = data.frame(Age = 34, Year = 2018, Name = "Zack Greinke")
modelEstimate = predict(m3, newdata = thisPitcher10,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 77.77149 47.02469 128.6219
thisPitcher11 = data.frame(Age = 31, Year = 2018, Name = "Carlos Carrasco")
modelEstimate = predict(m3, newdata = thisPitcher11,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 80.52371 48.90255 132.5916
thisPitcher12 = data.frame(Age = 29, Year = 2018, Name = "Stephen Strasburg")
modelEstimate = predict(m3, newdata = thisPitcher12,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 80.52746 48.94466 132.4899
thisPitcher13 = data.frame(Age = 27, Year = 2018, Name = "Marcus Stroman")
modelEstimate = predict(m3, newdata = thisPitcher13,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 80.60829 48.90868 132.8536
thisPitcher14 = data.frame(Age = 34, Year = 2018, Name = "Jon Lester")
modelEstimate = predict(m3, newdata = thisPitcher14,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 76.15353 46.04638 125.9461
thisPitcher15 = data.frame(Age = 27, Year = 2018, Name = "Noah Syndergaard")
modelEstimate = predict(m3, newdata = thisPitcher15,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##       fit     lwr      upr
## 1 80.9996 49.1461 133.4986
thisPitcher16 = data.frame(Age = 28, Year = 2018, Name = "Kyle Hendricks")
modelEstimate = predict(m3, newdata = thisPitcher16,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 78.26667 47.54719 128.8335
thisPitcher17 = data.frame(Age = 26, Year = 2018, Name = "Carlos Martinez")
modelEstimate = predict(m3, newdata = thisPitcher17,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 80.54018 48.76249 133.0268
thisPitcher18 = data.frame(Age = 30, Year = 2018, Name = "Dallas Keuchel")
modelEstimate = predict(m3, newdata = thisPitcher18,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 72.93248 44.32212 120.0111
thisPitcher19 = data.frame(Age = 27, Year = 2018, Name = "Gerrit Cole")
modelEstimate = predict(m3, newdata = thisPitcher19,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 75.20842 45.63234 123.9539
thisPitcher20 = data.frame(Age = 32, Year = 2018, Name = "Johnny Cueto")
modelEstimate = predict(m3, newdata = thisPitcher20,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 71.60404 43.43717 118.0357
thisPitcher21 = data.frame(Age = 29, Year = 2018, Name = "Chris Archer")
modelEstimate = predict(m3, newdata = thisPitcher21,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 73.48447 44.66392 120.9022
thisPitcher22 = data.frame(Age = 29, Year = 2018, Name = "Danny Duffy")
modelEstimate = predict(m3, newdata = thisPitcher22,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 72.98214 44.35861 120.0757
thisPitcher23 = data.frame(Age = 29, Year = 2018, Name = "Jose Quintana")
modelEstimate = predict(m3, newdata = thisPitcher23,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit     lwr      upr
## 1 72.97177 44.3523 120.0587
thisPitcher24 = data.frame(Age = 29, Year = 2018, Name = "Masahiro Tanaka")
modelEstimate = predict(m3, newdata = thisPitcher24,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##        fit      lwr      upr
## 1 72.82626 44.26387 119.8193
thisPitcher25 = data.frame(Age = 27, Year = 2018, Name = "Jerad Eickhoff")
modelEstimate = predict(m3, newdata = thisPitcher25,
                        interval = "prediction", level = .95)
exp(modelEstimate)
##       fit      lwr      upr
## 1 72.3559 43.90158 119.2526

This does not include Pitchers who made their imSS debut in 2016, 2017, or 2018. However, among pitchers ranked in all three years of imSS, these projections have the Top 10 for 2018 as: 1. Clayton Kershaw- 98

  1. Chris Sale-85

  2. Max Scherzer-85

  3. Corey Kluber-85

  4. Madison Bumgarner-84

  5. Jacob DeGrom-84

  6. Rich Hill-83

  7. Noah Syndergaard-81

  8. Carlos Martinez-81

  9. Marcus Stroman-81

Reflection

In this project, I imported my own data, created plots, created new subsets, created new variables, built a model, and made predictions off the. Since I am so new to RStudio and building models in general, I wasn’t even sure if I built the right kind of model for this exercise. However, I wanted to use what I did learn within this course in this project to see where it took my analysis. I can tell that the predictions were reading Age because Rich Hill, who was 38 in 2018, had a lower score than pitchers in the range below him, while pitchers entering what’s supposed to bee their ‘prime’ or ‘peak’ years, Nosh Syndergaard and Carlos Martinez, who were 27 and 26 in 2018, had higher projected scores than pitchers who scored higher than them from 2015-2017, but are older than 33 now.

I have tons of more work to do to make better projections, including looking at every stat that goes into the final score, and looking at other peripheral numbers such as Fastball Velocity and Opponent’s Exit Velocity, that would be better predictor variables than those dozen or so stats that do carry weight. Other factors such as Ballpark the pitcher makes half their starts in, and league also need to be taken into affect. I also look forward to adding the imSS data of future seasons to make better projections.