Summary

1. Mets 2018

Since 2018, this is one of only two classes to produce two major leaguers and one with an internal (Fangraphs) future value of 60 or higher. The 60+ player is Francisco Alvarez, who has become a very good MLB catcher. The class also includes Endy Rodríguez, who, although traded to the Pirates, is expected to become an above-average big leaguer. Rodríguez was signed for $10k.

2. Giants 2018

This is the other class to produce two big leaguers. Luis Matos and and Marco Luciano became the Giants number 1 and 2 prospects, respectively. Luis Matos was signed for $750k, indicating that the Giants saw value that other teams did not. This class also includes two other prospects who currently fall within the organization’s top 30.

3. Rockies 2019

The headline of this class is Adael Amador. I include this class to suggest that one great prospect can make a class great. Amador certainly seems to be a great prospect, and he signed for only $1.5MM. Our (Fangraph’s) scouts describe him as “an All-Star second baseman.” the class does include another top-3 player in the Rockies’ system, Yanquiel Fernandez, who our scouts note has the chance to be an impact player.

# Load packages
library(httr)
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 4.2.3
library(rvest)
## Warning: package 'rvest' was built under R version 4.2.3
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.3
## Warning: package 'ggplot2' was built under R version 4.2.3
## Warning: package 'tibble' was built under R version 4.2.3
## Warning: package 'tidyr' was built under R version 4.2.3
## Warning: package 'readr' was built under R version 4.2.3
## Warning: package 'dplyr' was built under R version 4.2.3
## Warning: package 'stringr' was built under R version 4.2.3
## Warning: package 'forcats' was built under R version 4.2.3
## Warning: package 'lubridate' was built under R version 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()         masks stats::filter()
## ✖ purrr::flatten()        masks jsonlite::flatten()
## ✖ readr::guess_encoding() masks rvest::guess_encoding()
## ✖ dplyr::lag()            masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

Web-scrape: Fangraphs top prospects for each organization, which I am using as our organization’s “internal evaluations”

# The scrape must be looped for each team

teams <- c('ari','atl','bal','bos','chc','chw', 'cin', 'cle','col',
           'det','hou','kcr','laa','lad','mia','mil','min','nym','nyy',
           'oak','phi','pit','sdp','sea','sfg','stl','tbr','tex','tor',
           'wsn')

prospects_table <- data.frame()
      
for(team in teams){

# create the unique url for each iteration
url <- paste0('https://www.fangraphs.com/prospects/the-board?%2F&org=',team)

# Extract table
team_page <- read_html(url)
team_table <- team_page %>%
 html_table() %>% 
  .[[10]]

# Pause scrape to avoid robot blocking
Sys.sleep(5)

# Add team's table to running table
prospects_table <- rbind(prospects_table, team_table)

}
# Add recent graduates, which is located at a different url

years <- c(2021,2022, 2023)
graduates_table <- data.frame()
      
for(year in years){

url <- paste0('https://www.fangraphs.com/prospects/the-board/',year,'-graduates')

year_page <- read_html(url)

year_table <- year_page %>%
 html_table() %>% 
  .[[10]]

Sys.sleep(5)

prospects_table <- rbind(prospects_table, year_table)

}

head(prospects_table)
## # A tibble: 6 × 22
##   Top 100Rank inall base…¹ Org RkRank withintea…² NameName `OrgMLB Organization`
##                      <int>                  <int> <chr>    <chr>                
## 1                        5                      1 Jordan … ARI                  
## 2                       40                      2 Druw Jo… ARI                  
## 3                       59                      3 Tommy T… ARI                  
## 4                       NA                      4 Jansel … ARI                  
## 5                       NA                      5 Cristof… ARI                  
## 6                       NA                      6 Slade C… ARI                  
## # ℹ abbreviated names: ¹​`Top 100Rank inall baseball`,
## #   ²​`Org RkRank withinteam's farm system`
## # ℹ 18 more variables:
## #   `PosProjected defensive position or pitching role(Starter, Multi-, or Single-inning Relief)` <chr>,
## #   `Current LevelMost recent the level played ator had a transaction to` <chr>,
## #   `ETAProjected year of debut/loss of prospect eligibility` <int>,
## #   `FVFuture Value(Explanation)` <chr>, …

Clean Table

prospects_df <- prospects_table %>%
  select("Top 100Rank inall baseball", "Org RkRank withinteam's farm system","NameName", "OrgMLB Organization", "PosProjected defensive position or pitching role(Starter, Multi-, or Single-inning Relief)",
"Current LevelMost recent the level played ator had a transaction to",
"ETAProjected year of debut/loss of prospect eligibility", "FVFuture Value(Explanation)","Sign YrSign Yr","Sign MktSign Mkt", "Sign OrgSign Org","BonusBonus") %>%
  rename('Top100' = "Top 100Rank inall baseball",
         'OrgRank' = "Org RkRank withinteam's farm system",
         'Name' ="NameName",
         'CurrentOrg' = "OrgMLB Organization",
         'Pos' = "PosProjected defensive position or pitching role(Starter, Multi-, or Single-inning Relief)",
         'Level' = "Current LevelMost recent the level played ator had a transaction to",
         'ETA' = "ETAProjected year of debut/loss of prospect eligibility",
         'FutureValue' = "FVFuture Value(Explanation)",
         'SignYr' = "Sign YrSign Yr",
         'SignMkt' = "Sign MktSign Mkt",
         'SignOrg' = "Sign OrgSign Org",
         'Bonus' = "BonusBonus"
         ) %>%
  mutate(FutureValue = as.integer(str_remove_all(FutureValue, pattern = '[^0-9]')))

# Select rows relevant to the task
intl_df <- prospects_df %>%
  filter((SignMkt == 'J2' | SignMkt == 'Intl15') & SignYr >= 2018)

Classes with two or more MLB players

mlb_players <- intl_df %>%
  filter(Level == "MLB") %>%
  group_by(SignOrg, SignYr) %>%
  summarise(mlb_players = n()) %>%
  filter(mlb_players >=2)
## `summarise()` has grouped output by 'SignOrg'. You can override using the
## `.groups` argument.
mlb_players
## # A tibble: 2 × 3
## # Groups:   SignOrg [2]
##   SignOrg SignYr mlb_players
##   <chr>    <int>       <int>
## 1 NYM       2018           2
## 2 SFG       2018           2

Classes with two or more top 100 prospects (including graduates)

top100 <- intl_df %>%
  filter(!is.na(Top100)) %>%
  group_by(SignOrg, SignYr) %>%
  summarise(top_100 = n()) %>%
  filter(top_100 >= 2)
## `summarise()` has grouped output by 'SignOrg'. You can override using the
## `.groups` argument.
top100
## # A tibble: 2 × 3
## # Groups:   SignOrg [2]
##   SignOrg SignYr top_100
##   <chr>    <int>   <int>
## 1 NYM       2018       2
## 2 SFG       2018       2

Classes with two or more org top 5 prospects (including graduates)

org_top_5 <- intl_df %>%
  filter(OrgRank <= 5) %>%
  group_by(SignOrg, SignYr) %>%
  summarise(top_5 = n()) %>%
  filter(top_5 >= 2)
## `summarise()` has grouped output by 'SignOrg'. You can override using the
## `.groups` argument.
org_top_5
## # A tibble: 5 × 3
## # Groups:   SignOrg [5]
##   SignOrg SignYr top_5
##   <chr>    <int> <int>
## 1 ARI       2022     2
## 2 COL       2019     2
## 3 LAA       2021     2
## 4 NYM       2018     2
## 5 SFG       2018     2

Classese with two or more players with future values 50 or higher

over_50 <- intl_df %>%
  filter(FutureValue >= 50) %>%
  group_by(SignOrg, SignYr) %>%
  summarise(over_50 = n()) %>%
  filter(over_50 >= 2)
## `summarise()` has grouped output by 'SignOrg'. You can override using the
## `.groups` argument.
over_50
## # A tibble: 2 × 3
## # Groups:   SignOrg [2]
##   SignOrg SignYr over_50
##   <chr>    <int>   <int>
## 1 NYM       2018       2
## 2 SFG       2018       2

Classes with a player 60 grade or higher

max_60 <- intl_df %>%
  filter(FutureValue >= 60) %>%
  group_by(SignOrg, SignYr) %>%
  summarise(max_60 = n())
## `summarise()` has grouped output by 'SignOrg'. You can override using the
## `.groups` argument.
max_60
## # A tibble: 5 × 3
## # Groups:   SignOrg [5]
##   SignOrg SignYr max_60
##   <chr>    <int>  <int>
## 1 CIN       2018      1
## 2 COL       2019      1
## 3 MIA       2019      1
## 4 MIL       2021      1
## 5 NYM       2018      1

Explore the highlighted classes

NYM_18 <- intl_df %>%
  filter(SignOrg == 'NYM' & SignYr == 2018)
NYM_18
## # A tibble: 3 × 12
##   Top100 OrgRank Name    CurrentOrg Pos   Level   ETA FutureValue SignYr SignMkt
##    <int>   <int> <chr>   <chr>      <chr> <chr> <int>       <int>  <int> <chr>  
## 1     NA       6 Kenedy… HOU        CF    AA     2024          40   2018 J2     
## 2     10       1 Franci… NYM        C     MLB    2023          60   2018 J2     
## 3     11       2 Endy R… PIT        C     MLB    2023          55   2018 J2     
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
SFG_18 <- intl_df %>%
  filter(SignOrg == 'SFG' & SignYr == 2018)
SFG_18
## # A tibble: 4 × 12
##   Top100 OrgRank Name    CurrentOrg Pos   Level   ETA FutureValue SignYr SignMkt
##    <int>   <int> <chr>   <chr>      <chr> <chr> <int>       <int>  <int> <chr>  
## 1     65       2 Marco … SFG        RF    MLB    2024          50   2018 J2     
## 2     NA      21 Victor… SFG        1B    AA     2025          40   2018 J2     
## 3     NA      27 Jairo … SFG        LF    A+     2023          40   2018 J2     
## 4     12       1 Luis M… SFG        CF    MLB    2023          55   2018 J2     
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
ARI_22 <- intl_df %>%
  filter(SignOrg == 'ARI' & SignYr == 2022)
ARI_22
## # A tibble: 4 × 12
##   Top100 OrgRank Name    CurrentOrg Pos   Level   ETA FutureValue SignYr SignMkt
##    <int>   <int> <chr>   <chr>      <chr> <chr> <int>       <int>  <int> <chr>  
## 1     NA       4 Jansel… ARI        SS    A      2027          45   2022 Intl15 
## 2     NA       5 Cristo… ARI        2B    A      2027          45   2022 Intl15 
## 3     NA      11 Ruben … ARI        3B    CPX    2028          40   2022 Intl15 
## 4     NA      30 Yerald… ARI        SS    CPX    2027          40   2022 Intl15 
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
COL_19 <- intl_df %>%
  filter(SignOrg == 'COL' & SignYr == 2019)
COL_19
## # A tibble: 3 × 12
##   Top100 OrgRank Name    CurrentOrg Pos   Level   ETA FutureValue SignYr SignMkt
##    <int>   <int> <chr>   <chr>      <chr> <chr> <int>       <int>  <int> <chr>  
## 1      8       1 Adael … COL        2B    AA     2025          60   2019 J2     
## 2     NA       3 Yanqui… COL        LF    AA     2025          45   2019 J2     
## 3     NA      22 Adrian… TOR        CF    A      2025          40   2019 J2     
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
LAA_21 <- intl_df %>%
  filter(SignOrg == 'LAA' & SignYr == 2021)
LAA_21
## # A tibble: 5 × 12
##   Top100 OrgRank Name    CurrentOrg Pos   Level   ETA FutureValue SignYr SignMkt
##    <int>   <int> <chr>   <chr>      <chr> <chr> <int>       <int>  <int> <chr>  
## 1     48       3 Edgar … CHW        C     AA     2025          50   2021 J2     
## 2     NA       5 Walber… LAA        SIRP  A      2026          40   2021 Intl15 
## 3     NA       9 Jorge … LAA        LF    A      2026          40   2021 Intl15 
## 4     NA      15 Denzer… LAA        SS    A      2026          40   2021 Intl15 
## 5     NA      18 Keythe… LAA        SP    A+     2027          35   2021 Intl15 
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
CIN_18 <- intl_df %>%
  filter(SignOrg == 'CIN' & SignYr == 2018)
CIN_18
## # A tibble: 2 × 12
##   Top100 OrgRank Name    CurrentOrg Pos   Level   ETA FutureValue SignYr SignMkt
##    <int>   <int> <chr>   <chr>      <chr> <chr> <int>       <int>  <int> <chr>  
## 1     NA      40 Luis M… CIN        SIRP  A      2023          35   2018 J2     
## 2      1       1 Elly D… CIN        SS    MLB    2024          60   2018 J2     
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
MIA_19 <- intl_df %>%
  filter(SignOrg == 'MIA' & SignYr == 2019)
MIA_19
## # A tibble: 4 × 12
##   Top100 OrgRank Name    CurrentOrg Pos   Level   ETA FutureValue SignYr SignMkt
##    <int>   <int> <chr>   <chr>      <chr> <chr> <int>       <int>  <int> <chr>  
## 1     NA      12 Javier… MIA        CF    A+     2025          40   2019 J2     
## 2     NA      25 Ian Le… MIA        2B    A      2025          35   2019 J2     
## 3     NA       9 Jose S… MIN        2B    A+     2025          45   2019 J2     
## 4      3       1 Eury P… MIA        SP    MLB    2023          60   2019 J2     
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
MIL_21 <- intl_df %>%
  filter(SignOrg == 'MIL' & SignYr == 2021)
MIL_21
## # A tibble: 4 × 12
##   Top100 OrgRank Name    CurrentOrg Pos   Level   ETA FutureValue SignYr SignMkt
##    <int>   <int> <chr>   <chr>      <chr> <chr> <int>       <int>  <int> <chr>  
## 1      3       1 Jackso… MIL        CF    AAA    2024          60   2021 J2     
## 2     NA      13 Hendry… MIL        RF    A+     2025          40   2021 Intl15 
## 3     NA      16 Jadher… MIL        SS    A      2026          40   2021 Intl15 
## 4     NA      29 Luis C… MIL        LF    A      2026          35   2021 Intl15 
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>