Since 2018, this is one of only two classes to produce two major leaguers and one with an internal (Fangraphs) future value of 60 or higher. The 60+ player is Francisco Alvarez, who has become a very good MLB catcher. The class also includes Endy Rodríguez, who, although traded to the Pirates, is expected to become an above-average big leaguer. Rodríguez was signed for $10k.
This is the other class to produce two big leaguers. Luis Matos and and Marco Luciano became the Giants number 1 and 2 prospects, respectively. Luis Matos was signed for $750k, indicating that the Giants saw value that other teams did not. This class also includes two other prospects who currently fall within the organization’s top 30.
The headline of this class is Adael Amador. I include this class to suggest that one great prospect can make a class great. Amador certainly seems to be a great prospect, and he signed for only $1.5MM. Our (Fangraph’s) scouts describe him as “an All-Star second baseman.” the class does include another top-3 player in the Rockies’ system, Yanquiel Fernandez, who our scouts note has the chance to be an impact player.
# Load packages
library(httr)
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 4.2.3
library(rvest)
## Warning: package 'rvest' was built under R version 4.2.3
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.3
## Warning: package 'ggplot2' was built under R version 4.2.3
## Warning: package 'tibble' was built under R version 4.2.3
## Warning: package 'tidyr' was built under R version 4.2.3
## Warning: package 'readr' was built under R version 4.2.3
## Warning: package 'dplyr' was built under R version 4.2.3
## Warning: package 'stringr' was built under R version 4.2.3
## Warning: package 'forcats' was built under R version 4.2.3
## Warning: package 'lubridate' was built under R version 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ readr::guess_encoding() masks rvest::guess_encoding()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
Web-scrape: Fangraphs top prospects for each organization, which I am using as our organization’s “internal evaluations”
# The scrape must be looped for each team
teams <- c('ari','atl','bal','bos','chc','chw', 'cin', 'cle','col',
'det','hou','kcr','laa','lad','mia','mil','min','nym','nyy',
'oak','phi','pit','sdp','sea','sfg','stl','tbr','tex','tor',
'wsn')
prospects_table <- data.frame()
for(team in teams){
# create the unique url for each iteration
url <- paste0('https://www.fangraphs.com/prospects/the-board?%2F&org=',team)
# Extract table
team_page <- read_html(url)
team_table <- team_page %>%
html_table() %>%
.[[10]]
# Pause scrape to avoid robot blocking
Sys.sleep(5)
# Add team's table to running table
prospects_table <- rbind(prospects_table, team_table)
}
# Add recent graduates, which is located at a different url
years <- c(2021,2022, 2023)
graduates_table <- data.frame()
for(year in years){
url <- paste0('https://www.fangraphs.com/prospects/the-board/',year,'-graduates')
year_page <- read_html(url)
year_table <- year_page %>%
html_table() %>%
.[[10]]
Sys.sleep(5)
prospects_table <- rbind(prospects_table, year_table)
}
head(prospects_table)
## # A tibble: 6 × 22
## Top 100Rank inall base…¹ Org RkRank withintea…² NameName `OrgMLB Organization`
## <int> <int> <chr> <chr>
## 1 5 1 Jordan … ARI
## 2 40 2 Druw Jo… ARI
## 3 59 3 Tommy T… ARI
## 4 NA 4 Jansel … ARI
## 5 NA 5 Cristof… ARI
## 6 NA 6 Slade C… ARI
## # ℹ abbreviated names: ¹`Top 100Rank inall baseball`,
## # ²`Org RkRank withinteam's farm system`
## # ℹ 18 more variables:
## # `PosProjected defensive position or pitching role(Starter, Multi-, or Single-inning Relief)` <chr>,
## # `Current LevelMost recent the level played ator had a transaction to` <chr>,
## # `ETAProjected year of debut/loss of prospect eligibility` <int>,
## # `FVFuture Value(Explanation)` <chr>, …
Clean Table
prospects_df <- prospects_table %>%
select("Top 100Rank inall baseball", "Org RkRank withinteam's farm system","NameName", "OrgMLB Organization", "PosProjected defensive position or pitching role(Starter, Multi-, or Single-inning Relief)",
"Current LevelMost recent the level played ator had a transaction to",
"ETAProjected year of debut/loss of prospect eligibility", "FVFuture Value(Explanation)","Sign YrSign Yr","Sign MktSign Mkt", "Sign OrgSign Org","BonusBonus") %>%
rename('Top100' = "Top 100Rank inall baseball",
'OrgRank' = "Org RkRank withinteam's farm system",
'Name' ="NameName",
'CurrentOrg' = "OrgMLB Organization",
'Pos' = "PosProjected defensive position or pitching role(Starter, Multi-, or Single-inning Relief)",
'Level' = "Current LevelMost recent the level played ator had a transaction to",
'ETA' = "ETAProjected year of debut/loss of prospect eligibility",
'FutureValue' = "FVFuture Value(Explanation)",
'SignYr' = "Sign YrSign Yr",
'SignMkt' = "Sign MktSign Mkt",
'SignOrg' = "Sign OrgSign Org",
'Bonus' = "BonusBonus"
) %>%
mutate(FutureValue = as.integer(str_remove_all(FutureValue, pattern = '[^0-9]')))
# Select rows relevant to the task
intl_df <- prospects_df %>%
filter((SignMkt == 'J2' | SignMkt == 'Intl15') & SignYr >= 2018)
Classes with two or more MLB players
mlb_players <- intl_df %>%
filter(Level == "MLB") %>%
group_by(SignOrg, SignYr) %>%
summarise(mlb_players = n()) %>%
filter(mlb_players >=2)
## `summarise()` has grouped output by 'SignOrg'. You can override using the
## `.groups` argument.
mlb_players
## # A tibble: 2 × 3
## # Groups: SignOrg [2]
## SignOrg SignYr mlb_players
## <chr> <int> <int>
## 1 NYM 2018 2
## 2 SFG 2018 2
Classes with two or more top 100 prospects (including graduates)
top100 <- intl_df %>%
filter(!is.na(Top100)) %>%
group_by(SignOrg, SignYr) %>%
summarise(top_100 = n()) %>%
filter(top_100 >= 2)
## `summarise()` has grouped output by 'SignOrg'. You can override using the
## `.groups` argument.
top100
## # A tibble: 2 × 3
## # Groups: SignOrg [2]
## SignOrg SignYr top_100
## <chr> <int> <int>
## 1 NYM 2018 2
## 2 SFG 2018 2
Classes with two or more org top 5 prospects (including graduates)
org_top_5 <- intl_df %>%
filter(OrgRank <= 5) %>%
group_by(SignOrg, SignYr) %>%
summarise(top_5 = n()) %>%
filter(top_5 >= 2)
## `summarise()` has grouped output by 'SignOrg'. You can override using the
## `.groups` argument.
org_top_5
## # A tibble: 5 × 3
## # Groups: SignOrg [5]
## SignOrg SignYr top_5
## <chr> <int> <int>
## 1 ARI 2022 2
## 2 COL 2019 2
## 3 LAA 2021 2
## 4 NYM 2018 2
## 5 SFG 2018 2
Classese with two or more players with future values 50 or higher
over_50 <- intl_df %>%
filter(FutureValue >= 50) %>%
group_by(SignOrg, SignYr) %>%
summarise(over_50 = n()) %>%
filter(over_50 >= 2)
## `summarise()` has grouped output by 'SignOrg'. You can override using the
## `.groups` argument.
over_50
## # A tibble: 2 × 3
## # Groups: SignOrg [2]
## SignOrg SignYr over_50
## <chr> <int> <int>
## 1 NYM 2018 2
## 2 SFG 2018 2
Classes with a player 60 grade or higher
max_60 <- intl_df %>%
filter(FutureValue >= 60) %>%
group_by(SignOrg, SignYr) %>%
summarise(max_60 = n())
## `summarise()` has grouped output by 'SignOrg'. You can override using the
## `.groups` argument.
max_60
## # A tibble: 5 × 3
## # Groups: SignOrg [5]
## SignOrg SignYr max_60
## <chr> <int> <int>
## 1 CIN 2018 1
## 2 COL 2019 1
## 3 MIA 2019 1
## 4 MIL 2021 1
## 5 NYM 2018 1
Explore the highlighted classes
NYM_18 <- intl_df %>%
filter(SignOrg == 'NYM' & SignYr == 2018)
NYM_18
## # A tibble: 3 × 12
## Top100 OrgRank Name CurrentOrg Pos Level ETA FutureValue SignYr SignMkt
## <int> <int> <chr> <chr> <chr> <chr> <int> <int> <int> <chr>
## 1 NA 6 Kenedy… HOU CF AA 2024 40 2018 J2
## 2 10 1 Franci… NYM C MLB 2023 60 2018 J2
## 3 11 2 Endy R… PIT C MLB 2023 55 2018 J2
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
SFG_18 <- intl_df %>%
filter(SignOrg == 'SFG' & SignYr == 2018)
SFG_18
## # A tibble: 4 × 12
## Top100 OrgRank Name CurrentOrg Pos Level ETA FutureValue SignYr SignMkt
## <int> <int> <chr> <chr> <chr> <chr> <int> <int> <int> <chr>
## 1 65 2 Marco … SFG RF MLB 2024 50 2018 J2
## 2 NA 21 Victor… SFG 1B AA 2025 40 2018 J2
## 3 NA 27 Jairo … SFG LF A+ 2023 40 2018 J2
## 4 12 1 Luis M… SFG CF MLB 2023 55 2018 J2
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
ARI_22 <- intl_df %>%
filter(SignOrg == 'ARI' & SignYr == 2022)
ARI_22
## # A tibble: 4 × 12
## Top100 OrgRank Name CurrentOrg Pos Level ETA FutureValue SignYr SignMkt
## <int> <int> <chr> <chr> <chr> <chr> <int> <int> <int> <chr>
## 1 NA 4 Jansel… ARI SS A 2027 45 2022 Intl15
## 2 NA 5 Cristo… ARI 2B A 2027 45 2022 Intl15
## 3 NA 11 Ruben … ARI 3B CPX 2028 40 2022 Intl15
## 4 NA 30 Yerald… ARI SS CPX 2027 40 2022 Intl15
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
COL_19 <- intl_df %>%
filter(SignOrg == 'COL' & SignYr == 2019)
COL_19
## # A tibble: 3 × 12
## Top100 OrgRank Name CurrentOrg Pos Level ETA FutureValue SignYr SignMkt
## <int> <int> <chr> <chr> <chr> <chr> <int> <int> <int> <chr>
## 1 8 1 Adael … COL 2B AA 2025 60 2019 J2
## 2 NA 3 Yanqui… COL LF AA 2025 45 2019 J2
## 3 NA 22 Adrian… TOR CF A 2025 40 2019 J2
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
LAA_21 <- intl_df %>%
filter(SignOrg == 'LAA' & SignYr == 2021)
LAA_21
## # A tibble: 5 × 12
## Top100 OrgRank Name CurrentOrg Pos Level ETA FutureValue SignYr SignMkt
## <int> <int> <chr> <chr> <chr> <chr> <int> <int> <int> <chr>
## 1 48 3 Edgar … CHW C AA 2025 50 2021 J2
## 2 NA 5 Walber… LAA SIRP A 2026 40 2021 Intl15
## 3 NA 9 Jorge … LAA LF A 2026 40 2021 Intl15
## 4 NA 15 Denzer… LAA SS A 2026 40 2021 Intl15
## 5 NA 18 Keythe… LAA SP A+ 2027 35 2021 Intl15
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
CIN_18 <- intl_df %>%
filter(SignOrg == 'CIN' & SignYr == 2018)
CIN_18
## # A tibble: 2 × 12
## Top100 OrgRank Name CurrentOrg Pos Level ETA FutureValue SignYr SignMkt
## <int> <int> <chr> <chr> <chr> <chr> <int> <int> <int> <chr>
## 1 NA 40 Luis M… CIN SIRP A 2023 35 2018 J2
## 2 1 1 Elly D… CIN SS MLB 2024 60 2018 J2
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
MIA_19 <- intl_df %>%
filter(SignOrg == 'MIA' & SignYr == 2019)
MIA_19
## # A tibble: 4 × 12
## Top100 OrgRank Name CurrentOrg Pos Level ETA FutureValue SignYr SignMkt
## <int> <int> <chr> <chr> <chr> <chr> <int> <int> <int> <chr>
## 1 NA 12 Javier… MIA CF A+ 2025 40 2019 J2
## 2 NA 25 Ian Le… MIA 2B A 2025 35 2019 J2
## 3 NA 9 Jose S… MIN 2B A+ 2025 45 2019 J2
## 4 3 1 Eury P… MIA SP MLB 2023 60 2019 J2
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>
MIL_21 <- intl_df %>%
filter(SignOrg == 'MIL' & SignYr == 2021)
MIL_21
## # A tibble: 4 × 12
## Top100 OrgRank Name CurrentOrg Pos Level ETA FutureValue SignYr SignMkt
## <int> <int> <chr> <chr> <chr> <chr> <int> <int> <int> <chr>
## 1 3 1 Jackso… MIL CF AAA 2024 60 2021 J2
## 2 NA 13 Hendry… MIL RF A+ 2025 40 2021 Intl15
## 3 NA 16 Jadher… MIL SS A 2026 40 2021 Intl15
## 4 NA 29 Luis C… MIL LF A 2026 35 2021 Intl15
## # ℹ 2 more variables: SignOrg <chr>, Bonus <chr>