library(nbastatR, warn.conflicts = FALSE)
library(tidyverse)
library(lubridate, warn.conflicts = FALSE)
library(gt)
library(skimr)

The All Stars

First lets take a look at our starters:

asg_starters %>%
  select(player, team_slug) %>%
  gt() %>%
  tab_header(
    title = md("**NBA All-Star Game Starters**"),
  ) %>%
  cols_label(player = md("**Player Name**"),
             team_slug = md("**Team**")) %>%
  tab_footnote(
    footnote = "Team captain",
    locations = cells_data(
      columns = vars(player),
      rows = c(1, 8))
  ) %>%
  tab_options(footnote.glyph = c("*, †, ‡"))

NBA All-Star Game Starters
Player Name Team
Giannis Antetokounmpo* MIL
Stephen Curry GSW
Kevin Durant GSW
Joel Embiid PHI
Paul George OKC
James Harden HOU
Kyrie Irving BOS
LeBron James* LAL
Kawhi Leonard TOR
Kemba Walker CHA
* Team captain

Now the All-Star reserves:

asg_reserves %>%
  select(player, team_slug) %>%
  gt() %>%
  tab_header(
    title = md("**NBA All-Star Game Reserves**"),
  ) %>%
  cols_label(player = md("**Player Name**"),
             team_slug = md("**Team**"))

NBA All-Star Game Reserves
Player Name Team
LaMarcus Aldridge SAS
Bradley Beal WAS
Anthony Davis NOP
Blake Griffin DET
Nikola Jokic DEN
Damian Lillard POR
Kyle Lowry TOR
Khris Middleton MIL
Victor Oladipo IND
Ben Simmons PHI
Klay Thompson GSW
Karl-Anthony Towns MIN
Nikola Vucevic ORL
Russell Westbrook OKC

We’ve also got a category I’m calling “specials.” Brooklyn Nets guard D’Angelo Russell was named by NBA Commissioner Adam Silver as the replacement for Victor Oladipo, who ruptured a quad tendon in a game against the Raptors on January 23.1

Silver also added Dirk Nowitzki and Dwyane Wade as “special team roster additions” because, well, they’re Dirk Nowitzki and D. Wade.2

asg_special %>%
  select(player, team_slug) %>%
  gt() %>%
  tab_header(
    title = md("**NBA All-Star Game Special Selections**"),
  ) %>%
  cols_label(player = md("**Player Name**"),
             team_slug = md("**Team**"))
NBA All-Star Game Special Selections
Player Name Team
D'Angelo Russell BKN
Dirk Nowitzki DAL
Dwyane Wade MIA

The Votes

We’ve got voting data, which I scraped and wrangled in an earlier missive.3

Western Conference Frontcourt
NBA All-Star Voting 2019 Results
Rank Player Name Team Fan Rank Player Rank Media Rank Weighted Score
1 LeBron James*, † LAL 1 1 1 1.00
2 Kevin Durant* GSW 3 2 2 2.50
3 Paul George* OKC 4 4 4 4.00
4 Anthony Davis NOP 5 3 3 4.00
5 Luka Doncic DAL 2 8 6 4.50
6 Nikola Jokic DEN 7 5 5 6.00
7 Steven Adams OKC 6 7 8 6.75
8 Draymond Green GSW 9 10 8 9.00
9 Karl-Anthony Towns MIN 11 10 8 10.00
10 LaMarcus Aldridge SAS 13 6 8 10.00
source: nba.com
* Voted to start
Team captain
Tiebreaker for starting spot is fan rank
Western Conference Backcourt
NBA All-Star Voting 2019 Results
Rank Player Name Team Fan Rank Player Rank Media Rank Weighted Score
1 Stephen Curry* GSW 1 1 2 1.25
2 James Harden* HOU 3 2 1 2.25
3 Derrick Rose MIN 2 4 6 3.50
4 Russell Westbrook OKC 4 3 3 3.50
5 Damian Lillard POR 6 5 4 5.25
6 Klay Thompson GSW 5 11 4 6.25
7 DeMar DeRozan SAS 7 8 6 7.00
8 Devin Booker PHX 10 6 6 8.00
9 Lonzo Ball LAL 8 14 6 9.00
10 Chris Paul HOU 9 14 6 9.50
source: nba.com
* Voted to start
Eastern Conference Frontcourt
NBA All-Star Voting 2019 Results
Rank Player Name Team Fan Rank Player Rank Media Rank Weighted Score
1 Giannis Antetokounmpo*, † MIL 1 1 1 1.00
2 Kawhi Leonard* TOR 2 2 1 1.75
3 Joel Embiid* PHI 3 3 1 2.50
4 Jayson Tatum BOS 4 7 4 4.75
5 Jimmy Butler PHI 5 4 7 5.25
6 Blake Griffin DET 6 5 7 6.00
7 Pascal Siakam TOR 8 10 4 7.50
8 Vince Carter ATL 7 12 7 8.25
9 Andre Drummond DET 12 6 7 9.25
10 Nikola Vucevic ORL 13 9 4 9.75
source: nba.com
* Voted to start
Team captain
Eastern Conference Backcourt
NBA All-Star Voting 2019 Results
Rank Player Name Team Fan Rank Player Rank Media Rank Weighted Score
1 Kyrie Irving* BOS 1 1 1 1.0
2 Kemba Walker* CHA 3 2 2 2.5
3 Dwyane Wade MIA 2 6 6 4.0
4 Ben Simmons PHI 4 5 3 4.0
5 Victor Oladipo IND 5 4 4 4.5
6 Kyle Lowry TOR 6 7 7 6.5
7 Bradley Beal WAS 10 3 5 7.0
8 Zach LaVine CHI 7 8 8 7.5
9 D'Angelo Russell BKN 11 10 8 10.0
10 Eric Bledsoe MIL 16 8 8 12.0
source: nba.com
* Voted to start.

Munging data in

Let’s get some more data on our actual selected All-Star team. First I’ll combine the starters, reserves, and “specials” into a single data frame.

asg_players <- bind_rows(asg_starters, asg_reserves, asg_special)

Next will bring in the All-Star voting data Note that I’m using two columns for the join not because I necessarily need them, but to avoid name repair which (by default) would leave me with two variables, team_slug.x and team_slug.y, in the resulting data frame.

asg_data <- asg_players %>%
  left_join(asg_votes, by = c("player", "team_slug"))

I’m also gonna ditch those last four columns, since they were a mess of my own making from the initial data scraping, and I’m pretty sure they have variable names that will bork something up somewhere.

asg_data <- asg_data[,1:13]

Selections vs. Voted Rankings

Next, let’s take a look at the players who were “least deserving” of their selection according to player, fan, and media rankings. We’ll group by conf_court, since the voting is does that way, and, thus each combination of Eastern/Western conferences and front/back courts has their own ranking.

rank_vars <- c("player", "team_slug", "player_rank", "fan_rank", "media_rank", "weighted_score", "conf_court")

asg_data %>%
  group_by(conf_court) %>%
  filter(player_rank == max(player_rank)) %>%
  select(rank_vars) %>%
  gt() %>%
  tab_header(
    title = md("**Least Deserving: Player Rank**"),
  ) %>%
  cols_label(player = md("**Player Name**"),
             team_slug = md("**Team**"),
             fan_rank = md("**Fan Rank**"),
             player_rank = md("**Player Rank**"),
             media_rank = md("**Media Rank**"),
             weighted_score = md("**Weighted Score**")) %>%
  tab_style(
    style = cells_styles(
      bkgd_color = "lightpink"),
    locations = cells_data(
      columns = vars(player_rank)
      )
    ) %>%
  tab_source_note(md("source: [nba.com](http://www.nba.com/article/2019/01/24/2019-nba-all-star-starters-revealed-official-release)")) %>%
  tab_options(stub_group.font.weight = "600",
              sourcenote.padding = px(10))
Least Deserving: Player Rank
Player Name Team Player Rank Fan Rank Media Rank Weighted Score
western backcourt
Klay Thompson GSW 11 5 4 6.25
eastern frontcourt
Nikola Vucevic ORL 9 13 4 9.75
eastern backcourt
D'Angelo Russell BKN 10 11 8 10.00
western frontcourt
Dirk Nowitzki DAL 30 15 8 17.00
source: nba.com
Least Deserving: Fan Rank
Player Name Team Player Rank Fan Rank Media Rank Weighted Score
western backcourt
Damian Lillard POR 5 6 4 5.25
eastern frontcourt
Khris Middleton MIL 8 14 7 10.75
eastern backcourt
D'Angelo Russell BKN 10 11 8 10.00
western frontcourt
Dirk Nowitzki DAL 30 15 8 17.00
source: nba.com
Least Deserving: Media Rank
Player Name Team Player Rank Fan Rank Media Rank Weighted Score
western frontcourt
LaMarcus Aldridge SAS 6 13 8 10.00
Karl-Anthony Towns MIN 10 11 8 10.00
Dirk Nowitzki DAL 30 15 8 17.00
eastern frontcourt
Blake Griffin DET 5 6 7 6.00
Khris Middleton MIL 8 14 7 10.75
western backcourt
Damian Lillard POR 5 6 4 5.25
Klay Thompson GSW 11 5 4 6.25
eastern backcourt
D'Angelo Russell BKN 10 11 8 10.00
source: nba.com

Why so many in this last round? Well, the media happened to have quite a few ties. Don’t worry, they haven’t totally lost their minds. They’ve got Damian Lillard ranked number 4 for Western Conference Guards, tied with Klay Thompson. Klay’s an interesting case, because though he’s definitely got the fans and media behind him, he’s only ranked 11th by his peers.

It’s not surprising that we see Nowitzki in all three. Dude’s a legend, playing at 41, and this is a “sentimental” pick on a lot of levels. DLo (D’Angelo Russell) is another story, though. I’m not saying he doesn’t deserve it — he’s been great this season, but (if you take a look at the table above for Eastern Backcourt), by votes alone, the players, fans, and media all had several players lined up before him.

Player faves not selected

Based on the votes of their peers, who was low-key robbed?

We’ll grab the top seven players from each segment according to player rankings. Then we can see who among them was not selected to play in the All-Star Game.4

I like this little not in operator %ni%, from a post on Stack Overflow by Spencer Castro.5 So, I’ll use that to get my player list.

'%ni%' <- Negate('%in%')
player_top7 <- asg_votes %>%
  filter(player_rank <= 7) 

top7_nin_asg <- player_top7 %>%
  filter(player %ni% asg_players$player)
Top players by player votes not on All-Star Roster
NBA All-Star Voting 2019 Results
Player Name Team Fan Rank Player Rank Media Rank Weighted Score
Jimmy Butler PHI 5 4 7 5.25
Derrick Rose MIN 2 4 6 3.50
Andre Drummond DET 12 6 7 9.25
Devin Booker PHX 10 6 6 8.00
Jayson Tatum BOS 4 7 4 4.75
Steven Adams OKC 6 7 8 6.75
Jamal Murray DEN 13 7 6 9.75


Looks like Jimmy Butler and Derrick Rose shoulda made the cut as far as their fellow players were concerned.

Lest you think everyone was in agreement about that, the media were a pretty consistent voting block, and, the way rankings were done, this means players with zero media votes all tied for… Well, let’s see:

asg_votes %>%
  group_by(conf_court) %>%
  filter(media_rank == max(media_rank)) %>%
  select(media_total_votes, media_rank, conf_court) %>%
  distinct() %>%
  gt() %>%
  tab_header(
    title = md("**Media All-Star No Votes and Rank**"),
    subtitle = glue::glue("NBA All-Star Voting 2019 Results")
    ) %>%
  cols_label(
    media_total_votes = md("**Media Votes**"),
    media_rank = md("**Media Rank**")
    ) %>%
    tab_source_note(md("source: [nba.com](http://www.nba.com/article/2019/01/24/2019-nba-all-star-starters-revealed-official-release)")) %>%
  tab_options(stub_group.font.weight = "600",
              sourcenote.padding = px(10))
Media All-Star No Votes and Rank
NBA All-Star Voting 2019 Results
Media Votes Media Rank
eastern frontcourt
0 7
eastern backcourt
0 8
western frontcourt
0 8
western backcourt
0 8
source: nba.com

So, yeah, Jimmy Butler was 7th in media rank…along with every other frontcourt player from the Eastern Conference who got zero media votes.

Team LeBron vs. Team Giannis

The conference captains, LeBron James and Giannis Antetokounmpo, drafted their teams on February 8th. Here’s how that shook out:

allstar_teams <- read_csv(here::here("data", "ASG", "allstar_teams.csv"))
## Parsed with column specification:
## cols(
##   draft_pick = col_double(),
##   player = col_character(),
##   allstar_team = col_character()
## )
allstar_teams %>%
  gt() %>%
  tab_header(
    title = md("**Draft Results: Team LeBron vs. Team Giannis**"),
    subtitle = glue::glue("2019 NBA All-Star Rosters")
  ) %>%
  cols_label(
    draft_pick = md("**Pick**"),
    player = md("**Player**"),
    allstar_team = md("**Team**")
  ) %>%
    tab_source_note(md("source: [nba.com](http://www.nba.com/allstar/2019/draft)")) %>%
    tab_footnote(
    footnote = "Traded after draft",
    locations = cells_data(
      columns = vars(player),
      rows = c(13, 16))
  ) %>%
  tab_options(footnote.glyph = c("*, †, ‡"))
Draft Results: Team LeBron vs. Team Giannis
2019 NBA All-Star Rosters
Pick Player Team
1 Kevin Durant Team LeBron
2 Stephen Curry Team Giannis
3 Kyrie Irving Team LeBron
4 Joel Embiid Team Giannis
5 Kawhi Leonard Team LeBron
6 Paul George Team Giannis
7 James Harden Team LeBron
8 Kemba Walker Team Giannis
9 Khris Middleton Team Giannis
10 Anthony Davis Team LeBron
11 Nikola Jokic Team Giannis
12 Klay Thompson Team LeBron
13 Ben Simmons* Team Giannis
14 Damian Lillard Team LeBron
15 Blake Griffin Team Giannis
16 Russell Westbrook* Team LeBron
17 D’Angelo Russell Team Giannis
18 LaMarcus Aldridge Team LeBron
19 Nikola Vucevic Team Giannis
20 Karl-Anthony Towns Team LeBron
21 Kyle Lowry Team Giannis
22 Bradley Beal Team LeBron
23 Dwyane Wade Team LeBron
24 Dirk Nowitzki Team Giannis
source: nba.com
* Traded after draft

Since there are no rules prohibiting trades (this is only the second year the NBA has drafted all-star teams rather than splitting them by conference), Giannis and LeBron went ahead and swapped Ben Simmons and Russell Westbrook after the draft.6

allstar_results <- read_csv(here::here("data", "ASG", "allstar_results.csv"))
## Parsed with column specification:
## cols(
##   player = col_character(),
##   allstar_team = col_character(),
##   starter = col_logical()
## )

allstar_results %>%
  mutate(status = case_when(
    starter == TRUE ~ "Starters",
    TRUE ~ "Reserves"
  )) %>%
  dplyr::select(player, allstar_team, status) %>%
  group_by(allstar_team, status) %>%
  gt() %>%
  tab_header(
    title = md("**2019 NBA All-Star Teams**")
  ) %>%
    tab_source_note(md("source: [nba.com](http://www.nba.com/allstar/2019/draft)")) %>%
  tab_options(
    stub_group.font.weight = "600"
  )
2019 NBA All-Star Teams
player
Team LeBron - Starters
LeBron James
Kevin Durant
Kyrie Irving
Kawhi Leonard
James Harden
Team Giannis - Starters
Giannis Antetokounmpo
Stephen Curry
Joel Embiid
Paul George
Kemba Walker
Team Giannis - Reserves
Khris Middleton
Nikola Jokic
Blake Griffin
Russell Westbrook
D'Angelo Russell
Nikola Vucevic
Kyle Lowry
Dirk Nowitzki
Team LeBron - Reserves
Anthony Davis
Klay Thompson
Ben Simmons
Damian Lillard
LaMarcus Aldridge
Karl-Anthony Towns
Bradley Beal
Dwyane Wade
source: nba.com

Since it’s pretty obvious that we’re looking at player names here, I wanted to get rid of player in the header. To do this, we can pass player to the rowname_col argument for gt(). However, in order to do so, you need to have at least one other variable in the table (otherwise it’s just rownames, and your table gets confused). So, I’ll bring the players’ NBA team slugs back into the mix

asg_player_teamslug <- asg_data %>%
  dplyr::select(player, team_slug)

allstar_results %>%
  left_join(asg_player_teamslug, by = "player") %>%
  mutate(status = case_when(
    starter == TRUE ~ "Starters",
    TRUE ~ "Reserves"
  )) %>%
  dplyr::select(player, allstar_team, status, team_slug) %>%
  group_by(allstar_team, status) %>%
  gt(rowname_col = "player") %>%
  tab_header(
    title = md("**2019 NBA All-Star Teams**")
  ) %>%
  cols_label(
    team_slug = md("**NBA Team**")
  ) %>%
  tab_source_note(md("source: [nba.com](http://www.nba.com/allstar/2019/draft)")) %>%
  tab_options(
    stub_group.font.weight = "600"
  )
2019 NBA All-Star Teams
NBA Team
Team LeBron - Starters
LeBron James LAL
Kevin Durant GSW
Kyrie Irving BOS
Kawhi Leonard TOR
James Harden HOU
Team Giannis - Starters
Giannis Antetokounmpo MIL
Stephen Curry GSW
Joel Embiid PHI
Paul George OKC
Kemba Walker CHA
Team Giannis - Reserves
Khris Middleton MIL
Nikola Jokic DEN
Blake Griffin DET
Russell Westbrook OKC
D'Angelo Russell BKN
Nikola Vucevic ORL
Kyle Lowry TOR
Dirk Nowitzki DAL
Team LeBron - Reserves
Anthony Davis NOP
Klay Thompson GSW
Ben Simmons PHI
Damian Lillard POR
LaMarcus Aldridge SAS
Karl-Anthony Towns MIN
Bradley Beal WAS
Dwyane Wade MIA
source: nba.com

We can also un-tidy our data to take a look at the teams side-by-side. I’ll also add a little flare to this table by giving the table a background colour in tab_options() — the font colour switches automatically, which is pretty cool.

allstar_spread <- allstar_results %>%
  tibble::rowid_to_column() %>%
  spread(allstar_team, player) %>%
  mutate(status = case_when(
    starter == TRUE ~ "Starters",
    TRUE ~ "Reserves"
  )) %>%
  dplyr::select(status, `Team Giannis`, `Team LeBron`)

team_giannis <- allstar_spread %>%
  filter(!is.na(`Team Giannis`)) %>%
  dplyr::select(status, `Team Giannis`)

team_lebron <- allstar_spread %>%
  filter(!is.na(`Team LeBron`)) %>%
  dplyr::select(status, `Team LeBron`)

spread_teams <- team_giannis %>%
  bind_cols(team_lebron) %>%
  select(-status1)

spread_teams %>%
  group_by(status) %>%
  gt() %>%
  tab_options(
    table.background.color = "midnightblue",
    column_labels.font.weight = "800",
    stub_group.font.weight = "600"
  )
Team Giannis Team LeBron
Starters
Giannis Antetokounmpo LeBron James
Stephen Curry Kevin Durant
Joel Embiid Kyrie Irving
Paul George Kawhi Leonard
Kemba Walker James Harden
Reserves
Khris Middleton Anthony Davis
Nikola Jokic Klay Thompson
Blake Griffin Ben Simmons
Russell Westbrook Damian Lillard
D'Angelo Russell LaMarcus Aldridge
Nikola Vucevic Karl-Anthony Towns
Kyle Lowry Bradley Beal
Dirk Nowitzki Dwyane Wade

Actual sportsball data

Now we’ll grab the the most recent advanced metrics from basketball reference using the {nbastatR} package by Alex Bresler. Note, running bref_players_stats() will assign the output data frames, dataBREFPlayerTotals and dataBREFPlayerAdvanced, to the environment, so we don’t need to do anything else.

bref_players_stats(seasons = 2019, tables = c("advanced", "totals"), widen = TRUE, assign_to_environment = TRUE)

Let’s take a quick gander at what these data look like for everyone in the league using the {skimr} package.7

skim-ing BREF Totals

skim(dataBREFPlayerTotals) %>% skimr::kable()

Skim summary statistics
n obs: 489
n variables: 39

Variable type: character

variable missing complete n min max empty n_unique
idPlayer 0 489 489 6 9 0 489
idPlayerSeason 0 489 489 11 14 0 489
idPosition 0 489 489 1 5 0 14
namePlayer 0 489 489 4 24 0 489
namePlayerBREF 0 489 489 7 24 0 489
slugSeason 0 489 489 7 7 0 1
slugTeamBREF 0 489 489 3 3 0 31
urlPlayerHeadshot 15 474 489 86 89 0 474
urlPlayerThumbnail 0 489 489 52 89 0 489

Variable type: integer

variable missing complete n mean sd p0 p25 p50 p75 p100 hist
idPlayerNBA 0 489 489 876605.81 724735.46 1114 2e+05 2e+05 1628381 1629312 ▇▁▁▁▁▁▁▇
yearSeasonFirst 15 474 489 2013.45 4.36 1989 2011 2015 2017 2018 ▁▁▁▁▁▂▃▇

Variable type: logical

variable missing complete n mean count
isHOFPlayer 0 489 489 0 FAL: 489, NA: 0

Variable type: numeric

variable missing complete n mean sd p0 p25 p50 p75 p100 hist
agePlayer 0 489 489 26.06 4.18 19 23 25 29 42 ▃▇▇▆▃▁▁▁
astTotals 0 489 489 85.21 93.93 0 16 54 120 537 ▇▃▂▁▁▁▁▁
blkTotals 0 489 489 17.24 21.41 0 3 10 23 145 ▇▂▁▁▁▁▁▁
countGames 0 489 489 36.88 18.26 1 22 43 53 59 ▃▂▂▂▂▃▅▇
countGamesStarted 0 489 489 17.45 20.91 0 0 6 35 59 ▇▂▁▁▁▁▁▂
drbTotals 0 489 489 119.99 110.03 0 33 97 176 594 ▇▅▃▂▁▁▁▁
fg2aTotals 0 489 489 198.82 194.41 0 42 142 289 925 ▇▅▂▂▁▁▁▁
fg2mTotals 0 489 489 103.17 103.71 0 20 71 148 502 ▇▅▂▁▁▁▁▁
fg3aTotals 0 489 489 110.09 114.85 0 14 72 174 733 ▇▃▂▁▁▁▁▁
fg3mTotals 0 489 489 39.1 43.43 0 3 24 64 274 ▇▃▂▁▁▁▁▁
fgaTotals 0 489 489 308.9 274.48 0 79 242 469 1316 ▇▅▃▂▁▁▁▁
fgmTotals 0 489 489 142.27 129.93 0 34 112 210 581 ▇▅▃▂▂▁▁▁
ftaTotals 0 489 489 80.46 91.2 0 15 50 115 620 ▇▂▁▁▁▁▁▁
ftmTotals 0 489 489 61.73 73 0 11 36 92 540 ▇▂▁▁▁▁▁▁
minutesTotals 0 489 489 840.16 609.83 1 256 824 1379 2158 ▇▅▃▅▃▅▃▁
orbTotals 0 489 489 35.55 41.97 0 7 22 46 271 ▇▃▁▁▁▁▁▁
pctEFG 7 482 489 0.5 0.11 0 0.47 0.51 0.56 0.93 ▁▁▁▂▇▂▁▁
pctFG 7 482 489 0.44 0.11 0 0.4 0.44 0.49 0.74 ▁▁▁▂▇▅▁▁
pctFG2 12 477 489 0.49 0.12 0 0.46 0.51 0.55 0.74 ▁▁▁▁▃▇▃▁
pctFG3 39 450 489 0.31 0.12 0 0.28 0.34 0.37 0.6 ▂▁▁▂▇▃▁▁
pctFT 34 455 489 0.71 0.18 0 0.66 0.75 0.82 0.94 ▁▁▁▁▂▃▇▅
pfTotals 0 489 489 73.25 50.86 0 27 72 113 220 ▇▅▅▅▅▂▁▁
ptsTotals 0 489 489 385.38 358.04 0 91 299 583 1976 ▇▅▂▂▁▁▁▁
stlTotals 0 489 489 26.97 23.96 0 7 22 42 127 ▇▅▃▂▁▁▁▁
tovTotals 0 489 489 47.91 45.8 0 13 35 70 290 ▇▃▂▁▁▁▁▁
trbTotals 0 489 489 155.54 146.09 0 43 122 229 795 ▇▅▃▁▁▁▁▁
yearSeason 0 489 489 2018 0 2018 2018 2018 2018 2018 ▁▁▁▇▁▁▁▁

We can also play around with how we present this data, and which data to present using other skimr functions, such as skim_to_list().

skimmed_list <- skim_to_list(dataBREFPlayerTotals)
numeric_skim <- skimmed_list$numeric
numeric_skim %>%
  select(one_of(c("variable", "n", "mean", "sd", "p0", "p25", "p50", "p75", "p100", "hist"))) %>%
  mutate(mean = as.numeric(mean),
         sd = as.numeric(sd)) %>%
  mutate_at(vars(starts_with("p")), as.numeric) %>%
  select(-p0)
## # A tibble: 27 x 9
##    variable          n      mean     sd   p25   p50   p75  p100 hist    
##    <chr>             <chr> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <chr>   
##  1 agePlayer         489    26.1   4.18    23    25    29    42 ▃▇▇▆▃▁▁▁
##  2 astTotals         489    85.2  93.9     16    54   120   537 ▇▃▂▁▁▁▁▁
##  3 blkTotals         489    17.2  21.4      3    10    23   145 ▇▂▁▁▁▁▁▁
##  4 countGames        489    36.9  18.3     22    43    53    59 ▃▂▂▂▂▃▅▇
##  5 countGamesStarted 489    17.4  20.9      0     6    35    59 ▇▂▁▁▁▁▁▂
##  6 drbTotals         489   120.  110.      33    97   176   594 ▇▅▃▂▁▁▁▁
##  7 fg2aTotals        489   199.  194.      42   142   289   925 ▇▅▂▂▁▁▁▁
##  8 fg2mTotals        489   103.  104.      20    71   148   502 ▇▅▂▁▁▁▁▁
##  9 fg3aTotals        489   110.  115.      14    72   174   733 ▇▃▂▁▁▁▁▁
## 10 fg3mTotals        489    39.1  43.4      3    24    64   274 ▇▃▂▁▁▁▁▁
## # … with 17 more rows

skim-ing BREF Advanced Stats

skim(dataBREFPlayerAdvanced) %>% skimr::kable()

Skim summary statistics
n obs: 489
n variables: 36

Variable type: character

variable missing complete n min max empty n_unique
idPlayer 0 489 489 6 9 0 489
idPlayerSeason 0 489 489 11 14 0 489
idPosition 0 489 489 1 5 0 14
namePlayer 0 489 489 4 24 0 489
namePlayerBREF 0 489 489 7 24 0 489
slugSeason 0 489 489 7 7 0 1
slugTeamBREF 0 489 489 3 3 0 31
urlPlayerHeadshot 15 474 489 86 89 0 474
urlPlayerThumbnail 0 489 489 52 89 0 489

Variable type: integer

variable missing complete n mean sd p0 p25 p50 p75 p100 hist
idPlayerNBA 0 489 489 876605.81 724735.46 1114 2e+05 2e+05 1628381 1629312 ▇▁▁▁▁▁▁▇
yearSeasonFirst 15 474 489 2013.45 4.36 1989 2011 2015 2017 2018 ▁▁▁▁▁▂▃▇

Variable type: logical

variable missing complete n mean count
isHOFPlayer 0 489 489 0 FAL: 489, NA: 0

Variable type: numeric

variable missing complete n mean sd p0 p25 p50 p75 p100 hist
agePlayer 0 489 489 26.06 4.18 19 23 25 29 42 ▃▇▇▆▃▁▁▁
countGames 0 489 489 36.88 18.26 1 22 43 53 59 ▃▂▂▂▂▃▅▇
minutes 0 489 489 840.16 609.83 1 256 824 1379 2158 ▇▅▃▅▃▅▃▁
pct3PRate 7 482 489 0.36 0.21 0 0.22 0.38 0.51 0.91 ▆▃▇▇▆▅▁▁
pctAST 0 489 489 0.13 0.096 0 0.068 0.11 0.18 0.59 ▆▇▃▂▁▁▁▁
pctBLK 0 489 489 0.17 0.27 0 0.013 0.023 0.15 0.9 ▇▁▁▁▁▁▁▁
pctDRB 0 489 489 0.15 0.084 0 0.1 0.14 0.19 0.71 ▂▇▃▁▁▁▁▁
pctFTRate 7 482 489 0.24 0.14 0 0.16 0.22 0.32 0.83 ▃▇▇▃▂▁▁▁
pctORB 0 489 489 0.075 0.14 0 0.019 0.035 0.075 1 ▇▁▁▁▁▁▁▁
pctSTL 0 489 489 0.13 0.27 0 0.012 0.017 0.025 0.9 ▇▁▁▁▁▁▁▁
pctTOV 6 483 489 0.12 0.054 0 0.096 0.12 0.15 0.5 ▁▇▆▁▁▁▁▁
pctTRB 0 489 489 0.1 0.059 0 0.061 0.088 0.13 0.52 ▅▇▃▁▁▁▁▁
pctTrueShooting 6 483 489 0.53 0.11 0 0.5 0.55 0.59 0.93 ▁▁▁▁▇▃▁▁
pctUSG 0 489 489 0.19 0.058 0 0.15 0.18 0.22 0.43 ▁▁▆▇▅▂▁▁
ratioBPM 0 489 489 -1.83 5.04 -35.6 -3.7 -1.4 0.7 27.6 ▁▁▁▂▇▁▁▁
ratioDBPM 0 489 489 -0.51 2.62 -20.5 -1.7 -0.5 0.8 8.8 ▁▁▁▁▂▇▂▁
ratioDWS 0 489 489 0.85 0.82 -0.3 0.2 0.6 1.2 4.3 ▇▇▅▂▂▁▁▁
ratioOBPM 0 489 489 -1.32 3.97 -22.5 -2.7 -1.1 0.4 22.4 ▁▁▁▇▅▁▁▁
ratioOWS 0 489 489 0.92 1.41 -2.3 0 0.4 1.4 8.3 ▁▇▅▂▁▁▁▁
ratioPER 0 489 489 13.19 7.21 -19 9.7 12.8 16.6 81.1 ▁▁▇▂▁▁▁▁
ratioVORP 0 489 489 0.43 0.99 -2 -0.1 0.1 0.7 7 ▁▇▃▁▁▁▁▁
ratioWS 0 489 489 1.77 2.05 -1.5 0.2 1.1 2.7 10.6 ▃▇▅▂▁▁▁▁
ratioWSPer48 0 489 489 0.08 0.11 -0.47 0.041 0.082 0.13 1.26 ▁▁▇▁▁▁▁▁
yearSeason 0 489 489 2018 0 2018 2018 2018 2018 2018 ▁▁▁▇▁▁▁▁
skimmed_list <- skim_to_list(dataBREFPlayerAdvanced)
numeric_skim <- skimmed_list$numeric
numeric_skim %>%
  select(one_of(c("variable", "n", "mean", "sd", "p0", "p25", "p50", "p75", "p100", "hist"))) %>%
  mutate(mean = as.numeric(mean),
         sd = as.numeric(sd)) %>%
  mutate_at(vars(starts_with("p")), as.numeric) %>%
  select(-p0)
## # A tibble: 24 x 9
##    variable   n        mean      sd     p25     p50     p75    p100 hist   
##    <chr>      <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>  
##  1 agePlayer  489    26.1     4.18   23      25     2.90e+1   42    ▃▇▇▆▃▁…
##  2 countGames 489    36.9    18.3    22      43     5.30e+1   59    ▃▂▂▂▂▃…
##  3 minutes    489   840.    610.    256     824     1.38e+3 2158    ▇▅▃▅▃▅…
##  4 pct3PRate  489     0.36    0.21    0.22    0.38  5.10e-1    0.91 ▆▃▇▇▆▅…
##  5 pctAST     489     0.13    0.096   0.068   0.11  1.80e-1    0.59 ▆▇▃▂▁▁…
##  6 pctBLK     489     0.17    0.27    0.013   0.023 1.50e-1    0.9  ▇▁▁▁▁▁…
##  7 pctDRB     489     0.15    0.084   0.1     0.14  1.90e-1    0.71 ▂▇▃▁▁▁…
##  8 pctFTRate  489     0.24    0.14    0.16    0.22  3.20e-1    0.83 ▃▇▇▃▂▁…
##  9 pctORB     489     0.075   0.14    0.019   0.035 7.50e-2    1    ▇▁▁▁▁▁…
## 10 pctSTL     489     0.13    0.27    0.012   0.017 2.50e-2    0.9  ▇▁▁▁▁▁…
## # … with 14 more rows

skim-ing the All Stars

Now let’s bring it all together by adding these statistics to our data frame with our All Stars and their voting records.

allstar_data <- asg_data %>%
  left_join(dataBREFPlayerAdvanced, by = c("player" = "namePlayer"))

I’m not usually big on camel-case, but Alex Bresler, developer of the nbastatR package, uses some really handy naming conventions, and I don’t want to mess with a good thing.

skim(allstar_data) %>% skimr::kable()

Skim summary statistics
n obs: 27
n variables: 48

Variable type: character

variable missing complete n min max empty n_unique
conf_court 0 27 27 17 18 0 4
conference 0 27 27 7 7 0 2
court 0 27 27 5 10 0 2
idPlayer 0 27 27 8 9 0 27
idPlayerSeason 0 27 27 13 14 0 27
idPosition 0 27 27 1 2 0 5
namePlayerBREF 0 27 27 10 21 0 27
player 0 27 27 10 21 0 27
slugSeason 0 27 27 7 7 0 1
slugTeamBREF 0 27 27 3 3 0 21
team_slug 0 27 27 3 3 0 21
urlPlayerHeadshot 0 27 27 86 89 0 27
urlPlayerThumbnail 0 27 27 52 55 0 27

Variable type: integer

variable missing complete n mean sd p0 p25 p50 p75 p100 hist
idPlayerNBA 0 27 27 338517.81 468453.01 1717 2e+05 2e+05 2e+05 1627732 ▇▁▁▁▁▁▁▁
yearSeasonFirst 0 27 27 2010.07 4.19 1998 2008.5 2011 2012.5 2016 ▁▁▂▃▅▇▇▇

Variable type: logical

variable missing complete n mean count
isHOFPlayer 0 27 27 0 FAL: 27, NA: 0
starter 0 27 27 0.37 FAL: 17, TRU: 10, NA: 0

Variable type: numeric

variable missing complete n mean sd p0 p25 p50 p75 p100 hist
agePlayer 0 27 27 28.07 4.38 22 25 28 30 40 ▇▅▇▆▂▁▁▁
countGames 0 27 27 50.81 7.95 26 46.5 54 56.5 58 ▁▁▁▁▂▂▃▇
fan_rank 0 27 27 5.89 4.35 1 3 5 8.5 15 ▇▇▇▁▁▃▂▂
fan_total_votes 0 27 27 1900597.52 1408581.99 222499 672593 1445989 3e+06 4620809 ▇▆▃▁▅▂▃▂
media_rank 0 27 27 4.07 2.53 1 2 4 6.5 8 ▇▃▃▇▂▁▃▅
media_total_votes 0 27 27 36.85 41.32 0 1 11 77.5 100 ▇▁▁▁▁▁▁▃
minutes 0 27 27 1693 377.3 286 1589 1767 1931 2158 ▁▁▁▂▁▅▇▇
pct3PRate 0 27 27 0.33 0.16 0.004 0.23 0.34 0.42 0.65 ▃▃▆▇▇▇▂▅
pctAST 0 27 27 0.26 0.1 0.043 0.18 0.26 0.36 0.47 ▁▂▇▅▇▅▃▁
pctBLK 0 27 27 0.12 0.26 0.01 0.013 0.017 0.041 0.9 ▇▁▁▁▁▁▁▁
pctDRB 0 27 27 0.19 0.073 0.11 0.12 0.19 0.25 0.34 ▇▁▆▃▁▂▂▂
pctFTRate 0 27 27 0.32 0.12 0.097 0.24 0.33 0.4 0.55 ▃▂▇▃▇▃▃▂
pctORB 0 27 27 0.046 0.032 0 0.021 0.038 0.075 0.11 ▁▇▅▂▁▂▂▃
pctSTL 0 27 27 0.1 0.24 0.01 0.016 0.018 0.026 0.8 ▇▁▁▁▁▁▁▁
pctTOV 0 27 27 0.12 0.033 0.058 0.1 0.12 0.14 0.2 ▂▃▃▇▇▃▂▁
pctTRB 0 27 27 0.12 0.05 0.063 0.08 0.11 0.16 0.21 ▇▃▂▂▂▁▂▂
pctTrueShooting 0 27 27 0.58 0.044 0.48 0.56 0.59 0.6 0.66 ▂▂▁▃▆▇▃▂
pctUSG 0 27 27 0.29 0.041 0.19 0.27 0.29 0.31 0.4 ▂▂▂▇▇▁▁▁
player_rank 0 27 27 5.52 5.73 1 2 4 6.5 30 ▇▅▂▁▁▁▁▁
player_total_votes 0 27 27 88.3 72.81 4 34.5 58 157 269 ▇▇▃▁▃▃▁▁
ratioBPM 0 27 27 4.59 3.85 -6 2.4 5.3 6.8 11.8 ▁▁▂▃▅▇▂▂
ratioDBPM 0 27 27 0.86 2.02 -2 -0.7 0.6 2.15 5.3 ▇▂▇▅▃▃▁▂
ratioDWS 0 27 27 2.26 0.86 0.3 1.65 2.2 2.7 4.3 ▁▂▇▇▇▂▂▁
ratioOBPM 0 27 27 3.72 3.08 -4 1.3 4.5 5.5 11.2 ▁▁▇▃▇▅▁▁
ratioOWS 0 27 27 3.96 2.23 -0.2 2.55 4.1 5.85 8.3 ▅▂▁▇▇▇▃▁
ratioPER 0 27 27 22.52 5.29 7.9 19.4 23.9 25.65 30.8 ▁▁▂▃▃▂▇▂
ratioVORP 0 27 27 2.97 1.65 -0.3 1.95 3.2 3.75 7 ▃▂▆▆▇▂▂▁
ratioWS 0 27 27 6.21 2.64 0.1 4.5 6.4 8.1 10.6 ▂▃▅▅▇▇▇▅
ratioWSPer48 0 27 27 0.17 0.066 0.015 0.12 0.19 0.22 0.29 ▁▁▅▃▂▇▃▂
weighted_score 0 27 27 5.34 3.86 1 2.5 4 6.75 17 ▇▅▅▁▅▁▁▁
yearSeason 0 27 27 2018 0 2018 2018 2018 2018 2018 ▁▁▁▇▁▁▁▁
skimmed_list <- skim_to_list(allstar_data)
numeric_skim <- skimmed_list$numeric
numeric_skim %>%
  select(one_of(c("variable", "mean", "sd", "p0", "p25", "p50", "p75", "p100", "hist"))) %>%
  mutate(mean = as.numeric(mean),
         sd = as.numeric(sd)) %>%
  mutate_at(vars(starts_with("p")), as.numeric) %>%
  select(-p0)
## # A tibble: 31 x 8
##    variable        mean       sd      p25       p50       p75    p100 hist 
##    <chr>          <dbl>    <dbl>    <dbl>     <dbl>     <dbl>   <dbl> <chr>
##  1 agePlayer    2.81e+1  4.38e+0  2.50e+1   2.80e+1   3.00e+1 4.00e+1 ▇▅▇▆…
##  2 countGames   5.08e+1  7.95e+0  4.65e+1   5.40e+1   5.65e+1 5.80e+1 ▁▁▁▁…
##  3 fan_rank     5.89e+0  4.35e+0  3.00e+0   5.00e+0   8.50e+0 1.50e+1 ▇▇▇▁…
##  4 fan_total_…  1.90e+6  1.41e+6  6.73e+5   1.45e+6   3.00e+6 4.62e+6 ▇▆▃▁…
##  5 media_rank   4.07e+0  2.53e+0  2.00e+0   4.00e+0   6.50e+0 8.00e+0 ▇▃▃▇…
##  6 media_tota…  3.68e+1  4.13e+1  1.00e+0   1.10e+1   7.75e+1 1.00e+2 ▇▁▁▁…
##  7 minutes      1.69e+3  3.77e+2  1.59e+3   1.77e+3   1.93e+3 2.16e+3 ▁▁▁▂…
##  8 pct3PRate    3.30e-1  1.60e-1  2.30e-1   3.40e-1   4.20e-1 6.50e-1 ▃▃▆▇…
##  9 pctAST       2.60e-1  1.00e-1  1.80e-1   2.60e-1   3.60e-1 4.70e-1 ▁▂▇▅…
## 10 pctBLK       1.20e-1  2.60e-1  1.30e-2   1.70e-2   4.10e-2 9.00e-1 ▇▁▁▁…
## # … with 21 more rows

  1. “Nets’ Russell replaces injured Oladipo in All-Star Game” http://www.nba.com/article/2019/02/01/dangelo-russell-replaces-victor-oladipo-all-star-roster

  2. “Wade, Nowitzki named special roster additions for All-Star Game” http://www.nba.com/article/2019/02/01/dirk-nowitzki-dwyane-wade-added-all-star-game

  3. “NBA All-Star Games: Votes” http://rpubs.com/maraaverick/nba-asg-votes-2019

  4. I’m using top 7 here, since there are seven reserve players selected in the second round.

  5. “Opposite of %in%https://stackoverflow.com/questions/5831794/opposite-of-in

  6. You can watch the video here, it’s pretty adorable.

  7. I highly recommend checking out Elin Waring’s vignette, “Using skimr” if you’d like to learn more.