Code
library(tidyverse)
library(plotly)
library(leaflet)
library(leaflet.extras)
library(tidygeocoder)
nba_all <- read.csv("C:/Users/ajsth/Documents/5. Education/1. RData/NBA project/nba_all.csv")library(tidyverse)
library(plotly)
library(leaflet)
library(leaflet.extras)
library(tidygeocoder)
nba_all <- read.csv("C:/Users/ajsth/Documents/5. Education/1. RData/NBA project/nba_all.csv")# obtaining number of players from each state for each "end year"
nba_pc_year <- nba_all |>
arrange(End_Year) |>
group_by(End_Year) |>
count(State) |>
rename(Count = n)# dplyr replace NA
nba_all <- nba_all |>
mutate(across(6:29, ~replace_na(.,0)))
#set up current player roster
nba_current <- nba_all |>
filter(End_Year == 2025)
nba_cgrouped <- nba_current |> #assistance with CGPT
group_by(City_State, lat, long) |>
summarize(
players = paste0(Name, ", Games played: ", Games,", PPG:",PTS.Game, collapse = "<br/>"),
.groups = "drop"
)current_map <- leaflet(nba_cgrouped, options = leafletOptions(
minZoom = 3,
maxZoom = 10
) ) |>
addTiles()|>
setView(lng = -95.750, #center map
lat = 36.800,
zoom = 4) |>
addCircleMarkers(
lng = ~long,
lat = ~lat,
label = ~players,
group = "searchmarkers",
options = markerOptions(opacity = 0),
popup = ~players
)|>
addCircleMarkers(
lng = ~long,
lat = ~lat,
label = ~City_State,
popup = ~players,
color = "#366488",
stroke = FALSE, fillOpacity = 0.5,
clusterOptions = markerClusterOptions(showCoverageOnHover = FALSE)
) |>
addResetMapButton() |>
addSearchFeatures(targetGroups = "searchmarkers", #assistance with CGPT
searchFeaturesOptions(
zoom = 9,
openPopup = TRUE
))|>
addControl(
html = "Source: Basketball Reference",
position = "bottomleft"
)
current_mapThis map shows the birth city and state for active, American-born players in the NBA, as of the 2025 season. The spatial patterns of this map fall into the impression I get with a lot of map-based data sets: is this just a population density map? Schwabish also references that he does not generally prefer maps, as there is often a better way to visualize information.
I initially had the default markers, but I’m glad I changed to circular markers, and was able to change the opacity and also cluster. them. This removed a lot of clutter from the map, and the clustering looks familiar to maps one might find in a news article. I did want to have a search function for players, so with a little bit (a lot) of help from ChatGPT, I was able to add invisible markers that are searchable.
I give myself a 6/10 for this map, giving myself a little credit in that its the first one I’ve really made using my own decision making rather than following the class videos. I think a choropleth with hover info of the players would better communicate this information.
I also wish I had taken the time to obtain international player information as some of my favorite players are foreign (Kristaps Porzingus!) but I was already spending a significant portion of time scraping data and geographic information for US-born players.
nba_long <- nba_all |>
mutate(Year = map2(Start_Year, End_Year, seq))|>
unnest(Year) |>
relocate(Year,.after = Name) |>
rename(Years_played = Years)all_states <- unique(nba_long$State)
nba_long |>
count(Year, State, name = "n") |> #assistance CGPT for count properly
mutate(State= factor(State, levels = all_states))|>
plot_ly(x= ~State, y = ~n, frame = ~Year, showlegend = FALSE ) |>
add_bars()|>
layout(title = list(text= "Birth state of players by active years",
y = .95,
font=list(family = "Times New Roman",
size = 25,
color = "#366488")),
xaxis = list(title = "", categoryorder = "category ascending"),
yaxis =list(title=""),
annotations = list(text = "Source: Basketball Reference",
showarrow = FALSE,
x=.01,
y= -.8,
xref = "paper",
yref = "paper",
font = list(size =10)))|>
animation_opts(frame =700)This interactive chart shows the total count of active NBA players by their birth state, over the years. The most illuminating information provided is that the total player count of the NBA has dramatically increased over the years, which makes sense as there ~4x as many teams now as there were prior to 1960.
I chose a bar chart, because I thought it would be an interesting way to show changes over time. I wanted to add an interactive element showing the player names and teams on hover, but I was having difficulties implementing it, and after some frustration decided that showing a hover of up to 40+ players might not be the best way to visualize the data anyway.
I give this chart a 3/10. I’m pretty disappointed in it, and had I been a little more thoughtful on the outset, I would have scraped data for populations over team, and team info over time. That represents a larger challenge that I might decide to complete if I revisit this.
The two charts were supposed to tell as story of where NBA players tend to develop. It would have been interesting to see if this bucked general population trends (I did notice that Steph Curry and LeBron James are both from Akron though).
Technically, plotly in particular seems pretty finicky, and I couldn’t get exactly what I wanted to show implemented, even with the help of AI. I also ran into several problems acquiring the data. I chose the “hard” route of scraping data myself instead of finding a pre-existing dataset. During this, I ran into rate-limiting due to my inexperience scraping, and significant time constraints generating geocode data.
That said, this was a good, if frustrating inexperience, which gave me practice as I intend to use plotly and leaflet against soon for another project.
All data was scraped from Basketball Reference.
https://www.basketball-reference.com/
Plotly Reference for methods
https://plotly.com/r/reference/
Leaflet Reference for methods
https://rstudio.github.io/leaflet/reference/
ChatGPT for significant trouble shooting and assistance with syntax
https://chatgpt.com/share/6817e7ac-1de0-800d-9500-d84604f477bc