This tutorial demonstrates how to create an animated visualization of the NBA’s play-by-play data. It heavily builds upon the tutorials written by Rajiv Shah available here:. I will use several functions available at Rajiv’s GitHub page to get the data and process it. I will then get the data into the format required to animate using gganimate.
library(RCurl)
library(jsonlite)
library(dplyr)
library(sp)
library(ggplot2)
library(data.table)
library(gganimate)
First source several functions that we need to extract and plot data. These written by Rajiv and adapted by me. The R file is available here.
Although the NBA’s play by play data is no longer made publicly available, a selection of these data have been stored. We will use the data in this repository. I downloaded this game-log if you want to follow along. I stored it as “0021500431.json” following Rajiv’s example. It is a game between the San Antonio Spurs and Minnesota Timberwolves from December 2015.
Use the ‘sportvu_convert_json’ function to convert the raw json data to a R dataframe.
all.movements <- sportvu_convert_json("0021500431.json")
The loaded data should look like this:
str(all.movements)
## 'data.frame': 2646562 obs. of 13 variables:
## $ player_id : chr "2225" "2225" "-1" "-1" ...
## $ lastname : chr "Parker" "Parker" "ball" "ball" ...
## $ firstname : chr "Tony" "Tony" NA NA ...
## $ jersey : chr "9" "9" NA NA ...
## $ position : chr "G" "G" NA NA ...
## $ team_id : num 1.61e+09 1.61e+09 NA NA 1.61e+09 ...
## $ x_loc : num 51.7 51.7 52.9 52.9 60.4 ...
## $ y_loc : num 40.3 40.3 39.9 39.9 31.8 ...
## $ radius : num 0 0 2.5 2.5 0 ...
## $ game_clock: num 716 716 716 716 716 ...
## $ shot_clock: num 13.3 13.3 13.3 13.3 13.3 ...
## $ quarter : num 1 1 1 1 1 1 1 1 1 1 ...
## $ event.id : num 2 1 1 2 2 1 2 1 2 1 ...
head(all.movements)
## player_id lastname firstname jersey position team_id x_loc
## 1 2225 Parker Tony 9 G 1610612759 51.73003
## 2 2225 Parker Tony 9 G 1610612759 51.73003
## 3 -1 ball <NA> <NA> <NA> NA 52.88766
## 4 -1 ball <NA> <NA> <NA> NA 52.88766
## 5 201937 Rubio Ricky 9 G 1610612750 60.40236
## 6 201937 Rubio Ricky 9 G 1610612750 60.40236
## y_loc radius game_clock shot_clock quarter event.id
## 1 40.31428 0.00000 715.78 13.27 1 2
## 2 40.31428 0.00000 715.78 13.27 1 1
## 3 39.89349 2.49642 715.78 13.27 1 1
## 4 39.89349 2.49642 715.78 13.27 1 2
## 5 31.81802 0.00000 715.78 13.27 1 2
## 6 31.81802 0.00000 715.78 13.27 1 1
You can see that the raw data contains infomation about players and the ball. Every row represents the location of a ball or player at a particular time. Each player is given an ID, a jersey number and position and team-id. They also have an x and y coordinate for each time. There is additionally radius information related to the ball’s movement.
Next, we need to get some metadata for this particular game. The get_pbp
function grabs this information from the stats.nba.com website.
gameid = "0021500431"
pbp <- get_pbp(gameid)
str(pbp)
## 'data.frame': 414 obs. of 33 variables:
## $ GAME_ID : Factor w/ 1 level "0021500431": 1 1 1 1 1 1 1 1 1 1 ...
## $ EVENTNUM : Factor w/ 414 levels "0","1","10","100",..: 1 2 99 192 288 368 378 387 404 394 ...
## $ EVENTMSGTYPE : Factor w/ 12 levels "1","10","12",..: 3 2 8 5 7 8 5 7 10 1 ...
## $ EVENTMSGACTIONTYPE : Factor w/ 36 levels "0","1","10","108",..: 1 1 15 18 1 15 2 1 7 28 ...
## $ PERIOD : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
## $ WCTIMESTRING : Factor w/ 94 levels "10:01 PM","10:02 PM",..: 14 14 14 15 15 15 15 15 16 16 ...
## $ PCTIMESTRING : Factor w/ 220 levels "0:00","0:01",..: 69 69 66 63 62 61 55 54 47 47 ...
## $ HOMEDESCRIPTION : Factor w/ 182 levels "Bjelica BLOCK (1 BLK)",..: NA 39 NA 91 NA NA 78 NA 171 NA ...
## $ NEUTRALDESCRIPTION : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
## $ VISITORDESCRIPTION : Factor w/ 209 levels "Aldridge 1' Alley Oop Layup (4 PTS) (Duncan 4 AST)",..: NA NA 162 90 94 92 NA 164 NA 160 ...
## $ SCORE : Factor w/ 104 levels "100 - 77","101 - 77",..: NA NA NA NA NA NA NA NA NA 16 ...
## $ SCOREMARGIN : Factor w/ 24 levels "-10","-11","-12",..: NA NA NA NA NA NA NA NA NA 11 ...
## $ PERSON1TYPE : Factor w/ 6 levels "0","1","2","3",..: 1 5 6 5 6 6 5 6 5 6 ...
## $ PLAYER1_ID : Factor w/ 26 levels "0","1495","1610612750",..: 1 6 22 21 14 14 23 22 6 22 ...
## $ PLAYER1_NAME : Factor w/ 23 levels "Adreian Payne",..: NA 9 21 2 10 10 19 21 9 21 ...
## $ PLAYER1_TEAM_ID : Factor w/ 2 levels "1610612750","1610612759": NA 1 2 1 2 2 1 2 1 2 ...
## $ PLAYER1_TEAM_CITY : Factor w/ 2 levels "Minnesota","San Antonio": NA 1 2 1 2 2 1 2 1 2 ...
## $ PLAYER1_TEAM_NICKNAME : Factor w/ 2 levels "Spurs","Timberwolves": NA 2 1 2 1 1 2 1 2 1 ...
## $ PLAYER1_TEAM_ABBREVIATION: Factor w/ 2 levels "MIN","SAS": NA 1 2 1 2 2 1 2 1 2 ...
## $ PERSON2TYPE : Factor w/ 3 levels "0","4","5": 1 3 1 1 1 1 1 1 1 1 ...
## $ PLAYER2_ID : Factor w/ 25 levels "0","1495","1626145",..: 1 2 1 1 1 1 1 1 1 1 ...
## $ PLAYER2_NAME : Factor w/ 24 levels "Adreian Payne",..: NA 21 NA NA NA NA NA NA NA NA ...
## $ PLAYER2_TEAM_ID : Factor w/ 2 levels "1610612750","1610612759": NA 2 NA NA NA NA NA NA NA NA ...
## $ PLAYER2_TEAM_CITY : Factor w/ 2 levels "Minnesota","San Antonio": NA 2 NA NA NA NA NA NA NA NA ...
## $ PLAYER2_TEAM_NICKNAME : Factor w/ 2 levels "Spurs","Timberwolves": NA 1 NA NA NA NA NA NA NA NA ...
## $ PLAYER2_TEAM_ABBREVIATION: Factor w/ 2 levels "MIN","SAS": NA 2 NA NA NA NA NA NA NA NA ...
## $ PERSON3TYPE : Factor w/ 4 levels "0","1","4","5": 1 4 1 4 1 1 1 1 2 1 ...
## $ PLAYER3_ID : Factor w/ 9 levels "0","1495","200746",..: 1 4 1 6 1 1 1 1 1 1 ...
## $ PLAYER3_NAME : Factor w/ 8 levels "Danny Green",..: NA 1 NA 4 NA NA NA NA NA NA ...
## $ PLAYER3_TEAM_ID : Factor w/ 2 levels "1610612750","1610612759": NA 2 NA 2 NA NA NA NA NA NA ...
## $ PLAYER3_TEAM_CITY : Factor w/ 2 levels "Minnesota","San Antonio": NA 2 NA 2 NA NA NA NA NA NA ...
## $ PLAYER3_TEAM_NICKNAME : Factor w/ 2 levels "Spurs","Timberwolves": NA 1 NA 1 NA NA NA NA NA NA ...
## $ PLAYER3_TEAM_ABBREVIATION: Factor w/ 2 levels "MIN","SAS": NA 2 NA 2 NA NA NA NA NA NA ...
As you can see, this contains loads of stuff that we don’t need. Let’s just keep what we want:
pbp <- pbp[-1,] #first row is NAs
colnames(pbp)[2] <- c('event.id') #will use this to merge on all.movements df
pbp0 <- pbp %>% select (event.id,EVENTMSGTYPE,EVENTMSGACTIONTYPE,SCORE)
pbp0$event.id <- as.numeric(levels(pbp0$event.id))[pbp0$event.id]
head(pbp0)
## event.id EVENTMSGTYPE EVENTMSGACTIONTYPE SCORE
## 2 1 10 0 <NA>
## 3 2 5 45 <NA>
## 4 3 2 5 <NA>
## 5 4 4 0 <NA>
## 6 5 5 45 <NA>
## 7 6 2 1 <NA>
tail(pbp0)
## event.id EVENTMSGTYPE EVENTMSGACTIONTYPE SCORE
## 409 477 6 1 <NA>
## 410 478 3 11 107 - 83
## 411 480 3 12 108 - 83
## 412 482 2 1 <NA>
## 413 483 4 0 <NA>
## 414 484 13 0 108 - 83
Each event.id refers to a different play in the game. Now we can merge the two files:
all.movements <- merge(x = all.movements, y = pbp, by = "event.id", all.x = TRUE)
dim(all.movements)
## [1] 2646562 45
This is a lot of information! Over 2.6 millions rows. Let’s pick one event id to visualize. Here I will use the same one as Rajiv’s example - eventid=303.
id303 <- all.movements[which(all.movements$event.id == 303),]
dim(id303)
## [1] 10890 45
head(id303)
## event.id player_id lastname firstname jersey position team_id
## 1644741 303 -1 ball <NA> <NA> <NA> NA
## 1644742 303 203937 Anderson Kyle 1 F 1610612759
## 1644743 303 201937 Rubio Ricky 9 G 1610612750
## 1644744 303 201988 Mills Patty 8 G 1610612759
## 1644745 303 203952 Wiggins Andrew 22 G-F 1610612750
## 1644746 303 203937 Anderson Kyle 1 F 1610612759
## x_loc y_loc radius game_clock shot_clock quarter
## 1644741 5.43835 24.73073 10.63683 359.75 5.49 3
## 1644742 65.31054 22.12468 0.00000 346.42 19.03 3
## 1644743 46.60167 20.00475 0.00000 376.60 22.70 3
## 1644744 38.77574 21.41917 0.00000 359.40 23.71 3
## 1644745 11.18441 34.04307 0.00000 359.40 23.69 3
## 1644746 8.62043 2.05544 0.00000 364.39 7.11 3
## GAME_ID EVENTMSGTYPE EVENTMSGACTIONTYPE PERIOD WCTIMESTRING
## 1644741 0021500431 1 98 3 9:32 PM
## 1644742 0021500431 1 98 3 9:32 PM
## 1644743 0021500431 1 98 3 9:32 PM
## 1644744 0021500431 1 98 3 9:32 PM
## 1644745 0021500431 1 98 3 9:32 PM
## 1644746 0021500431 1 98 3 9:32 PM
## PCTIMESTRING HOMEDESCRIPTION NEUTRALDESCRIPTION
## 1644741 5:59 <NA> <NA>
## 1644742 5:59 <NA> <NA>
## 1644743 5:59 <NA> <NA>
## 1644744 5:59 <NA> <NA>
## 1644745 5:59 <NA> <NA>
## 1644746 5:59 <NA> <NA>
## VISITORDESCRIPTION SCORE
## 1644741 West 2' Cutting Layup Shot (10 PTS) (Ginobili 4 AST) 67 - 48
## 1644742 West 2' Cutting Layup Shot (10 PTS) (Ginobili 4 AST) 67 - 48
## 1644743 West 2' Cutting Layup Shot (10 PTS) (Ginobili 4 AST) 67 - 48
## 1644744 West 2' Cutting Layup Shot (10 PTS) (Ginobili 4 AST) 67 - 48
## 1644745 West 2' Cutting Layup Shot (10 PTS) (Ginobili 4 AST) 67 - 48
## 1644746 West 2' Cutting Layup Shot (10 PTS) (Ginobili 4 AST) 67 - 48
## SCOREMARGIN PERSON1TYPE PLAYER1_ID PLAYER1_NAME PLAYER1_TEAM_ID
## 1644741 -19 5 2561 David West 1610612759
## 1644742 -19 5 2561 David West 1610612759
## 1644743 -19 5 2561 David West 1610612759
## 1644744 -19 5 2561 David West 1610612759
## 1644745 -19 5 2561 David West 1610612759
## 1644746 -19 5 2561 David West 1610612759
## PLAYER1_TEAM_CITY PLAYER1_TEAM_NICKNAME PLAYER1_TEAM_ABBREVIATION
## 1644741 San Antonio Spurs SAS
## 1644742 San Antonio Spurs SAS
## 1644743 San Antonio Spurs SAS
## 1644744 San Antonio Spurs SAS
## 1644745 San Antonio Spurs SAS
## 1644746 San Antonio Spurs SAS
## PERSON2TYPE PLAYER2_ID PLAYER2_NAME PLAYER2_TEAM_ID
## 1644741 5 1938 Manu Ginobili 1610612759
## 1644742 5 1938 Manu Ginobili 1610612759
## 1644743 5 1938 Manu Ginobili 1610612759
## 1644744 5 1938 Manu Ginobili 1610612759
## 1644745 5 1938 Manu Ginobili 1610612759
## 1644746 5 1938 Manu Ginobili 1610612759
## PLAYER2_TEAM_CITY PLAYER2_TEAM_NICKNAME PLAYER2_TEAM_ABBREVIATION
## 1644741 San Antonio Spurs SAS
## 1644742 San Antonio Spurs SAS
## 1644743 San Antonio Spurs SAS
## 1644744 San Antonio Spurs SAS
## 1644745 San Antonio Spurs SAS
## 1644746 San Antonio Spurs SAS
## PERSON3TYPE PLAYER3_ID PLAYER3_NAME PLAYER3_TEAM_ID
## 1644741 0 0 <NA> <NA>
## 1644742 0 0 <NA> <NA>
## 1644743 0 0 <NA> <NA>
## 1644744 0 0 <NA> <NA>
## 1644745 0 0 <NA> <NA>
## 1644746 0 0 <NA> <NA>
## PLAYER3_TEAM_CITY PLAYER3_TEAM_NICKNAME PLAYER3_TEAM_ABBREVIATION
## 1644741 <NA> <NA> <NA>
## 1644742 <NA> <NA> <NA>
## 1644743 <NA> <NA> <NA>
## 1644744 <NA> <NA> <NA>
## 1644745 <NA> <NA> <NA>
## 1644746 <NA> <NA> <NA>
Wow! So much information - this event along as over 10 thousand rows with 77 variables!
The first thing to notice is that there are 850 different timepoints in this particular
length(table(id303$game_clock))
## [1] 850
We will extract data to make several dataframes. We will use time 361.15. A different function is used to get the data for the ball, players and the convex hull.
Players
This shows the x and y locations of the 10 players on the court at this time.
playerdf <- player_position1(df=id303, eventid=303,gameclock=361.15)
playerdf
## ID X Y jersey
## 1 1610612750 6.56514 24.39048 32
## 2 1610612750 13.67987 34.34244 22
## 3 1610612759 16.08250 25.92709 30
## 4 1610612759 20.31601 46.23162 33
## 5 1610612750 15.89886 20.33154 9
## 6 1610612759 24.47131 10.80858 8
## 7 1610612750 11.03182 7.93721 8
## 8 1610612759 7.59973 16.73085 20
## 9 1610612750 20.11542 29.46991 5
## 10 1610612759 14.81859 2.42889 1
Convex Hull Plot
This function grabs the points required for plotting the convex hull of the player’s location. See Rajiv’s website for more info.
chulldf <- chull_plot(df=id303, eventid=303, gameclock=361.15)
chulldf
## ID X Y
## 1 1 11.03182 7.93721
## 2 1 6.56514 24.39048
## 3 1 13.67987 34.34244
## 4 1 20.11542 29.46991
## 5 1 11.03182 7.93721
## 6 2 24.47131 10.80858
## 7 2 14.81859 2.42889
## 8 2 7.59973 16.73085
## 9 2 20.31601 46.23162
## 10 2 24.47131 10.80858
Ball Info
This simply gets the ball information from the id303 dataframe. Notice this dataframe also includes ID and jersey variables (NA). This is so we can bind together with the player location dataframe later.
ballposdf <- ball_position1(df=id303, eventid=303, gameclock=361.15)
ballposdf
## ID X Y jersey
## 1 NA 7.80244 17.77459 <NA>
Plotting
Next these pieces of information are plotted using ggplot by overlaying it on a geom made by Ed Kupfer.
fullcourt() +
geom_point(data=playerdf,aes(x=X,y=Y,group=ID,color=factor(ID)),size=6) + #players
geom_text(data=playerdf,aes(x=X,y=Y,group=ID,label=jersey),color='black') + #jersey number
geom_polygon(data=chulldf,aes(x=X,y=Y,group=ID,fill=factor(ID)),alpha = 0.2) + #convex hull
geom_point(data=ballposdf,aes(x=X,y=Y),color='darkorange',size=3) + #ball
scale_color_manual(values=c("lightsteelblue2","orangered2")) +
scale_fill_manual(values=c("lightsteelblue2","orangered2")) +
theme(legend.position="none")
Because the above plot was made using ggplot, it is relatively trivial to make an animated plot by creating the same plot for every time unit and compiling together. This could be done in several ways - I will use gganimate for ease.
First, I will get the 3 dataframes (player position, convex hull info, ball position) for every time unit of play 303. I will then put everything into one big dataframe with an extra variable indicating which time unit (timebin) those observations belong to. To do this, we need to make sure that all 3 dataframes (player position, convex hull info, ball position) need to have the same variables. All that is required is to add a jersey number = NA to the convex hull dataframe. Lastly, we will add a variable that indicates what dataframe the observations originally come from - this is so we can only plot those data for each step of the ggplot.
Step 1. Store all the clocktimes for this event in ascending order. We will grab the player and ball positions for every unique clock time:
clocktimes= rev(sort(unique(id303$game_clock)))
Step 2. Now use a for loop to grab player positions, convex hull info and ball position for each time unit.
fulldf=list()
for(i in seq_along(clocktimes)){
dplayer <- player_position1(df=id303, 303,clocktimes[i]) #Gets positions of players
dchull <- chull_plot(df=id303, 303,clocktimes[i]) #Gets area of convex hull
ballpos <- ball_position1(df=id303, 303,clocktimes[i]) #Gets position of ball
dchull$jersey = "NA"
dplayer$valx = 'player'
dchull$valx = 'hull'
ballpos$valx = 'ball'
fulldf[[i]] = rbind(dplayer,dchull,ballpos)
}
length(fulldf) #850 elements
## [1] 850
fulldf = Map(cbind,fulldf,timebin=1:length(fulldf)) #add time unit
A few things to notice… The number of rows of this dataframe are not all the same.
table(lapply(fulldf,nrow) %>% unlist)
##
## 20 21 22 23 216 1397
## 45 320 334 149 1 1
This is primarily because the convex plot sometimes has a different number of rows. However, two timebins have ridiculously large number of observations…
which(lapply(fulldf,nrow) %>% unlist > 23)
## [1] 1 464
The timeunit = 1 is when the game clock is stopped as the ball is entering play. The players are running all over the place. I will not plot this here. 464 actually refers to the end of this particular play and similarly the ball and tracking data picks up on a lot of different info. Therefore, I will actually just plot the events 2-463.
playdf = data.table::rbindlist(fulldf)
playdf2 = playdf %>% filter(timebin!=1) %>% filter(timebin<464)
Now we can use the gganimate syntax to create a ggplot object. We add a “frame=” argument, telling it that we want to compile separate images over the timebin variable. We store this as “p”.
p = fullcourt() +
geom_point(data=playdf2 %>% filter(valx=="player"),aes(x=X,y=Y,group=ID,color=factor(ID),frame=timebin),size=6) +
geom_text(data=playdf2 %>% filter(valx=="player"),aes(x=X,y=Y,group=ID,frame=timebin,label=jersey),color='black') +
geom_polygon(data=playdf2 %>% filter(valx=="hull"),aes(x=X,y=Y,group=ID,fill=factor(ID),frame=timebin),alpha = 0.2) +
geom_point(data=playdf2 %>% filter(valx=="ball"),aes(x=X,y=Y,frame=timebin),color='darkorange',size=3) +
scale_color_manual(values=c("lightsteelblue2","orangered2")) +
scale_fill_manual(values=c("lightsteelblue2","orangered2")) +
theme(legend.position="none")
Next, we use gganimate to save the file as a gif to our working directory. You could tweak the interval parameter to improve the flow. This might will take some time depending on your machine’s umph:
gg_animate(p, "nbaplot.gif", title_frame =F, ani.width = 600, ani.height = 450, interval=0.1)
Animated NBA Plot
I hope this helps inspire you to play around with these data and make animated plots. Theoretically any spatiotemporal data could be plotted using this method. If you can improve on this method or have any suggestions - please let me know by email jc3181 AT columbia DOT edu or on twitter.