HS DSC Basketball play-by-play

Task: Based on a set of play-by-play data from a full regular season… predict the wining percentage of a set of playoff match ups.

Step 1: set of ~30 game winning %. Same as 2023, games are Home-Away and neutral site.
Step 2: presentation deck with methods, etc

(same as HSDSC 2023)


File size:

number of rows (events): 7.616^{5}



Simplified play-by-play data

type of events from 130 => ~20

Example data: minimum number of fields

id game_id event_seq type_text score_value scoring_play shooting_play period_number clock time_s coordinate_y coordinate_x
4014425354 401442535 1 Jumpball 0 FALSE FALSE 1 12:00 720 214748365 214748406.75
4014425357 401442535 2 Shooting Foul 0 FALSE FALSE 1 11:46 706 1 -36.75
4014425359 401442535 3 Free Throw - 1 of 2 1 TRUE TRUE 1 11:46 706 0 28.00
40144253510 401442535 4 Free Throw - 2 of 2 1 TRUE TRUE 1 11:46 706 0 28.00
40144253511 401442535 5 Floating Jump Shot 2 TRUE TRUE 1 11:33 693 9 -36.75
40144253513 401442535 6 Turnaround Fade Away Jump Shot 0 FALSE TRUE 1 11:11 671 6 27.75
40144253514 401442535 7 Defensive Rebound 0 FALSE FALSE 1 11:09 669 -6 -27.75
40144253515 401442535 8 Jump Shot 0 FALSE TRUE 1 11:00 660 14 -19.75
40144253516 401442535 9 Defensive Rebound 0 FALSE FALSE 1 10:57 657 -14 19.75
40144253517 401442535 10 Pullup Jump Shot 2 TRUE TRUE 1 10:52 652 11 26.75


play-by-play model:

event level is sub-possession level



Inputs: Team parameters + Schedule + play-by-play model + ______ + _______


Input 1: Team parameters (season averages?)

Use observed data as starting point, expand observed distributions (maybe)

(last 3 seasson of nba teams)



Input 2: Team schedules (similar to 2023 NSL)



Input 3: delta time = f(event type)



Problems:

My “simple” play-by-play model doesn’t capture needed features. It will regenerate season averages (if I get the timing resolved). Points for-against is all that is needed for predictions.



Input 3: Changes in play

from season average rates per team to team rates that change based on:



Too easy version of the solution: points for - points against is pretty good

There is so much information in points



HSDSC 2023 WH got pretty close on the HFA distance effect.