Here, I will attempt to sequence NBA history into statistically significant “eras”. The purpose of this is to be better able to stay focused on a
single era when looking at statistics as a whole, since these eras seem to cause problems for machine learning algorithms. By restricting the
training data to one era, more can be learned from the results of the models and algorithms. After all, how useful is a model trained in the 80’s and 90’s going to be in 2025? No, it needs to be trained since the “modern era” started.
I’ve gotten started with some data from basketball reference, it looks like this, only with a lot more rows, having years from 1980-2024
| year | Team | G | MP | FG | FGA | FG% | 3P | ThreePA | 3P% | 2P | 2PA | 2P% | FT | FTA | FT% | ORB | DRB | TRB | AST | STL | BLK | TOV | PF | PTS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1980 | Atlanta Hawks | 82 | 241.2 | 38.3 | 83.8 | 0.458 | 0.6 | 2.2 | 0.251 | 37.8 | 81.6 | 0.463 | 24.4 | 31.9 | 0.765 | 15.4 | 28.5 | 43.9 | 21.4 | 8.3 | 6.8 | 20.2 | 26.5 | 101.6 |
| 1980 | Portland Trail Blazers | 82 | 241.2 | 40.8 | 85.5 | 0.478 | 0.7 | 2.3 | 0.296 | 40.2 | 83.2 | 0.483 | 20.9 | 27.8 | 0.752 | 13.9 | 28.8 | 42.6 | 24.5 | 9.2 | 4.8 | 17.7 | 22.9 | 103.3 |
| 1980 | Seattle SuperSonics | 82 | 241.8 | 41.6 | 90.5 | 0.459 | 0.7 | 2.9 | 0.246 | 40.8 | 87.6 | 0.466 | 20.0 | 26.2 | 0.764 | 14.7 | 29.4 | 44.0 | 24.6 | 8.9 | 4.8 | 18.5 | 24.4 | 103.8 |
| 1980 | Philadelphia 76ers | 82 | 242.1 | 42.0 | 92.2 | 0.455 | 0.9 | 3.4 | 0.271 | 41.1 | 88.8 | 0.463 | 20.0 | 26.2 | 0.765 | 16.1 | 28.7 | 44.8 | 25.5 | 10.7 | 4.7 | 19.0 | 25.6 | 104.9 |
| 1980 | Kansas City Kings | 82 | 241.5 | 40.6 | 85.3 | 0.476 | 0.5 | 2.1 | 0.238 | 40.1 | 83.2 | 0.482 | 23.2 | 30.5 | 0.763 | 13.9 | 32.2 | 46.1 | 21.7 | 8.5 | 5.2 | 21.5 | 25.3 | 104.9 |
| 1980 | Boston Celtics | 82 | 242.4 | 41.9 | 89.2 | 0.470 | 0.9 | 3.2 | 0.286 | 41.0 | 86.0 | 0.477 | 20.9 | 27.1 | 0.770 | 14.2 | 28.0 | 42.2 | 22.8 | 8.4 | 5.1 | 19.9 | 25.1 | 105.7 |
I will use an algorithm called k-means, which measures the differences in the teams statistics over time, and will assign them to similar chunks, if
done correctly, clusters should be from consecutive years, giving us the NBA’s Eras
To start, we need to find estimations for how many eras there even are, the following graph will show how much more clear a cluster becomes when you
add a new one, once the returns of adding a new cluster get small enough, we know we’ve found the right amount of clusters
It appears that after 3 clusters, we are no longer getting a cluster that tells us anything useful, so we will use 3 clusters or “eras”
Lets visualize thees clusters to see if it worked in dividing the NBA into eras, using 3 pointers attempted per game on the vertical axis to help
show the change over time
Wow, that’s amazing, k-means near perfectly dissected the NBA into 3 distinct eras. It seems machine learning trying to be used on today game
should perhaps only be trained on data starting around 2015, since what the model will find as a good predictor in the other eras may be DIFFERENT!