Here, I will attempt to sequence NBA history into statistically significant “eras”. The purpose of this is to be better able to stay focused on a

single era when looking at statistics as a whole, since these eras seem to cause problems for machine learning algorithms. By restricting the

training data to one era, more can be learned from the results of the models and algorithms. After all, how useful is a model trained in the 80’s and 90’s going to be in 2025? No, it needs to be trained since the “modern era” started.

I’ve gotten started with some data from basketball reference, it looks like this, only with a lot more rows, having years from 1980-2024

year Team G MP FG FGA FG% 3P ThreePA 3P% 2P 2PA 2P% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS
1980 Atlanta Hawks 82 241.2 38.3 83.8 0.458 0.6 2.2 0.251 37.8 81.6 0.463 24.4 31.9 0.765 15.4 28.5 43.9 21.4 8.3 6.8 20.2 26.5 101.6
1980 Portland Trail Blazers 82 241.2 40.8 85.5 0.478 0.7 2.3 0.296 40.2 83.2 0.483 20.9 27.8 0.752 13.9 28.8 42.6 24.5 9.2 4.8 17.7 22.9 103.3
1980 Seattle SuperSonics 82 241.8 41.6 90.5 0.459 0.7 2.9 0.246 40.8 87.6 0.466 20.0 26.2 0.764 14.7 29.4 44.0 24.6 8.9 4.8 18.5 24.4 103.8
1980 Philadelphia 76ers 82 242.1 42.0 92.2 0.455 0.9 3.4 0.271 41.1 88.8 0.463 20.0 26.2 0.765 16.1 28.7 44.8 25.5 10.7 4.7 19.0 25.6 104.9
1980 Kansas City Kings 82 241.5 40.6 85.3 0.476 0.5 2.1 0.238 40.1 83.2 0.482 23.2 30.5 0.763 13.9 32.2 46.1 21.7 8.5 5.2 21.5 25.3 104.9
1980 Boston Celtics 82 242.4 41.9 89.2 0.470 0.9 3.2 0.286 41.0 86.0 0.477 20.9 27.1 0.770 14.2 28.0 42.2 22.8 8.4 5.1 19.9 25.1 105.7

I will use an algorithm called k-means, which measures the differences in the teams statistics over time, and will assign them to similar chunks, if

done correctly, clusters should be from consecutive years, giving us the NBA’s Eras

To start, we need to find estimations for how many eras there even are, the following graph will show how much more clear a cluster becomes when you

add a new one, once the returns of adding a new cluster get small enough, we know we’ve found the right amount of clusters

It appears that after 3 clusters, we are no longer getting a cluster that tells us anything useful, so we will use 3 clusters or “eras”

Lets visualize thees clusters to see if it worked in dividing the NBA into eras, using 3 pointers attempted per game on the vertical axis to help

show the change over time

Wow, that’s amazing, k-means near perfectly dissected the NBA into 3 distinct eras. It seems machine learning trying to be used on today game

should perhaps only be trained on data starting around 2015, since what the model will find as a good predictor in the other eras may be DIFFERENT!