import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH']= 'D:/Anaconda3/Library/plugins/platforms'

Introduction

This data explores the top 10 MVP candidates of each year dating back to 1980. The data set has basic stats such as points per game, and also has more advanced statistics such as efficiency rating. The goal of this exploration was to try and figure out values of certain statistics that indicate success.

Dataset

#Import Data and Packages
import numpy as np
import pandas as pd
from pandas.plotting import parallel_coordinates
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns


data = pd.read_csv("//apporto.com/dfs/LOYOLA/Users/dthoule_loyola/Desktop/master_table.csv")
data.info()
# Calculate descriptive statistics for each column
data1 = data.drop(columns=['Rank', 'Player', 'Tm', 'team'])


mean = data1.mean()
median = data1.median()
mode = data1.mode().iloc[0]
std_dev = data1.std()
variance = data1.var()
range_ = data1.max() - data.min()
minimum = data1.min()
maximum = data1.max()
skewness = data1.skew()
kurtosis = data1.kurtosis()

print("Mean:\n", mean)
print("Median:\n", median)
print("Mode:\n", mode)
print("Standard deviation:\n", std_dev)
print("Variance:\n", variance)
print("Range:\n", range_)
print("Minimum:\n", minimum)
print("Maximum:\n", maximum)
print("Skewness:\n", skewness)
print("Kurtosis:\n", kurtosis)

This gives descriptive stats for all columns or variables in the data. We can see the mean age of candidates is 27.45, while the STD of steals is not that big at 0.5, meaning that statistic does not have much variability.

Findings

Some statistics are highly correlated with success, as we can see in this data.

Top 10 players by number of top 10 MVP voting finishes

Being nominated for MVP is a great indicator of success. Let’s see what players have got the most nominations in the past 40 years.


top_players = data['Player'].value_counts().head(10).index.tolist()

nba_mvp_df_top10 = data[data['Player'].isin(top_players)]

# Create a bar chart of the number of MVP awards won by each player
plt.figure(figsize=(10, 5))
sns.countplot(data=nba_mvp_df_top10, x='Player', order=nba_mvp_df_top10['Player'].value_counts().index)
plt.xticks(rotation=90)
plt.title('Top 10 players by number of top 10 MVP voting finishes ')
plt.xlabel('Player')
plt.ylabel('Count')
plt.show()

Lebron James, Tim Duncan, and Karl Malone have been nominated for the most MVPs in the last 40 years. Let’s see what age they were most successful at.

Player age distribution for MVP award candidates

This graph shows the age distribution for MVP candidates. Plotting the five number summary for this variable indicates what age players are in their “prime”.

plt.figure(figsize=(10, 5))
sns.boxplot(data=data, x='Age')
plt.title('Player age distribution for MVP award candidates')
plt.xlabel('Age')
plt.show()

25-30 years old is the sweet spot.

NBA MVPs: Points, Rebounds, and Assists by Season

Points, Rebounds, and Assists are the three most notable stats in basketball. This graph shows what the best players of the last 40 years have recorded in terms of these stats.

stats_data = data.pivot_table(values=['PTS', 'TRB', 'AST'], index='year', aggfunc=sum)

# Create stacked area chart
plt.figure(figsize=(10,6))
plt.stackplot(stats_data.index, stats_data.values.T, labels=stats_data.columns)
plt.title('NBA MVPs: Points, Rebounds, and Assists by Season')
plt.xlabel('Season')
plt.ylabel('Total')
plt.legend(loc='upper left')
plt.show()

Rebounds and Points have declined, but not majorly since 40 years ago. These values of each stat are good predictors of what future MVP candidates will record.

Success Factors Correlation

Now to see what stats are related with each other. This can show relationships that can prove beneficial when predicting future MVPs, and how these players affect the outcome of their teams season.

heatmap = data.corr()
mask = np.triu(np.ones_like(data.corr(), dtype=bool))
with sns.axes_style("white"):
    ax = sns.heatmap(heatmap,mask=mask, vmax=.3, square=True)
ax.set_title("Success Factors Correlation")

We see that Wins and Playoff Seed have a very strong negative correlation, meaning the more you win, the lower(better) seed you will have in the playoffs.

NBA playoff seed vs. number of wins

This figure visualizes the correlation of the last plot. We can see how many wins you need to secure a 1 seed in the playoffs, and how many you need to squeak in.

seed = data['seed']
wins = data['W']

#Linear regression seed vs wins 

plt.scatter(seed, wins)
plt.xlabel('Seed')
plt.ylabel('Wins')
plt.title('NBA playoff seed vs. number of wins')

# Display the plot
plt.show()

55+ wins is a good predictor of a number 1 seed secured, meanwhile you can make it in with as low as 40.

Conclusion

Through visualizing the data, we know that the average age of an MVP is 27.9 years old, Lebron has dominated the MVP race for a long time, and if you win 50+ games in a season, you will have a very good chance of being the number 1 seed come playoff time.