Each year, the WNBA Most Valuable Player is determined by a committee of sportswriters and broadcasters. Each individual in the committee is asked to select their top 5 players from that season with their top choice receiving 10 points, 2nd receiving 7 points, 3rd receiving 5 points, 4th receiving 3 points and 5th receiving 1 point. This poses the questions, how does the panel decide who they are voting for? That is, Are certain statistics more valuable than others in the eyes of the committee?
This dataset is a collection of each of the top 10 WNBA MVP vote-getters from the last 20 seasons. The data includes each individual’s advanced metrics. Advanced metrics are statistics that go beyond a traditional box score - points, rebounds, assists, turnovers, etc. - as they include stats such as Player Efficiency Rating and Win Shares. The data is collected by tracking on-court actions. These include points scored, rebounds, assists, steals, blocks, turnovers, etc. From these raw numbers, derived statistics such as effective field goal percentage, player efficiency, etc, are calculated.
The advanced metrics that appear in this dataset are player
efficiency rating (PER), true shooting percentage
(TrueShoot_Perc), three point attempt rate
(ThreePoint_Att), free throw attempt rate
(FTr), Offensive rebound percentage (ORB%),
defensive rebound percentage (DRB%), total rebound
percentage (TotReb_Perc), assist percentage
(Assist_Perc), steal percentage (Steal_Perc),
block percentage (Block_Perc), turnover percentage
(Turnover_Perc), usage percentage
(Usage_Perc), offensive win shares (OWS),
defensive win shares (DWS), win shares (WS), and win shares
per 48 minutes (WS/48).
The NBA first started introduced these metrics during the 1996-97 season and the WNBA officially introduced it’s own advanced stats pages in 2016. However, basketball-reference.com has data from all WNBA players and their advanced statistics dating back to its originating season (1997). The site is a well respected provider of sports statistics from Sports Reference, LLC that presents statistics for the WNBA, NBA, European Leagues and the ABA. Sports Reference, LLC also runs other similar pages such as Baseball Reference and Pro Football Reference. Additionally, the data includes each player’s ranking in their respective MVP race and whether or not they won a championship that year.
An ordinal logistic regression (or proportional odds model) is
used to examine this data. An ordinal logistic regression is used to
analyze and model ordinal outcomes. Ordinal outcomes are ordered/leveled
categorical variables (they are on an arbitrary scale). Therefore, an
ordinal regression is useful when predicting the probability that an
outcome will fall into a particular category. In this case, it will be
used to predict where a player will fall in the MVP race (1-10). The
ordinal outcome is a player’s rank, and the independent
variables are a player’s advanced metrics.
The ordinal logistic regression uses log-odds of cumulative probabilities. First, let \(Y =\) the ordinal dependent variable and \(J\) represents the ordered categories. In this specific case,
\(J = 1\) (First in Voting)
\(J = 2\) (Second in Voting) …
\(J = 10\) (Tenth in Voting)
For each category, \(j\), the model defines the cumulative probability as: \[\pi_j = P(Y \le j |x)\] From there, the cumulative probabilities are transformed using logit. The logit is the logarithm of the odds of the probability of a certain event occurring.
\[L_j(x) = logit(\pi_j) = log(\frac{\pi_j}{1 - \pi_j})\]
From this the multiple regression model becomes:
\[L_j(x) = \alpha_j - \beta_1(x_1) - \beta_2(x_2) - ... - \beta_k(x_k)\]
Where \(\alpha_j\) is the intercept or “cutoff” point specific to each category \(j\) when all other predictors = 0. \(\beta_k\) is the coefficient corresponding to the kth independent variable. It measures the effect of the independent variable on the cumulative log-odds of Y being in category j or below. Finally, \(x_k\) represents the kth predictor variable.
An ordinal logistic regression is quite similar to a standard logistic regression. Both models have categorical dependent variables, use link functions and use coefficients to measure the relationship between independent and dependent variables. The need for an ordinal regression arises when the dependent variable is categorical and non-binary. As a result, the ordinal regression yields necessary thresholds (\(\alpha_j\)) while a logistic regression does not. Additionally, the interpretation of the coefficient represents the log-odds of being in a lower category relative to a higher one - instead of representing the log-odds of being in either of the binary categories.
In order to use an ordinal logistic regression, certain assumptions must be met. First, the dependent variable must be measured on an ordinal level. Next, one or more of the independent variables must either continuous, categorical or ordinal. There should be no multicollinearity. Finally, the proportional odds assumption states that each independent variable should have an identical effect on each cumulative split of the ordinal dependent variable.
It is immediately it’s clear that the complete model,
containing all independent variables in the dataset is going to contain
multicollinearity. For example, we would expect ORB%
(offensive rebound percentage), DRB% (defensive rebound
percentage), and TotReb_Perc to be highly correlated. This
is due to the fact that TotReb_Perc includes both offensive
rebounds and defensive rebounds in it’s calculation. Additionally,
WS (win shares) is also calculated using two other
variables: OWS (offensive win shares) and DWS
defensive win shares. WS/48 is also calculated using
WS. It’s expected that these groups of variables are highly
correlated. To check, each predictor’s variance inflation factors will
be examined:
vif(full_model)
## MP PER TrueShoot_Perc ThreePoint_Att FTr
## 82.71899 500.11917 3448.90148 39289.71568 334.38633
## `ORB%` `DRB%` TotReb_Perc Assist_Perc Steal_Perc
## 1136.53590 2669.05095 11713.21812 25.54280 747.06501
## Block_Perc Turnover_Perc Usage_Perc OWS DWS
## 18.70567 35.35585 168.33862 254875.87140 204088.25090
## WS `WS/48`
## 117222.48847 3856.70296
If a predictor has a VIF that is greater than 5,
multicollinearity is present. None of these predictors have a VIF \(< 5\), in fact, most of them are
extremely larger than 5. Therefore since ORB% and
DRB% are included in TRB%, they will be
discarded and only TotReb_Perc will be examined. Similarly,
only WS and their effect on Rank will be
examined. \[L_j(x) = \alpha_j - \beta_1
(\text{MP}) - \beta_2 (\text{PER}) - \beta_3 (\text{TrueShoot\_Perc}) -
\beta_4 (\text{ThreePoint\_Att}) - \beta_5 (\text{FTr}) - \beta_6
(\text{TotReb\_Perc}) - \] \[\beta_7
(\text{Assist\_Perc}) - \beta_8 (\text{Steal\_Perc})- \beta_9
(\text{Block\_Perc}) - \beta_{10} (\text{Turnover\_Perc}) - \beta_{11}
(\text{Usage\_Perc}) - \beta_{12} (\text{WS})\]
The model is fitted using either of the two R code chunks.
model <- polr(Rank ~ MP + PER + TrueShoot_Perc + ThreePoint_Att + FTr +
TotReb_Perc + Assist_Perc + Steal_Perc
+ Block_Perc + Turnover_Perc + Usage_Perc + WS,
data = WNBA_MVP,
Hess = TRUE, method = 'logistic')
po_model <- vglm(Rank ~ MP + PER + TrueShoot_Perc + ThreePoint_Att + FTr +
TotReb_Perc + Assist_Perc + Steal_Perc
+ Block_Perc + Turnover_Perc + Usage_Perc + WS,
family = cumulative(parallel = TRUE), data = WNBA_MVP)
The dependent variable is measured on an ordinal level.
Rank is ordinal where athletes are categorized by their MVP
race place. Additionally, all independent variables are continuous.
Again, a VIF test will be used for multicollinearity:
vif(model)
## MP PER TrueShoot_Perc ThreePoint_Att FTr
## 47.66546 139.96466 7157.75946 18543.19042 6564.27856
## TotReb_Perc Assist_Perc Steal_Perc Block_Perc Turnover_Perc
## 13.00831 12.24155 17.13075 6.23259 16.61266
## Usage_Perc WS
## 78.67595 68.41207
Even with the exclusion of specific variables there is still multicollinearity present. This is due multiple variables being correlated with the amount of minutes an athlete plays.
A brant test can be used to examine the proportional odds assumption.
\(H_0\): Proportional assumption holds: the relationship between the predictors and each pair of outcomes is the same
\(H_a\): Proportional assumption does not hold: the relationship between predictors and pair of outcomes is not the same
brant(model)
## --------------------------------------------
## Test for X2 df probability
## --------------------------------------------
## Omnibus -34.72 96 1
## MP 6.5 8 0.59
## PER 11.24 8 0.19
## TrueShoot_Perc 3.07 8 0.93
## ThreePoint_Att 9.04 8 0.34
## FTr 16.54 8 0.04
## TotReb_Perc 2.04 8 0.98
## Assist_Perc 4.58 8 0.8
## Steal_Perc 7.27 8 0.51
## Block_Perc 4 8 0.86
## Turnover_Perc 7.31 8 0.5
## Usage_Perc 9.9 8 0.27
## WS 9.8 8 0.28
## --------------------------------------------
Because Omnibus’ p-value = 1 > 0.05, we fail to reject the
null hypothesis. The proportional odds assumption holds for the model.
We also fail to reject the null hypothesis for each independent variable
except for FTr (free throw attempt rate). For all
individual independent variables, (except FTr), the
proportional odds assumption holds.
Because minutes played MP is highly correlated to every
predictor, removing it may solve the issue of multicollinearity
model2 <- polr(Rank ~ PER + TrueShoot_Perc + ThreePoint_Att + FTr +
TotReb_Perc + Assist_Perc + Steal_Perc
+ Block_Perc + Turnover_Perc + Usage_Perc + WS,
data = WNBA_MVP,
Hess = TRUE, method = 'logistic')
vif(model2)
## PER TrueShoot_Perc ThreePoint_Att FTr TotReb_Perc
## 8.389933 4.351632 2.037506 1.521745 2.841988
## Assist_Perc Steal_Perc Block_Perc Turnover_Perc Usage_Perc
## 3.989450 1.920804 2.724908 3.751749 2.476418
## WS
## 2.546524
Now, each predictor is less than 5 (with the exception of
PER). However, that predictor is only slightly greater than
5. Therefore removing MP eliminates most of the
multicollinearity. However, the proportional odds assumption still needs
to be checked:
brant(model2)
## --------------------------------------------
## Test for X2 df probability
## --------------------------------------------
## Omnibus 385.54 88 0
## PER 7.77 8 0.46
## TrueShoot_Perc 3.21 8 0.92
## ThreePoint_Att 8.66 8 0.37
## FTr 16.88 8 0.03
## TotReb_Perc 2.14 8 0.98
## Assist_Perc 4.36 8 0.82
## Steal_Perc 6.26 8 0.62
## Block_Perc 3.86 8 0.87
## Turnover_Perc 7.24 8 0.51
## Usage_Perc 5.92 8 0.66
## WS 5.66 8 0.69
## --------------------------------------------
Because Omnibus’ p-value = 0 < 0.05, the null hypothesis is
rejected. The proportional odds assumption does not for the model.
However, we again fail to reject the null hypothesis for each
independent variable except for FTr (free throw attempt
rate). For all individual independent variables, (except
FTr), the proportional odds assumption holds.
By removing the MP (minutes played) variable from
our model, the multicollinearity assumption will be met. However, doing
so changes the results of the Brant test. Removing MP makes
it so that the whole model violates the proportional odds assumption.
This is intuitive as the minutes an athlete plays is an integral part of
the calculations for most advanced statistical metrics. However, it’s
inclusion is also what keeps the effect of the independent variables
consistent across thresholds. Therefore, the model including
MP is still effective at predicting where players will fall
in the MVP rankings but the significance of each independent variable
may be unstable.
summary(po_model)
## Call:
## vglm(formula = Rank ~ MP + PER + TrueShoot_Perc + ThreePoint_Att +
## FTr + TotReb_Perc + Assist_Perc + Steal_Perc + Block_Perc +
## Turnover_Perc + Usage_Perc + WS, family = cumulative(parallel = TRUE),
## data = WNBA_MVP)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -9.922090 3.345549 -2.966 0.00302 **
## (Intercept):2 -8.652552 3.330536 -2.598 0.00938 **
## (Intercept):3 -7.720604 3.317799 -2.327 0.01996 *
## (Intercept):4 -7.054029 3.309095 -2.132 0.03303 *
## (Intercept):5 -6.411014 3.301649 -1.942 0.05217 .
## (Intercept):6 -5.788910 3.295752 -1.756 0.07901 .
## (Intercept):7 -5.052761 3.290954 -1.535 0.12470
## (Intercept):8 -4.340005 3.289219 -1.319 0.18701
## (Intercept):9 -3.149244 3.293568 -0.956 0.33898
## MP -0.001824 0.001281 -1.424 0.15458
## PER 0.071897 0.134672 0.534 0.59343
## TrueShoot_Perc -12.635779 5.887748 -2.146 0.03186 *
## ThreePoint_Att 1.774746 1.111914 1.596 0.11046
## FTr -3.940708 1.563194 -2.521 0.01170 *
## TotReb_Perc 0.116411 0.045303 2.570 0.01018 *
## Assist_Perc 0.063922 0.028167 2.269 0.02324 *
## Steal_Perc -0.500202 0.191634 -2.610 0.00905 **
## Block_Perc 0.165364 0.118756 1.392 0.16378
## Turnover_Perc 0.070104 0.066368 1.056 0.29083
## Usage_Perc 0.235087 0.067327 3.492 0.00048 ***
## WS 1.097519 0.219434 5.002 5.69e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of linear predictors: 9
##
## Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2]),
## logitlink(P[Y<=3]), logitlink(P[Y<=4]), logitlink(P[Y<=5]), logitlink(P[Y<=6]),
## logitlink(P[Y<=7]), logitlink(P[Y<=8]), logitlink(P[Y<=9])
##
## Residual deviance: 801.2098 on 1860 degrees of freedom
##
## Log-likelihood: -400.6049 on 1860 degrees of freedom
##
## Number of Fisher scoring iterations: 7
##
## No Hauck-Donner effect found in any of the estimates
##
##
## Exponentiated coefficients:
## MP PER TrueShoot_Perc ThreePoint_Att FTr
## 0.9981776829 1.0745444682 0.0000032535 5.8987804699 0.0194344576
## TotReb_Perc Assist_Perc Steal_Perc Block_Perc Turnover_Perc
## 1.1234572283 1.0660091855 0.6064081002 1.1798226626 1.0726200712
## Usage_Perc WS
## 1.2650188379 2.9967205796
According to the output, the estimated first threshold between a rank of 1 and 2 is statistically significant with a corresponding p-value of \(0.00302 < \alpha = 0.01\). This indicates that the threshold is significantly different than zero. The same holds true for the second threshold in between a rank of 2 and 3 as that resulting p-value is 0.00938. The third and fourth thresholds are statistically significant at a significance level of \(\alpha = 0.05\). Therefore, both thresholds are significantly different from zero. The fifth and sixth thresholds are somewhat statistically significant. They are only less than \(\alpha = 0.1\) and therefore, are only statistically different than zero at that level. The seventh, eighth and ninth thresholds are not statistically significant indicating that there is no evidence that the thresholds differ from zero. This is understandable as committee members are more likely to align on their top vote-getters with more variability in the players earning the eighth, ninth and tenth place.
The predictors with the most significant effect on where a
WNBA player will rank in the MVP race are Usage Percentage
(Usge_Perc) and Win Shares (WS). Both
variables have resulting p-values less than 0.01 at 0.00048 and 5.69e-07
respectively. These results logically sound as MVPs typically have the
ball in their hands more than their teammates and contribute a high
number of wins to their team.
However, PER and Usage_Perc may have
a joint influence on Rank. For example, a player with a
high usage percentage and a low PER means that they often handle the
ball, but are inefficient when doing so. Including an interaction term
may introduce an effect that differs from their individual effects.
Including it allows the model to capture the possibility that Usage
percentage may have a stronger effect on RANK when PER is
high and a weaker effect when PER is low. Their joint influence is best
represented in an interaction term between the two. \[L_j(x) = \alpha_j - \beta_1 (\text{MP}) - \beta_2
(\text{PER}) - \beta_3 (\text{TrueShoot\_Perc}) - \beta_4
(\text{ThreePoint\_Att}) - \beta_5 (\text{FTr}) - \beta_6
(\text{TotReb\_Perc}) -\] \[\beta_7
(\text{Assist\_Perc}) - \beta_8 (\text{Steal\_Perc}) - \beta_9
(\text{Block\_Perc}) - \beta_{10} (\text{Turnover\_Perc}) - \beta_{11}
(\text{Usage\_Perc}) -\] \[\beta_{12}
(\text{WS}) - \beta_{13}(\text{PER})(\text{Usage\_Perc})\] The
assumptions still hold in this model.
po_model_int <- vglm(Rank ~ MP + PER + TrueShoot_Perc + ThreePoint_Att + FTr +
TotReb_Perc + Assist_Perc + Steal_Perc
+ Block_Perc + Turnover_Perc + Usage_Perc + WS + Usage_Perc:PER,
family = cumulative(parallel = TRUE), data = WNBA_MVP)
A lower AIC indicates a better-fitting model.
AIC(po_model)
## [1] 843.2098
AIC(po_model_int)
## [1] 841.701
According to the AIC, the model with the interaction term is a better fit for the data than the model without the interaction. The AIC for the interaction term is 841.701 while the model without the term is 843.2098.
## Call:
## vglm(formula = Rank ~ MP + PER + TrueShoot_Perc + ThreePoint_Att +
## FTr + TotReb_Perc + Assist_Perc + Steal_Perc + Block_Perc +
## Turnover_Perc + Usage_Perc + WS + Usage_Perc:PER, family = cumulative(parallel = TRUE),
## data = WNBA_MVP)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -22.224267 7.561924 -2.939 0.00329 **
## (Intercept):2 -20.974131 7.559038 -2.775 0.00553 **
## (Intercept):3 -20.048000 7.552347 -2.655 0.00794 **
## (Intercept):4 -19.372914 7.546095 -2.567 0.01025 *
## (Intercept):5 -18.716520 7.539318 -2.483 0.01305 *
## (Intercept):6 -18.086310 7.532215 -2.401 0.01634 *
## (Intercept):7 -17.338105 7.522857 -2.305 0.02118 *
## (Intercept):8 -16.603413 7.512932 -2.210 0.02711 *
## (Intercept):9 -15.380689 7.496424 -2.052 0.04020 *
## MP -0.002048 0.001285 -1.594 0.11095
## PER 0.700228 0.366239 1.912 0.05588 .
## TrueShoot_Perc -15.359437 6.054216 -2.537 0.01118 *
## ThreePoint_Att 1.736820 1.111975 1.562 0.11831
## FTr -4.815861 1.636033 -2.944 0.00324 **
## TotReb_Perc 0.102018 0.045821 2.226 0.02599 *
## Assist_Perc 0.069165 0.028212 2.452 0.01422 *
## Steal_Perc -0.555505 0.193422 -2.872 0.00408 **
## Block_Perc 0.165165 0.118399 1.395 0.16302
## Turnover_Perc 0.094203 0.067247 1.401 0.16126
## Usage_Perc 0.743002 0.285024 2.607 0.00914 **
## WS 1.119619 0.219608 5.098 3.43e-07 ***
## PER:Usage_Perc -0.022532 0.012267 -1.837 0.06623 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of linear predictors: 9
##
## Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2]),
## logitlink(P[Y<=3]), logitlink(P[Y<=4]), logitlink(P[Y<=5]), logitlink(P[Y<=6]),
## logitlink(P[Y<=7]), logitlink(P[Y<=8]), logitlink(P[Y<=9])
##
## Residual deviance: 797.701 on 1859 degrees of freedom
##
## Log-likelihood: -398.8505 on 1859 degrees of freedom
##
## Number of Fisher scoring iterations: 7
##
## No Hauck-Donner effect found in any of the estimates
##
##
## Exponentiated coefficients:
## MP PER TrueShoot_Perc ThreePoint_Att FTr
## 9.979541e-01 2.014213e+00 2.135409e-07 5.679252e+00 8.100246e-03
## TotReb_Perc Assist_Perc Steal_Perc Block_Perc Turnover_Perc
## 1.107403e+00 1.071613e+00 5.737822e-01 1.179588e+00 1.098783e+00
## Usage_Perc WS PER:Usage_Perc
## 2.102237e+00 3.063687e+00 9.777196e-01
According to the new output, all estimated thresholds are now
at least somewhat significant. The first three are significant at a
\(\alpha = 0.01\) significance level.
All other thresholds are significant at \(\alpha = 0.05\). Therefore, all thresholds
are significantly different from 0 at a 0.05 significance level. Again,
win shares (WS) is the most significant predictor of
Rank while usage percentage is now only significant at a
\(\alpha = 0.05\) level. Variables with
a positive coefficient in this model indicate that higher values for
those variables are associated with a higher probability of being ranked
closer to 1 (lower) than 10 (higher).
For example, for every one unit increase in win shares, the log-odds of being in a lower ranking (closer to 1) is 1.119619. The exponentiated coefficient is 3.0637. This indicates that for each one unit increase in win shares, the odds of being in a lower category increase by a factor of 3.0637. Converting this to a probability results in: \[\frac{3.0637}{3.0637 + 1} = 0.7539188\] Therefore, for every one unit increase in win shares, the odds of being in a lower category (closer to 1) increase by 75.4%.
Examining usage_perc yields that for every one percent
increase, the log-odds of being in a lower ranking is 0.743002. Its
corresponding exponentiated coefficient is 2.102237. Converting this to
a probability results in: \[\frac{2.102237}{2.102237 + 1} = 0.677652\]
Therefore, for every one percent increase in usage, the odds of being in
a lower category (closer to 1) increase by 67.8%.
One intriguing result comes from true shooting percentage. Its
resulting coefficient is -15.359437. This means that for every one
percent increase in true shooting, the log-odds of being in a higher
category (because the coefficient is negative) is 15.359437. This is
quite surprising as its expected for players with higher true shooting
percentages to perform better in the MVP race than those without.
However, the resulting exponentiated coefficient is 2.135409e-07. This
means that for every one percent increase in true shooting percentage,
the odds of being in a higher category (closer to 10) increase by a
factor of 2.135409e-07. Using this to calculate the probability results
in: \[\frac{2.135409 * 10^{-7}}{2.135409 *
10^{-7} + 1} \approx 2.135409 * 10^{-7}\] Therefore, for every
one percent increase in true shooting, the odds of being in a higher
category (closer to 10) increase by 0.00002135409%. This number is quite
low indicating that while TrueShoot_Perc is a significant
predictor, a one percent increase in true shooting percentage does not
change the probability of moving to a different category by much at
all.
Caitlin Clark ended the 2024 season with the following advanced metrics:
MP: 1416
PER: 18.8
TrueShoot_Perc: 0.583
ThreePoint_Att: 0.612
FTr: 0.310
TotReb_Perc: 9.4
Assist_Perc: 39.1
Steal_Perc: 1.9
Block_Perc: 1.7
Turnover_Perc: 25.3
Usage_Perc: 27.7
WS: 3.0
## 1 2 3 4 5 6 7
## 1 0.02050411 0.04759407 0.08765848 0.1102342 0.1452917 0.1561895 0.1674518
## 8 9 10
## 1 0.1175882 0.0990197 0.04846819
According to the model, she would have have a 2.05% chance to win the MVP race and has around a 14-17% chance to end up in 4th, 5th or 6th. Caitlin Clark ended up 4th in MVP voting in 2024.
Aja Wilson ended the 2024 season with the following advanced metrics:
MP: 1308
PER: 34.9
TrueShoot_Perc: 0.591
ThreePoint_Att: 0.081
FTr: 0.370
TotReb_Perc: 19.9
Assist_Perc: 14.1
Steal_Perc: 2.6
Block_Perc: 6.3
Turnover_Perc: 5.3
Usage_Perc: 32.2
WS: 10.9
## 1 2 3 4 5 6
## 1 0.938447 0.04311016 0.01105567 0.003612589 0.001813079 0.0009160897
## 7 8 9 10
## 1 0.000550442 0.0002575021 0.0001675492 6.993284e-05
This model predicts that Wilson has a 93.8% chance to win the MVP race. A’ja Wilson did win league MVP this year and she did so unanimously.
Finally, Nneka Ogwumike has been in the league for 12 years and won the MVP in 2016 with the LA Sparks. Her average advanced stats are:
MP: 901.4615385
PER: 24.26923077
TrueShoot_Perc: 0.6072307692
ThreePoint_Att: 0.09184615385
FTr: 0.3110769231
TotReb_Perc: 14.94615385
Assist_Perc: 12.76923077
Steal_Perc: 2.653846154
Block_Perc: 1.638461538
Turnover_Perc: 11.73846154
Usage_Perc: 23.4
WS: 5.307692308
## 1 2 3 4 5 6 7
## 1 0.007399768 0.01796397 0.03628841 0.05265044 0.08492443 0.1192191 0.1783747
## 8 9 10
## 1 0.1762209 0.2018258 0.1251326
If Ogwumike had a season in which her advanced metrics were consistent with her career averages, she would have a 0.7% chance to win MVP and most likely end up in 9th place (20.18% chance).
It appears as though the model is a solid predictor for where players will land in the MVP race given their advanced statistical metrics. The model also concluded that win shares have the most significant impact on where a player will rank in MVP voting. However, because multicollinearity is violated, the resulting standard errors could be inflated leading to unstable p-values. Therefore, this model may not be the best option when it comes to finding which metrics are statistically significant predictors of WNBA MVP rankings.