Introduction

Each year, the WNBA Most Valuable Player is determined by a committee of sportswriters and broadcasters. Each individual in the committee is asked to select their top 5 players from that season with their top choice receiving 10 points, 2nd receiving 7 points, 3rd receiving 5 points, 4th receiving 3 points and 5th receiving 1 point. This poses the questions, how does the panel decide who they are voting for? That is, Are certain statistics more valuable than others in the eyes of the committee?

This dataset is a collection of each of the top 10 WNBA MVP vote-getters from the last 20 seasons. The data includes each individual’s advanced metrics. Advanced metrics are statistics that go beyond a traditional box score - points, rebounds, assists, turnovers, etc. - as they include stats such as Player Efficiency Rating and Win Shares. The data is collected by tracking on-court actions. These include points scored, rebounds, assists, steals, blocks, turnovers, etc. From these raw numbers, derived statistics such as effective field goal percentage, player efficiency, etc, are calculated.

The advanced metrics that appear in this dataset are player efficiency rating (PER), true shooting percentage (TrueShoot_Perc), three point attempt rate (ThreePoint_Att), free throw attempt rate (FTr), Offensive rebound percentage (ORB%), defensive rebound percentage (DRB%), total rebound percentage (TotReb_Perc), assist percentage (Assist_Perc), steal percentage (Steal_Perc), block percentage (Block_Perc), turnover percentage (Turnover_Perc), usage percentage (Usage_Perc), offensive win shares (OWS), defensive win shares (DWS), win shares (WS), and win shares per 48 minutes (WS/48).

The NBA first started introduced these metrics during the 1996-97 season and the WNBA officially introduced it’s own advanced stats pages in 2016. However, basketball-reference.com has data from all WNBA players and their advanced statistics dating back to its originating season (1997). The site is a well respected provider of sports statistics from Sports Reference, LLC that presents statistics for the WNBA, NBA, European Leagues and the ABA. Sports Reference, LLC also runs other similar pages such as Baseball Reference and Pro Football Reference. Additionally, the data includes each player’s ranking in their respective MVP race and whether or not they won a championship that year.

Methodology

An ordinal logistic regression (or proportional odds model) is used to examine this data. An ordinal logistic regression is used to analyze and model ordinal outcomes. Ordinal outcomes are ordered/leveled categorical variables (they are on an arbitrary scale). Therefore, an ordinal regression is useful when predicting the probability that an outcome will fall into a particular category. In this case, it will be used to predict where a player will fall in the MVP race (1-10). The ordinal outcome is a player’s rank, and the independent variables are a player’s advanced metrics.

The ordinal logistic regression uses log-odds of cumulative probabilities. First, let \(Y =\) the ordinal dependent variable and \(J\) represents the ordered categories. In this specific case,

\(J = 1\) (First in Voting)
\(J = 2\) (Second in Voting) …
\(J = 10\) (Tenth in Voting)

For each category, \(j\), the model defines the cumulative probability as: \[\pi_j = P(Y \le j |x)\] From there, the cumulative probabilities are transformed using logit. The logit is the logarithm of the odds of the probability of a certain event occurring.

\[L_j(x) = logit(\pi_j) = log(\frac{\pi_j}{1 - \pi_j})\]

From this the multiple regression model becomes:

\[L_j(x) = \alpha_j - \beta_1(x_1) - \beta_2(x_2) - ... - \beta_k(x_k)\]

Where \(\alpha_j\) is the intercept or “cutoff” point specific to each category \(j\) when all other predictors = 0. \(\beta_k\) is the coefficient corresponding to the kth independent variable. It measures the effect of the independent variable on the cumulative log-odds of Y being in category j or below. Finally, \(x_k\) represents the kth predictor variable.

An ordinal logistic regression is quite similar to a standard logistic regression. Both models have categorical dependent variables, use link functions and use coefficients to measure the relationship between independent and dependent variables. The need for an ordinal regression arises when the dependent variable is categorical and non-binary. As a result, the ordinal regression yields necessary thresholds (\(\alpha_j\)) while a logistic regression does not. Additionally, the interpretation of the coefficient represents the log-odds of being in a lower category relative to a higher one - instead of representing the log-odds of being in either of the binary categories.

In order to use an ordinal logistic regression, certain assumptions must be met. First, the dependent variable must be measured on an ordinal level. Next, one or more of the independent variables must either continuous, categorical or ordinal. There should be no multicollinearity. Finally, the proportional odds assumption states that each independent variable should have an identical effect on each cumulative split of the ordinal dependent variable.

It is immediately it’s clear that the complete model, containing all independent variables in the dataset is going to contain multicollinearity. For example, we would expect ORB% (offensive rebound percentage), DRB% (defensive rebound percentage), and TotReb_Perc to be highly correlated. This is due to the fact that TotReb_Perc includes both offensive rebounds and defensive rebounds in it’s calculation. Additionally, WS (win shares) is also calculated using two other variables: OWS (offensive win shares) and DWS defensive win shares. WS/48 is also calculated using WS. It’s expected that these groups of variables are highly correlated. To check, each predictor’s variance inflation factors will be examined:

vif(full_model)

##             MP            PER TrueShoot_Perc ThreePoint_Att            FTr 
##       82.71899      500.11917     3448.90148    39289.71568      334.38633 
##         `ORB%`         `DRB%`    TotReb_Perc    Assist_Perc     Steal_Perc 
##     1136.53590     2669.05095    11713.21812       25.54280      747.06501 
##     Block_Perc  Turnover_Perc     Usage_Perc            OWS            DWS 
##       18.70567       35.35585      168.33862   254875.87140   204088.25090 
##             WS        `WS/48` 
##   117222.48847     3856.70296

If a predictor has a VIF that is greater than 5, multicollinearity is present. None of these predictors have a VIF \(< 5\), in fact, most of them are extremely larger than 5. Therefore since ORB% and DRB% are included in TRB%, they will be discarded and only TotReb_Perc will be examined. Similarly, only WS and their effect on Rank will be examined. \[L_j(x) = \alpha_j - \beta_1 (\text{MP}) - \beta_2 (\text{PER}) - \beta_3 (\text{TrueShoot\_Perc}) - \beta_4 (\text{ThreePoint\_Att}) - \beta_5 (\text{FTr}) - \beta_6 (\text{TotReb\_Perc}) - \] \[\beta_7 (\text{Assist\_Perc}) - \beta_8 (\text{Steal\_Perc})- \beta_9 (\text{Block\_Perc}) - \beta_{10} (\text{Turnover\_Perc}) - \beta_{11} (\text{Usage\_Perc}) - \beta_{12} (\text{WS})\]

The model is fitted using either of the two R code chunks.

model <- polr(Rank ~ MP + PER + TrueShoot_Perc + ThreePoint_Att + FTr + 
                   TotReb_Perc + Assist_Perc + Steal_Perc
              + Block_Perc + Turnover_Perc + Usage_Perc + WS, 
              data = WNBA_MVP, 
              Hess = TRUE, method = 'logistic')

po_model <- vglm(Rank ~ MP + PER + TrueShoot_Perc + ThreePoint_Att + FTr + 
                   TotReb_Perc + Assist_Perc + Steal_Perc
              + Block_Perc + Turnover_Perc + Usage_Perc + WS, 
              family = cumulative(parallel = TRUE), data = WNBA_MVP)

Checking the New Model Assumptions

The dependent variable is measured on an ordinal level. Rank is ordinal where athletes are categorized by their MVP race place. Additionally, all independent variables are continuous. Again, a VIF test will be used for multicollinearity:

vif(model)

##             MP            PER TrueShoot_Perc ThreePoint_Att            FTr 
##       47.66546      139.96466     7157.75946    18543.19042     6564.27856 
##    TotReb_Perc    Assist_Perc     Steal_Perc     Block_Perc  Turnover_Perc 
##       13.00831       12.24155       17.13075        6.23259       16.61266 
##     Usage_Perc             WS 
##       78.67595       68.41207

Even with the exclusion of specific variables there is still multicollinearity present. This is due multiple variables being correlated with the amount of minutes an athlete plays.

A brant test can be used to examine the proportional odds assumption.

\(H_0\): Proportional assumption holds: the relationship between the predictors and each pair of outcomes is the same
\(H_a\): Proportional assumption does not hold: the relationship between predictors and pair of outcomes is not the same

brant(model)

## -------------------------------------------- 
## Test for X2  df  probability 
## -------------------------------------------- 
## Omnibus      -34.72  96  1
## MP       6.5 8   0.59
## PER      11.24   8   0.19
## TrueShoot_Perc   3.07    8   0.93
## ThreePoint_Att   9.04    8   0.34
## FTr      16.54   8   0.04
## TotReb_Perc  2.04    8   0.98
## Assist_Perc  4.58    8   0.8
## Steal_Perc   7.27    8   0.51
## Block_Perc   4   8   0.86
## Turnover_Perc    7.31    8   0.5
## Usage_Perc   9.9 8   0.27
## WS       9.8 8   0.28
## --------------------------------------------

Because Omnibus’ p-value = 1 > 0.05, we fail to reject the null hypothesis. The proportional odds assumption holds for the model. We also fail to reject the null hypothesis for each independent variable except for FTr (free throw attempt rate). For all individual independent variables, (except FTr), the proportional odds assumption holds.

Because minutes played MP is highly correlated to every predictor, removing it may solve the issue of multicollinearity

model2 <- polr(Rank ~ PER + TrueShoot_Perc + ThreePoint_Att + FTr + 
                   TotReb_Perc + Assist_Perc + Steal_Perc
              + Block_Perc + Turnover_Perc + Usage_Perc + WS, 
              data = WNBA_MVP, 
              Hess = TRUE, method = 'logistic')

vif(model2)

##            PER TrueShoot_Perc ThreePoint_Att            FTr    TotReb_Perc 
##       8.389933       4.351632       2.037506       1.521745       2.841988 
##    Assist_Perc     Steal_Perc     Block_Perc  Turnover_Perc     Usage_Perc 
##       3.989450       1.920804       2.724908       3.751749       2.476418 
##             WS 
##       2.546524

Now, each predictor is less than 5 (with the exception of PER). However, that predictor is only slightly greater than 5. Therefore removing MP eliminates most of the multicollinearity. However, the proportional odds assumption still needs to be checked:

brant(model2)

## -------------------------------------------- 
## Test for X2  df  probability 
## -------------------------------------------- 
## Omnibus      385.54  88  0
## PER      7.77    8   0.46
## TrueShoot_Perc   3.21    8   0.92
## ThreePoint_Att   8.66    8   0.37
## FTr      16.88   8   0.03
## TotReb_Perc  2.14    8   0.98
## Assist_Perc  4.36    8   0.82
## Steal_Perc   6.26    8   0.62
## Block_Perc   3.86    8   0.87
## Turnover_Perc    7.24    8   0.51
## Usage_Perc   5.92    8   0.66
## WS       5.66    8   0.69
## --------------------------------------------

Because Omnibus’ p-value = 0 < 0.05, the null hypothesis is rejected. The proportional odds assumption does not for the model. However, we again fail to reject the null hypothesis for each independent variable except for FTr (free throw attempt rate). For all individual independent variables, (except FTr), the proportional odds assumption holds.

By removing the MP (minutes played) variable from our model, the multicollinearity assumption will be met. However, doing so changes the results of the Brant test. Removing MP makes it so that the whole model violates the proportional odds assumption. This is intuitive as the minutes an athlete plays is an integral part of the calculations for most advanced statistical metrics. However, it’s inclusion is also what keeps the effect of the independent variables consistent across thresholds. Therefore, the model including MP is still effective at predicting where players will fall in the MVP rankings but the significance of each independent variable may be unstable.

Results and Conclusions

summary(po_model)

## Call:
## vglm(formula = Rank ~ MP + PER + TrueShoot_Perc + ThreePoint_Att + 
##     FTr + TotReb_Perc + Assist_Perc + Steal_Perc + Block_Perc + 
##     Turnover_Perc + Usage_Perc + WS, family = cumulative(parallel = TRUE), 
##     data = WNBA_MVP)
## 
## Coefficients: 
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept):1   -9.922090   3.345549  -2.966  0.00302 ** 
## (Intercept):2   -8.652552   3.330536  -2.598  0.00938 ** 
## (Intercept):3   -7.720604   3.317799  -2.327  0.01996 *  
## (Intercept):4   -7.054029   3.309095  -2.132  0.03303 *  
## (Intercept):5   -6.411014   3.301649  -1.942  0.05217 .  
## (Intercept):6   -5.788910   3.295752  -1.756  0.07901 .  
## (Intercept):7   -5.052761   3.290954  -1.535  0.12470    
## (Intercept):8   -4.340005   3.289219  -1.319  0.18701    
## (Intercept):9   -3.149244   3.293568  -0.956  0.33898    
## MP              -0.001824   0.001281  -1.424  0.15458    
## PER              0.071897   0.134672   0.534  0.59343    
## TrueShoot_Perc -12.635779   5.887748  -2.146  0.03186 *  
## ThreePoint_Att   1.774746   1.111914   1.596  0.11046    
## FTr             -3.940708   1.563194  -2.521  0.01170 *  
## TotReb_Perc      0.116411   0.045303   2.570  0.01018 *  
## Assist_Perc      0.063922   0.028167   2.269  0.02324 *  
## Steal_Perc      -0.500202   0.191634  -2.610  0.00905 ** 
## Block_Perc       0.165364   0.118756   1.392  0.16378    
## Turnover_Perc    0.070104   0.066368   1.056  0.29083    
## Usage_Perc       0.235087   0.067327   3.492  0.00048 ***
## WS               1.097519   0.219434   5.002 5.69e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Number of linear predictors:  9 
## 
## Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2]), 
## logitlink(P[Y<=3]), logitlink(P[Y<=4]), logitlink(P[Y<=5]), logitlink(P[Y<=6]), 
## logitlink(P[Y<=7]), logitlink(P[Y<=8]), logitlink(P[Y<=9])
## 
## Residual deviance: 801.2098 on 1860 degrees of freedom
## 
## Log-likelihood: -400.6049 on 1860 degrees of freedom
## 
## Number of Fisher scoring iterations: 7 
## 
## No Hauck-Donner effect found in any of the estimates
## 
## 
## Exponentiated coefficients:
##             MP            PER TrueShoot_Perc ThreePoint_Att            FTr 
##   0.9981776829   1.0745444682   0.0000032535   5.8987804699   0.0194344576 
##    TotReb_Perc    Assist_Perc     Steal_Perc     Block_Perc  Turnover_Perc 
##   1.1234572283   1.0660091855   0.6064081002   1.1798226626   1.0726200712 
##     Usage_Perc             WS 
##   1.2650188379   2.9967205796

According to the output, the estimated first threshold between a rank of 1 and 2 is statistically significant with a corresponding p-value of \(0.00302 < \alpha = 0.01\). This indicates that the threshold is significantly different than zero. The same holds true for the second threshold in between a rank of 2 and 3 as that resulting p-value is 0.00938. The third and fourth thresholds are statistically significant at a significance level of \(\alpha = 0.05\). Therefore, both thresholds are significantly different from zero. The fifth and sixth thresholds are somewhat statistically significant. They are only less than \(\alpha = 0.1\) and therefore, are only statistically different than zero at that level. The seventh, eighth and ninth thresholds are not statistically significant indicating that there is no evidence that the thresholds differ from zero. This is understandable as committee members are more likely to align on their top vote-getters with more variability in the players earning the eighth, ninth and tenth place.

The predictors with the most significant effect on where a WNBA player will rank in the MVP race are Usage Percentage (Usge_Perc) and Win Shares (WS). Both variables have resulting p-values less than 0.01 at 0.00048 and 5.69e-07 respectively. These results logically sound as MVPs typically have the ball in their hands more than their teammates and contribute a high number of wins to their team.

Adding an Interaction

However, PER and Usage_Perc may have a joint influence on Rank. For example, a player with a high usage percentage and a low PER means that they often handle the ball, but are inefficient when doing so. Including an interaction term may introduce an effect that differs from their individual effects. Including it allows the model to capture the possibility that Usage percentage may have a stronger effect on RANK when PER is high and a weaker effect when PER is low. Their joint influence is best represented in an interaction term between the two. \[L_j(x) = \alpha_j - \beta_1 (\text{MP}) - \beta_2 (\text{PER}) - \beta_3 (\text{TrueShoot\_Perc}) - \beta_4 (\text{ThreePoint\_Att}) - \beta_5 (\text{FTr}) - \beta_6 (\text{TotReb\_Perc}) -\] \[\beta_7 (\text{Assist\_Perc}) - \beta_8 (\text{Steal\_Perc}) - \beta_9 (\text{Block\_Perc}) - \beta_{10} (\text{Turnover\_Perc}) - \beta_{11} (\text{Usage\_Perc}) -\] \[\beta_{12} (\text{WS}) - \beta_{13}(\text{PER})(\text{Usage\_Perc})\] The assumptions still hold in this model.

po_model_int <- vglm(Rank ~ MP + PER + TrueShoot_Perc + ThreePoint_Att + FTr + 
                   TotReb_Perc + Assist_Perc + Steal_Perc
                 + Block_Perc + Turnover_Perc + Usage_Perc + WS + Usage_Perc:PER, 
                 family = cumulative(parallel = TRUE), data = WNBA_MVP)

A lower AIC indicates a better-fitting model.

AIC(po_model)

## [1] 843.2098

AIC(po_model_int)

## [1] 841.701

According to the AIC, the model with the interaction term is a better fit for the data than the model without the interaction. The AIC for the interaction term is 841.701 while the model without the term is 843.2098.

Results: Model with Interaction

## Call:
## vglm(formula = Rank ~ MP + PER + TrueShoot_Perc + ThreePoint_Att + 
##     FTr + TotReb_Perc + Assist_Perc + Steal_Perc + Block_Perc + 
##     Turnover_Perc + Usage_Perc + WS + Usage_Perc:PER, family = cumulative(parallel = TRUE), 
##     data = WNBA_MVP)
## 
## Coefficients: 
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept):1  -22.224267   7.561924  -2.939  0.00329 ** 
## (Intercept):2  -20.974131   7.559038  -2.775  0.00553 ** 
## (Intercept):3  -20.048000   7.552347  -2.655  0.00794 ** 
## (Intercept):4  -19.372914   7.546095  -2.567  0.01025 *  
## (Intercept):5  -18.716520   7.539318  -2.483  0.01305 *  
## (Intercept):6  -18.086310   7.532215  -2.401  0.01634 *  
## (Intercept):7  -17.338105   7.522857  -2.305  0.02118 *  
## (Intercept):8  -16.603413   7.512932  -2.210  0.02711 *  
## (Intercept):9  -15.380689   7.496424  -2.052  0.04020 *  
## MP              -0.002048   0.001285  -1.594  0.11095    
## PER              0.700228   0.366239   1.912  0.05588 .  
## TrueShoot_Perc -15.359437   6.054216  -2.537  0.01118 *  
## ThreePoint_Att   1.736820   1.111975   1.562  0.11831    
## FTr             -4.815861   1.636033  -2.944  0.00324 ** 
## TotReb_Perc      0.102018   0.045821   2.226  0.02599 *  
## Assist_Perc      0.069165   0.028212   2.452  0.01422 *  
## Steal_Perc      -0.555505   0.193422  -2.872  0.00408 ** 
## Block_Perc       0.165165   0.118399   1.395  0.16302    
## Turnover_Perc    0.094203   0.067247   1.401  0.16126    
## Usage_Perc       0.743002   0.285024   2.607  0.00914 ** 
## WS               1.119619   0.219608   5.098 3.43e-07 ***
## PER:Usage_Perc  -0.022532   0.012267  -1.837  0.06623 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Number of linear predictors:  9 
## 
## Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2]), 
## logitlink(P[Y<=3]), logitlink(P[Y<=4]), logitlink(P[Y<=5]), logitlink(P[Y<=6]), 
## logitlink(P[Y<=7]), logitlink(P[Y<=8]), logitlink(P[Y<=9])
## 
## Residual deviance: 797.701 on 1859 degrees of freedom
## 
## Log-likelihood: -398.8505 on 1859 degrees of freedom
## 
## Number of Fisher scoring iterations: 7 
## 
## No Hauck-Donner effect found in any of the estimates
## 
## 
## Exponentiated coefficients:
##             MP            PER TrueShoot_Perc ThreePoint_Att            FTr 
##   9.979541e-01   2.014213e+00   2.135409e-07   5.679252e+00   8.100246e-03 
##    TotReb_Perc    Assist_Perc     Steal_Perc     Block_Perc  Turnover_Perc 
##   1.107403e+00   1.071613e+00   5.737822e-01   1.179588e+00   1.098783e+00 
##     Usage_Perc             WS PER:Usage_Perc 
##   2.102237e+00   3.063687e+00   9.777196e-01

According to the new output, all estimated thresholds are now at least somewhat significant. The first three are significant at a \(\alpha = 0.01\) significance level. All other thresholds are significant at \(\alpha = 0.05\). Therefore, all thresholds are significantly different from 0 at a 0.05 significance level. Again, win shares (WS) is the most significant predictor of Rank while usage percentage is now only significant at a \(\alpha = 0.05\) level. Variables with a positive coefficient in this model indicate that higher values for those variables are associated with a higher probability of being ranked closer to 1 (lower) than 10 (higher).

For example, for every one unit increase in win shares, the log-odds of being in a lower ranking (closer to 1) is 1.119619. The exponentiated coefficient is 3.0637. This indicates that for each one unit increase in win shares, the odds of being in a lower category increase by a factor of 3.0637. Converting this to a probability results in: \[\frac{3.0637}{3.0637 + 1} = 0.7539188\] Therefore, for every one unit increase in win shares, the odds of being in a lower category (closer to 1) increase by 75.4%.

Examining usage_perc yields that for every one percent increase, the log-odds of being in a lower ranking is 0.743002. Its corresponding exponentiated coefficient is 2.102237. Converting this to a probability results in: \[\frac{2.102237}{2.102237 + 1} = 0.677652\] Therefore, for every one percent increase in usage, the odds of being in a lower category (closer to 1) increase by 67.8%.

One intriguing result comes from true shooting percentage. Its resulting coefficient is -15.359437. This means that for every one percent increase in true shooting, the log-odds of being in a higher category (because the coefficient is negative) is 15.359437. This is quite surprising as its expected for players with higher true shooting percentages to perform better in the MVP race than those without. However, the resulting exponentiated coefficient is 2.135409e-07. This means that for every one percent increase in true shooting percentage, the odds of being in a higher category (closer to 10) increase by a factor of 2.135409e-07. Using this to calculate the probability results in: \[\frac{2.135409 * 10^{-7}}{2.135409 * 10^{-7} + 1} \approx 2.135409 * 10^{-7}\] Therefore, for every one percent increase in true shooting, the odds of being in a higher category (closer to 10) increase by 0.00002135409%. This number is quite low indicating that while TrueShoot_Perc is a significant predictor, a one percent increase in true shooting percentage does not change the probability of moving to a different category by much at all.

The Model as a Predictor

Caitlin Clark ended the 2024 season with the following advanced metrics:

MP: 1416
PER: 18.8
TrueShoot_Perc: 0.583
ThreePoint_Att: 0.612
FTr: 0.310
TotReb_Perc: 9.4
Assist_Perc: 39.1
Steal_Perc: 1.9
Block_Perc: 1.7
Turnover_Perc: 25.3
Usage_Perc: 27.7
WS: 3.0

##            1          2          3         4         5         6         7
## 1 0.02050411 0.04759407 0.08765848 0.1102342 0.1452917 0.1561895 0.1674518
##           8         9         10
## 1 0.1175882 0.0990197 0.04846819

According to the model, she would have have a 2.05% chance to win the MVP race and has around a 14-17% chance to end up in 4th, 5th or 6th. Caitlin Clark ended up 4th in MVP voting in 2024.

Aja Wilson ended the 2024 season with the following advanced metrics:

MP: 1308
PER: 34.9
TrueShoot_Perc: 0.591
ThreePoint_Att: 0.081
FTr: 0.370
TotReb_Perc: 19.9
Assist_Perc: 14.1
Steal_Perc: 2.6
Block_Perc: 6.3
Turnover_Perc: 5.3
Usage_Perc: 32.2
WS: 10.9

##          1          2          3           4           5            6
## 1 0.938447 0.04311016 0.01105567 0.003612589 0.001813079 0.0009160897
##             7            8            9           10
## 1 0.000550442 0.0002575021 0.0001675492 6.993284e-05

This model predicts that Wilson has a 93.8% chance to win the MVP race. A’ja Wilson did win league MVP this year and she did so unanimously.

Finally, Nneka Ogwumike has been in the league for 12 years and won the MVP in 2016 with the LA Sparks. Her average advanced stats are:

MP: 901.4615385
PER: 24.26923077
TrueShoot_Perc: 0.6072307692
ThreePoint_Att: 0.09184615385
FTr: 0.3110769231
TotReb_Perc: 14.94615385
Assist_Perc: 12.76923077
Steal_Perc: 2.653846154
Block_Perc: 1.638461538
Turnover_Perc: 11.73846154
Usage_Perc: 23.4
WS: 5.307692308

##             1          2          3          4          5         6         7
## 1 0.007399768 0.01796397 0.03628841 0.05265044 0.08492443 0.1192191 0.1783747
##           8         9        10
## 1 0.1762209 0.2018258 0.1251326

If Ogwumike had a season in which her advanced metrics were consistent with her career averages, she would have a 0.7% chance to win MVP and most likely end up in 9th place (20.18% chance).

Discusssion

It appears as though the model is a solid predictor for where players will land in the MVP race given their advanced statistical metrics. The model also concluded that win shares have the most significant impact on where a player will rank in MVP voting. However, because multicollinearity is violated, the resulting standard errors could be inflated leading to unstable p-values. Therefore, this model may not be the best option when it comes to finding which metrics are statistically significant predictors of WNBA MVP rankings.

Advanced Metrics Impact on WNBA MVP Rankings - Ordinal Regression

Carly Martin